The Singular Value Decomposition

We’ve now seen the eigenvalue decomposition of a linear transformation (in the form of a matrix). We can think of what we did in that decomposition as breaking up the original transformation into three transformations. If we multiply the rightmost matrix by any vector, and then multiply the middle matrix by that product, and then multiply the leftmost matrix on the right-hand side by that product, we would see that starting vector be transformed three times. That process would be equivalent to multiplying the starting vector by the original matrix.

We can also say that, in that original transformation matrix, which we’ll call \(\mathtt{A}\), we mapped a set of orthogonal vectors, or vectors at right angles to each other, (1, 0) and (0, 1), onto a set of non-orthogonal vectors (0, –2) and (1, –3). We don’t have to multiply each vector by the transformation matrix one at a time. We can multiply the set of vectors, as a matrix, by the transformation matrix, like so.


Of course, we just multiplied the original matrix by the identity matrix, so it spit out the original matrix again. But the above interpretation is different, though it gives the same results.

Okay, great, but we humans seem to love our right angles. So, this question arises about linear transformations: could we use any matrix to map some pair of orthogonal vectors (vectors at right angles to each other) to a different set of orthogonal vectors? That is, could we transform our way from one orthogonal “system” to another with any single transformation matrix?

Could we find a pair of orthogonal vectors which, after undergoing our transformation \(\mathtt{A}\), were mapped to a different pair of orthogonal vectors? If we could, that would mean that if we multiplied \(\mathtt{A}\) by a set of orthogonal vectors \(\mathtt{V}\) (i.e., transformed the set of vectors \(\mathtt{V}\) by the action of \(\mathtt{A}\)), it would be equivalent to just starting with a different set of orthogonal vectors already in position (we’ll call these orthogonal vectors \(\mathtt{U}\)) and just scaling them by a scaling matrix \(\mathtt{\Sigma}\). In notation, we’ll write this hypothesis as \[\mathtt{AV=U \Sigma}\]

The word hypothesis is important here. We’re not really writing down something we know is equivalent—that is, we don’t really know if there is a \(\mathtt{V}\), \(\mathtt{U}\), and \(\mathtt{\Sigma}\) which will make this equivalence true. Students (and I) have a hard time not seeing the equals sign as meaning “I know that it is true that these are equivalent.” But that’s not what it means here, and it’s good to get used to that flexibility. What it means here is that we are supposing for the time being that these two products are equivalent. If some contradiction falls out of our algebra (or we get some kind of infinity), we’ll know that the equivalence fails—at least insofar as we want just one matrix to pop out for each unknown matrix.

Let’s first see how this equivalence appears before we dig into figuring out what \(\mathtt{V}\), \(\mathtt{U}\), and \(\mathtt{\Sigma}\) are. That is, let me show you that the matrices \(\mathtt{V}\), \(\mathtt{U}\), and \(\mathtt{\Sigma}\) are possible before we look at what they are. On the left is our original transformation matrix acting on a set of orthogonal vectors \(\mathtt{V}\) (purple and red), so this transformation shows \(\mathtt{AV}\). (You can see that the original (1, 0) and (0, 1) vectors go to where they’re supposed to.)

\[\mathtt{AV\quad\quad\quad\quad =\quad\quad\quad\quad U \Sigma}\]

On the right is a set of two different orthogonal vectors \(\mathtt{\Sigma}\) (purple and red) which start as being aligned to the horizontal and vertical grid lines and then are rotated and reflected (which is a part of scaling) by a matrix \(\mathtt{U}\). And the two transformations are equivalent! (In the demo they are very close.) We can squeeze out an orthogonal transformation out of that weird matrix we saw in the eigenvalue decomposition which took a square and rotated and stretched it into a funky parallelogram.

I note that, above, I called \(\mathtt{\Sigma}\) a scaling matrix, but here I’m using it as just a set of orthogonal vectors. Luckily for us, both things are true. It just depends on how you look at them. A square matrix like the ones we are using represents both a pair of 2-dimensional vectors or a linear transformation. We get to decide how to interpret the matrix in any given situation.

Getting \(\mathtt{V}\)

To begin to know what these matrices are, we can start by writing the equation above like this. \[\mathtt{A=U \Sigma V^T}\] We can do this because \(\mathtt{V}\) is orthogonal, and I briefly mentioned last time that the transpose of an orthogonal matrix is the same as its inverse. So, in effect, we multiplied both sides of the equation by \(\mathtt{V^{-1}}\), which removed the V from the left-hand side.

Okay, now we pull a little transpose magic, but let’s walk through it. Start by multiplying the expression on each side, on the left, by its transpose. (We’ll circle back to grounding all this some other time.) So, we have \[\mathtt{A^TA=(U\Sigma V^T)^T(U\Sigma V^T)}\] We multiplied \(\mathtt{A}\) by its transpose by multiplying to the left of \(\mathtt{A}\), and we multiplied \(\mathtt{U\Sigma V^T}\) by its transpose by multiplying to the left of \(\mathtt{U\Sigma V^T}\). The product \(\mathtt{A^TA}\) is simple: \[\begin{bmatrix}\mathtt{0}&\mathtt{-2}\\\mathtt{1}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}=\begin{bmatrix}\mathtt{4}&\mathtt{6}\\\mathtt{6}&\mathtt{10}\end{bmatrix}\] But let’s take some time to simplify \(\mathtt{(U\Sigma V^T)^T(U\Sigma V^T)}\). As an example of what to do when we have the transpose of a product of matrices, consider these products.

\[\left(\begin{bmatrix}\mathtt{1}&\mathtt{3}\\\mathtt{2}&\mathtt{4}\end{bmatrix}\begin{bmatrix}\mathtt{4}&\mathtt{6}\\\mathtt{5}&\mathtt{7}\end{bmatrix}\right)^\mathtt{T} \longrightarrow \]
\[\left(\begin{bmatrix}\mathtt{4}&\mathtt{5}\\\mathtt{6}&\mathtt{7}\end{bmatrix}\begin{bmatrix}\mathtt{1}&\mathtt{2}\\\mathtt{3}&\mathtt{4}\end{bmatrix}\right)^\mathtt{T} \longrightarrow \]

Each matrix product on the left is equal to the expression on its right. Test them out for yourself. But the expressions on the right are equal to each other, which means the products on the left are equal to each other. This example shows that \(\mathtt{(AB)^{T}=B^{T}A^{T}}\). The transpose of a product is equal to the product of the transposes, multiplied in reverse order. Again, we’ll ground this later, but this suggests, correctly, that we can rewrite the transposed product above, \(\mathtt{(U\Sigma V^T)^T}\), as \(\mathtt{V\Sigma^{T}U^{T}}\), considering that \(\mathtt{(V^{T})^{T}=V}\). Multiplying that by the remaining part of the right-hand side, \(\mathtt{U\Sigma V^T}\), we get \(\mathtt{V\Sigma^{T}U^{T}U\Sigma V^T}\). Since \(\mathtt{U}\) is orthogonal, its transpose is its inverse, so the \(\mathtt{U}\) terms cancel, leaving us with \(\mathtt{V\Sigma^{T}\Sigma V^T}\). The transpose of a scaling matrix, \(\mathtt{\Sigma}\) in this case, is itself (try it out!), so the middle can be written more simply as \(\mathtt{\Sigma^{2}}\). So, finally (pretty far from finally, actually), we have \[\mathtt{A^{T}A=V\Sigma^{2}V^T}\]

And this is an eigenvalue decomposition for \(\mathtt{A^{T}A}\)! We’ve got a scaling matrix as the lunchmeat, sandwiched by a vector and its inverse (which is the same as the transpose in this case). So, now we can figure out \(\mathtt{V}\) and \(\mathtt{\Sigma}\) by doing the eigenvalue decomposition like we did previously. Here it is: \[\begin{bmatrix}\mathtt{4}&\mathtt{6}\\\mathtt{6}&\mathtt{10}\end{bmatrix}=\begin{bmatrix}\mathtt{\color{purple}{-\frac{521}{991}}}&\mathtt{\color{red}{-\frac{3725}{4379}}}\\\mathtt{\color{purple}{-\frac{3725}{4379}}}&\mathtt{\,\,\,\,\color{red}{\frac{521}{991}}}\end{bmatrix}\begin{bmatrix}\mathtt{\frac{4510}{329}}&\mathtt{0}\\\mathtt{0}&\mathtt{\frac{658}{2255}}\end{bmatrix}\begin{bmatrix}\mathtt{-\frac{521}{991}}&\mathtt{-\frac{3725}{4379}}\\\mathtt{-\frac{3725}{4379}}&\mathtt{\,\,\,\,\frac{521}{991}}\end{bmatrix}\]

The purple and red vectors are our starting vectors from the animation on the left above: (–0.53, –0.85) and (–0.85, 0.53). When we ask what orthogonal starting vectors can we pick such that matrix \(\mathtt{A}\) will transform them to a different pair of orthogonal vectors, matrix \(\mathtt{V}\)—the red and purple vectors above—is our answer. The square roots of the diagonal values of \(\mathtt{\Sigma}\) above (remember, this matrix was squared) are called the singular values of \(\mathtt{A}\). To me, getting \(\mathtt{V}\) is the coolest part of this decomposition. The rest, below, is just gravy.

And Now for \(\mathtt{U}\) and \(\mathtt{\Sigma}\)

In this case, we multiply both sides of the equation \(\mathtt{A=U\Sigma V^T}\) on the right by the transpose of \(\mathtt{A}\) and get an eigenvalue decomposition of \(\mathtt{AA^T}\). \[\,\,\mathtt{AA^T=(U\Sigma V^T)(U\Sigma V^T)^T=U\Sigma V^{T}V\Sigma^{T}U^T=U\Sigma^{2}U^T}\]


Now the square roots of the purple and red vectors are our starting vectors from the animation on the right above: (3.7, 0) and (0, 0.54). You’ll notice, of course, that \(\mathtt{\Sigma^2}\) is the same as above, so we really found it earlier. This step is only to pin down what \(\mathtt{U}\) is. At any rate, we are done, and we can write down the full singular value decomposition (SVD) of the original matrix \(\mathtt{A}\). \[\quad\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}=\begin{bmatrix}\mathtt{-\frac{400}{1741}}&\mathtt{\frac{764}{785}}\\\mathtt{\,\,\,\,\frac{764}{785}}&\mathtt{\frac{400}{1741}}\end{bmatrix}\begin{bmatrix}\mathtt{\frac{1655}{447}}&\mathtt{0}\\\mathtt{0}&\mathtt{\frac{894}{1655}}\end{bmatrix}\begin{bmatrix}\mathtt{-\frac{521}{991}}&\mathtt{-\frac{3725}{4379}}\\\mathtt{-\frac{3725}{4379}}&\mathtt{\,\,\,\,\frac{521}{991}}\end{bmatrix}\]

We’re finding an orthogonal-to-orthogonal transformation in a hurricane, so it shouldn’t be surprising to get weird numbers. One thing the SVD makes clear (the eigenvalue decomposition does this too) is that linear transformations can be described as combinations of rotations and scalings (the latter of which include reflections) and that’s it.

Published by

Josh Fisher

Instructional designer, software development in K-12 mathematics education.