The Singular Value Decomposition

We’ve now seen the eigenvalue decomposition of a linear transformation (in the form of a matrix). We can think of what we did in that decomposition as breaking up the original transformation into three transformations. If we multiply the rightmost matrix by any vector, and then multiply the middle matrix by that product, and then multiply the leftmost matrix on the right-hand side by that product, we would see that starting vector be transformed three times. That process would be equivalent to multiplying the starting vector by the original matrix.

We can also say that, in that original transformation matrix, which we’ll call \(\mathtt{A}\), we mapped a set of orthogonal vectors, or vectors at right angles to each other, (1, 0) and (0, 1), onto a set of non-orthogonal vectors (0, –2) and (1, –3). We don’t have to multiply each vector by the transformation matrix one at a time. We can multiply the set of vectors, as a matrix, by the transformation matrix, like so.

\(\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{1}&\mathtt{0}\\\mathtt{0}&\mathtt{1}\end{bmatrix}=\begin{bmatrix}\mathtt{1}\begin{bmatrix}\mathtt{\,\,\,\,0}\\\mathtt{-2}\end{bmatrix}+\mathtt{0}\begin{bmatrix}\mathtt{\,\,\,\,1}\\\mathtt{-3}\end{bmatrix}&\mathtt{0}\begin{bmatrix}\mathtt{\,\,\,\,0}\\\mathtt{-2}\end{bmatrix}+\mathtt{1}\begin{bmatrix}\mathtt{\,\,\,\,1}\\\mathtt{-3}\end{bmatrix}\end{bmatrix}=\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\)

Of course, we just multiplied the original matrix by the identity matrix, so it spit out the original matrix again. But the above interpretation is different, though it gives the same results.

Okay, great, but we humans seem to love our right angles. So, this question arises about linear transformations: could we use any matrix to map some pair of orthogonal vectors (vectors at right angles to each other) to a different set of orthogonal vectors? That is, could we transform our way from one orthogonal “system” to another with any single transformation matrix?

Could we find a pair of orthogonal vectors which, after undergoing our transformation \(\mathtt{A}\), were mapped to a different pair of orthogonal vectors? If we could, that would mean that if we multiplied \(\mathtt{A}\) by a set of orthogonal vectors \(\mathtt{V}\) (i.e., transformed the set of vectors \(\mathtt{V}\) by the action of \(\mathtt{A}\)), it would be equivalent to just starting with a different set of orthogonal vectors already in position (we’ll call these orthogonal vectors \(\mathtt{U}\)) and just scaling them by a scaling matrix \(\mathtt{\Sigma}\). In notation, we’ll write this hypothesis as \[\mathtt{AV=U \Sigma}\]

The word hypothesis is important here. We’re not really writing down something we know is equivalent—that is, we don’t really know if there is a \(\mathtt{V}\), \(\mathtt{U}\), and \(\mathtt{\Sigma}\) which will make this equivalence true. Students (and I) have a hard time not seeing the equals sign as meaning “I know that it is true that these are equivalent.” But that’s not what it means here, and it’s good to get used to that flexibility. What it means here is that we are supposing for the time being that these two products are equivalent. If some contradiction falls out of our algebra (or we get some kind of infinity), we’ll know that the equivalence fails—at least insofar as we want just one matrix to pop out for each unknown matrix.

Let’s first see how this equivalence appears before we dig into figuring out what \(\mathtt{V}\), \(\mathtt{U}\), and \(\mathtt{\Sigma}\) are. That is, let me show you that the matrices \(\mathtt{V}\), \(\mathtt{U}\), and \(\mathtt{\Sigma}\) are possible before we look at what they are. On the left is our original transformation matrix acting on a set of orthogonal vectors \(\mathtt{V}\) (purple and red), so this transformation shows \(\mathtt{AV}\). (You can see that the original (1, 0) and (0, 1) vectors go to where they’re supposed to.)

\[\mathtt{AV\quad\quad\quad\quad =\quad\quad\quad\quad U \Sigma}\]

On the right is a set of two different orthogonal vectors \(\mathtt{\Sigma}\) (purple and red) which start as being aligned to the horizontal and vertical grid lines and then are rotated and reflected (which is a part of scaling) by a matrix \(\mathtt{U}\). And the two transformations are equivalent! (In the demo they are very close.) We can squeeze out an orthogonal transformation out of that weird matrix we saw in the eigenvalue decomposition which took a square and rotated and stretched it into a funky parallelogram.

I note that, above, I called \(\mathtt{\Sigma}\) a scaling matrix, but here I’m using it as just a set of orthogonal vectors. Luckily for us, both things are true. It just depends on how you look at them. A square matrix like the ones we are using represents both a pair of 2-dimensional vectors or a linear transformation. We get to decide how to interpret the matrix in any given situation.

Getting \(\mathtt{V}\)

To begin to know what these matrices are, we can start by writing the equation above like this. \[\mathtt{A=U \Sigma V^T}\] We can do this because \(\mathtt{V}\) is orthogonal, and I briefly mentioned last time that the transpose of an orthogonal matrix is the same as its inverse. So, in effect, we multiplied both sides of the equation by \(\mathtt{V^{-1}}\), which removed the V from the left-hand side.

Okay, now we pull a little transpose magic, but let’s walk through it. Start by multiplying the expression on each side, on the left, by its transpose. (We’ll circle back to grounding all this some other time.) So, we have \[\mathtt{A^TA=(U\Sigma V^T)^T(U\Sigma V^T)}\] We multiplied \(\mathtt{A}\) by its transpose by multiplying to the left of \(\mathtt{A}\), and we multiplied \(\mathtt{U\Sigma V^T}\) by its transpose by multiplying to the left of \(\mathtt{U\Sigma V^T}\). The product \(\mathtt{A^TA}\) is simple: \[\begin{bmatrix}\mathtt{0}&\mathtt{-2}\\\mathtt{1}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}=\begin{bmatrix}\mathtt{4}&\mathtt{6}\\\mathtt{6}&\mathtt{10}\end{bmatrix}\] But let’s take some time to simplify \(\mathtt{(U\Sigma V^T)^T(U\Sigma V^T)}\). As an example of what to do when we have the transpose of a product of matrices, consider these products.

\[\left(\begin{bmatrix}\mathtt{1}&\mathtt{3}\\\mathtt{2}&\mathtt{4}\end{bmatrix}\begin{bmatrix}\mathtt{4}&\mathtt{6}\\\mathtt{5}&\mathtt{7}\end{bmatrix}\right)^\mathtt{T} \longrightarrow \]
\[\begin{bmatrix}\mathtt{19}&\mathtt{27}\\\mathtt{28}&\mathtt{40}\end{bmatrix}^\mathtt{T}\]
\[\left(\begin{bmatrix}\mathtt{4}&\mathtt{5}\\\mathtt{6}&\mathtt{7}\end{bmatrix}\begin{bmatrix}\mathtt{1}&\mathtt{2}\\\mathtt{3}&\mathtt{4}\end{bmatrix}\right)^\mathtt{T} \longrightarrow \]
\[\begin{bmatrix}\mathtt{19}&\mathtt{28}\\\mathtt{27}&\mathtt{40}\end{bmatrix}\,\,\,\]

Each matrix product on the left is equal to the expression on its right. Test them out for yourself. But the expressions on the right are equal to each other, which means the products on the left are equal to each other. This example shows that \(\mathtt{(AB)^{T}=B^{T}A^{T}}\). The transpose of a product is equal to the product of the transposes, multiplied in reverse order. Again, we’ll ground this later, but this suggests, correctly, that we can rewrite the transposed product above, \(\mathtt{(U\Sigma V^T)^T}\), as \(\mathtt{V\Sigma^{T}U^{T}}\), considering that \(\mathtt{(V^{T})^{T}=V}\). Multiplying that by the remaining part of the right-hand side, \(\mathtt{U\Sigma V^T}\), we get \(\mathtt{V\Sigma^{T}U^{T}U\Sigma V^T}\). Since \(\mathtt{U}\) is orthogonal, its transpose is its inverse, so the \(\mathtt{U}\) terms cancel, leaving us with \(\mathtt{V\Sigma^{T}\Sigma V^T}\). The transpose of a scaling matrix, \(\mathtt{\Sigma}\) in this case, is itself (try it out!), so the middle can be written more simply as \(\mathtt{\Sigma^{2}}\). So, finally (pretty far from finally, actually), we have \[\mathtt{A^{T}A=V\Sigma^{2}V^T}\]

And this is an eigenvalue decomposition for \(\mathtt{A^{T}A}\)! We’ve got a scaling matrix as the lunchmeat, sandwiched by a vector and its inverse (which is the same as the transpose in this case). So, now we can figure out \(\mathtt{V}\) and \(\mathtt{\Sigma}\) by doing the eigenvalue decomposition like we did previously. Here it is: \[\begin{bmatrix}\mathtt{4}&\mathtt{6}\\\mathtt{6}&\mathtt{10}\end{bmatrix}=\begin{bmatrix}\mathtt{\color{purple}{-\frac{521}{991}}}&\mathtt{\color{red}{-\frac{3725}{4379}}}\\\mathtt{\color{purple}{-\frac{3725}{4379}}}&\mathtt{\,\,\,\,\color{red}{\frac{521}{991}}}\end{bmatrix}\begin{bmatrix}\mathtt{\frac{4510}{329}}&\mathtt{0}\\\mathtt{0}&\mathtt{\frac{658}{2255}}\end{bmatrix}\begin{bmatrix}\mathtt{-\frac{521}{991}}&\mathtt{-\frac{3725}{4379}}\\\mathtt{-\frac{3725}{4379}}&\mathtt{\,\,\,\,\frac{521}{991}}\end{bmatrix}\]

The purple and red vectors are our starting vectors from the animation on the left above: (–0.53, –0.85) and (–0.85, 0.53). When we ask what orthogonal starting vectors can we pick such that matrix \(\mathtt{A}\) will transform them to a different pair of orthogonal vectors, matrix \(\mathtt{V}\)—the red and purple vectors above—is our answer. The square roots of the diagonal values of \(\mathtt{\Sigma}\) above (remember, this matrix was squared) are called the singular values of \(\mathtt{A}\). To me, getting \(\mathtt{V}\) is the coolest part of this decomposition. The rest, below, is just gravy.

And Now for \(\mathtt{U}\) and \(\mathtt{\Sigma}\)

In this case, we multiply both sides of the equation \(\mathtt{A=U\Sigma V^T}\) on the right by the transpose of \(\mathtt{A}\) and get an eigenvalue decomposition of \(\mathtt{AA^T}\). \[\,\,\mathtt{AA^T=(U\Sigma V^T)(U\Sigma V^T)^T=U\Sigma V^{T}V\Sigma^{T}U^T=U\Sigma^{2}U^T}\]

\[\begin{bmatrix}\mathtt{\,\,\,\,1}&\mathtt{-3}\\\mathtt{-3}&\mathtt{\,\,\,\,13}\end{bmatrix}=\begin{bmatrix}\mathtt{-\frac{400}{1741}}&\mathtt{\frac{764}{785}}\\\mathtt{\,\,\,\,\frac{764}{785}}&\mathtt{\frac{400}{1741}}\end{bmatrix}\begin{bmatrix}\mathtt{\color{purple}{\frac{4510}{329}}}&\mathtt{\color{red}{0}}\\\mathtt{\color{purple}{0}}&\mathtt{\color{red}{\frac{658}{2255}}}\end{bmatrix}\begin{bmatrix}\mathtt{-\frac{400}{1741}}&\mathtt{\frac{764}{785}}\\\mathtt{\,\,\,\,\frac{764}{785}}&\mathtt{\frac{400}{1741}}\end{bmatrix}\]

Now the square roots of the purple and red vectors are our starting vectors from the animation on the right above: (3.7, 0) and (0, 0.54). You’ll notice, of course, that \(\mathtt{\Sigma^2}\) is the same as above, so we really found it earlier. This step is only to pin down what \(\mathtt{U}\) is. At any rate, we are done, and we can write down the full singular value decomposition (SVD) of the original matrix \(\mathtt{A}\). \[\quad\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}=\begin{bmatrix}\mathtt{-\frac{400}{1741}}&\mathtt{\frac{764}{785}}\\\mathtt{\,\,\,\,\frac{764}{785}}&\mathtt{\frac{400}{1741}}\end{bmatrix}\begin{bmatrix}\mathtt{\frac{1655}{447}}&\mathtt{0}\\\mathtt{0}&\mathtt{\frac{894}{1655}}\end{bmatrix}\begin{bmatrix}\mathtt{-\frac{521}{991}}&\mathtt{-\frac{3725}{4379}}\\\mathtt{-\frac{3725}{4379}}&\mathtt{\,\,\,\,\frac{521}{991}}\end{bmatrix}\]

We’re finding an orthogonal-to-orthogonal transformation in a hurricane, so it shouldn’t be surprising to get weird numbers. One thing the SVD makes clear (the eigenvalue decomposition does this too) is that linear transformations can be described as combinations of rotations and scalings (the latter of which include reflections) and that’s it.


Inverses and Transposes

This will now be my 22nd post on linear algebra, and I hope it’ll be noticeable, looking at all of them together so far, that we haven’t talked about systems of equations. And there’s a good reason for that: because they suck you down an ugly hole of mindless calculation, meaning-challenged tedium, and fruitless, pointless backward thinking. Systems are awesome and important, and I’m sure someone can make a strong argument for introducing them early, but, for me, they can wait.

The inverses of matrices are pretty interesting. The inverse of a 2 × 2 matrix is the matrix that, when multiplied to the original matrix, gives the product that is the identity matrix. The inverse is given by the middle matrix below: \[\begin{bmatrix}\mathtt{a} & \mathtt{b}\\\mathtt{c} & \mathtt{d}\end{bmatrix}\begin{bmatrix}\mathtt{\,\,\,\,\frac{d}{ad-bc}} & \mathtt{-\frac{b}{ad-bc}}\\\mathtt{-\frac{c}{ad-bc}} & \mathtt{\,\,\,\,\frac{a}{ad-bc}}\end{bmatrix}\mathtt{=}\begin{bmatrix}\mathtt{1} & \mathtt{0}\\\mathtt{0} & \mathtt{1}\end{bmatrix}\]

There is an easy-to-follow derivation of this formula here if you’re interested—one that requires only some high school algebra.

You can notice, given this setup, that the identity matrix is its own inverse. Are there any others that fit this description? One thought: we need \(\mathtt{ad-bc}\) to be equal to 1, and we also want \(\mathtt{a=d}\). The values on the other diagonal, \(\mathtt{b}\) and \(\mathtt{c}\), should both be 0, since they have to be equal to their opposites.

In that case, \(\begin{bmatrix}\mathtt{-1} & \mathtt{\,\,\,\,0}\\\mathtt{\,\,\,\,0} & \mathtt{-1}\end{bmatrix}\) would be its own inverse too. What does that mean?

For the identity matrix, it means that if we apply the do-nothing transformation to the home-base matrix, we stay on home base, with the “x” vector pointed to (1, 0) and the “y” vector pointed to (0, 1). The matrix above represents a reflection across the origin. So, applying a reflection across the origin to it should return us to home base.

Wouldn’t a reflection across the x-axis be an inverse of itself, then? Yes! The criteria above for an inverse need to be amended a little to allow for negatives effectively canceling each other out to make positives. \[\begin{bmatrix}\mathtt{1} & \mathtt{\,\,\,\,0}\\\mathtt{0} & \mathtt{-1}\end{bmatrix}\begin{bmatrix}\mathtt{1} & \mathtt{\,\,\,\,0}\\\mathtt{0} & \mathtt{-1}\end{bmatrix}\mathtt{=}\begin{bmatrix}\mathtt{1} & \mathtt{0}\\\mathtt{0} & \mathtt{1}\end{bmatrix}\]

The inverse of a matrix is represented with a superscript \(\mathtt{-1}\). So, the inverse of the matrix \(\mathtt{A}\) is written as \(\mathtt{A^{-1}}\).

The Transpose

The transpose of a matrix is the matrix you get when you take the rows of the matrix and turn them into columns instead. The transpose of a matrix is represented with a superscript \(\mathtt{T}\). So: \[\begin{bmatrix}\mathtt{\,\,\,\,1} & \mathtt{3}\\\mathtt{-2} & \mathtt{4}\end{bmatrix}^{\mathtt{T}}\mathtt{=}\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{3} & \mathtt{\,\,\,\,4}\end{bmatrix}\]

If a 2 × 2 matrix, \(\mathtt{V}\) is orthogonal—meaning that its column vectors are perpendicular and each column vector has a length of 1—then its transpose is the same as its inverse, or \(\mathtt{V^{-1}=V^T}\).

Eigenvectors for Reflections

Having just finished a post on eigenvalue decomposition, involving eigenvalues and eigenvectors, I was flipping through this nifty little textbook, where I saw a description of geometric reflections using eigenvectors, and I thought, “Oh my gosh, of course!”

We have seen reflections here, yet back then we had to find something called the foot of a point to figure out the reflection. But we can construct a reflection matrix (same as a scaling matrix) using only knowledge about eigenvectors.

When we talked about eigenvectors before—those vectors which do not change direction under a transformation (except if they reverse direction)—we were looking for them. But in a reflection, we already know them. In any reflection across a line, we already know that the vector that matches the line of reflection will not change direction, and the vector perpendicular to the line of reflection will only be scaled by \(\mathtt{-1}\). So, both the vector which describes the line of reflection and the vector perpendicular to that line are eigenvectors.

So let’s say we want to reflect point \(\mathtt{C}\) across the line described by the vector \(\mathtt{\alpha(2, -1)}\).

Our eigenvector matrix will be \[\begin{bmatrix}\mathtt{\,\,\,\,2}&\mathtt{-1}\\\mathtt{-1}&\mathtt{-2}\end{bmatrix}\] Here we see the second eigenvector, but we can simply use the vector perpendicular to the first if we don’t know the second one.

Our eigenvalue matrix will be \[\begin{bmatrix}\mathtt{1}&\mathtt{\,\,\,\,0}\\\mathtt{0}&\mathtt{-1}\end{bmatrix}\] since we will keep the first eigenvector—the line of reflection—fixed (eigenvalue of 1) and just flip the second eigenvector (eigenvalue of –1).

As a penultimate step, we calculate the inverse of the eigenvector matrix (we’ll get into inverses fairly soon), and then finally multiply all those matrices together (right to left), as we saw with the eigenvalue decomposition, to get the reflection matrix. \[\begin{bmatrix}\mathtt{\,\,\,\,2}&\mathtt{-1}\\\mathtt{-1}&\mathtt{-2}\end{bmatrix}\begin{bmatrix}\mathtt{1}&\mathtt{\,\,\,\,0}\\\mathtt{0}&\mathtt{-1}\end{bmatrix}\begin{bmatrix}\mathtt{\,\,\,\,\frac{2}{5}}&\mathtt{-\frac{1}{5}}\\\mathtt{-\frac{1}{5}}&\mathtt{-\frac{2}{5}}\end{bmatrix}\mathtt{=}\begin{bmatrix}\mathtt{\,\,\,\,\frac{3}{5}}&\mathtt{-\frac{4}{5}}\\\mathtt{-\frac{4}{5}}&\mathtt{-\frac{3}{5}}\end{bmatrix}\]

The final matrix on the right side is the reflection matrix for reflections across the line represented by the vector \(\mathtt{\alpha(2, -1)}\), or essentially all lines with a slope of \(\mathtt{-\frac{1}{2}}\).

It’s worth playing around with building these matrices and using them to find reflections. A lot of what’s here is pretty obvious, but reflection matrices can come in handy in some non-obvious ways too. The perpendicular vector at the very least has to be on the same side of the line as the point you are reflecting.

Eigenvalue Decomposition

I wrote about eigenvalues and eigenvectors a while back, here. In this post, I’ll show how determining the eigenvalues and eigenvectors of a matrix (2 by 2 in this case) is pretty much all of the work of what’s called eigenvalue decomposition. We’ll start with this matrix, which represents a linear transformation: \[\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\]

You can see the action of this matrix at the right (sort of). It sends the (1, 0) vector to (0, –2) and the (0, 1) vector to (1, –3).

The eigenvectors of this transformation are any nonzero vectors that do not change their direction during this transformation, but only scale up or down (or stay the same) by a factor of \(\mathtt{\lambda}\) as a result of the transformation. So,

\(\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix}=\lambda \begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix}\)

Using our calculations from the previous post linked above, we calculate the eigenvalues to be \(\mathtt{\lambda_1=-2}\) and \(\mathtt{\lambda_2=-1}\). And the corresponding eigenvectors are of the form \(\mathtt{(r, -2r)}\) and \(\mathtt{(-r, r)}\), respectively.

The red vector (representing the eigenvector \(\mathtt{(-r, r)}\)) at right starts at \(\mathtt{(-1, 1)}\). It is scaled by the eigenvalue of \(\mathtt{-1}\) during the transformation—meaning it simply turns in the opposite direction and its magnitude doesn’t change. Any vector of the form \(\mathtt{(-r, r)}\) will behave this way during this tranformation.

The purple vector (representing the eigenvector \(\mathtt{(r, -2r)}\)) starts at \(\mathtt{(-1, 2)}\). It is scaled by the eigenvalue of \(\mathtt{-2}\) during the transformation—meaning it turns in the opposite direction and is scaled by a factor of \(\mathtt{2}\). Any vector of the form \(\mathtt{(r, -2r)}\) will behave this way during this tranformation.

And Now for the Decomposition

We can now use the equation above and plug in each eigenvalue and its corresponding eigenvector to create two matrix equations.

\(\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,2}\end{bmatrix}=\mathtt{-2}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,2}\end{bmatrix}\) \[\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,1}\end{bmatrix}=\mathtt{-1}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,1}\end{bmatrix}\]

We can combine the items on the left side of each equation and the items on the right side of each equation into one matrix equation.

\(\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{-1}&\mathtt{-1}\\\mathtt{\,\,\,\,2}&\mathtt{\,\,\,\,1}\end{bmatrix}=\begin{bmatrix}\mathtt{-1}&\mathtt{-1}\\\mathtt{\,\,\,\,2}&\mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{-2}&\mathtt{\,\,\,\,0}\\\mathtt{\,\,\,\,0}&\mathtt{-1}\end{bmatrix}\)

This leaves us with [original matrix][eigenvector matrix] = [eigenvector matrix][eigenvalue matrix]. Finally, we multiply both sides by the inverse of the eigenvector matrix, in order to remove it from the left side of the equation. We can’t remove it from the right side, because matrix multiplication is not commutative. That leaves us with the final decomposition (hat tip to Math the Beautiful for some of the ideas in this post):

Multiplying these three matrices together, or combining the transformations represented by the matrices as we showed here, will result in the original matrix.

Making Parallelepipeds

We have talked about the cross product (here and here), so let’s move on to doing something marginally interesting with it: we’ll make a rectangular prism, or the more general parallelepiped.

A parallelepiped is shown at the right. It is defined by three vectors: u, v, and w. The cross product vector \(\mathtt{v \wedge w}\) is perpendicular to both v and w, and its magnitude, \(\mathtt{||v \wedge w||}\), is equal to the area of the parallelogram formed by the vectors v and w (something I didn’t mention in the previous two posts).

The perpendicular height of the skewed prism, or parallelepiped, is given by \(\mathtt{||u||cos(θ)}\).

The volume of the parallelepiped can thus be written as the area of the base times the height, or \(\mathtt{V = (||v \wedge w||)(||u||\text{cos}(θ))}\).

We can write \(\mathtt{\text{cos}(θ)}\) in this case as \[\mathtt{\text{cos}(θ) = \frac{(v \wedge w) \cdot u}{(||v \wedge w||)(||u||)}}\]

Which means that, after simplifying the volume equation, we’re left with \(\mathtt{V = (v \wedge w) \cdot u}\): so, the dot product of the vector perpendicular to the base and the slant height of the prism. The result is a scalar value, of course, for the volume of the parallelepiped, and, because it is a dot product, it is a signed value. We can get negative volumes, which doesn’t mean the volume is negative but tells us something about the orientation of the parallelepiped.

Creating Some Parallelepipeds

Creating prisms and skewed prisms can be done in Geogebra 3D, but here I’ll show how to create these figures from scratch using Three.js. Click and drag on the window to rotate the scene below. Right click and drag to pan left, right, up, or down. Scroll to zoom in and out.

Click on the Pencil icon above in the lovely Trinket window and navigate to the parallelepiped.js tab to see the code that makes the cubes. You can see that vectors are used to create the vertices (position vectors, so just points). Each face is composed of 2 triangles: (0, 1, 2) means to create a face from the 0th, 1st, and 2nd vertices from the vertices list. Make some copies of the box in the code and play around!

To determine the volume of each cube: \[\left(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \wedge \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix}\right) \cdot \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} \mathtt{= 1}\]

Making Sense of the Cross Product

Last time, we saw that the cross product is a product of two 3d vectors which delivers a vector perpendicular to those two factor vectors.

The cross product is built using three determinants. To determine the x-component of the cross product from the factor vectors (1, 3, 0) and (–2, 0, 0), you find the determinant of the vectors (3, 0) and (0, 0)—the vectors built from the “not-x” components (y- and z-components) of the factors. Repeat this process for the other two components of the cross product, making sure to reverse the sign of the result for the y-component.

But why does this work? How does the cross product make itself perpendicular to the two factor vectors by just using determinants? Below, we’ll still be using magic, but we get a little closer to making our understanding magic free.

Getting the Result We Want

We can actually start with a result we definitely want from the cross product and go from there. (1) The result we want is that when we determine the cross product of a “pure” x-vector (\(\mathtt{1,0,0}\)) and a “pure” y-vector (\(\mathtt{0,1,0}\)), we should get a “pure” z-vector (\(\mathtt{0,0,1}\)). The same goes for other pairings as well. Thus:

\(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \otimes \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix} = \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} \quad \quad \) \(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \otimes \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} = \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix} \quad \quad \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix} \otimes \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} = \begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \)

A simpler way to write this is to use \(\mathtt{i}\), \(\mathtt{j}\), and \(\mathtt{k}\) to represent the pure x-, y-, and z-vectors, respectively. So, \(\mathtt{i \otimes j = k}\) and so on.

Another thing we want—and here comes some (more) magic—is for (2) the cross product to be antisymmetric, which means that when we change the order of the factors, the cross product’s sign changes but its value does not. So, we want \(\mathtt{i \otimes j = k}\), but then \(\mathtt{j \otimes i = -k}\). And, as before, the same goes for the other pairings as well: \(\mathtt{j \otimes k = i}\), \(\mathtt{k \otimes j = -i}\), \(\mathtt{k \otimes i = j}\), \(\mathtt{i \otimes k = -j}\). This property allows us to use the cross product in order to get a sense of how two vectors are oriented relative to each other in 3d space.

With those two magic beans in hand (and a third and fourth to come in just a second), we can go back to notice that any vector can be written as a linear combination of \(\mathtt{i}\), \(\mathtt{j}\), and \(\mathtt{k}\). The two vectors at the end of the previous post on this topic, for example, (0, 4, 1) and (–2, 0, 0) can be written as \(\mathtt{4j + k}\) and \(\mathtt{-2i}\), respectively.

The cross product, then, of any two 3d vectors \(\mathtt{v = (v_x,v_y,v_z)}\) and \(\mathtt{w = (w_x,w_y,w_z)}\) can be written as: \[\mathtt{(v_{x}i+v_{y}j+v_{z}k) \otimes (w_{x}i+w_{y}j+w_{z}k)}\]

For the final bits of magic, we (3) assume that the cross product distributes over addition as we would expect it to, and (4) decide that the cross product of a “pure” vector (i, j, or k) with itself is 0. If that all works out, then we get this: \[\mathtt{v_{x}w_{x}i^2 + v_{x}w_{y}ij + v_{x}w_{z}ik + v_{y}w_{x}ji + v_{y}w_{y}j^2 + v_{y}w_{z}jk + v_{z}w_{x}ki + v_{z}w_{y}kj + v_{z}w_{z}k^2}\]

Then, by applying the ideas in (1) and (4), we simplify to this: \[\mathtt{(v_{y}w_{z} – v_{z}w_{y})i + (-v_{x}w_{z} – v_{z}w_{x})j + (v_{x}w_{y} – v_{y}w_{x})k}\]

And that’s our cross product vector that we saw before. The cross product of the vectors shown in the image above would be the vector (0, –2, 8).

The Cross Product

The cross product of two vectors is another vector (whereas the dot product was just another number—a scalar). The cross product vector is perpendicular to both of the factor vectors. Typically, books will say that we need 3d vectors (vectors with 3 components) to talk about the cross product, which is true, sort of, but we can give 3d vectors a third component of zero to see how the cross product works with 2d-ish vectors, like below.

At the right, we show the vector (1, 3, 0), the vector (–2, 0, 0), and the cross product of those two vectors (in that order), which is the cross product vector (0, 0, 6).

Since we’re calling it a product, we’ll want to know how we built that product. So, let’s talk about that.

Deconstructing the Cross Product

The cross product vector is built using three determinants, as shown below.

For the x-component of the cross product vector, we deconstruct the factor vectors into 2d vectors made up of the y- and z-components. Then we find the determinant of those two 2d vectors (the area of the parallelogram they form, if any). We do the same for each of the other components of the cross product vector—if we’re working on the y-component of the cross product vector, then we create two 2d vectors from the x- and z-components of the factor vectors and find their parallelogram area, or determinant. And the same for the third component of the cross product vector. (Notice, though, that we reverse the sign of the second component of the cross product vector. It’s not evident here, because it’s zero.)

We’ll look more into the intuition behind this later. It is not immediately obvious why three simple area calculations (the determinants) should be able to deliver a vector that is exactly perpendicular to the two factor vectors (which is an indication that we don’t know everything there is to know about the seemingly dirt-simple concept of area!). But the cross product has a lot of fascinating connections to and uses in physics and engineering—and computer graphics.

I’ll leave you with this exercise to determine the cross product, or a vector perpendicular to this little ramp. The blue vector is (0, 4, 1), and the red vector is
(–2, 0, 0).


Vectors and Complex Numbers

Vectors share a lot of characteristics with complex numbers. They are both multi-dimensional objects, so to speak. Position vectors with 2 components \(\mathtt{(x_1, x_2)}\) behave in much the same way geometrically as complex numbers \(\mathtt{a + bi}\). At the right, you can see that Geogebra displays the position vectors as arrows and the complex numbers as points. In some sense, though, we could use both the vector and the complex number to refer to the same object if we wanted.

You’ll have no problem finding out about how to multiply two complex numbers, though a similar product result for multiplying 2 vectors seems to be hard to come by. For complex numbers, we just use the Distributive Property: \[\mathtt{(a + bi)(c + di) = ac + adi + bci + bdi^2 = ac – bd + (ad + bc)i}\] In fact, we are told that we can think of multiplying complex numbers as rotating points on the complex plane. Since \(\mathtt{0 + i}\) is at a 90° angle to the x-axis, multiplying \(\mathtt{3 + 2i}\) by \(\mathtt{0 + i}\) will rotate the point \(\mathtt{3 + 2i}\) ninety degrees about the origin: \[\mathtt{(3 + 2i)(0 + 1i) = (3)(0) + (3)(1)i + (2)(0)i + (2)(1)i^2 = -2 + 3i}\]

We’ll get the same result after changing the order of the factors too, of course, since complex multiplication is commutative, but now we have to say that \(\mathtt{0 + i}\) was not only rotated by β but scaled as well.

By what was it scaled? Well, since the straight vertical vector has a length of 1, it was scaled by the length of the vector represented by the complex number \(\mathtt{3 + 2i}\), or \(\mathtt{\sqrt{13}}\).

Multiplying Vectors in the Same Way

It seems that we can multiply vectors in the same way that you can multiply complex numbers, though I’m hard pressed to find a source which describes this possibility.

That is, we can rotate the position vector (a, b) so many degrees (\(\mathtt{tan^{-1}(\frac{d}{c})}\)) counterclockwise by multiplying by the position vector (c, d) of unit length, like so: \[\begin{bmatrix}\mathtt{a}\\\mathtt{b}\end{bmatrix}\begin{bmatrix}\mathtt{c}\\\mathtt{d}\end{bmatrix} = \begin{bmatrix}\mathtt{ac – bd}\\\mathtt{ad + bc}\end{bmatrix}\]

Want to rotate the vector (5, 2) by 19°? First we determine the unit vector which forms a 19° angle with the x-axis. That’s (cos(19°), sin(19°)). Then multiply as above:

\[\begin{bmatrix}\mathtt{5}\\\mathtt{2}\end{bmatrix}\begin{bmatrix}\mathtt{cos(19^\circ)}\\\mathtt{sin(19^\circ)}\end{bmatrix} = \begin{bmatrix}\mathtt{5cos(19^\circ) – 2sin(19^\circ)}\\\mathtt{5sin(19^\circ) + 2cos(19^\circ)}\end{bmatrix}\]

Seems like a perfectly satisfactory way of multiplying vectors to me. We have some issues with undefined values and generality, etc., but for chopping some things together, multiplying vectors in a crazy way seems easier to think about than hauling out full blown matrices to do the job.

Word Vectors and Dot Products

A really cool thing about vectors is that they are used to represent and compare a lot of different things that don’t, at first glance, appear to be mathematically representable or comparable. And a lot of this power comes from working with vectors that are “bigger” than the 2-component vectors we have looked at thus far.

\(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\end{bmatrix}\) \(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\end{bmatrix}\)

For example, we could have a vector with 26 components. Some would say that this is a vector with 26 dimensions, but I don’t see the need to talk about dimensions—for the most part, if we’re talking about 26-component vectors, we’re probably not talking about dimensions in any helpful sense, except to help us look smart.

At the right are two possible 26-component vectors. We can say that the vector on the left represents the word pelican. The vector on the right represents the word clap. Each component of the vectors is a representation of a letter from a to z in the word. So, each vector may not be unique to the word it represents. The one on the left could also be the vector for capelin, a kind of fish, or panicle, which is a loose cluster of flowers.

The words, however, are similar in that the shorter word clap contains all the letters that the longer word pelican contains. We might be able to see this similarity show up if we measure the cosine between the two vectors. The cosine can be had, recall, by determining the dot product of the vectors (multiply each pair of corresponding elements and add all the products) and dividing the result by the product of their lengths (the lengths being, in each case, the square root of component12 + component22 . . .). What we get for the two vectors on the right is: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{4}} \approx 0.816}\]

This is fairly close to 1. The angle measure between the two words would be about 35°. Now let’s compare pelican and plenty. These two words are also fairly similar—there is the same 4-letter overlap between the words—but should yield a smaller cosine because of the divergent letters. Confirm for yourself, but for these two words I get: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{6}} = \frac{2}{3}\quad\quad}\]

And that’s about a 48-degree angle between the words. An even more different word, like sausage (the a and s components have 2s), produces a cosine (with pelican) of about 0.3693, which is about a 68° angle.

So, we see that with vectors we can apply a numeric measurement to the similarity of words, with anagrams having cosines of 1 and words sharing no letters at all as being at right angles to each other (having a dot product and cosine of 0).

Combining Matrix Transformations

Something that stands out in my mind as I have learned more linear algebra recently is how much more sane it feels to do a lot of forward thinking before getting into the backward “solving” thinking—to, for example, create a bunch of linear transformations and strengthen my ability to do stuff with the mathematics before throwing a wrench in the works and having me wonder what would happen if I didn’t know the starting vectors.

So, we’ll continue that forward thinking here by looking at the effect of combining transformations. Or, if we think about a 2 × 2 matrix as representing a linear transformation, then we’ll look at combining matrices.

How about this one, then? This is a transformation in which the (1, 0) basis vector goes to (1, 1 third) and the (0, 1) basis vector goes to (–2, 1). You can see the effect this transformation has on the unshaded triangle (producing the shaded triangle).

Before we combine this with another transformation, notice that the horizontal base of the original triangle, which was parallel to the horizontal basis vector, appears to be, in its transformed form, now parallel to the transformed horizontal basis vector. Let’s test this. \[\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{\frac{1}{3}} & \mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{2}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{-2}\\\mathtt{2\frac{2}{3}}\end{bmatrix} \quad\text{and}\quad\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{\frac{1}{3}} & \mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{4}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{0}\\\mathtt{3\frac{1}{3}}\end{bmatrix}\]

The slope of the originally horizontal but now transformed base is, then, \(\mathtt{\frac{3\frac{1}{3}\, – \,2\frac{2}{3}}{0\,-\,(-2)} = \frac{\frac{2}{3}}{2} = \frac{1}{3}}\), which is the same slope as the transformed horizontal basis vector (1, 1 third).

Transform the Transformation

Okay, so let’s transform the transformation, as shown at the right, under this matrix: \[\begin{bmatrix}\mathtt{-1} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{\frac{1}{2}}\end{bmatrix}\]

Is it possible to multiply the two matrices to get our final (purple) transformation? Here’s how to multiply the two matrices and the result:

You should be able to check that, yes indeed, the last matrix takes the original triangle to the purple triangle. You should also be able to test that reversing the order of the multiplication of the two matrices changes the answer completely, so matrix multiplication is not commutative. Notice also that the determinant is approximately \(\mathtt{-0.8333…}\). This tells us that the area of the new triangle is 5 sixths that of the original. And the negative indicates the reflection the triangle underwent. The determinant of the first matrix is –0.5, and that of the second is 5 thirds. Multiply those together and you get the determinant of the combined transformations matrix.