Making Sense of the Cross Product

Last time, we saw that the cross product is a product of two 3d vectors which delivers a vector perpendicular to those two factor vectors.

The cross product is built using three determinants. To determine the x-component of the cross product from the factor vectors (1, 3, 0) and (–2, 0, 0), you find the determinant of the vectors (3, 0) and (0, 0)—the vectors built from the “not-x” components (y- and z-components) of the factors. Repeat this process for the other two components of the cross product, making sure to reverse the sign of the result for the y-component.

But why does this work? How does the cross product make itself perpendicular to the two factor vectors by just using determinants? Below, we’ll still be using magic, but we get a little closer to making our understanding magic free.

Getting the Result We Want

We can actually start with a result we definitely want from the cross product and go from there. (1) The result we want is that when we determine the cross product of a “pure” x-vector (\(\mathtt{1,0,0}\)) and a “pure” y-vector (\(\mathtt{0,1,0}\)), we should get a “pure” z-vector (\(\mathtt{0,0,1}\)). The same goes for other pairings as well. Thus:

\(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \otimes \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix} = \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} \quad \quad \) \(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \otimes \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} = \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix} \quad \quad \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix} \otimes \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} = \begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \)

A simpler way to write this is to use \(\mathtt{i}\), \(\mathtt{j}\), and \(\mathtt{k}\) to represent the pure x-, y-, and z-vectors, respectively. So, \(\mathtt{i \otimes j = k}\) and so on.

Another thing we want—and here comes some (more) magic—is for (2) the cross product to be antisymmetric, which means that when we change the order of the factors, the cross product’s sign changes but its value does not. So, we want \(\mathtt{i \otimes j = k}\), but then \(\mathtt{j \otimes i = -k}\). And, as before, the same goes for the other pairings as well: \(\mathtt{j \otimes k = i}\), \(\mathtt{k \otimes j = -i}\), \(\mathtt{k \otimes i = j}\), \(\mathtt{i \otimes k = -j}\). This property allows us to use the cross product in order to get a sense of how two vectors are oriented relative to each other in 3d space.

With those two magic beans in hand (and a third and fourth to come in just a second), we can go back to notice that any vector can be written as a linear combination of \(\mathtt{i}\), \(\mathtt{j}\), and \(\mathtt{k}\). The two vectors at the end of the previous post on this topic, for example, (0, 4, 1) and (–2, 0, 0) can be written as \(\mathtt{4j + k}\) and \(\mathtt{-2i}\), respectively.

The cross product, then, of any two 3d vectors \(\mathtt{v = (v_x,v_y,v_z)}\) and \(\mathtt{w = (w_x,w_y,w_z)}\) can be written as: \[\mathtt{(v_{x}i+v_{y}j+v_{z}k) \otimes (w_{x}i+w_{y}j+w_{z}k)}\]

For the final bits of magic, we (3) assume that the cross product distributes over addition as we would expect it to, and (4) decide that the cross product of a “pure” vector (i, j, or k) with itself is 0. If that all works out, then we get this: \[\mathtt{v_{x}w_{x}i^2 + v_{x}w_{y}ij + v_{x}w_{z}ik + v_{y}w_{x}ji + v_{y}w_{y}j^2 + v_{y}w_{z}jk + v_{z}w_{x}ki + v_{z}w_{y}kj + v_{z}w_{z}k^2}\]

Then, by applying the ideas in (1) and (4), we simplify to this: \[\mathtt{(v_{y}w_{z} – v_{z}w_{y})i + (-v_{x}w_{z} – v_{z}w_{x})j + (v_{x}w_{y} – v_{y}w_{x})k}\]

And that’s our cross product vector that we saw before. The cross product of the vectors shown in the image above would be the vector (0, –2, 8).

The Cross Product

The cross product of two vectors is another vector (whereas the dot product was just another number—a scalar). The cross product vector is perpendicular to both of the factor vectors. Typically, books will say that we need 3d vectors (vectors with 3 components) to talk about the cross product, which is true, sort of, but we can give 3d vectors a third component of zero to see how the cross product works with 2d-ish vectors, like below.

At the right, we show the vector (1, 3, 0), the vector (–2, 0, 0), and the cross product of those two vectors (in that order), which is the cross product vector (0, 0, 6).

Since we’re calling it a product, we’ll want to know how we built that product. So, let’s talk about that.

Deconstructing the Cross Product

The cross product vector is built using three determinants, as shown below.

For the x-component of the cross product vector, we deconstruct the factor vectors into 2d vectors made up of the y- and z-components. Then we find the determinant of those two 2d vectors (the area of the parallelogram they form, if any). We do the same for each of the other components of the cross product vector—if we’re working on the y-component of the cross product vector, then we create two 2d vectors from the x- and z-components of the factor vectors and find their parallelogram area, or determinant. And the same for the third component of the cross product vector. (Notice, though, that we reverse the sign of the second component of the cross product vector. It’s not evident here, because it’s zero.)

We’ll look more into the intuition behind this later. It is not immediately obvious why three simple area calculations (the determinants) should be able to deliver a vector that is exactly perpendicular to the two factor vectors (which is an indication that we don’t know everything there is to know about the seemingly dirt-simple concept of area!). But the cross product has a lot of fascinating connections to and uses in physics and engineering—and computer graphics.

I’ll leave you with this exercise to determine the cross product, or a vector perpendicular to this little ramp. The blue vector is (0, 4, 1), and the red vector is
(–2, 0, 0).


Vectors and Complex Numbers

Vectors share a lot of characteristics with complex numbers. They are both multi-dimensional objects, so to speak. Position vectors with 2 components \(\mathtt{(x_1, x_2)}\) behave in much the same way geometrically as complex numbers \(\mathtt{a + bi}\). At the right, you can see that Geogebra displays the position vectors as arrows and the complex numbers as points. In some sense, though, we could use both the vector and the complex number to refer to the same object if we wanted.

You’ll have no problem finding out about how to multiply two complex numbers, though a similar product result for multiplying 2 vectors seems to be hard to come by. For complex numbers, we just use the Distributive Property: \[\mathtt{(a + bi)(c + di) = ac + adi + bci + bdi^2 = ac – bd + (ad + bc)i}\] In fact, we are told that we can think of multiplying complex numbers as rotating points on the complex plane. Since \(\mathtt{0 + i}\) is at a 90° angle to the x-axis, multiplying \(\mathtt{3 + 2i}\) by \(\mathtt{0 + i}\) will rotate the point \(\mathtt{3 + 2i}\) ninety degrees about the origin: \[\mathtt{(3 + 2i)(0 + 1i) = (3)(0) + (3)(1)i + (2)(0)i + (2)(1)i^2 = -2 + 3i}\]

We’ll get the same result after changing the order of the factors too, of course, since complex multiplication is commutative, but now we have to say that \(\mathtt{0 + i}\) was not only rotated by β but scaled as well.

By what was it scaled? Well, since the straight vertical vector has a length of 1, it was scaled by the length of the vector represented by the complex number \(\mathtt{3 + 2i}\), or \(\mathtt{\sqrt{13}}\).

Multiplying Vectors in the Same Way

It seems that we can multiply vectors in the same way that you can multiply complex numbers, though I’m hard pressed to find a source which describes this possibility.

That is, we can rotate the position vector (a, b) so many degrees (\(\mathtt{tan^{-1}(\frac{d}{c})}\)) counterclockwise by multiplying by the position vector (c, d) of unit length, like so: \[\begin{bmatrix}\mathtt{a}\\\mathtt{b}\end{bmatrix}\begin{bmatrix}\mathtt{c}\\\mathtt{d}\end{bmatrix} = \begin{bmatrix}\mathtt{ac – bd}\\\mathtt{ad + bc}\end{bmatrix}\]

Want to rotate the vector (5, 2) by 19°? First we determine the unit vector which forms a 19° angle with the x-axis. That’s (cos(19°), sin(19°)). Then multiply as above:

\[\begin{bmatrix}\mathtt{5}\\\mathtt{2}\end{bmatrix}\begin{bmatrix}\mathtt{cos(19^\circ)}\\\mathtt{sin(19^\circ)}\end{bmatrix} = \begin{bmatrix}\mathtt{5cos(19^\circ) – 2sin(19^\circ)}\\\mathtt{5sin(19^\circ) + 2cos(19^\circ)}\end{bmatrix}\]

Seems like a perfectly satisfactory way of multiplying vectors to me. We have some issues with undefined values and generality, etc., but for chopping some things together, multiplying vectors in a crazy way seems easier to think about than hauling out full blown matrices to do the job.

Word Vectors and Dot Products

A really cool thing about vectors is that they are used to represent and compare a lot of different things that don’t, at first glance, appear to be mathematically representable or comparable. And a lot of this power comes from working with vectors that are “bigger” than the 2-component vectors we have looked at thus far.

\(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\end{bmatrix}\) \(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{1}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\\\mathtt{0}\end{bmatrix}\)

For example, we could have a vector with 26 components. Some would say that this is a vector with 26 dimensions, but I don’t see the need to talk about dimensions—for the most part, if we’re talking about 26-component vectors, we’re probably not talking about dimensions in any helpful sense, except to help us look smart.

At the right are two possible 26-component vectors. We can say that the vector on the left represents the word pelican. The vector on the right represents the word clap. Each component of the vectors is a representation of a letter from a to z in the word. So, each vector may not be unique to the word it represents. The one on the left could also be the vector for capelin, a kind of fish, or panicle, which is a loose cluster of flowers.

The words, however, are similar in that the shorter word clap contains all the letters that the longer word pelican contains. We might be able to see this similarity show up if we measure the cosine between the two vectors. The cosine can be had, recall, by determining the dot product of the vectors (multiply each pair of corresponding elements and add all the products) and dividing the result by the product of their lengths (the lengths being, in each case, the square root of component12 + component22 . . .). What we get for the two vectors on the right is: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{4}} \approx 0.816}\]

This is fairly close to 1. The angle measure between the two words would be about 35°. Now let’s compare pelican and plenty. These two words are also fairly similar—there is the same 4-letter overlap between the words—but should yield a smaller cosine because of the divergent letters. Confirm for yourself, but for these two words I get: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{6}} = \frac{2}{3}\quad\quad}\]

And that’s about a 48-degree angle between the words. An even more different word, like sausage (the a and s components have 2s), produces a cosine (with pelican) of about 0.3693, which is about a 68° angle.

So, we see that with vectors we can apply a numeric measurement to the similarity of words, with anagrams having cosines of 1 and words sharing no letters at all as being at right angles to each other (having a dot product and cosine of 0).

Combining Matrix Transformations

Something that stands out in my mind as I have learned more linear algebra recently is how much more sane it feels to do a lot of forward thinking before getting into the backward “solving” thinking—to, for example, create a bunch of linear transformations and strengthen my ability to do stuff with the mathematics before throwing a wrench in the works and having me wonder what would happen if I didn’t know the starting vectors.

So, we’ll continue that forward thinking here by looking at the effect of combining transformations. Or, if we think about a 2 × 2 matrix as representing a linear transformation, then we’ll look at combining matrices.

How about this one, then? This is a transformation in which the (1, 0) basis vector goes to (1, 1 third) and the (0, 1) basis vector goes to (–2, 1). You can see the effect this transformation has on the unshaded triangle (producing the shaded triangle).

Before we combine this with another transformation, notice that the horizontal base of the original triangle, which was parallel to the horizontal basis vector, appears to be, in its transformed form, now parallel to the transformed horizontal basis vector. Let’s test this. \[\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{\frac{1}{3}} & \mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{2}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{-2}\\\mathtt{2\frac{2}{3}}\end{bmatrix} \quad\text{and}\quad\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{\frac{1}{3}} & \mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{4}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{0}\\\mathtt{3\frac{1}{3}}\end{bmatrix}\]

The slope of the originally horizontal but now transformed base is, then, \(\mathtt{\frac{3\frac{1}{3}\, – \,2\frac{2}{3}}{0\,-\,(-2)} = \frac{\frac{2}{3}}{2} = \frac{1}{3}}\), which is the same slope as the transformed horizontal basis vector (1, 1 third).

Transform the Transformation

Okay, so let’s transform the transformation, as shown at the right, under this matrix: \[\begin{bmatrix}\mathtt{-1} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{\frac{1}{2}}\end{bmatrix}\]

Is it possible to multiply the two matrices to get our final (purple) transformation? Here’s how to multiply the two matrices and the result:

You should be able to check that, yes indeed, the last matrix takes the original triangle to the purple triangle. You should also be able to test that reversing the order of the multiplication of the two matrices changes the answer completely, so matrix multiplication is not commutative. Notice also that the determinant is approximately \(\mathtt{-0.8333…}\). This tells us that the area of the new triangle is 5 sixths that of the original. And the negative indicates the reflection the triangle underwent. The determinant of the first matrix is –0.5, and that of the second is 5 thirds. Multiply those together and you get the determinant of the combined transformations matrix.

Inverse of a Scaling Matrix

Well, we should be pretty comfortable moving things around with vectors and matrices. We’re good on some of the forward thinking. We can think of a matrix \(\mathtt{A}\) as a mapping of one vector (or an entire set of vectors) to another vector (or to another set of vectors). Then we can think of \(\mathtt{B}\) as the matrix which undoes the mapping of \(\mathtt{A}\). So, \(\mathtt{B}\) is the inverse of \(\mathtt{A}\).

How do we figure out what \(\mathtt{A}\) and \(\mathtt{B}\) are?

\[\mathtt{A}\color{green}{\begin{bmatrix}\mathtt{\,\,3\,\,} \\\mathtt{\,\,3\,\,} \end{bmatrix}} \,= \color{green}{\begin{bmatrix}\mathtt{-4} \\\mathtt{\,\,\,\,1} \end{bmatrix}}\]
\[\mathtt{B}\color{green}{\begin{bmatrix}\mathtt{-4} \\\mathtt{\,\,\,\,\,1} \end{bmatrix}} = \color{green}{\begin{bmatrix}\mathtt{\,\,3\,\,} \\\mathtt{\,\,3\,\,} \end{bmatrix}}\]

Eyeballing Is a Lost Art in Mathematics Education

It is! We can figure out the matrix \(\mathtt{A}\) without doing any calculations. Break down the movement of the green point into horizontal and vertical components. Horizontally, the green point is reflected across the “y-axis” and then stretched another third of its distance from the y-axis. This corresponds to multiplying the horizontal component of the green point by –1.333…. For the vertical component, the green point starts at 3 and ends at 1, so the vertical component is dilated by a factor of 0.333…. We can see both of these transformations shown in the change in the sizes and directions of the blue and orange basis vectors. So, our transformation matrix \(\mathtt{A}\) is shown below. When we multiply the vector (3, 3) by this transformation matrix, we get the point, or position vector, (–4, 1). \[\begin{bmatrix}\mathtt{-\frac{4}{3}} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{\frac{1}{3}}\end{bmatrix}\begin{bmatrix}\mathtt{3}\\\mathtt{3}\end{bmatrix} = \begin{bmatrix}\mathtt{-4}\\\mathtt{\,\,\,\,1}\end{bmatrix}\]

You can see that \(\mathtt{A}\) is a scaling matrix, which is why it can be eyeballed, more or less. And what is the inverse matrix? We can use similar reasoning and work backward from (–4, 1) to (3, 3). For the horizontal component, reflect across the y-axis and scale down by three fourths. For the vertical component, multiply by 3. So, the inverse matrix, \(\mathtt{B}\), when multiplied to the vector, produces the correct starting vector: \[\begin{bmatrix}\mathtt{-\frac{3}{4}} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{3}\end{bmatrix}\begin{bmatrix}\mathtt{-4}\\\mathtt{\,\,\,\,1}\end{bmatrix} = \begin{bmatrix}\mathtt{3}\\\mathtt{3}\end{bmatrix}\]

You’ll notice that we use the reciprocals of the non-zero scaling numbers in the original matrix to produce the inverse matrix. You can do the calculations with the other points on the animation above to test it out.

Incidentally, we can also eyeball the eigenvectors—those vectors which don’t change direction but are merely scaled as a result of the transformations—and even the eigenvalues (the scale factor of each transformed eigenvector). The vector (1, 0) is an eigenvector, with an eigenvalue of \(\mathtt{-\frac{4}{3}}\) for the original transformation and an eigenvalue of –0.75 for the inverse, and the vector (0, 1) is an eigenvector, with an eigenvalue of \(\mathtt{\frac{1}{3}}\) for the original transformation and an eigenvalue of 3 for the inverse.

Rotations, Reflections, Scalings

I just wanted to pause briefly to showcase how some of the linear transformations we have been looking into can be represented in computerese (or at least one version of computerese). You can click on the pencil icon and then on the matrix_transform.js file in the trinket below and look for the word matrix. Change the numbers in those lines to check the effects on the transformations. You can get some fairly wild stuff.

By the way, trinket is an incredibly beautiful product if you like tinkering with all kinds of code. Grab a free account and share your work!

For this demo, I stuck with simple transformations centered at the origin of a coordinate system (so to speak). As you can imagine, there are much more elaborate things you can do when you combine transformations and move the center point around.

Reflections and Foot of a Point

So, we did rotations with matrices. Now what about reflections? The basic reflections—of the identity matrix, say—aren’t worth mentioning at the moment. The more puzzling reflections—those about a line that is not horizontal or vertical—are worth looking at.

The more complicated way, though, to do this we’ll save for another time. The simpler way involves something called the foot of the point. Back when we were working out the distance of a point to a line, naturally we were thinking about the perpendicular distance of that point from the line. And where that perpendicular distance to the point intersects the line is called the foot of the point.

This point is also the perpendicular bisector of \(\mathtt{\overline{rr’}}\), or the line segment connecting the point \(\mathtt{r}\) with its reflection across the line. So, if we can get the foot of the point we are reflecting, we can get the reflected point.

Determining the Foot of the Point

Let’s start with a different diagram. The line shown here can be represented by the following vector equation: \[\mathtt{p +\, α}\begin{bmatrix}\mathtt{2}\\\mathtt{1}\end{bmatrix}\] What is the ordered pair for point r’, the reflection of point r across the line?

Let’s start by finding the location of q, the foot of the point. Since we know p (it’s [0, 4]), and we know that the line is described by the vector \(\mathtt{(2α, α)}\), what we need to know is the scalar that scales us from p to q. We’ll call that scalar t.

To get at the scalar t, we can equate two cosine equations. The equation on the left shows the cosine of β that we learned when we looked at the dot product. And the equation on the right shows the cosine of β as the simple adjacent over hypotenuse ratio: \[\mathtt{\text{cos(β)} = \frac{(q\,-\,p) \cdot (r\,-\,p)}{|q\,-\,p||r\,-\,p|} \quad\quad\quad\text{cos(β)} = \frac{|t(q\,-\,p)|}{|r\,-\,p|}}\]

When we set the two right-hand expressions equal to each other and solve for t, we get the scalar t. (The difference in points, q – p, is just the vector [2, 1] and r – p is just the vector [5, –1].) \[\mathtt{t = \frac{(q\,-\,p) \cdot (r\,-\,p)}{|q\,-\,p|^2}} \,\,\longrightarrow\,\, \mathtt{t =} \frac{\begin{bmatrix}\mathtt{2}\\\mathtt{1}\end{bmatrix} \cdot \begin{bmatrix}\mathtt{\,\,\,\,5}\\\mathtt{-1}\end{bmatrix}}{5} \,\,\longrightarrow\,\,\mathtt{t = 1.8}\]

Using the equation for the line at the start of this section, we see that we can set \(\mathtt{α}\) equal to t to determine the location of point q. So, point q is at \[\begin{bmatrix}\mathtt{0}\\\mathtt{4}\end{bmatrix} \mathtt{+\,\,\, 1.8}\begin{bmatrix}\mathtt{2}\\\mathtt{1}\end{bmatrix} = \begin{bmatrix}\mathtt{3.6}\\\mathtt{5.8}\end{bmatrix}\]

The Midpoint and the Reflection

Now that we have found the location of point q, we can treat it as the midpoint of \(\mathtt{\overline{rr’}}\), or the line segment connecting the point \(\mathtt{r}\) with its reflection across the line.

This is yet another thing we haven’t covered, but the midpoint between \(\mathtt{r}\) and \(\mathtt{r’}\) is \(\mathtt{q = \frac{1}{2}(r + r’)}\). Thus, the equation for the reflection of r (r’) across the given line, when we have figured out the foot of the point q is \[\mathtt{r’ = 2q\,-\,r}\]

I have to say, this makes reflections seem like a lot of work anyway.

Eigenvalues and Eigenvectors

We can do all kinds of weird scalings with matrices, which we saw first here. For example, stretch the ‘horizontal’ vector (1, 0) to, say, (2, 0) and then stretch and move the ‘vertical’ vector (0, 1) to, say, (–3, 5). Our transformation matrix, then, is

\(\begin{bmatrix}\mathtt{2} & \mathtt{-3}\\\mathtt{0} & \mathtt{\,\,\,\,5}\end{bmatrix}\)

What will this do to a position vector (a point) at, say, (1, 1)? We multiply the matrix and the vector to find out:

\(\begin{bmatrix}\mathtt{2} & \mathtt{-3}\\\mathtt{0} & \mathtt{\,\,\,\,5}\end{bmatrix}\begin{bmatrix}\mathtt{1}\\\mathtt{1}\end{bmatrix} = 1\begin{bmatrix}\mathtt{2}\\\mathtt{0}\end{bmatrix} + 1\begin{bmatrix}\mathtt{-3}\\\mathtt{\,\,\,\,5}\end{bmatrix} = \begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,5}\end{bmatrix}\)

The vector representing point A in this case clearly changed directions as a result of the transformation, in addition to getting stretched. However, a question that doesn’t seem worth asking now but will later is whether there are any vectors that don’t change direction as a result of the transformation—either staying the same or just getting scaled. That is, are there vectors (\(\mathtt{r_1, r_2}\)), such that (using lambda, \(\mathtt{\lambda}\), as a constant to be cool again):

\(\begin{bmatrix}\mathtt{2} & \mathtt{-3}\\\mathtt{0} & \mathtt{\,\,\,\,5}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix} = \mathtt{\lambda}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix}\)?

A good guess would be that any ‘horizontal’ vector would not change direction, since the original (1, 0) was only scaled to (2, 0). Anyway, remembering that the identity matrix represents the do-nothing transformation, we can also write the above equation like this:

\(\begin{bmatrix}\mathtt{2} & \mathtt{-3}\\\mathtt{0} & \mathtt{\,\,\,\,5}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix} = \mathtt{\lambda}\begin{bmatrix}\mathtt{1} & \mathtt{0}\\\mathtt{0} & \mathtt{1}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix} = \begin{bmatrix}\mathtt{\lambda} & \mathtt{0}\\\mathtt{0} & \mathtt{\lambda}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix}\)

And although we haven’t yet talked about the idea that you can combine transformation matrices (add and subtract them), let me just say now that you can do this. So, we can manipulate the sides of the equation above (the far left and far right) and rewrite using the Distributive Property in reverse to get:

\(\left(\begin{bmatrix}\mathtt{2} & \mathtt{-3}\\\mathtt{0} & \mathtt{\,\,\,\,5}\end{bmatrix} – \begin{bmatrix}\mathtt{\lambda} & \mathtt{0}\\\mathtt{0} & \mathtt{\lambda}\end{bmatrix}\right)\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix} = \mathtt{0} \rightarrow \begin{bmatrix}\mathtt{2\,-\,\lambda} & \mathtt{-3}\\\mathtt{0} & \mathtt{5\,-\,\lambda}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix} = \mathtt{0}\)

The vector (\(\mathtt{r_1}, \mathtt{r_2}\)) could, of course, always be the zero vector. But we ignore that solution and assume that it represents some non-zero vector. Given this assumption, the transformation matrix that has the lambdas subtracted from integers above must have a determinant of 0. We haven’t talked about that last point yet either, but it should make some sense even now. If a transformation matrix takes a non-zero vector (a one-dimensional ray, so to speak) to zero, no positive areas will survive. If you take the side of a square and reduce one of its dimensions to zero, it becomes a one-dimensional object with no area.

Getting the Eigenvalues and Eigenvectors

Moving on, we know how to calculate the determinant, and we know that the determinant must be 0. So, \(\mathtt{(2 – \lambda)(5 – \lambda) = 0}\). The solutions here are \(\mathtt{\lambda = 2}\) and \(\mathtt{\lambda = 5}\). These two numbers are the eigenvalues. To get the eigenvectors, plug in each of the eigenvalues into that transformation matrix above and solve for the vector: \[\begin{bmatrix}\mathtt{2\,-\,2} & \mathtt{-3}\\\mathtt{0} & \mathtt{5\,-\,2}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix} = \mathtt{0}\]

We have to kind of fudge a solution to that system of equations, but in the end we wind up with the result that one of the eigenvectors will be any vector of the form \(\mathtt{(c, 0)}\), where c represents any number. This confirms our earlier intuition that one of the vectors that will not change directions will be any ‘horizontal’ vector. The eigenvalue tells us that any vector of this form will be stretched by a factor of 2 in the transformation.

A similar process with the eigenvalue of 5 results in an eigenvector of the form \(\mathtt{(c, -c)}\). Any vector of this form will not change its direction as a result of the transformation, but will be scaled by a factor of 5.

Check out and play with this interactive to watch how the transformation matrix works and to watch how the eigenvectors appear in the transformation. Be sure to check out the video linked at the top too!

Rotations with Matrices

Okay, now let’s move stuff around with linear algebra. We’ll eventually do rotations, reflections, and maybe translations too, while mixing that up with stretchings and skewings and other things that matrices can do for us.

We learned here that a matrix gives us information about two arrows—the x-axis arrow and the y-axis arrow. What we really mean is that a 2 × 2 matrix represents a transformation of 2D space. This transformation is given by 2 column vectors—the 2 columns of the matrix. The identity matrix, as we saw previously, represents the do-nothing transformation:

\[\begin{bmatrix}\mathtt{\color{blue}{1}} & \mathtt{\color{orange}{0}}\\\mathtt{\color{blue}{0}} & \mathtt{\color{orange}{1}}\end{bmatrix} \leftarrow \begin{bmatrix}\mathtt{\color{blue}{1}}\\\mathtt{\color{blue}{0}}\end{bmatrix} \text{and} \begin{bmatrix}\mathtt{\color{orange}{0}}\\\mathtt{\color{orange}{1}}\end{bmatrix}\]


Another way to look at this matrix is that it tells us about the 2D space we’re looking at and how to interpret ANY vector in that space. So, what does the vector (1, 2) mean here? It means take 1 of the (1, 0) vectors and add 2 of the (0, 1) vectors.

\[\begin{bmatrix}\mathtt{1} & \mathtt{0}\\\mathtt{0} & \mathtt{1}\end{bmatrix}\begin{bmatrix}\mathtt{1}\\\mathtt{2}\end{bmatrix} = \mathtt{1}\begin{bmatrix}\mathtt{1}\\\mathtt{0}\end{bmatrix} + \mathtt{2}\begin{bmatrix}\mathtt{0}\\\mathtt{1}\end{bmatrix} = \begin{bmatrix}\mathtt{(1)(1) + (2)(0)}\\\mathtt{(1)(0) + (2)(1)}\end{bmatrix}\]


But what if we reflect the entire coordinate plane across the y-axis? That’s a new system, and it’s a system given by where the blue and orange vectors would be under that reflection:

\[\begin{bmatrix}\mathtt{\color{blue}{-1}} & \mathtt{\color{orange}{0}}\\\mathtt{\color{blue}{\,\,\,\,0}} & \mathtt{\color{orange}{1}}\end{bmatrix} \leftarrow \begin{bmatrix}\mathtt{\color{blue}{-1}}\\\mathtt{\color{blue}{\,\,\,\,0}}\end{bmatrix} \text{and} \begin{bmatrix}\mathtt{\color{orange}{0}}\\\mathtt{\color{orange}{1}}\end{bmatrix}\]

In that new system, we can guess where the vector (1, 2) will end up. It will just be reflected across the y-axis. But matrix-vector multiplication allows us to figure that out by just multiplying the vector and the matrix:

\[\begin{bmatrix}\mathtt{-1} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{1}\end{bmatrix}\begin{bmatrix}\mathtt{1}\\\mathtt{2}\end{bmatrix} = \mathtt{1}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,0}\end{bmatrix} + \mathtt{2}\begin{bmatrix}\mathtt{0}\\\mathtt{1}\end{bmatrix} = \begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,2}\end{bmatrix}\]


This opens up a ton of possibilities for specifying different kinds of transformations. And it makes it pretty straightforward to specify transformations and play with them—just set the two column vectors of your matrix and see what happens! We can rotate and reflect the column vectors and scale them up together or separately.

Rotations

Let’s start with rotations. And we’ll throw in some scaling too, just to make it more interesting. The image shows a coordinate system that has been rotated –135°, by rotating our column vectors from the identity matrix by that degree. The coordinate system has also been dilated by a factor of 0.5. This results in \(\mathtt{\triangle{ABC}}\) rotated –135° and scaled down by a half as shown.

What matrix represents this new rotated and scaled down system? The rotation of the first column vector, (1, 0), can be represented as (\(\mathtt{cos\,θ, sin\,θ}\)). And the second column vector, which is (0, 1) before the rotation, is perpendicular to the first column vector, so we just flip the components and make one of them the opposite of what it originally was:
(\(\mathtt{-sin\,θ, cos\,θ}\)). So, a general rotation matrix looks like the matrix on the left. The rotation matrix for a –135° rotation is on the right: \[\begin{bmatrix}\mathtt{cos \,θ} & \mathtt{-sin\,θ}\\\mathtt{sin\,θ} & \mathtt{\,\,\,\,\,cos\,θ}\end{bmatrix}\quad\quad\begin{bmatrix}\mathtt{-\frac{\sqrt{2}}{2}} & \mathtt{\,\,\,\,\frac{\sqrt{2}}{2}}\\\mathtt{-\frac{\sqrt{2}}{2}} & \mathtt{-\frac{\sqrt{2}}{2}}\end{bmatrix}\]

You can eyeball that the rotation matrix is correct by interpreting the columns of the matrix as the new positions of the horizontal vector and vertical vector, respectively (the new coordinates they are pointing to). A –135° rotation is a clockwise rotation of 90° + 45°.

Now for the scaling, or dilation by a factor of 0.5. This is accomplished by the matrix on the left, which, when multiplied by the rotation matrix on the right, will give us the one combo transformation matrix: \[\begin{bmatrix}\mathtt{\frac{1}{2}} & \mathtt{0}\\\mathtt{0} & \mathtt{\frac{1}{2}}\end{bmatrix}\begin{bmatrix}\mathtt{-\frac{\sqrt{2}}{2}} & \mathtt{\,\,\,\,\frac{\sqrt{2}}{2}}\\\mathtt{-\frac{\sqrt{2}}{2}} & \mathtt{-\frac{\sqrt{2}}{2}}\end{bmatrix} = \begin{bmatrix}\mathtt{-\frac{\sqrt{2}}{4}} & \mathtt{\,\,\,\,\frac{\sqrt{2}}{4}}\\\mathtt{-\frac{\sqrt{2}}{4}} & \mathtt{-\frac{\sqrt{2}}{4}}\end{bmatrix}\]

The result is another 2 × 2 matrix, with two column vectors. The calculations below show how we find those two new column vectors: \[\mathtt{-\frac{\sqrt{2}}{2}}\begin{bmatrix}\mathtt{\frac{1}{2}}\\\mathtt{0}\end{bmatrix} + -\frac{\sqrt{2}}{2}\begin{bmatrix}\mathtt{0}\\\mathtt{\frac{1}{2}}\end{bmatrix} = \begin{bmatrix}\mathtt{-\frac{\sqrt{2}}{4}}\\\mathtt{-\frac{\sqrt{2}}{4}}\end{bmatrix}\quad\quad\mathtt{\frac{\sqrt{2}}{2}}\begin{bmatrix}\mathtt{\frac{1}{2}}\\\mathtt{0}\end{bmatrix} + -\frac{\sqrt{2}}{2}\begin{bmatrix}\mathtt{0}\\\mathtt{\frac{1}{2}}\end{bmatrix} = \begin{bmatrix}\mathtt{\,\,\,\,\frac{\sqrt{2}}{4}}\\\mathtt{-\frac{\sqrt{2}}{4}}\end{bmatrix}\]

Now for the Point of Rotation

We’ve got just one problem left. Our transformation matrix, let’s call it \(\mathtt{A}\), is perfect, but we don’t rotate around the origin. So, we have to do some adding to get our final expression. To rotate, for example, point B around point C, we don’t use point B’s position vector from the origin—we rewrite this vector as though point C were the origin. So, point B has a position vector of B – C = (1, 0) in the point C–centered system. Once we’re done rotating this new position vector for point B, we have to add the position vector for C back to the result. So, we get: \[\mathtt{B’} = \begin{bmatrix}\mathtt{-\frac{\sqrt{2}}{4}} & \mathtt{\,\,\,\,\frac{\sqrt{2}}{4}}\\\mathtt{-\frac{\sqrt{2}}{4}} & \mathtt{-\frac{\sqrt{2}}{4}}\end{bmatrix}\begin{bmatrix}\mathtt{1}\\\mathtt{0}\end{bmatrix} + \begin{bmatrix}\mathtt{2}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{2\,-\,\frac{\sqrt{2}}{4}}\\\mathtt{2\,-\,\frac{\sqrt{2}}{4}}\end{bmatrix}\]

Which gives us a result, for point B’, of approximately (1.65, 1.65). We can do the calculation for point A as well: \[\,\,\,\,\,\mathtt{A’} = \begin{bmatrix}\mathtt{-\frac{\sqrt{2}}{4}} & \mathtt{\,\,\,\,\frac{\sqrt{2}}{4}}\\\mathtt{-\frac{\sqrt{2}}{4}} & \mathtt{-\frac{\sqrt{2}}{4}}\end{bmatrix}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,2}\end{bmatrix} + \begin{bmatrix}\mathtt{2}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{2\,+\,\frac{3\sqrt{2}}{4}}\\\mathtt{2\,-\,\frac{\sqrt{2}}{4}}\end{bmatrix}\]

This puts A’ at about (3.06, 1.65). Looks right! By the way, the determinant is \(\mathtt{\frac{1}{4}}\)—go calculate that for yourself. This is no surprise, of course, since a dilation by a factor of 0.5 will scale areas down by one fourth. The rotation has no effect on the determinant, because rotations do not affect areas.

Our general formula, then, for a rotation through \(\mathtt{θ}\) of some point \(\mathtt{x}\) (as represented by a position vector) about some point \(\mathtt{r}\) (also represented by a position vector) is: \[\mathtt{x’} = \begin{bmatrix}\mathtt{cos\,θ} & \mathtt{-sin\,θ}\\\mathtt{sin\,θ} & \mathtt{\,\,\,\,\,cos\,θ}\end{bmatrix}\begin{bmatrix}\mathtt{x_1\,-\,r_1}\\\mathtt{x_2\,-\,r_2}\end{bmatrix} + \begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix}\]