Algebra students usually learn at some point in their studies that you can dilate a function like \(\mathtt{f(x)=x^2}\) by multiplying it by a constant value. Usually the multiplier is written as \(\mathtt{A}\), so you get \(\mathtt{f(x)=Ax}\), which can be \(\mathtt{f(x)=2x^2}\) or \(\mathtt{f(x)=3x^2}\) or \(\mathtt{f(x)=\frac{1}{2}x^2}\), and so on.

What we focus on at the beginning is how this change affects the graph of the function—and, importantly, how we can consistently describe that change as it applies to any function.

So, for example, the brightest (orange-brown) curve in the image at the right represents \(\mathtt{f(x)=x^2}\), or \(\mathtt{f(x)=1x^2}\). And when we increase the A-value, from 2 to 3, the curve gets narrower. Decreasing the A-value, to \(\mathtt{\frac{1}{2}}\) for example, causes the curve to get wider. The same kind of “squishing” happens with every function type. (Check out this article by Better Explained for a really nice explanation.)

Making New Functions

We can also make higher-degree functions from lower-degree functions using dilations. The only change we make to the process of dilation shown above is that now we multiply each point of a function by a non-constant value. More specifically, we can multiply the y-coordinate of each point by its x-coordinate to get its new y-coordinate.

At the left, we transform the constant function \(\mathtt{f(x)=5}\) into a higher-order function in this way. By multiplying the y-coordinate of each original point by its x-coordinate, we change the function from \(\mathtt{f(x)=5}\) to \(\mathtt{f(x)=(x)(5)}\), or \(\mathtt{f(x)=5x}\). Another multiplication of all the points by x would get us \(\mathtt{f(x)=5x^2}\). In that case, you can see that all the points to the left of the y-axis have to reflect across the x-axis, since each y-coordinate would be a negative times a negative.

Another idea that becomes clearer when working with non-constant dilations in this way is that zeros start to make a little more sense.

Try it with other dilations (say, \(\mathtt{x \cdot (x+3)}\) or even \(\mathtt{x \cdot (x-1)^2)}\)) and pay attention to what happens to those points that wind up getting multiplied by 0.

I came across this case study recently that I managed to like a little. It focuses on an analysis of a Singapore teacher’s practice of making things explicit in his classroom. Specifically, the paper outlines three ways the teacher engages in explicitation (as the authors call it): (1) making ideas in the base materials (i.e., textbook) explicit in the lesson plan, (2) making ideas within the plan of the unit more explicit, and (3) making ideas explicit in the enactment of teaching the unit(s). These parts are shown in the diagram below, which I have redrawn, with minor modifications, from the paper.

The teacher interviewed for this case study, “Teck Kim,” taught math to Year 11 (10th grade) students in the “Normal (Academic)” track, and the work focus of the case study was on a unit the teacher called “Vectors in Two Dimensions.”

Explicit From

The first category of explicitation, Explicit From, involves using base materials such as a textbook as a starting point and adapting these materials to make more explicit what it is the teacher wants students to learn. The paper provides an illustration of some of the textbook content related to explaining column vectors, along with Kim’s adaptation. I have again redrawn below what was provided in the paper. Here I also made minor modifications to the layout of the textbook example and one small change to fix a possible translation error (or typo) in the teacher’s example. The textbook content is on the left, and the teacher’s is on the right (if it wasn’t painfully obvious).

There are many interesting things to notice about the teacher’s adaptation. Most obviously, it is much simpler than the textbook’s explanation. This is due, in part, to the adaptation’s leaving magnitude unexplained during the presentation and instead asking a leading question about it.

The textbook presented the process of calculating the magnitudes of the given vectors, leading to a ‘formula’ of \(\mathtt{\sqrt{x^2+y^2}}\) for column vector (\(\mathtt{x y}\)). In its place, Teck Kim’s notes appeared to compress all these into one question: “How would you calculate the magnitude?” On the surface, it appears that Teck Kim was less explicit than the textbook in the computational process of magnitude. But a careful examination into the pre-module interview reveals that the compression of this section into a question was deliberate . . . He meant to use the question to trigger students’ initial thoughts on the manner—which would then serve to ready their frame of mind when the teacher explains the procedure in class.

So, it is not the case that explanation has been removed—only that the teacher has moved the explication of vector magnitude into the Explicit To section of the process. We can also notice, then, in this Explicit From phase, that the teacher makes use of both dual coding and variation theory in his compression of the to-be-explained material. The text in the teacher’s work is placed directly next to the diagram as labels to describe the meaning of each component of the vector, and the vector that students are to draw varies minimally from the one demonstrated: a change in sign is the only difference, allowing students to see how negative components change the direction of a vector. All much more efficient and effective than the textbook’s try at the same material.

Explicit Within

Intriguingly, Explicit Within is harder to explain than the other two, but is closer to the work I do every day. A quote from the article nicely describes explicitation within the teacher’s own lesson plan as an “inter-unit implicit-to-explicit strategy”:

This inter-unit implicit-to-explicit strategy reveals a level of sophistication in the crafting of instructional materials that we had not previously studied. The common anecdotal portrayal of Singapore mathematics teachers’ use of materials is one of numerous similar routine exercise items for students to repetitively practise the same skill to gain fluency. In the case of Teck Kim’s notes, it was not pure repetitive practice that was in play; rather, students were given the opportunity to revisit similar tasks and representations but with added richness of perspective each time.

We saw a very small example of explicit-within above as well. The plan, following the textbook, would have delayed the introduction of negative components of vectors, but Teck Kim introduces it early, as a variational difference. The idea is not necessarily that students should know it cold from the beginning, but that it serves a useful instructional purpose even before it is consolidated.

Explicit To

Finally, there is Explicit To, which refers to the classroom implementation of explicitation, and which needs no lengthy description. I’ll leave you with a quote again from the paper.

No matter how well the instructional materials were designed, Teck Kim recognised the limitations to the extent in which the notes by itself can help make things explicit to the students. The explicitation strategy must go beyond the contents contained in the notes. In particular, he used the notes as a springboard to connect to further examples and explanations he would provide during in-class instruction. He drew students’ attention to questions spelt out in the notes, created opportunities for students to formulate initial thoughts and used these preparatory moves to link to the explicit content he subsequently covered in class.

Inductive teaching or learning, although it has a special name, happens all the time without our having to pay any attention to technique. It is basically learning through examples. As the authors of the paper we’re discussing here indicate, through inductive learning:

Children . . . learn concepts such as ‘boat’ or ‘fruit’ by being exposed to exemplars of those categories and inducing the commonalities that define the concepts. . . . Such inductive learning is critical in making sense of events, objects, and actions—and, more generally, in structuring and understanding our world.

The paper describes three experiments conducted to further test the benefit of interleaving on inductive learning (“further” because an interleaving effect has been demonstrated in previous studies). Interleaving is one of a handful of powerful learning and practicing strategies mentioned throughout the book Make It Stick: The Science of Successful Learning. In the book, the power of interleaving is highlighted by the following summary of another experiment involving determining volumes:

Two groups of college students were taught how to find the volumes of four obscure geometric solids (wedge, spheroid, spherical cone, and half cone). One group then worked a set of practice problems that were clustered by problem type . . . The other group worked the same practice problems, but the sequence was mixed (interleaved) rather than clustered by type of problem . . . During practice, the students who worked the problems in clusters (that is, massed) averaged 89 percent correct, compared to only 60 percent for those who worked the problems in a mixed sequence. But in the final test a week later, the students who had practiced solving problems clustered by type averaged only 20 percent correct, while the students whose practice was interleaved averaged 63 percent.

The research we look at in this post does not produce such stupendous results, but it is nevertheless an interesting validation of the interleaving effect. Although there are three experiments described, I’ll summarize just the first one.

Discriminative-Contrast Hypothesis

But first, you can try out an experiment like the one reported in the paper. Click start to study pictures of different bird species below. There are 32 pictures, and each one is shown for 4 seconds. After this study period, you will be asked to try to identify 8 birds from pictures that were not shown during the study period, but which belong to one of the species you studied.

Once the study phase is over, click test to start the test and match each picture to a species name. There is no time limit on the test. Simply click next once you have selected each of your answers.

So, it looks like you were in the condition and got correct.

Based on previous research, one would predict that, in general, you would do better in the interleaved condition, where the species are mixed together in the study phase, than you would in the ‘massed,’ or grouped condition, where the pictures are presented in species groups. The question the researchers wanted to home in on in their first experiment was about the mechanism that made interleaved study more effective.

So, their experiment was conducted much like the one above, except with three groups, which all received the interleaved presentation. However, two of the groups were interrupted in their study by trivia questions in different ways. One group—the alternating trivia group—received a trivia question after every picture; the other group—the grouped trivia group—received 8 trivia questions after every group of 8 interleaved pictures. The third group—the contiguous group—received no interruption in their study.

What the researchers discovered is that while the contiguous group performed the best (of course), the grouped trivia group did not perform significantly worse, while the alternating trivia group did perform significantly worse than both the contiguous and grouped trivia groups. This was seen as providing some confirmation for the discriminative-contrast hypothesis:

Interleaved studying might facilitate noticing the differences that separate one category from another. In other words, perhaps interleaving is beneficial because it juxtaposes different categories, which then highlights differences across the categories and supports discrimination learning.

In the grouped trivia condition, participants were still able to take advantage of the interleaving effect because the disruptions (the trivia questions) had less of an effect when grouped in packs of 8. In the alternating trivia condition, however, a trivia question appeared after every picture, frustrating the discrimination mechanism that seems to help make the interleaving effect tick.

Takeaway Goodies (and Questions) for Instruction

The paper makes it clear that interleaving is not a slam dunk for instruction. Massed studying or practice might be more beneficial, for example, when the goal is to understand the similarities among the objects of study rather than the differences. Massed studying may also be preferred when the objects are ‘highly discriminable’ (easy to tell apart).

Yet, many of the misconceptions we deal with in mathematics education in particular can be seen as the result of dealing with objects of ‘low discriminability’ (objects that are hard to tell apart). In many cases, these objects really are hard to tell apart, and in others we simply make them hard through our sequencing. Consider some of the items listed in the NCTM’s wonderful 13 Rules That Expire, which students often misapply:

When multiplying by ten, just add a zero to the end of the number.

You cannot take a bigger number from a smaller number.

Addition and multiplication make numbers bigger.

You always divide the larger number by the smaller number.

In some sense, these are problematic because they are like the sparrows and finches above when presented only in groups—they are harder to stop because we don’t present them in situations that break the rules, or interleave them. Appending a zero to a number to multiply by 10 does work on counting numbers but not on decimals; addition and multiplication do make counting numbers bigger until they don’t always make fractions bigger; and you cannot take a bigger counting number from a smaller one and get a counting number. For that, you need integers.

Notice any similarities above? Can we please talk about how we keep kids trapped for too long in counting number land? I’ve got this marvelous study to show you which might provide some good reasons to interleave different number systems throughout students’ educations. It’s linked above, and below.

Last time, we saw that the cross product is a product of two 3d vectors which delivers a vector perpendicular to those two factor vectors.

The cross product is built using three determinants. To determine the x-component of the cross product from the factor vectors (1, 3, 0) and (–2, 0, 0), you find the determinant of the vectors (3, 0) and (0, 0)—the vectors built from the “not-x” components (y- and z-components) of the factors. Repeat this process for the other two components of the cross product, making sure to reverse the sign of the result for the y-component.

But why does this work? How does the cross product make itself perpendicular to the two factor vectors by just using determinants? Below, we’ll still be using magic, but we get a little closer to making our understanding magic free.

Getting the Result We Want

We can actually start with a result we definitely want from the cross product and go from there. (1) The result we want is that when we determine the cross product of a “pure” x-vector (\(\mathtt{1,0,0}\)) and a “pure” y-vector (\(\mathtt{0,1,0}\)), we should get a “pure” z-vector (\(\mathtt{0,0,1}\)). The same goes for other pairings as well. Thus:

A simpler way to write this is to use \(\mathtt{i}\), \(\mathtt{j}\), and \(\mathtt{k}\) to represent the pure x-, y-, and z-vectors, respectively. So, \(\mathtt{i \otimes j = k}\) and so on.

Another thing we want—and here comes some (more) magic—is for (2) the cross product to be antisymmetric, which means that when we change the order of the factors, the cross product’s sign changes but its value does not. So, we want \(\mathtt{i \otimes j = k}\), but then \(\mathtt{j \otimes i = -k}\). And, as before, the same goes for the other pairings as well: \(\mathtt{j \otimes k = i}\), \(\mathtt{k \otimes j = -i}\), \(\mathtt{k \otimes i = j}\), \(\mathtt{i \otimes k = -j}\). This property allows us to use the cross product in order to get a sense of how two vectors are oriented relative to each other in 3d space.

With those two magic beans in hand (and a third and fourth to come in just a second), we can go back to notice that any vector can be written as a linear combination of \(\mathtt{i}\), \(\mathtt{j}\), and \(\mathtt{k}\). The two vectors at the end of the previous post on this topic, for example, (0, 4, 1) and (–2, 0, 0) can be written as \(\mathtt{4j + k}\) and \(\mathtt{-2i}\), respectively.

The cross product, then, of any two 3d vectors \(\mathtt{v = (v_x,v_y,v_z)}\) and \(\mathtt{w = (w_x,w_y,w_z)}\) can be written as: \[\mathtt{(v_{x}i+v_{y}j+v_{z}k) \otimes (w_{x}i+w_{y}j+w_{z}k)}\]

For the final bits of magic, we (3) assume that the cross product distributes over addition as we would expect it to, and (4) decide that the cross product of a “pure” vector (i, j, or k) with itself is 0. If that all works out, then we get this: \[\mathtt{v_{x}w_{x}i^2 + v_{x}w_{y}ij + v_{x}w_{z}ik + v_{y}w_{x}ji + v_{y}w_{y}j^2 + v_{y}w_{z}jk + v_{z}w_{x}ki + v_{z}w_{y}kj + v_{z}w_{z}k^2}\]

Then, by applying the ideas in (1) and (4), we simplify to this: \[\mathtt{(v_{y}w_{z} – v_{z}w_{y})i + (-v_{x}w_{z} – v_{z}w_{x})j + (v_{x}w_{y} – v_{y}w_{x})k}\]

And that’s our cross product vector that we saw before. The cross product of the vectors shown in the image above would be the vector (0, –2, 8).

The cross product of two vectors is another vector (whereas the dot product was just another number—a scalar). The cross product vector is perpendicular to both of the factor vectors. Typically, books will say that we need 3d vectors (vectors with 3 components) to talk about the cross product, which is true, sort of, but we can give 3d vectors a third component of zero to see how the cross product works with 2d-ish vectors, like below.

At the right, we show the vector (1, 3, 0), the vector (–2, 0, 0), and the cross product of those two vectors (in that order), which is the cross product vector (0, 0, 6).

Since we’re calling it a product, we’ll want to know how we built that product. So, let’s talk about that.

Deconstructing the Cross Product

The cross product vector is built using three determinants, as shown below.

For the x-component of the cross product vector, we deconstruct the factor vectors into 2d vectors made up of the y- and z-components. Then we find the determinant of those two 2d vectors (the area of the parallelogram they form, if any). We do the same for each of the other components of the cross product vector—if we’re working on the y-component of the cross product vector, then we create two 2d vectors from the x- and z-components of the factor vectors and find their parallelogram area, or determinant. And the same for the third component of the cross product vector. (Notice, though, that we reverse the sign of the second component of the cross product vector. It’s not evident here, because it’s zero.)

We’ll look more into the intuition behind this later. It is not immediately obvious why three simple area calculations (the determinants) should be able to deliver a vector that is exactly perpendicular to the two factor vectors (which is an indication that we don’t know everything there is to know about the seemingly dirt-simple concept of area!). But the cross product has a lot of fascinating connections to and uses in physics and engineering—and computer graphics.

I’ll leave you with this exercise to determine the cross product, or a vector perpendicular to this little ramp. The blue vector is (0, 4, 1), and the red vector is (–2, 0, 0).

Vectors share a lot of characteristics with complex numbers. They are both multi-dimensional objects, so to speak. Position vectors with 2 components \(\mathtt{(x_1, x_2)}\) behave in much the same way geometrically as complex numbers \(\mathtt{a + bi}\). At the right, you can see that Geogebra displays the position vectors as arrows and the complex numbers as points. In some sense, though, we could use both the vector and the complex number to refer to the same object if we wanted.

You’ll have no problem finding out about how to multiply two complex numbers, though a similar product result for multiplying 2 vectors seems to be hard to come by. For complex numbers, we just use the Distributive Property: \[\mathtt{(a + bi)(c + di) = ac + adi + bci + bdi^2 = ac – bd + (ad + bc)i}\] In fact, we are told that we can think of multiplying complex numbers as rotating points on the complex plane. Since \(\mathtt{0 + i}\) is at a 90° angle to the x-axis, multiplying \(\mathtt{3 + 2i}\) by \(\mathtt{0 + i}\) will rotate the point \(\mathtt{3 + 2i}\) ninety degrees about the origin: \[\mathtt{(3 + 2i)(0 + 1i) = (3)(0) + (3)(1)i + (2)(0)i + (2)(1)i^2 = -2 + 3i}\]

We’ll get the same result after changing the order of the factors too, of course, since complex multiplication is commutative, but now we have to say that \(\mathtt{0 + i}\) was not only rotated by β but scaled as well.

By what was it scaled? Well, since the straight vertical vector has a length of 1, it was scaled by the length of the vector represented by the complex number \(\mathtt{3 + 2i}\), or \(\mathtt{\sqrt{13}}\).

Multiplying Vectors in the Same Way

It seems that we can multiply vectors in the same way that you can multiply complex numbers, though I’m hard pressed to find a source which describes this possibility.

That is, we can rotate the position vector (a, b) so many degrees (\(\mathtt{tan^{-1}(\frac{d}{c})}\)) counterclockwise by multiplying by the position vector (c, d) of unit length, like so: \[\begin{bmatrix}\mathtt{a}\\\mathtt{b}\end{bmatrix}\begin{bmatrix}\mathtt{c}\\\mathtt{d}\end{bmatrix} = \begin{bmatrix}\mathtt{ac – bd}\\\mathtt{ad + bc}\end{bmatrix}\]

Want to rotate the vector (5, 2) by 19°? First we determine the unit vector which forms a 19° angle with the x-axis. That’s (cos(19°), sin(19°)). Then multiply as above:

Seems like a perfectly satisfactory way of multiplying vectors to me. We have some issues with undefined values and generality, etc., but for chopping some things together, multiplying vectors in a crazy way seems easier to think about than hauling out full blown matrices to do the job.

A really cool thing about vectors is that they are used to represent and compare a lot of different things that don’t, at first glance, appear to be mathematically representable or comparable. And a lot of this power comes from working with vectors that are “bigger” than the 2-component vectors we have looked at thus far.

For example, we could have a vector with 26 components. Some would say that this is a vector with 26 dimensions, but I don’t see the need to talk about dimensions—for the most part, if we’re talking about 26-component vectors, we’re probably not talking about dimensions in any helpful sense, except to help us look smart.

At the right are two possible 26-component vectors. We can say that the vector on the left represents the word pelican. The vector on the right represents the word clap. Each component of the vectors is a representation of a letter from a to z in the word. So, each vector may not be unique to the word it represents. The one on the left could also be the vector for capelin, a kind of fish, or panicle, which is a loose cluster of flowers.

The words, however, are similar in that the shorter word clap contains all the letters that the longer word pelican contains. We might be able to see this similarity show up if we measure the cosine between the two vectors. The cosine can be had, recall, by determining the dot product of the vectors (multiply each pair of corresponding elements and add all the products) and dividing the result by the product of their lengths (the lengths being, in each case, the square root of component_{1}^{2} + component_{2}^{2} . . .). What we get for the two vectors on the right is: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{4}} \approx 0.816}\]

This is fairly close to 1. The angle measure between the two words would be about 35°. Now let’s compare pelican and plenty. These two words are also fairly similar—there is the same 4-letter overlap between the words—but should yield a smaller cosine because of the divergent letters. Confirm for yourself, but for these two words I get: \[\mathtt{\frac{4}{\sqrt{6}\sqrt{6}} = \frac{2}{3}\quad\quad}\]

And that’s about a 48-degree angle between the words. An even more different word, like sausage (the a and s components have 2s), produces a cosine (with pelican) of about 0.3693, which is about a 68° angle.

So, we see that with vectors we can apply a numeric measurement to the similarity of words, with anagrams having cosines of 1 and words sharing no letters at all as being at right angles to each other (having a dot product and cosine of 0).

I‘ve started thinking about Modules 0 for Grade 6. And I’ve written my first sequence for inequalities, which I’ll show below. Although I tried to design the sequence using ideas from variation theory, I found that the specific goal I had for this sequence—writing inequalities of the form x < c and c < x from number line models—did not make it easy to think of a boatload of questions I could ask, each slightly different from the previous one. Plus, I had some slightly more robust instructional goals in mind. Still, I found that it paid off to even just try thinking about variation.

So, I start with the video below, which serves as the first (and only) instructional worked example in the sequence.

I use the Silent Teacher method, wherein I essentially show the worked example twice, the second time with my voice annotating what I’m seeing, doing, and thinking as I write the inequality to represent the two models. In the lesson, I include a brief reminder to students above the video what the inequality symbols mean and what the equals sign means.

My assumptions with regard to this content are that students have seen and used inequality symbols for a long time before they get to Grade 6, though primarily with positive numbers and not variables or negatives. So, this represents a kind of “start-again” topic, which is one reason why I include the block models along with the number line model. It is a compromise between extending the concept and reviewing it: so I do a bit of both.

Another reason I include the block models is because they make a solid, albeit abstract, connection to the use of inequalities with algebraic expressions to express relative values in situations where we don’t know one of the values. We know that q above represents a number greater than x, but we can’t mark q on the number line because we don’t know its exact value. This is what the thinking question below the video is hopefully getting at. It’s numbered in case an instructor wants to assign the sequence to a student.

The Sequence

After the video, there is a sequence of a mere 8 questions. The first of these, shown at the right, is not a typical “Your Turn” type of question, where the student tries out a technique on a very similar problem. Here we unpack the other ways to express the inequalities shown in the video—it’s important to constantly make the point that there is almost always a few different ways of looking at mathematical relationships—and we include the equation, in part because research tells us that comparing the equals sign with other relational operators reinforces the correct relational view of the equals sign.

Next up is a more typical Your Turn, with a block model and number line model both closely mirroring the models shown in the video.

Students can write n or 1 to represent the single block (or the point labeled with both n and 1 on the number line). Doing so helpfully reinforces a slightly better meaning of “variable,” which is a letter that represents any quantity, known or unknown.

And here, for the first time (in a thinking question), I ask students to relate the number line model to the blocks model.

The next question in the sequence is an example of some minimal variation. What’s different here is that the m and n block towers switch sides in the illustration, and the inequality model on the number line shifts to the right. Everything else stays the way it was.

We could continue in this way, adding or subtracting blocks, switching sides, etc., but this kind of model has limitations that don’t allow for examining more of the variation space. But we can hint at the fact that adding the same number to both sides of an inequality doesn’t change the direction of the inequality.

And that’s what we do in the next exercise in the sequence. Here also, the known number is moved along the number line. The thinking question I ask here is:

Would adding 1 block to each tower change the direction of the inequality? Why or why not?

I phrase the question as a hypothetical because, strictly speaking, it’s not evident from the diagram that I added exactly 1 block to tower m.

And Now for a Big Change

Now we see how this isn’t really a sequence of minimal variation. One reason for the change-up is that I realized too late that the model I started with could only show the greater quantity as the unknown quantity. I thought about changing to a different model, one which could show the full range of variation, but I couldn’t think of a situation that worked.

This example, in which the larger quantity (the greater height) is the known, was too good to pass up. And it gave me a context to foreshadow subtracting both sides of an inequality by the same number, which is what (kind of) happens in the next exercise.

Here, though—and again—it was not plausible to hit this balance of operations idea directly (plus, it’s outside of the scope anyway). We only hint at it. But we still ask the thinking question—again, as a hypothetical—about whether subtracting the same value from both quantities changes the direction of the inequality.

The height examples, and perhaps all of the items in the sequence, lie somewhere between minimal variation and maximal variation. At some point while designing it, I had to stop searching for more perfect examples and just run with it.

The final two items in the sequence present two more (more or less abstract) situations where inequalities seem to fit.

The first, shown at the right, is the “swarm,” which contains too many items to count, though we can know for sure that the number is a greater value than 6. Here too is an example situation that better fits with the idea of a larger unknown that couldn’t be handled by the earlier block models.

In this example, I’ve switched up the labels on the number line for a small taste of minimal variation within all the macro variation going on.

Finally, there’s temperature and a quick example showing negative numbers.

What we get at here, also, is that we haven’t left the universe of comparing numbers just because we’re introducing a little algebra. Plus, I’ve eliminated the number line model here, just for a little flavor—and it’s too close in appearance to the thermometer levels. I didn’t want that confusion creeping in.

Something that stands out in my mind as I have learned more linear algebra recently is how much more sane it feels to do a lot of forward thinking before getting into the backward “solving” thinking—to, for example, create a bunch of linear transformations and strengthen my ability to do stuff with the mathematics before throwing a wrench in the works and having me wonder what would happen if I didn’t know the starting vectors.

So, we’ll continue that forward thinking here by looking at the effect of combining transformations. Or, if we think about a 2 × 2 matrix as representing a linear transformation, then we’ll look at combining matrices.

How about this one, then? This is a transformation in which the (1, 0) basis vector goes to (1, 1 third) and the (0, 1) basis vector goes to (–2, 1). You can see the effect this transformation has on the unshaded triangle (producing the shaded triangle).

Before we combine this with another transformation, notice that the horizontal base of the original triangle, which was parallel to the horizontal basis vector, appears to be, in its transformed form, now parallel to the transformed horizontal basis vector. Let’s test this. \[\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{\frac{1}{3}} & \mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{2}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{-2}\\\mathtt{2\frac{2}{3}}\end{bmatrix} \quad\text{and}\quad\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{\frac{1}{3}} & \mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{4}\\\mathtt{2}\end{bmatrix} = \begin{bmatrix}\mathtt{0}\\\mathtt{3\frac{1}{3}}\end{bmatrix}\]

The slope of the originally horizontal but now transformed base is, then, \(\mathtt{\frac{3\frac{1}{3}\, – \,2\frac{2}{3}}{0\,-\,(-2)} = \frac{\frac{2}{3}}{2} = \frac{1}{3}}\), which is the same slope as the transformed horizontal basis vector (1, 1 third).

Transform the Transformation

Okay, so let’s transform the transformation, as shown at the right, under this matrix: \[\begin{bmatrix}\mathtt{-1} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{\frac{1}{2}}\end{bmatrix}\]

Is it possible to multiply the two matrices to get our final (purple) transformation? Here’s how to multiply the two matrices and the result:

You should be able to check that, yes indeed, the last matrix takes the original triangle to the purple triangle. You should also be able to test that reversing the order of the multiplication of the two matrices changes the answer completely, so matrix multiplication is not commutative. Notice also that the determinant is approximately \(\mathtt{-0.8333…}\). This tells us that the area of the new triangle is 5 sixths that of the original. And the negative indicates the reflection the triangle underwent. The determinant of the first matrix is –0.5, and that of the second is 5 thirds. Multiply those together and you get the determinant of the combined transformations matrix.

Well, we should be pretty comfortable moving things around with vectors and matrices. We’re good on some of the forward thinking. We can think of a matrix \(\mathtt{A}\) as a mapping of one vector (or an entire set of vectors) to another vector (or to another set of vectors). Then we can think of \(\mathtt{B}\) as the matrix which undoes the mapping of \(\mathtt{A}\). So, \(\mathtt{B}\) is the inverse of \(\mathtt{A}\).

How do we figure out what \(\mathtt{A}\) and \(\mathtt{B}\) are?

It is! We can figure out the matrix \(\mathtt{A}\) without doing any calculations. Break down the movement of the green point into horizontal and vertical components. Horizontally, the green point is reflected across the “y-axis” and then stretched another third of its distance from the y-axis. This corresponds to multiplying the horizontal component of the green point by –1.333…. For the vertical component, the green point starts at 3 and ends at 1, so the vertical component is dilated by a factor of 0.333…. We can see both of these transformations shown in the change in the sizes and directions of the blue and orange basis vectors. So, our transformation matrix \(\mathtt{A}\) is shown below. When we multiply the vector (3, 3) by this transformation matrix, we get the point, or position vector, (–4, 1). \[\begin{bmatrix}\mathtt{-\frac{4}{3}} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{\frac{1}{3}}\end{bmatrix}\begin{bmatrix}\mathtt{3}\\\mathtt{3}\end{bmatrix} = \begin{bmatrix}\mathtt{-4}\\\mathtt{\,\,\,\,1}\end{bmatrix}\]

You can see that \(\mathtt{A}\) is a scaling matrix, which is why it can be eyeballed, more or less. And what is the inverse matrix? We can use similar reasoning and work backward from (–4, 1) to (3, 3). For the horizontal component, reflect across the y-axis and scale down by three fourths. For the vertical component, multiply by 3. So, the inverse matrix, \(\mathtt{B}\), when multiplied to the vector, produces the correct starting vector: \[\begin{bmatrix}\mathtt{-\frac{3}{4}} & \mathtt{0}\\\mathtt{\,\,\,\,0} & \mathtt{3}\end{bmatrix}\begin{bmatrix}\mathtt{-4}\\\mathtt{\,\,\,\,1}\end{bmatrix} = \begin{bmatrix}\mathtt{3}\\\mathtt{3}\end{bmatrix}\]

You’ll notice that we use the reciprocals of the non-zero scaling numbers in the original matrix to produce the inverse matrix. You can do the calculations with the other points on the animation above to test it out.

Incidentally, we can also eyeball the eigenvectors—those vectors which don’t change direction but are merely scaled as a result of the transformations—and even the eigenvalues (the scale factor of each transformed eigenvector). The vector (1, 0) is an eigenvector, with an eigenvalue of \(\mathtt{-\frac{4}{3}}\) for the original transformation and an eigenvalue of –0.75 for the inverse, and the vector (0, 1) is an eigenvector, with an eigenvalue of \(\mathtt{\frac{1}{3}}\) for the original transformation and an eigenvalue of 3 for the inverse.