The Wrong Prior Knowledge

One of the biggest sins one can commit inside instruction of any kind—curriculum or live teaching—is to simply instantiate a concept or skill seemingly out of thin air, like magic. “Here’s how to solve a proportion using the means-and-extremes method,” for example—no functional connection to prior learning, no motivating the need for the knowledge, just poof. (Unfortunately, it is this degenerate caricature of telling that is wrongly held up as the only form of telling possible in teaching.)

The force of this moral commitment against magic in instruction is not absolute in anyone’s mind. But it is also strong enough, I think, to make us desperate to include, for a given concept, any prior-knowledge connection we can lay our hands on, some kind of motivating idea. And not only is it possible for us to choose poorly in this regard, it is certainly possible that such desperation makes bad choices more likely.

This is what I think has happened with the concept of multiplication. Desperate to find some way of making multiplication make sense to students, we made the mistake of connecting it to students’ prior knowledge of addition, and, by doing so, have created a situation where generations of students confuse the two operations at multiple points in their educational journeys. Even with less “additive” connections like area models or equal groups, most students (and then adults) still maintain a notion of multiplication as a kind of fast addition. Here’s how I think the operations are schematized in most students’ minds:

Addition and subtraction are conceptualized correctly, not only in themselves as appending and taking away quantities, but also in relation to each other: taking away is, more or less intuitively, the inverse of appending. But because multiplication is just turbo-charged addition, the only good candidate for its inverse would be turbo-charged subtraction. Yet, we have decided, collectively, that a different connection underlies division: equal shares. This leaves us in a state where we sort of know, intellectually, that multiplication and division are inverses; we just don’t really think of them or treat them that way. This schema also leaves us with the notion that division doesn’t increase or decrease a quantity—it just spreads it out. And that multiplication only increases a quantity. It’s a mess.

Multiplication as Cloning

Students’ schemas around multiplication certainly don’t have to match the “official” organization of concepts in mathematics, but it is somewhat concerning that we have drifted so far from the official version. Ideally, the schema should have two rows, one for addition and its inverse, subtraction, and one for multiplication and its inverse, division. Connections can be drawn among all the operations, of course, but these two rows should be distinct. Multiplication and division do different things to numbers than addition and subtraction.

The best example of this distinction I have come across, for adults, is to imagine a number line and then what each operation does to all the numbers on that line. Adding 2 to every number on the number line translates all the numbers 2 units to the right. Subtraction by 2 moves them all back. Multiplication, on the other hand, does something very different. With 0 fixed, it stretches (dilates) all the rest of the numbers into place. Division compresses them back.

All of that is pretty abstract for students. So, what if we started like this instead, with the notion of multiplication as a kind of cloning device:

We’re not simply appending a quantity to a starting quantity, we’re cloning what we started with (which is why we can’t go anywhere when we start with 0). And we also have a code that we can consistently decipher: for integers, \(\mathtt{a\times b}\) means that every 1 unit in \(\mathtt{a}\) expands to \(\mathtt{b}\). We can do this in different ways, because we can change what we mean by “1 unit.” The quantity 3 on the left expands to 6, using 1 square consistently as 1 unit. On the right, we essentially have \(\mathtt{\left(2+1\right)\times 2}\), using 2 as the unit in 1 case and 1 as the unit in the other. The code simply tells us that every unit expands to \(\mathtt{b}\) units. With the right perspective on multiplication, we pick up the Distributive Property for free (and axiomatically free it is in mathematics).

The best part of this code is that it doesn’t have to change when we reach fractions (yet, the additive model has to be thrown away at this point; good luck, students!).

Even for integers written as fractions, what \(\mathtt{4\times 3}\) means is that every 1 unit (the denominator of the multiplier) in 4 expands to 3 units. Given this interpretation, I’m hoping you can feel, in this next image, how satisfying it is to see not only how fraction multiplication and division use the same code, but also see division in its rightful place as the inverse of multiplication:

Despite its potential benefits, this approach will no doubt face resistance from some educators due to its unconventional nature. But in my mind, this is how curriculum improves: here’s an idea, let’s talk with the experts. Experts (of classroom teaching): the realities of the classroom mean that I can only take the idea this far. Let’s work out the solution! (More examples welcome!)

Some TikTok Math

@rebrokeraaron#duet with @dustin_wheeler

I came across a nice TikTok vid recently that definitely screamed out for a mathematical explanation. It was interesting to watch because I had written an activity not too long ago which connected the two mathematical ideas relevant to this video—I just didn’t have this nice context to apply it to.

Anyway, you can see it at the right there.

The two connected ideas at work here are the Triangle Proportionality Theorem and the Midpoint Formula. We don’t actually need both of these concepts to explain the video. It’s just more pleasing to apply them both together.

The Triangle Proportionality Theorem (see an interactive proof here) tells us that if we draw a line segment inside a triangle that connects two sides and is parallel to the third side, then the segment will divide the two connected sides proportionally.

So, \(\mathtt{\overline{DE}}\) above connects sides \(\mathtt{\overline{AB}}\) and \(\mathtt{\overline{AC}}\), is parallel to side \(\mathtt{\overline{BC}}\), and thus divides both \(\mathtt{\overline{AB}}\) and \(\mathtt{\overline{AC}}\) proportionally, such that \(\mathtt{a:b=c:d}\).

And of course the Midpoint Formula, which we’ll suss out in a second, is \(\mathtt{\left(\frac{x_{1}+x_{2}}{2}, \frac{y_{1}+y_{2}}{2}\right)}\).

Okay, Let’s Explain This

What’s happening in the video is that the measurer is creating a right triangle. They first make a horizontal measurement—which they don’t know the midpoint of. That’s the horizontal leg of the right triangle. Then they take the tape measure up the board diagonally. That’s the hypotenuse of the right triangle. We do know that midpoint (because it’s a simpler whole number).

We can drop a line segment from the midpoint that we know straight down vertically, such that the segment is parallel to the vertical side of the triangle (the vertical side of the board).

Now we have set up everything we need for the Triangle Proportionality Theorem. The vertical segment connects two sides—the hypotenuse and the horizontal side—and is parallel to the third (vertical) side. Thus, the vertical segment divides the hypotenuse and horizontal segment into equal ratios.

But since we know that the segment is at the midpoint of the hypotenuse, we know that it divides the hypotenuse into the ratio \(\mathtt{1:1}\). That is, it divides the hypotenuse into two congruent segments. Therefore, it must also divide the horizontal line segment into two congruent segments!

Okay, so that pretty much nails it (ha!). Why do we need the Midpoint Formula?

Now for the Midpoint Formula

Well, we don’t really need the Midpoint Formula. It’s just interesting that the Triangle Proportionality Theorem explains why the Midpoint Formula works. Working through the explanation above, it may have occurred to you that what the Triangle Proportionality Theorem says, indirectly, about ANY right triangle, is that—if you can wrangle the legs to be horizontal and vertical—the midpoint of the hypotenuse is always directly above or below the midpoint of the horizontal side, and the midpoint of the hypotenuse is always directly to the left or right of the midpoint of the vertical side. That’s what the Midpoint Formula basically says in symbols.

Imitation and the Ratchet Effect

a bunch of gears

Comparative psychologist Michael Tomasello, in his 1999 book The Cultural Origins of Human Cognition, popularized the now widely adopted metaphor of the “ratchet effect” in human cultural evolution:

Basically none of the most complex human artifacts or social practices—including tool industries, symbolic communication, and social institutions—were invented once and for all at a single moment by any one individual or group of individuals. Rather, what happened was that some individual or group of individuals first invented a primitive version of the artifact or practice, and then some later user or users made a modification, an “improvement,” that others then adopted perhaps without change for many generations, at which point some other individual or group of individuals made another modification, which was then learned and used by others, and so on over historical time in what has sometimes been dubbed “the ratchet effect” (Tomasello, Kruger, and Ratner, 1993). The process of cumulative cultural evolution requires not only creative invention but also, and just as importantly, faithful social transmission that can work as a ratchet to prevent slippage backward—so that the newly invented artifact or practice preserves its new and improved form at least somewhat faithfully until a further modification or improvement comes along.

But the ratchet effect presents us with a bit of a puzzle for children’s learning—or how we typically think about that learning. One can imagine, for example, a first-generation technology for dividing resources into fair shares where rocks are used as symbols and moved around into equal groups. Future generations learn this technique and then gradually innovate on it by—again, for example—recognizing that one can divide 18 into fair shares by first dividing the ten items into equal groups and then dividing the 8 into the same number of equal groups, rather than taking and moving around all 18 at once.

Even at this stage the challenge of explaining to a new generation of children why one can do this should seem more daunting than explaining the first-generation method. But now throw on top all of the cumulative innovations we can imagine here for analog division across thousands of generations: rocks are eventually replaced by written symbols, contexts where the division process applies proliferate and become more abstract, and a technology is eventually developed (long division) that allows a user to mechanistically divide any number into just about any other without needing to think about the context at all.

All of these developments are positive (or neutral) cultural innovations. But the learner in the one-thousandth generation is not neurologically all that different from the child in the first generation watching rocks being moved around. Yet, the more modern student is asked to learn a much more causally opaque process—one that has been refined over millennia, which the child was obviously not there to witness, and one whose moving parts are not intuitively related to a goal. It is much simpler for a child just arriving on the scene to intuit the goal of a tribal elder who is separating 105 beads into 3 equal groups than it is for a very similar and similarly-situated modern child to understand the goal of the seemingly random number scrawling associated with long division.

So, the puzzle is this: If the process of cumulative cultural evolution has continued to ratchet over time, how has it been maintained over tens of thousands of years when each new generation starts out marginally further from the goal of understanding any given beneficial technology? For the example of division above, we can point to instructional techniques that actually do start with separating rocks (or counters) into equal groups and building up to the more abstract long division algorithm. But this suite of techniques is already a relic. Digital computing has thoroughly taken over this work, and it’s probably safe to say that very few people (adults and children) really know how it works.

If long division is not a salient example for you, you can relate to the feeling of being an ignorant stranger to your own species’ cultural achievements by asking yourself how much you really understand about how toilets work, how cars work, and on and on. Or consider one of the many gruesome examples—described by Joseph Henrich in his book The Secret of Our Success—of what happens when otherwise intelligent and strong people find themselves outside the protections of relevant cultural understandings:

In June 1845 the HMS Erebus and the HMS Terror, both under the command of Sir John Franklin, sailed away from the British Isles in search of the fabled Northwest Passage, a sea channel that could energize trade by connecting western Europe to East Asia. This was the Apollo mission of the mid-nineteenth century, as the British raced the Russians for control of the Canadian Arctic and to complete a global map of terrestrial magnetism. The British admiralty outfitted Franklin, an experienced naval officer who had faced Arctic challenges before, with two field-tested, reinforced ice-breaking ships equipped with state-of-the-art steam engines, retractable screw propellers, and detachable rudders. With cork insulation, coal-fired internal heating, desalinators, five years of provisions, including tens of thousands of cans of food (canning was a new technology), and a twelve-hundred-volume library, these ships were carefully prepared to explore the icy north and endure long Arctic winters.

As expected, the expedition’s first season of exploration ended when the sea ice inevitably locked them in for the winter around Devon and Beechney Islands, 600 miles north of the Arctic Circle. After a successful ten-month stay, the seas opened and the expedition moved south to explore the seaways near King William Island, where in September they again found themselves locked in by ice. This time, however, as the next summer approached, it soon became clear that the ice was not retreating and that they’d remain imprisoned for another year. Franklin promptly died, leaving his crew to face the coming year in the pack ice with dwindling supplies of food and coal (heat). In April 1848, after nineteen months on the ice, the second-in-command, an experienced Arctic officer named Crozier, ordered the 105 men to abandon ship and set up camp on King William Island.

The details of what happened next are not completely known, but what is clear is that everyone gradually died. . . .

King William Island lies at the heart of Netsilik territory, an Inuit population that spent its winters out on the pack ice and their summers on the island, just like Franklin’s men. In the winter, they lived in snow houses and hunted seals using harpoons. In the summer, they lived in tents, hunted caribou, musk ox, and birds using complex compound bows and kayaks, and speared salmon using leisters. The Netsilik name for the main harbor on King William Island is Uqsuqtuuq, which means “lots of fat” (seal fat). For the Netsilik, this island is rich in resources for food, clothing, shelter, and tool-making (e.g., drift wood).

It’s Not the Innovation

What can explain the rapid progress in cumulative cultural achievements in our species (and no others, to the same extent) when each new generation must in many ways “catch up” to the ratcheted accomplishments of the previous ones? Let’s start with what the answer cannot possibly be. Tomasello again:

Perhaps surprisingly, for many animal species it is not the creative component, but rather the stabilizing ratchet component, that is the difficult feat. Thus, many nonhuman primate individuals regularly produce intelligent behavioral innovations and novelties, but then their groupmates do not engage in the kinds of social learning that would enable, over time, the cultural ratchet to do its work (Kummer and Goodall, 1985).

Similarly, Franklin’s men did not turn to cannibalism and eventually succumb to the elements because they lacked creativity or innovation or could not think outside the box.

The reason Franklin’s men could not survive is that humans don’t adapt to novel environments the way other animals do or by using our individual intelligence. None of the 105 big brains figured out how to use driftwood, which was available on King William Island’s west coast where they camped, to make the recurve composite bows, which the Inuit used when stalking caribou. They further lacked the vast body of cultural know-how about building snow houses, creating fresh water, hunting seals, making kayaks, spearing salmon and tailoring cold-weather clothing.

Innovation, by itself, gets us nowhere. The notion that our culture progresses because our species is endowed with big innovative brains (and we just need to unlock that potential) is nonsense in light of what we know about cultural evolution. In reality, what best explains the ratchet effect is a lot of imitation (solving the more difficult problem of storing and transmitting cultural knowledge) and a little bit of innovation (solving the problem of occasionally generating novel ideas, spread by imitation).

It’s the Imitation

The Inuit that can survive and thrive in an environment that killed all of Franklin’s men do so because, like Franklin’s men and like us, they are good imitators within their own cultures (and not very good innovators on average). All of us imitate valuable cultural knowledge without completely understanding what we’re doing. We need this skill precisely because of the ratchet effect. It is simply not possible, in general, to personally innovate solutions that can rival the effectiveness of those built up over thousands of generations, and it is similarly impossible to conceptually understand everything in the world before we need to use it. Thus, we imitate first and understand later. Indeed, “understandings” (or, answers to “why” questions) are imitated just as readily as answers to “how” questions, and can be equally causally opaque. If asked by a child why we don’t fly off into space when we jump, your answer would involve copying an understanding—an understanding not of your own devising—about gravity. And you don’t know what gravity is because no one does.

Lest you think (despite the story about Sir John Franklin) that causal opacity and rapid ratcheting is just a puzzle for tech-rich, conventionally educated, Western cultures in developed countries, here’s Henrich again:

Let’s briefly consider just a few of the Inuit cultural adaptations that you would need to figure out to survive on King William Island. To hunt seals, you first have to find their breathing holes in the ice. It’s important that the area around the hole be snow covered—otherwise the seals will hear you and vanish. You then open the hole, smell it to verify that it’s still in use (what do seals smell like?), and then assess the shape of the hole using a special curved piece of caribou antler. The hole is then covered with snow, save for a small gap at the top that is capped with a down indicator. If the seal enters the hole, the indicator moves, and you must blindly plunge your harpoon into the hole using all your weight. Your harpoon should be about 1.5 meters (5 ft) long, with a detachable tip that is tethered with a heavy braid of sinew line. You can get the antler from the previously noted caribou, which you brought down with your driftwood bow. The rear spike of the harpoon is made of extra-hard polar bear bone (yes, you also need to know how to kill polar bears; best to catch them napping in their dens). Once you’ve plunged your harpoon’s head into the seal, you’re then in a wrestling match as you reel him in, onto the ice, where you can finish him off with the aforementioned bear-bone spike.

Another reason to believe that imitation is (most of) the secret sauce for cultural evolution is that imitation shows up very early and robustly in development. In fact, children engage in what is called overimitation—imitating actions performed by a model even when those actions are obviously causally irrelevant to achieving the model’s goal. Other primates don’t do this. Legare and Nielsen explain this counterintuitive finding from research:

Why faithfully copy all of the actions of a demonstrator, even those that are obviously irrelevant? Given the potentially overwhelming number of objects, tools, and artifacts children must learn to use, it is useful to replicate the entire suite of actions used by an expert when first learning how to do something. Some propose that overimitation is an adaptive human strategy facilitating more rapid social learning of instrumental skills than would be possible if copying required a full representation of the causal structure of an event.


There are many takeaways and elaborations that come to mind in light of the above—all of which I’m still sussing out. One important takeaway worth mentioning, I think, is that, because humans have had culture for possibly hundreds of thousands of years, it is not out of the question that we have undergone some psychological adaptations that allow us to, most importantly, store and transmit and, less importantly, innovate on, valuable prefabricated solutions in our cultural groups.

Is it possible that the ratchet effect can help explain a foundational concept in Cognitive Load Theory: that our working memories (our innovation engines) are severely limited while our long-term memories (our imitation engines) are functionally infinite?

The other takeaway comes from Paul Harris, in the last paragraph of his book Trusting What You’re Told: How Children Learn from Others, which follows many of the same themes elaborated above, specifically from the child development angle. It is a takeaway worth taking away, especially for those in education who believe, without question or doubt, that children should be thought of as “little scientists”:

The classic method in social anthropology is not the scientific method in the way that experimental scientists conceive of it. It includes no experiments or control groups. Instead, when anthropologists want to understand a new culture, they immerse themselves in the language, learn from participant observation, and rely on trusted informants. Of course, this method has an ancient pedigree. Human children have successfully used it for millennia across innumerable cultures. Indeed, judging by their methods and their talents, we would do well to think of children not as scientists, but as anthropologists.

GCF and LCM Triangles

Go grab some dot paper or grid paper—or just make some dots in a square grid on a blank piece of paper. Let’s start with a 4 × 4 grid of dots, like so.

A 4 by 4 array of dots.

Now, start at the top left corner, draw a vertical line down to the bottom of the grid, and count each dot that your pen enters—which just means that you won’t count the first dot, since your pen leaves that dot but does not enter it. Then, draw a horizontal line to the right, starting over with your counting. Again, count each dot that your pen enters. Count just 2 dots as you draw to the right.

4 by 4 array of dots with an L-shape 3 high and 2 wide

Finally, draw a straight line (a hypotenuse) back to your starting point. Here again, count the number of dots you enter.

4 by 4 array of dots with a right triangle 3 high and 2 wide

One example is not, of course, enough to convince you that the number of dots your pen enters when drawing the hypotenuse is the greatest common factor (GCF) of the number of counted vertical dots and the number of counted horizontal dots. So, here are a few more examples with just a 4 × 4 grid.

No doubt there are tons of people out there for whom this display is completely unsurprising. But it surprised me. The GCF of two numbers is an object that seems as though it should be rather hidden—a value that may appear when we crack two numbers open and do some calculations with them, not something that just pops up when we draw lines on dot paper. We use prime factorization to suss out GCF, after all, and that is by no means an intuitive process.


There are some very nice mathematical connections here. The first is to the coordinate plane, or perhaps more simply to orthogonal axes, which we use to compare values all the time—but only in certain contexts. Widen or eliminate the context constraint, and it seems obvious that comparing two numbers orthogonally could yield insights about GCF.

And slope is, ultimately, the “reason” why this all works. The slope of a line in lowest terms is just the rise over the run with both the numerator and denominator divided by the GCF: \[\mathtt{\frac{\text{rise}}{\text{run}}\div\frac{\text{GCF}}{\text{GCF}}=\text{slope in lowest terms}}\]

Once slope is there, all kinds of connections take hold: divisibility, fractions, lowest terms, etc. Linear algebra, too, contains a connection, which itself is connected to something called Bézout’s Identity. There is also a weird connection to calculus—maybe—that I haven’t quite teased out. To see what I mean, let’s also draw the LCM out of these images.

From the lowest entered point on the hypotenuse, draw a horizontal line extending to the width of the triangle. Then draw a vertical line to the bottom right corner of the triangle. Now go left: draw a horizontal line all the way to the left edge of the triangle. Then a vertical line extending to the height of the lowest entered point on the hypotenuse. Finally, move right and draw a horizontal line back to where you started. You should draw a rectangle as shown in each of these examples. The area of each rectangle is the LCM of the two numbers.

The maybe-calculus connection I speak of is the visible curve vs. area-under-the-curve vibe we’ve got going on there. I’m still noodling on that one.

The Farey Mean?

I had never heard of the Farey mean, but here it is, brought to you by @howie_hua.

When you add two fractions, you of course remember that you should never just add the numerators and add the denominators across. The resulting fraction will not be the sum of the two addends. But if you do add across (under certain conditions which I’ll show below), the result will be a fraction between the two “addend” fractions. So, you can use the add-across method to find a fraction between two other fractions.

For example, \(\mathtt{\frac{1}{2}+\frac{4}{3}\rightarrow\frac{5}{5}}\). The first “addend” is definitely less than 1, and the second definitely greater than 1. The Farey mean here is exactly 1 (or \(\mathtt{\frac{5}{5}}\)), which is between the two “addend” fractions.

Why Does It Work?

Since this site is fast becoming all linear algebra all the time, let’s throw some linear algebra at this. What we want to show is that, given \(\mathtt{\frac{a}{b}<\frac{c}{d}}\) (we'll go with this assumption for now), \[\mathtt{\frac{a}{b}<\frac{a+c}{b+d}<\frac{c}{d}}\]

for certain positive integer values of \(\mathtt{a,b,c,}\) and \(\mathtt{d}\). I would probably do better to make those inequality signs less-than-or-equal-tos, but let’s stick with this for the present. We’ll start by representing the fraction \(\mathtt{\frac{a}{b}}\) as the vector \(\scriptsize\begin{bmatrix}\mathtt{b}\\\mathtt{a}\end{bmatrix}\) along with the fraction \(\mathtt{\frac{c}{d}}\) as the vector \(\scriptsize\begin{bmatrix}\mathtt{d}\\\mathtt{c}\end{bmatrix}\).

We’re looking specifically at the slopes or angles here (which is why we can represent a fraction as a vector in the first place), so we’ve made \(\scriptsize\begin{bmatrix}\mathtt{d}\\\mathtt{c}\end{bmatrix}\) have a greater slope to keep in line with our assumption above that \(\mathtt{\frac{a}{b}<\frac{c}{d}}\).

The fraction \(\mathtt{\frac{a+c}{b+d}}\) is the same as the vector \(\scriptsize\begin{bmatrix}\mathtt{b+d}\\\mathtt{a+c}\end{bmatrix}\). And since this vector is the diagonal of the vector parallelogram, it will of course have a greater slope than \(\mathtt{\frac{a}{b}}\) but less than \(\mathtt{\frac{c}{d}}\). You can keep going forever—just take one of the side vectors and use the diagonal vector as the other side. So long as you’re making parallelograms, you’ll get a new diagonal shallower than the two side vectors, and the result will be a fraction between the other two.

Incidentally, our assumption at the beginning that \(\mathtt{\frac{a}{b}<\frac{c}{d}}\) doesn't really matter to this picture. If we make \(\mathtt{\frac{c}{d}}\) less than \(\mathtt{\frac{a}{b}}\), our picture simply flips. The diagonal vector still has to be located between the two side vectors.

What Doesn’t Work?

The linear algebra picture of this concept also tells us where this method fails to find a fraction between the two addend fractions. When the two “addend” fractions are equivalent, \(\mathtt{c}\) and \(\mathtt{d}\) are multiples of \(\mathtt{a}\) and \(\mathtt{b}\), respectively, or vice versa. In that case, the resulting fraction looks like this.

The slopes or angles for both addends and for the result are the same, producing a Farey mean that is equal to both fractions.

Cosine Similarity and Correlation

I wrote a lesson not too long ago that started with a Would You Rather? survey activity. For our purposes here, we can pretend that each question had a Likert scale from 1–10 attached to it, though in reality, the lesson was about categorical data.

At any rate, here are the questions—edited a bit. Feel free to rate your answers on the scales provided. Careful! Once you click, you lock in your answer.

Would you rather . . .

  1. be able to fly (1) or be able to read minds (10)?

  1. go way back in time (1) or go way into the future (10)?

  1. be able to talk to animals (1) or speak all languages (10)?

  1. watch only historical movies (1) or sci-fi movies (10) for the rest of your life?

  1. be just a veterinarian (1) or just a musician (10)?

Finally, one last question that is not a would-you-rather. Once you’ve answered this and the rest of the questions, you can press the I'm Finished! button to submit your responses.

  1. Rate your fear of heights from (1) not at all afraid to (10) very afraid.

Check out the results so far.

Are Your Responses Correlated?

Next in the lesson, I move on to asking whether you think some of the survey responses are correlated. For example, if you scored “low” on the veterinarian-or-musician scale—meaning you would strongly prefer to be a veterinarian over a musician—would that indicate that you probably also scored “low” on Question (c) about talking to animals or speaking all the languages? In other words, are those two scores correlated? What about choosing the ability to fly and your fear of heights? Are those correlated? How could we measure this using a lot of responses from a lot of different people?

An ingenious way of looking at this question is by using cosine similarity from linear algebra. (We looked at the cosine of the angle between two vectors here and here.)

For example, suppose you really would rather have the ability to fly and you have almost no fear of heights. So, you answered Question (a) with a 1 and Question (f) with, say, a 2. Another person has no desire to fly and a terrible fear of heights, so they answer Question (a) with an 8 and Question (f) with a (10). From this description, we would probably guess that the two quantities wish-for-flight and fear-of-heights are strongly correlated. But we’ve also now got the vectors (1, 2) and (8, 10) to show us this correlation.

See that tiny angle between the vectors on the left? The cosine of tiny angles (as we saw) is close to 1, which indicates a strong correlation. On the right, you see the opposite idea. One person really wants to fly but is totally afraid of heights (1, 10) and another almost couldn’t care less about flying (or at least would really rather read minds) but has a low fear of heights (8, 2). The cosine of the close-to-90°-angle between these vectors will be close to 0, indicating a weak correlation between responses to our flight and heights questions.

But That’s Not the Ingenious Part

That’s pretty cool, but it is not, in fact, how we measure correlation. The first difficulty we encounter happens after adding more people to the survey, giving us several angles to deal with—not impossible, but pretty messy for a hundred or a thousand responses. The second, more important, difficulty is that the graph on the right above doesn’t show a weak correlation; it shows a strong negative correlation. Given just the two response pairs to work from in that graph, we would have to conclude that a strong fear of heights would make you more likely to want the ability to fly (or vice versa) rather than less likely. But the “weakest” the cosine can measure in this kind of setup is 0.

The solution to the first difficulty is to take all the x-components of the responses and make one giant vector out of them. Then do the same to the y-components. Now we’ve got just two vectors to compare! For our data on the left, the vectors (1, 2) and (8, 10) become (1, 8) and (2, 10). The vectors on the right—(1, 10) and (8, 2)—become (1, 8) and (10, 2).

The solution to the second difficulty—no negative correlations—we can achieve by centering the data. Let’s take our new vectors for the right, uncorrelated, graph: (1, 8) and (10, 2). Add the components in each vector and divide by the number of components (2) to get an average. Then subtract the average from each component. So, our new centered vectors are

(1 – ((1 + 8) ÷ 2), 8 – ((1 + 8) ÷ 2)) and (10 – ((10 + 2) ÷ 2), 2 – ((10 + 2) ÷ 2))

Or (–3.5, 3.5) and (4, –4). It’s probably not too tough to see that a vector in the 2nd quadrant and a vector in the 4th quadrant are heading in opposite directions. And these vectors now form a close-to-180° angle, and the cosine of 180° is –1 which is the actual lowest correlation we can get, indicating a strong negative correlation.

And That’s Correlation

To summarize, the way to determine correlation linear-algebra style is to determine the cosine of the centered x- and y-vectors of the data. That formula is \[\mathtt{\frac{(x-\overline{x}) \cdot (y-\overline{y})}{|x-\overline{x}||y-\overline{y}|} = cos(θ)}\]

Which is just another way of writing the more common version of the r-value correlation.

The Formula for Combinations

And now, finally, let’s get to the formula for combinations. The math in my last post got a little tricky toward the end, with the strange exclamation mark notation floating around. So let’s recap permutations without that notation.

× 3
× 3 × 2
× 3 × 2 × 1
÷ (1 × 2 × 3)
÷ (1 × 2)
÷ 1

You should see that, to traverse a tree diagram, we multiply by the tree branches, cumulatively, to move right, and then divide by those branches—again, cumulatively—to move left. The formula for the number of permutations of 2 cards chose from 4, \(\mathtt{\frac{4!}{(4-2)!}}\), tells us to multiply all the way to the right, to get 24 in the numerator and then divide two steps to the left (divide by \(\mathtt{(4-2)!}\) or 2) to get 12 permutations of 2 cards chosen from 4.


An important point about the above is that the number of permutations of \(\mathtt{r}\) cards chosen from \(\mathtt{n}\) cards, \(\mathtt{_{n}P_r}\), is a subset of the number of permutations of \(\mathtt{n}\) cards, \(\mathtt{n!}\) The tree diagram shows \(\mathtt{n!}\) and contained within it are \(\mathtt{_{n}P_r}\).

Combinations of \(\mathtt{r}\) items chosen from \(\mathtt{n}\), denoted as \(\mathtt{_{n}C_r}\), are a further subset. That is, \(\mathtt{_{n}C_r}\) are a subset of \(\mathtt{_{n}P_r}\). In our example of 2 cards chosen from 4, \(\mathtt{_{n}P_r}\) represents the first two columns of the tree diagram combined. In those columns, we have, for example, the permutations JQ and QJ. But these two permutations represent just one combination. The same goes for the other pairs in those columns. Thus, we can see that to get the number of combinations of 2 cards chosen from 4, we take \(\mathtt{_{n}P_r}\) and divide by 2. So, \[\mathtt{\frac{4!}{(4-2)!}\div 2=\frac{4!}{(4-2)!\cdot2}}\]

What about combinations of 3 cards chosen from 4? That’s the first 3 columns combined. Now the repeats are, for example, JQK, JKQ, QJK, QKJ, KJQ, KQJ. Which is 6. Noticing the pattern? For \(\mathtt{_{4}C_2}\), we divide \(\mathtt{_{4}P_2}\) further by 2! For \(\mathtt{_{4}C_3}\), we divide \(\mathtt{_{4}P_3}\) further by 3! We’re dividing (further) by \(\mathtt{r!}\)

When you think about it, this makes sense. We need to collapse every permutation of \(\mathtt{r}\) cards down to 1 combination. So we divide by \(\mathtt{r!}\) Here, finally then, is the formula for combinations: \[\mathtt{_{n}C_r=\frac{n!}{(n-r)!r!}}\]

And Now for the Legal Formula

So, did you come up with a working rule to describe the pattern we looked at last time? Here’s what I came up with:

As we saw last time, the “root” of the tree diagram (the first column) shows \(\mathtt{_{4}P_1}\), which is the number of permutations of 1 card chosen from 4. The first and second columns combined show \(\mathtt{_{4}P_2}\), the number of permutations of 2 cards chosen from 4. So, to determine \(\mathtt{_{n}P_r}\), according to this pattern, we start with \(\mathtt{n}\) and then multiply \(\mathtt{(n-1)(n-2)}\) and so on until we reach \(\mathtt{n-(r-1)}\).

The number of permutations of, say, 3 items chosen from 5, then, would be \[\mathtt{_{5}P_3=5\cdot (5-1)(5-2)=60}\]

This is a nice rule that works every time for permutations of \(\mathtt{r}\) things chosen from \(\mathtt{n}\) things. It can even be represented a little more ‘mathily’ as \[\mathtt{_{n}P_r=\prod_{k=0}^{r-1}(n-k)}\]

So let’s move on to the “legal” formula for \(\mathtt{_{n}P_r}\). A quick sidebar on notation, though, which we’ll need in a moment.

When we count the number of permutations at the end of a tree diagram, what we get is actually \(\mathtt{_{n}P_n}\). In our example, that’s \(\mathtt{_{4}P_4}\). The way we write this amount is with an exclamation mark: \(\mathtt{n!}\), or, in our case, \(\mathtt{4!}\) What \(\mathtt{4!}\) means is \(\mathtt{4\times(4-1)\times(4-2)\times(4-3)}\) according to our rule above, or just \(\mathtt{4\times3\times2\times1}\). And \(\mathtt{3!}\) is \(\mathtt{3\times(3-1)\times(3-2)}\), or just \(\mathtt{3\times2\times1}\).

In general, we can say that \(\mathtt{n!=n\times(n-1)!}\) So, for example, \(\mathtt{4!=4\times3!}\) etc. And since this means that \(\mathtt{1!=1\times(1-1)!}\), that means that \(\mathtt{0!=1}\).

So, for the tree diagram, \(\mathtt{_{4}P_4}\) means multiplying all the way to the right by \(\mathtt{n!}\). But if we’re interested in the number of arrangements of \(\mathtt{r}\) cards chosen from \(\mathtt{n}\) cards, then we need to come back to the left by \(\mathtt{(n-r)!}\) And since moving right is multiplying, moving left is dividing.

4 × 3
4 × 3 × 2
4 × 3 × 2 × 1
÷ (4 – 1)!
÷ (4 – 2)!
÷ (4 – 3)!
   ÷ (4 – 4)!

The division we need is not immediately obvious, but if you study the tree diagram above, I think it’ll make sense. This gives us, finally, the “legal” formula for the number of permutations of \(\mathtt{r}\) items from \(\mathtt{n}\) items: \[\mathtt{_{n}P_r=\frac{n!}{(n-r)!}}\]

A New Formula for Permutations?

Last time, we saw that combinations are a subset of permutations, and we wondered what the relationship between the two is. Before we get there, though, let’s look at another possible relationship—one we only hinted at last time. And to examine this relationship, we’ll use a tree diagram.

Tree Diagram

This tree diagram shows the number of permutations of the 4 cards J, Q, K, A—the number of ways we can arrange the 4 cards. The topmost branch shows the result JQKA. And you can see all 24 results from our list last time here in the tree diagram.

4 × 3
4 × 3 × 2
4 × 3 × 2 × 1
÷ 3!
÷ 2!
÷ 1!
   ÷ 0!

Here’s where, normally, people would talk about the multiplication 4 × 3 × 2 × 1 and tell you that another way to write that is with an exclamation mark: 4! But that’s skipping over something important.

And that something important is this: Notice that the first column of the tree diagram—the root of the tree—shows 4 items. This is the number of different permutations you can make of just 1 card, chosen from 4 different cards. And the first and second columns combined show the number of permutations you can make of 2 cards, chosen from 4 cards (JQ, JK, JA, etc.).

And so on. You might think that to go from “permutations of 1 card chosen from 4″ to “permutations of 2 cards chosen from 4″ you would multiply by 2. But of course that’s not right (and the tree diagram tells us so). You actually multiply 4 by 4 – 1. And to go from “permutations of 1 card chosen from 4″ to “permutations of 3 cards chosen from 4″ you multiply 4 • (4 – 1) • (4 – 2).

We’re on the verge of being able to describe the relationship, which I’ll put in question form (and mix in some notation to):

What is the relationship between the number of permutations of \(\mathtt{n}\) things, \(\mathtt{P(n)}\), and the number of permutations of \(\mathtt{r}\) things chosen from \(\mathtt{n}\) things, \(\mathtt{_{n}P_r}\)?

We can see from our example above that \(\mathtt{P(4)=24}\). That is, the number of permutations of 4 things is 24. But we also noticed these three results: $$\begin{aligned}_{\mathtt{4}}\mathtt{P}_{\mathtt{1}}&= \mathtt{\,\,4}\cdot \mathtt{1} \\ _{\mathtt{4}}\mathtt{P}_{\mathtt{2}}&= (\mathtt{4}\cdot \mathtt{1})(\mathtt{4}-\mathtt{1}) \\ _{\mathtt{4}}\mathtt{P}_{\mathtt{3}}&= (\mathtt{4}\cdot \mathtt{1})(\mathtt{4}-\mathtt{1})(\mathtt{4}-\mathtt{2})\end{aligned}$$

A New Formula?

Study the pattern above and see if you can write a rule that will get you the correct result for any \(\mathtt{_{n}P_r}\). Check your results here (for example, for \(\mathtt{_{16}P_{12}}\), you can just enter 16P12 and press Enter).

The rule you write, if you get it right, won’t be an algorithm. But it’ll work every time! This is the step we always skip when teaching about permutations! The next step is to think hard about why it works. We’ll get to the “legal” formula for permutations next time.

Permutations & Combinations

I have now been blogging for 16 years, and my very first post (long gone) was on combinations and permutations. So, it’s fun to come back to the idea now. In 2004, my experience with the two concepts was limited to how textbooks often used the awkward “care about order” (permutations) or “don’t care about order” (combinations) language to introduce the ideas. So, that’s what I wrote about then. Now I want to talk about how the two concepts are related.

What They Are

When you count permutations, you count how many different ways you can sequentially arrange some things. When you count combinations, you count how many ways you can have some things. So, given 2 cards, there are 2 different ways you can sequentially arrange 2 cards, but given 2 cards, there’s just one way to have 2 cards.

Right off the bat, the language is weird, and it’s hard to see why combinations should ever be a thing (there’s always just 1 way to have a set of things). But combinations make better sense when you are not choosing from all the elements you are given.

So, for example, how many permutations and combinations can I make of 2 cards, chosen from a total of 3 cards?

Now having the two categories of permutation and combination makes a little more sense. There are 6 permutations of 2 cards chosen from 3 cards and there are 3 combinations of 2 cards chosen from 3 cards. That is, there are 6 different ways to sequentially arrange 2 cards chosen from 3 and just 3 different ways to have 2 cards chosen from 3. And you can see, by the way, that the combinations are a subset of the permutations.

In fact, let’s do an example with 4 cards to show the actual relationship between permutations and combinations. Here we’ll just use letters to save space. The permutations of JQKA if we choose 3 cards are:


That’s 24 permutations. For combinations, we get JQK, JQA, QKA, KAJ. That’s 4 combinations. What’s the relationship? We’ll come back on that next time.