Imitation and the Ratchet Effect

a bunch of gears

Comparative psychologist Michael Tomasello, in his 1999 book The Cultural Origins of Human Cognition, popularized the now widely adopted metaphor of the “ratchet effect” in human cultural evolution:

Basically none of the most complex human artifacts or social practices—including tool industries, symbolic communication, and social institutions—were invented once and for all at a single moment by any one individual or group of individuals. Rather, what happened was that some individual or group of individuals first invented a primitive version of the artifact or practice, and then some later user or users made a modification, an “improvement,” that others then adopted perhaps without change for many generations, at which point some other individual or group of individuals made another modification, which was then learned and used by others, and so on over historical time in what has sometimes been dubbed “the ratchet effect” (Tomasello, Kruger, and Ratner, 1993). The process of cumulative cultural evolution requires not only creative invention but also, and just as importantly, faithful social transmission that can work as a ratchet to prevent slippage backward—so that the newly invented artifact or practice preserves its new and improved form at least somewhat faithfully until a further modification or improvement comes along.

But the ratchet effect presents us with a bit of a puzzle for children’s learning—or how we typically think about that learning. One can imagine, for example, a first-generation technology for dividing resources into fair shares where rocks are used as symbols and moved around into equal groups. Future generations learn this technique and then gradually innovate on it by—again, for example—recognizing that one can divide 18 into fair shares by first dividing the ten items into equal groups and then dividing the 8 into the same number of equal groups, rather than taking and moving around all 18 at once.

Even at this stage the challenge of explaining to a new generation of children why one can do this should seem more daunting than explaining the first-generation method. But now throw on top all of the cumulative innovations we can imagine here for analog division across thousands of generations: rocks are eventually replaced by written symbols, contexts where the division process applies proliferate and become more abstract, and a technology is eventually developed (long division) that allows a user to mechanistically divide any number into just about any other without needing to think about the context at all.

All of these developments are positive (or neutral) cultural innovations. But the learner in the one-thousandth generation is not neurologically all that different from the child in the first generation watching rocks being moved around. Yet, the more modern student is asked to learn a much more causally opaque process—one that has been refined over millennia, which the child was obviously not there to witness, and one whose moving parts are not intuitively related to a goal. It is much simpler for a child just arriving on the scene to intuit the goal of a tribal elder who is separating 105 beads into 3 equal groups than it is for a very similar and similarly-situated modern child to understand the goal of the seemingly random number scrawling associated with long division.

So, the puzzle is this: If the process of cumulative cultural evolution has continued to ratchet over time, how has it been maintained over tens of thousands of years when each new generation starts out marginally further from the goal of understanding any given beneficial technology? For the example of division above, we can point to instructional techniques that actually do start with separating rocks (or counters) into equal groups and building up to the more abstract long division algorithm. But this suite of techniques is already a relic. Digital computing has thoroughly taken over this work, and it’s probably safe to say that very few people (adults and children) really know how it works.

If long division is not a salient example for you, you can relate to the feeling of being an ignorant stranger to your own species’ cultural achievements by asking yourself how much you really understand about how toilets work, how cars work, and on and on. Or consider one of the many gruesome examples—described by Joseph Henrich in his book The Secret of Our Success—of what happens when otherwise intelligent and strong people find themselves outside the protections of relevant cultural understandings:

In June 1845 the HMS Erebus and the HMS Terror, both under the command of Sir John Franklin, sailed away from the British Isles in search of the fabled Northwest Passage, a sea channel that could energize trade by connecting western Europe to East Asia. This was the Apollo mission of the mid-nineteenth century, as the British raced the Russians for control of the Canadian Arctic and to complete a global map of terrestrial magnetism. The British admiralty outfitted Franklin, an experienced naval officer who had faced Arctic challenges before, with two field-tested, reinforced ice-breaking ships equipped with state-of-the-art steam engines, retractable screw propellers, and detachable rudders. With cork insulation, coal-fired internal heating, desalinators, five years of provisions, including tens of thousands of cans of food (canning was a new technology), and a twelve-hundred-volume library, these ships were carefully prepared to explore the icy north and endure long Arctic winters.

As expected, the expedition’s first season of exploration ended when the sea ice inevitably locked them in for the winter around Devon and Beechney Islands, 600 miles north of the Arctic Circle. After a successful ten-month stay, the seas opened and the expedition moved south to explore the seaways near King William Island, where in September they again found themselves locked in by ice. This time, however, as the next summer approached, it soon became clear that the ice was not retreating and that they’d remain imprisoned for another year. Franklin promptly died, leaving his crew to face the coming year in the pack ice with dwindling supplies of food and coal (heat). In April 1848, after nineteen months on the ice, the second-in-command, an experienced Arctic officer named Crozier, ordered the 105 men to abandon ship and set up camp on King William Island.

The details of what happened next are not completely known, but what is clear is that everyone gradually died. . . .

King William Island lies at the heart of Netsilik territory, an Inuit population that spent its winters out on the pack ice and their summers on the island, just like Franklin’s men. In the winter, they lived in snow houses and hunted seals using harpoons. In the summer, they lived in tents, hunted caribou, musk ox, and birds using complex compound bows and kayaks, and speared salmon using leisters. The Netsilik name for the main harbor on King William Island is Uqsuqtuuq, which means “lots of fat” (seal fat). For the Netsilik, this island is rich in resources for food, clothing, shelter, and tool-making (e.g., drift wood).

It’s Not the Innovation

What can explain the rapid progress in cumulative cultural achievements in our species (and no others, to the same extent) when each new generation must in many ways “catch up” to the ratcheted accomplishments of the previous ones? Let’s start with what the answer cannot possibly be. Tomasello again:

Perhaps surprisingly, for many animal species it is not the creative component, but rather the stabilizing ratchet component, that is the difficult feat. Thus, many nonhuman primate individuals regularly produce intelligent behavioral innovations and novelties, but then their groupmates do not engage in the kinds of social learning that would enable, over time, the cultural ratchet to do its work (Kummer and Goodall, 1985).

Similarly, Franklin’s men did not turn to cannibalism and eventually succumb to the elements because they lacked creativity or innovation or could not think outside the box.

The reason Franklin’s men could not survive is that humans don’t adapt to novel environments the way other animals do or by using our individual intelligence. None of the 105 big brains figured out how to use driftwood, which was available on King William Island’s west coast where they camped, to make the recurve composite bows, which the Inuit used when stalking caribou. They further lacked the vast body of cultural know-how about building snow houses, creating fresh water, hunting seals, making kayaks, spearing salmon and tailoring cold-weather clothing.

Innovation, by itself, gets us nowhere. The notion that our culture progresses because our species is endowed with big innovative brains (and we just need to unlock that potential) is nonsense in light of what we know about cultural evolution. In reality, what best explains the ratchet effect is a lot of imitation (solving the more difficult problem of storing and transmitting cultural knowledge) and a little bit of innovation (solving the problem of occasionally generating novel ideas, spread by imitation).

It’s the Imitation

The Inuit that can survive and thrive in an environment that killed all of Franklin’s men do so because, like Franklin’s men and like us, they are good imitators within their own cultures (and not very good innovators on average). All of us imitate valuable cultural knowledge without completely understanding what we’re doing. We need this skill precisely because of the ratchet effect. It is simply not possible, in general, to personally innovate solutions that can rival the effectiveness of those built up over thousands of generations, and it is similarly impossible to conceptually understand everything in the world before we need to use it. Thus, we imitate first and understand later. Indeed, “understandings” (or, answers to “why” questions) are imitated just as readily as answers to “how” questions, and can be equally causally opaque. If asked by a child why we don’t fly off into space when we jump, your answer would involve copying an understanding—an understanding not of your own devising—about gravity. And you don’t know what gravity is because no one does.

Lest you think (despite the story about Sir John Franklin) that causal opacity and rapid ratcheting is just a puzzle for tech-rich, conventionally educated, Western cultures in developed countries, here’s Henrich again:

Let’s briefly consider just a few of the Inuit cultural adaptations that you would need to figure out to survive on King William Island. To hunt seals, you first have to find their breathing holes in the ice. It’s important that the area around the hole be snow covered—otherwise the seals will hear you and vanish. You then open the hole, smell it to verify that it’s still in use (what do seals smell like?), and then assess the shape of the hole using a special curved piece of caribou antler. The hole is then covered with snow, save for a small gap at the top that is capped with a down indicator. If the seal enters the hole, the indicator moves, and you must blindly plunge your harpoon into the hole using all your weight. Your harpoon should be about 1.5 meters (5 ft) long, with a detachable tip that is tethered with a heavy braid of sinew line. You can get the antler from the previously noted caribou, which you brought down with your driftwood bow. The rear spike of the harpoon is made of extra-hard polar bear bone (yes, you also need to know how to kill polar bears; best to catch them napping in their dens). Once you’ve plunged your harpoon’s head into the seal, you’re then in a wrestling match as you reel him in, onto the ice, where you can finish him off with the aforementioned bear-bone spike.

Another reason to believe that imitation is (most of) the secret sauce for cultural evolution is that imitation shows up very early and robustly in development. In fact, children engage in what is called overimitation—imitating actions performed by a model even when those actions are obviously causally irrelevant to achieving the model’s goal. Other primates don’t do this. Legare and Nielsen explain this counterintuitive finding from research:

Why faithfully copy all of the actions of a demonstrator, even those that are obviously irrelevant? Given the potentially overwhelming number of objects, tools, and artifacts children must learn to use, it is useful to replicate the entire suite of actions used by an expert when first learning how to do something. Some propose that overimitation is an adaptive human strategy facilitating more rapid social learning of instrumental skills than would be possible if copying required a full representation of the causal structure of an event.


There are many takeaways and elaborations that come to mind in light of the above—all of which I’m still sussing out. One important takeaway worth mentioning, I think, is that, because humans have had culture for possibly hundreds of thousands of years, it is not out of the question that we have undergone some psychological adaptations that allow us to, most importantly, store and transmit and, less importantly, innovate on, valuable prefabricated solutions in our cultural groups.

Is it possible that the ratchet effect can help explain a foundational concept in Cognitive Load Theory: that our working memories (our innovation engines) are severely limited while our long-term memories (our imitation engines) are functionally infinite?

The other takeaway comes from Paul Harris, in the last paragraph of his book Trusting What You’re Told: How Children Learn from Others, which follows many of the same themes elaborated above, specifically from the child development angle. It is a takeaway worth taking away, especially for those in education who believe, without question or doubt, that children should be thought of as “little scientists”:

The classic method in social anthropology is not the scientific method in the way that experimental scientists conceive of it. It includes no experiments or control groups. Instead, when anthropologists want to understand a new culture, they immerse themselves in the language, learn from participant observation, and rely on trusted informants. Of course, this method has an ancient pedigree. Human children have successfully used it for millennia across innumerable cultures. Indeed, judging by their methods and their talents, we would do well to think of children not as scientists, but as anthropologists.

GCF and LCM Triangles

Go grab some dot paper or grid paper—or just make some dots in a square grid on a blank piece of paper. Let’s start with a 4 × 4 grid of dots, like so.

A 4 by 4 array of dots.

Now, start at the top left corner, draw a vertical line down to the bottom of the grid, and count each dot that your pen enters—which just means that you won’t count the first dot, since your pen leaves that dot but does not enter it. Then, draw a horizontal line to the right, starting over with your counting. Again, count each dot that your pen enters. Count just 2 dots as you draw to the right.

4 by 4 array of dots with an L-shape 3 high and 2 wide

Finally, draw a straight line (a hypotenuse) back to your starting point. Here again, count the number of dots you enter.

4 by 4 array of dots with a right triangle 3 high and 2 wide

One example is not, of course, enough to convince you that the number of dots your pen enters when drawing the hypotenuse is the greatest common factor (GCF) of the number of counted vertical dots and the number of counted horizontal dots. So, here are a few more examples with just a 4 × 4 grid.

No doubt there are tons of people out there for whom this display is completely unsurprising. But it surprised me. The GCF of two numbers is an object that seems as though it should be rather hidden—a value that may appear when we crack two numbers open and do some calculations with them, not something that just pops up when we draw lines on dot paper. We use prime factorization to suss out GCF, after all, and that is by no means an intuitive process.


There are some very nice mathematical connections here. The first is to the coordinate plane, or perhaps more simply to orthogonal axes, which we use to compare values all the time—but only in certain contexts. Widen or eliminate the context constraint, and it seems obvious that comparing two numbers orthogonally could yield insights about GCF.

And slope is, ultimately, the “reason” why this all works. The slope of a line in lowest terms is just the rise over the run with both the numerator and denominator divided by the GCF: \[\mathtt{\frac{\text{rise}}{\text{run}}\div\frac{\text{GCF}}{\text{GCF}}=\text{slope in lowest terms}}\]

Once slope is there, all kinds of connections take hold: divisibility, fractions, lowest terms, etc. Linear algebra, too, contains a connection, which itself is connected to something called Bézout’s Identity. There is also a weird connection to calculus—maybe—that I haven’t quite teased out. To see what I mean, let’s also draw the LCM out of these images.

From the lowest entered point on the hypotenuse, draw a horizontal line extending to the width of the triangle. Then draw a vertical line to the bottom right corner of the triangle. Now go left: draw a horizontal line all the way to the left edge of the triangle. Then a vertical line extending to the height of the lowest entered point on the hypotenuse. Finally, move right and draw a horizontal line back to where you started. You should draw a rectangle as shown in each of these examples. The area of each rectangle is the LCM of the two numbers.

The maybe-calculus connection I speak of is the visible curve vs. area-under-the-curve vibe we’ve got going on there. I’m still noodling on that one.

The Farey Mean?

I had never heard of the Farey mean, but here it is, brought to you by @howie_hua.

When you add two fractions, you of course remember that you should never just add the numerators and add the denominators across. The resulting fraction will not be the sum of the two addends. But if you do add across (under certain conditions which I’ll show below), the result will be a fraction between the two “addend” fractions. So, you can use the add-across method to find a fraction between two other fractions.

For example, \(\mathtt{\frac{1}{2}+\frac{4}{3}\rightarrow\frac{5}{5}}\). The first “addend” is definitely less than 1, and the second definitely greater than 1. The Farey mean here is exactly 1 (or \(\mathtt{\frac{5}{5}}\)), which is between the two “addend” fractions.

Why Does It Work?

Since this site is fast becoming all linear algebra all the time, let’s throw some linear algebra at this. What we want to show is that, given \(\mathtt{\frac{a}{b}<\frac{c}{d}}\) (we'll go with this assumption for now), \[\mathtt{\frac{a}{b}<\frac{a+c}{b+d}<\frac{c}{d}}\]

for certain positive integer values of \(\mathtt{a,b,c,}\) and \(\mathtt{d}\). I would probably do better to make those inequality signs less-than-or-equal-tos, but let’s stick with this for the present. We’ll start by representing the fraction \(\mathtt{\frac{a}{b}}\) as the vector \(\scriptsize\begin{bmatrix}\mathtt{b}\\\mathtt{a}\end{bmatrix}\) along with the fraction \(\mathtt{\frac{c}{d}}\) as the vector \(\scriptsize\begin{bmatrix}\mathtt{d}\\\mathtt{c}\end{bmatrix}\).

We’re looking specifically at the slopes or angles here (which is why we can represent a fraction as a vector in the first place), so we’ve made \(\scriptsize\begin{bmatrix}\mathtt{d}\\\mathtt{c}\end{bmatrix}\) have a greater slope to keep in line with our assumption above that \(\mathtt{\frac{a}{b}<\frac{c}{d}}\).

The fraction \(\mathtt{\frac{a+c}{b+d}}\) is the same as the vector \(\scriptsize\begin{bmatrix}\mathtt{b+d}\\\mathtt{a+c}\end{bmatrix}\). And since this vector is the diagonal of the vector parallelogram, it will of course have a greater slope than \(\mathtt{\frac{a}{b}}\) but less than \(\mathtt{\frac{c}{d}}\). You can keep going forever—just take one of the side vectors and use the diagonal vector as the other side. So long as you’re making parallelograms, you’ll get a new diagonal shallower than the two side vectors, and the result will be a fraction between the other two.

Incidentally, our assumption at the beginning that \(\mathtt{\frac{a}{b}<\frac{c}{d}}\) doesn't really matter to this picture. If we make \(\mathtt{\frac{c}{d}}\) less than \(\mathtt{\frac{a}{b}}\), our picture simply flips. The diagonal vector still has to be located between the two side vectors.

What Doesn’t Work?

The linear algebra picture of this concept also tells us where this method fails to find a fraction between the two addend fractions. When the two “addend” fractions are equivalent, \(\mathtt{c}\) and \(\mathtt{d}\) are multiples of \(\mathtt{a}\) and \(\mathtt{b}\), respectively, or vice versa. In that case, the resulting fraction looks like this.

The slopes or angles for both addends and for the result are the same, producing a Farey mean that is equal to both fractions.

Cosine Similarity and Correlation

I wrote a lesson not too long ago that started with a Would You Rather? survey activity. For our purposes here, we can pretend that each question had a Likert scale from 1–10 attached to it, though in reality, the lesson was about categorical data.

At any rate, here are the questions—edited a bit. Feel free to rate your answers on the scales provided. Careful! Once you click, you lock in your answer.

Would you rather . . .

  1. be able to fly (1) or be able to read minds (10)?

  1. go way back in time (1) or go way into the future (10)?

  1. be able to talk to animals (1) or speak all languages (10)?

  1. watch only historical movies (1) or sci-fi movies (10) for the rest of your life?

  1. be just a veterinarian (1) or just a musician (10)?

Finally, one last question that is not a would-you-rather. Once you’ve answered this and the rest of the questions, you can press the I'm Finished! button to submit your responses.

  1. Rate your fear of heights from (1) not at all afraid to (10) very afraid.

Check out the results so far.

Are Your Responses Correlated?

Next in the lesson, I move on to asking whether you think some of the survey responses are correlated. For example, if you scored “low” on the veterinarian-or-musician scale—meaning you would strongly prefer to be a veterinarian over a musician—would that indicate that you probably also scored “low” on Question (c) about talking to animals or speaking all the languages? In other words, are those two scores correlated? What about choosing the ability to fly and your fear of heights? Are those correlated? How could we measure this using a lot of responses from a lot of different people?

An ingenious way of looking at this question is by using cosine similarity from linear algebra. (We looked at the cosine of the angle between two vectors here and here.)

For example, suppose you really would rather have the ability to fly and you have almost no fear of heights. So, you answered Question (a) with a 1 and Question (f) with, say, a 2. Another person has no desire to fly and a terrible fear of heights, so they answer Question (a) with an 8 and Question (f) with a (10). From this description, we would probably guess that the two quantities wish-for-flight and fear-of-heights are strongly correlated. But we’ve also now got the vectors (1, 2) and (8, 10) to show us this correlation.

See that tiny angle between the vectors on the left? The cosine of tiny angles (as we saw) is close to 1, which indicates a strong correlation. On the right, you see the opposite idea. One person really wants to fly but is totally afraid of heights (1, 10) and another almost couldn’t care less about flying (or at least would really rather read minds) but has a low fear of heights (8, 2). The cosine of the close-to-90°-angle between these vectors will be close to 0, indicating a weak correlation between responses to our flight and heights questions.

But That’s Not the Ingenious Part

That’s pretty cool, but it is not, in fact, how we measure correlation. The first difficulty we encounter happens after adding more people to the survey, giving us several angles to deal with—not impossible, but pretty messy for a hundred or a thousand responses. The second, more important, difficulty is that the graph on the right above doesn’t show a weak correlation; it shows a strong negative correlation. Given just the two response pairs to work from in that graph, we would have to conclude that a strong fear of heights would make you more likely to want the ability to fly (or vice versa) rather than less likely. But the “weakest” the cosine can measure in this kind of setup is 0.

The solution to the first difficulty is to take all the x-components of the responses and make one giant vector out of them. Then do the same to the y-components. Now we’ve got just two vectors to compare! For our data on the left, the vectors (1, 2) and (8, 10) become (1, 8) and (2, 10). The vectors on the right—(1, 10) and (8, 2)—become (1, 8) and (10, 2).

The solution to the second difficulty—no negative correlations—we can achieve by centering the data. Let’s take our new vectors for the right, uncorrelated, graph: (1, 8) and (10, 2). Add the components in each vector and divide by the number of components (2) to get an average. Then subtract the average from each component. So, our new centered vectors are

(1 – ((1 + 8) ÷ 2), 8 – ((1 + 8) ÷ 2)) and (10 – ((10 + 2) ÷ 2), 2 – ((10 + 2) ÷ 2))

Or (–3.5, 3.5) and (4, –4). It’s probably not to tough to see that a vector in the 2nd quadrant and a vector in the 4th quadrant are heading in opposite directions. And these vectors now form a close-to-180° angle, and the cosine of 180° is –1 which is the actual lowest correlation we can get, indicating a strong negative correlation.

And That’s Correlation

To summarize, the way to determine correlation linear-algebra style is to determine the cosine of the centered x- and y-vectors of the data. That formula is \[\mathtt{\frac{(x-\overline{x}) \cdot (y-\overline{y})}{|x-\overline{x}||y-\overline{y}|} = cos(θ)}\]

Which is just another way of writing the more common version of the r-value correlation.

The Formula for Combinations

And now, finally, let’s get to the formula for combinations. The math in my last post got a little tricky toward the end, with the strange exclamation mark notation floating around. So let’s recap permutations without that notation.

× 3
× 3 × 2
× 3 × 2 × 1
÷ (1 × 2 × 3)
÷ (1 × 2)
÷ 1

You should see that, to traverse a tree diagram, we multiply by the tree branches, cumulatively, to move right, and then divide by those branches—again, cumulatively—to move left. The formula for the number of permutations of 2 cards chose from 4, \(\mathtt{\frac{4!}{(4-2)!}}\), tells us to multiply all the way to the right, to get 24 in the numerator and then divide two steps to the left (divide by \(\mathtt{(4-2)!}\) or 2) to get 12 permutations of 2 cards chosen from 4.


An important point about the above is that the number of permutations of \(\mathtt{r}\) cards chosen from \(\mathtt{n}\) cards, \(\mathtt{_{n}P_r}\), is a subset of the number of permutations of \(\mathtt{n}\) cards, \(\mathtt{n!}\) The tree diagram shows \(\mathtt{n!}\) and contained within it are \(\mathtt{_{n}P_r}\).

Combinations of \(\mathtt{r}\) items chosen from \(\mathtt{n}\), denoted as \(\mathtt{_{n}C_r}\), are a further subset. That is, \(\mathtt{_{n}C_r}\) are a subset of \(\mathtt{_{n}P_r}\). In our example of 2 cards chosen from 4, \(\mathtt{_{n}P_r}\) represents the first two columns of the tree diagram combined. In those columns, we have, for example, the permutations JQ and QJ. But these two permutations represent just one combination. The same goes for the other pairs in those columns. Thus, we can see that to get the number of combinations of 2 cards chosen from 4, we take \(\mathtt{_{n}P_r}\) and divide by 2. So, \[\mathtt{\frac{4!}{(4-2)!}\div 2=\frac{4!}{(4-2)!\cdot2}}\]

What about combinations of 3 cards chosen from 4? That’s the first 3 columns combined. Now the repeats are, for example, JQK, JKQ, QJK, QKJ, KJQ, KQJ. Which is 6. Noticing the pattern? For \(\mathtt{_{4}C_2}\), we divide \(\mathtt{_{4}P_2}\) further by 2! For \(\mathtt{_{4}C_3}\), we divide \(\mathtt{_{4}P_3}\) further by 3! We’re dividing (further) by \(\mathtt{r!}\)

When you think about it, this makes sense. We need to collapse every permutation of \(\mathtt{r}\) cards down to 1 combination. So we divide by \(\mathtt{r!}\) Here, finally then, is the formula for combinations: \[\mathtt{_{n}C_r=\frac{n!}{(n-r)!r!}}\]

And Now for the Legal Formula

So, did you come up with a working rule to describe the pattern we looked at last time? Here’s what I came up with:

As we saw last time, the “root” of the tree diagram (the first column) shows \(\mathtt{_{4}P_1}\), which is the number of permutations of 1 card chosen from 4. The first and second columns combined show \(\mathtt{_{4}P_2}\), the number of permutations of 2 cards chosen from 4. So, to determine \(\mathtt{_{n}P_r}\), according to this pattern, we start with \(\mathtt{n}\) and then multiply \(\mathtt{(n-1)(n-2)}\) and so on until we reach \(\mathtt{n-(r-1)}\).

The number of permutations of, say, 3 items chosen from 5, then, would be \[\mathtt{_{5}P_3=5\cdot (5-1)(5-2)=60}\]

This is a nice rule that works every time for permutations of \(\mathtt{r}\) things chosen from \(\mathtt{n}\) things. It can even be represented a little more ‘mathily’ as \[\mathtt{_{n}P_r=\prod_{k=0}^{r-1}(n-k)}\]

So let’s move on to the “legal” formula for \(\mathtt{_{n}P_r}\). A quick sidebar on notation, though, which we’ll need in a moment.

When we count the number of permutations at the end of a tree diagram, what we get is actually \(\mathtt{_{n}P_n}\). In our example, that’s \(\mathtt{_{4}P_4}\). The way we write this amount is with an exclamation mark: \(\mathtt{n!}\), or, in our case, \(\mathtt{4!}\) What \(\mathtt{4!}\) means is \(\mathtt{4\times(4-1)\times(4-2)\times(4-3)}\) according to our rule above, or just \(\mathtt{4\times3\times2\times1}\). And \(\mathtt{3!}\) is \(\mathtt{3\times(3-1)\times(3-2)}\), or just \(\mathtt{3\times2\times1}\).

In general, we can say that \(\mathtt{n!=n\times(n-1)!}\) So, for example, \(\mathtt{4!=4\times3!}\) etc. And since this means that \(\mathtt{1!=1\times(1-1)!}\), that means that \(\mathtt{0!=1}\).

So, for the tree diagram, \(\mathtt{_{4}P_4}\) means multiplying all the way to the right by \(\mathtt{n!}\). But if we’re interested in the number of arrangements of \(\mathtt{r}\) cards chosen from \(\mathtt{n}\) cards, then we need to come back to the left by \(\mathtt{(n-r)!}\) And since moving right is multiplying, moving left is dividing.

4 × 3
4 × 3 × 2
4 × 3 × 2 × 1
÷ (4 – 1)!
÷ (4 – 2)!
÷ (4 – 3)!
   ÷ (4 – 4)!

The division we need is not immediately obvious, but if you study the tree diagram above, I think it’ll make sense. This gives us, finally, the “legal” formula for the number of permutations of \(\mathtt{r}\) items from \(\mathtt{n}\) items: \[\mathtt{_{n}P_r=\frac{n!}{(n-r)!}}\]

A New Formula for Permutations?

Last time, we saw that combinations are a subset of permutations, and we wondered what the relationship between the two is. Before we get there, though, let’s look at another possible relationship—one we only hinted at last time. And to examine this relationship, we’ll use a tree diagram.

Tree Diagram

This tree diagram shows the number of permutations of the 4 cards J, Q, K, A—the number of ways we can arrange the 4 cards. The topmost branch shows the result JQKA. And you can see all 24 results from our list last time here in the tree diagram.

4 × 3
4 × 3 × 2
4 × 3 × 2 × 1
÷ 3!
÷ 2!
÷ 1!
   ÷ 0!

Here’s where, normally, people would talk about the multiplication 4 × 3 × 2 × 1 and tell you that another way to write that is with an exclamation mark: 4! But that’s skipping over something important.

And that something important is this: Notice that the first column of the tree diagram—the root of the tree—shows 4 items. This is the number of different permutations you can make of just 1 card, chosen from 4 different cards. And the first and second columns combined show the number of permutations you can make of 2 cards, chosen from 4 cards (JQ, JK, JA, etc.).

And so on. You might think that to go from “permutations of 1 card chosen from 4″ to “permutations of 2 cards chosen from 4″ you would multiply by 2. But of course that’s not right (and the tree diagram tells us so). You actually multiply 4 by 4 – 1. And to go from “permutations of 1 card chosen from 4″ to “permutations of 3 cards chosen from 4″ you multiply 4 • (4 – 1) • (4 – 2).

We’re on the verge of being able to describe the relationship, which I’ll put in question form (and mix in some notation to):

What is the relationship between the number of permutations of \(\mathtt{n}\) things, \(\mathtt{P(n)}\), and the number of permutations of \(\mathtt{r}\) things chosen from \(\mathtt{n}\) things, \(\mathtt{_{n}P_r}\)?

We can see from our example above that \(\mathtt{P(4)=24}\). That is, the number of permutations of 4 things is 24. But we also noticed these three results: $$\begin{aligned}_{\mathtt{4}}\mathtt{P}_{\mathtt{1}}&= \mathtt{\,\,4}\cdot \mathtt{1} \\ _{\mathtt{4}}\mathtt{P}_{\mathtt{2}}&= (\mathtt{4}\cdot \mathtt{1})(\mathtt{4}-\mathtt{1}) \\ _{\mathtt{4}}\mathtt{P}_{\mathtt{3}}&= (\mathtt{4}\cdot \mathtt{1})(\mathtt{4}-\mathtt{1})(\mathtt{4}-\mathtt{2})\end{aligned}$$

A New Formula?

Study the pattern above and see if you can write a rule that will get you the correct result for any \(\mathtt{_{n}P_r}\). Check your results here (for example, for \(\mathtt{_{16}P_{12}}\), you can just enter 16P12 and press Enter).

The rule you write, if you get it right, won’t be an algorithm. But it’ll work every time! This is the step we always skip when teaching about permutations! The next step is to think hard about why it works. We’ll get to the “legal” formula for permutations next time.

Permutations & Combinations

I have now been blogging for 16 years, and my very first post (long gone) was on combinations and permutations. So, it’s fun to come back to the idea now. In 2004, my experience with the two concepts was limited to how textbooks often used the awkward “care about order” (permutations) or “don’t care about order” (combinations) language to introduce the ideas. So, that’s what I wrote about then. Now I want to talk about how the two concepts are related.

What They Are

When you count permutations, you count how many different ways you can sequentially arrange some things. When you count combinations, you count how many ways you can have some things. So, given 2 cards, there are 2 different ways you can sequentially arrange 2 cards, but given 2 cards, there’s just one way to have 2 cards.

Right off the bat, the language is weird, and it’s hard to see why combinations should ever be a thing (there’s always just 1 way to have a set of things). But combinations make better sense when you are not choosing from all the elements you are given.

So, for example, how many permutations and combinations can I make of 2 cards, chosen from a total of 3 cards?

Now having the two categories of permutation and combination makes a little more sense. There are 6 permutations of 2 cards chosen from 3 cards and there are 3 combinations of 2 cards chosen from 3 cards. That is, there are 6 different ways to sequentially arrange 2 cards chosen from 3 and just 3 different ways to have 2 cards chosen from 3. And you can see, by the way, that the combinations are a subset of the permutations.

In fact, let’s do an example with 4 cards to show the actual relationship between permutations and combinations. Here we’ll just use letters to save space. The permutations of JQKA if we choose 3 cards are:


That’s 24 permutations. For combinations, we get JQK, JQA, QKA, KAJ. That’s 4 combinations. What’s the relationship? We’ll come back on that next time.

Rotating Coordinate Systems

I‘ve just started with Six Not-So-Easy Pieces, based on Feynman’s famous lectures, and already there’s some decently juicy stuff. In the beginning, Feynman discusses the symmetry of physical laws—that is, the invariance of physical laws under certain transformations (like rotations):

If we build a piece of equipment in some place and watch it operate, and nearby we build the same kind of apparatus but put it up on an angle, will it operate in the same way?

He goes on to explain that, of course, a grandfather clock will not operate in the same way under specific rotations. Assuming the invariance of physical laws under rotations, this change in operation tells us something interesting: that the operation of the clock is dependent on something outside of the “system” that is the clock itself.

The theorem is then false in the case of the pendulum clock, unless we include the earth, which is pulling on the pendulum. Therefore we can make a prediction about pendulum clocks if we believe in the symmetry of physical law for rotation: something else is involved in the operation of a pendulum clock besides the machinery of the clock, something outside it that we should look for.

Rotation Coordinates

Feynman then proceeds with a brief mathematical analysis of forces under rotations. A somewhat confusing prelude to this is a presentation that involves expressing the coordinates of a rotated system in terms of the original system. He uses the diagram below to derive those coordinates (except for the blue highlighting, which I use to show what (x’, y’) looks like in Moe’s system). What we want is to express (x’, y’) in terms of x and y—to describe Moe’s point P in terms of Joe’s point P.

“We first drop perpendiculars from P to all four axes and draw AB perpendicular to PQ.”

The first confusion that is not dealt with (because Feynman makes the assumption that his audience is advanced students) is what angles in the diagram are congruent to θ shown. And here again we see the value of the easy-to-forget art of eyeballing and common sense in geometric reasoning.

The y’ axis is displaced just as much as the x’ axis by rotation, and “displaced just as much by rotation” is a perfectly good definition of angle congruence that we tend to forget after hundreds of hours of deriving work. The same reasoning applies to the rotational displacement from AP to AB. If we imagine rotating AP to AB, we see that we are starting perpendicular to the x-axis and ending perpendicular to the x’-axis. The y-to-y’ rotation does the same thing, so the displacement angle must be the same. So let’s put in those new thetas, only one of which we’ll need.

Inspection of the figure shows that x’ can be written as the sum of two lengths along the x’-axis, and y’ as the difference of two lengths along AB.

Here is x’ as the sum of two lengths (red and orange): \[\mathtt{x’=OA\cdot\color{red}{\frac{OC}{OA}}+AP\cdot\color{orange}{\frac{BP}{AP}}\quad\rightarrow\quad x\cdot\color{red}{cos\,θ}+y\cdot\color{orange}{sin\,θ}}\]

And here is y’ as the difference of two lengths (green – purple): \[\mathtt{y’=AP\cdot\color{green}{\frac{AB}{AP}}-OA\cdot\color{purple}{\frac{AC}{OA}}\quad\rightarrow\quad y\cdot\color{green}{cos\,θ}-x\cdot\color{purple}{sin\,θ}}\]

So, if Joe describes the location of point P to Moe, and the rotational displacement between Moe and Joe’s systems is known (and it is known that the two systems share an origin), Moe can use the manipulations above to determine the location of point P in his system.

Another, exactly equal, way of saying this—the way we said it when we talked about rotation matrices—is that, if we represent point P in Joe’s system as a position vector (x, y), then Moe’s point P vector is \[\small{\begin{bmatrix}\mathtt{x’}\\\mathtt{y’}\end{bmatrix}=\begin{bmatrix}\mathtt{\,\,\,\,\,cos\,θ} & \mathtt{sin\,θ}\\\mathtt{-sin\,θ} & \mathtt{cos\,θ}\end{bmatrix}\begin{bmatrix}\mathtt{x}\\\mathtt{y}\end{bmatrix}=\mathtt{x}\begin{bmatrix}\mathtt{\,\,\,\,\,cos\,θ}\\\mathtt{-sin\,θ}\end{bmatrix}+\mathtt{y}\begin{bmatrix}\mathtt{sin\,θ}\\\mathtt{cos\,θ}\end{bmatrix}=\begin{bmatrix}\mathtt{\,\,\,\,\,\,x\cdot \color{red}{cos\,θ}\,\,\,+y\cdot \color{orange}{sin\,θ}}\\\mathtt{-(x\cdot \color{purple}{sin\,θ})+y\cdot \color{green}{cos\,θ}}\end{bmatrix}}\]

The first rotation matrix above actually describes a clockwise rotation, which is both different from what we discussed at the link above (our final matrix there was for counterclockwise rotations) and unexpected, since we know that Moe’s system is a counterclockwise rotation of Joe’s system.

The resolution to that unexpectedness can again be found after a little eyeballing. The position vector for point P in Joe’s system is at a steep angle, whereas in Moe’s system, it is at a shallow angle. Only a clockwise rotation will change the coordinates in the appropriate way.

Mr Barton’s Second Book

It has been now just two years since I reviewed Mr Barton’s stellar first book. I say “just,” in part because the last three weeks during this pandemic have felt like five years, and in part because Barton packs so much into his second book, it is a little surprising he did it in just two years.

The central theme of Reflect, Expect, Check, Explain is using and constructing ‘intelligent’ sequences of mathematics exercises, “providing opportunities to think mathematically.” The intelligence behind these sequences is the way we order and arrange them, allowing for comparison (reflection) between two or more exercises, the anticipation of what the answer or solution method will be (expectation) based on what the previous answer or solution method was, determination of the answer (check), and then an explanation of the connection between the exercises (explain).

Consider, for example, the sequence at left, from early in the book. During reflect, for the first pair of exercises, I can notice that the lower and upper bounds have stayed the same, and the second number line has minor ticks for every second minor tick of the first number line. I can also notice that the sought-after decimal value is at the same location on both number lines. This noticing can lead me to expect that since I identified the missing value for the first number line as 2.6, my answer should be the same for the second number line. It’s possible, though, that I won’t come up with an expectation. In the check phase, I fill in the values for the equal intervals on the second number line, coming up with the value for the question mark. Finally, when I explain, I either have a chance to talk about my earlier expectation and explain why I was off or why my expectation was correct or, if I couldn’t formulate an expectation, I can explain why the question-marked values are the same even though the tick marks are different.

As I move through the sequence, there are really interesting thoughts to have.

  • Why did the question-marked values line up when moving from 10 to 5 equal intervals (between Questions 1 and 2) but not when moving from 5 to 4 equal intervals (between Questions 3 and 4)?
  • Why does “lining up” fail me in Questions 4, 5, and 6 when it worked between Questions 1 and 2?
  • I can’t rely on inspection every time to figure out the intervals. Is there something I can do to make that task simpler?
  • Is the question-marked value in Question 9 just the question-marked value in Question 8, divided by 10?
  • Can I extend my interval calculator method to decimals?

If this were the entire book, that would be enough for me, to be honest. But Mr Barton spends an exemplary amount of effort addressing possible questions and misconceptions about such sequences (the FAQ chapter is excellent) and explaining how these sequences can both fit into more extensive learning episodes and can function in different ways from practice. All the while, the sequences remain the stars of the show.

I highly recommend (again) Mr Barton’s book, especially to math teachers. He outlines in brilliant detail how you can turn a set of boring exercises into a powerful method for soliciting students’ mathematical thinking. No revolution required.

Choice Quotes

Below are just a few snips from the book that I added to my notebook while reading. These are not necessarily reflective of the entire argument. But after a long day of educhatter, which more often than not reads like an ancient scroll from some monist cult, it is comforting to read these thoughts and know that there is still a place for practical, technical, dispassionate thinking about teaching and learning in the 21st century—a place for waging the cerebral battle, rather than constantly leading with our chin or our hearts.

Teaching a method in isolation and practising it in isolation is important to develop confidence and competence with that method, and indeed, students can get pretty good pretty quickly. But if we do not then challenge them to decide when they should use that method – and crucially when they should not – we deny them the opportunity to identify the strategy needed to solve the problem.

There are two main arguments in favour of teaching a particular method before delving into why it works.

The path to flexible knowledge The key point that Willingham makes is that acquiring inflexible knowledge is a necessary step on the path to developing flexible knowledge. There is no short cut. The ‘why’ is conceptual and abstract. We understand concepts through examples. The ‘how’ generates our students’ experience of examples. In other words, often we have to do things several times to appreciate exactly how and why they work.

Motivation As Garon-Carrier et al. (2015) conclude, motivation is likely to be built on a foundation of success, and not the other way around.

The mistake I made for much of my career was trying to fast track my students to this [problem solving] stage. This was partly due to my obsession with differentiation – heaven forbid a child should be in their comfort zone for more than a few seconds – but also based on my belief that problem solving offered some sort of incredible 2-for-1 deal. I thought it would enable my students to practice the basics, whilst at the same time allowing them to develop that magic problem solving skill.

I will again quote John Mason: “It is the ways of thinking that are rich, not the task itself.”