Imitation and the Ratchet Effect

a bunch of gears

Comparative psychologist Michael Tomasello, in his 1999 book The Cultural Origins of Human Cognition, popularized the now widely adopted metaphor of the “ratchet effect” in human cultural evolution:

Basically none of the most complex human artifacts or social practices—including tool industries, symbolic communication, and social institutions—were invented once and for all at a single moment by any one individual or group of individuals. Rather, what happened was that some individual or group of individuals first invented a primitive version of the artifact or practice, and then some later user or users made a modification, an “improvement,” that others then adopted perhaps without change for many generations, at which point some other individual or group of individuals made another modification, which was then learned and used by others, and so on over historical time in what has sometimes been dubbed “the ratchet effect” (Tomasello, Kruger, and Ratner, 1993). The process of cumulative cultural evolution requires not only creative invention but also, and just as importantly, faithful social transmission that can work as a ratchet to prevent slippage backward—so that the newly invented artifact or practice preserves its new and improved form at least somewhat faithfully until a further modification or improvement comes along.

But the ratchet effect presents us with a bit of a puzzle for children’s learning—or how we typically think about that learning. One can imagine, for example, a first-generation technology for dividing resources into fair shares where rocks are used as symbols and moved around into equal groups. Future generations learn this technique and then gradually innovate on it by—again, for example—recognizing that one can divide 18 into fair shares by first dividing the ten items into equal groups and then dividing the 8 into the same number of equal groups, rather than taking and moving around all 18 at once.

Even at this stage the challenge of explaining to a new generation of children why one can do this should seem more daunting than explaining the first-generation method. But now throw on top all of the cumulative innovations we can imagine here for analog division across thousands of generations: rocks are eventually replaced by written symbols, contexts where the division process applies proliferate and become more abstract, and a technology is eventually developed (long division) that allows a user to mechanistically divide any number into just about any other without needing to think about the context at all.

All of these developments are positive (or neutral) cultural innovations. But the learner in the one-thousandth generation is not neurologically all that different from the child in the first generation watching rocks being moved around. Yet, the more modern student is asked to learn a much more causally opaque process—one that has been refined over millennia, which the child was obviously not there to witness, and one whose moving parts are not intuitively related to a goal. It is much simpler for a child just arriving on the scene to intuit the goal of a tribal elder who is separating 105 beads into 3 equal groups than it is for a very similar and similarly-situated modern child to understand the goal of the seemingly random number scrawling associated with long division.

So, the puzzle is this: If the process of cumulative cultural evolution has continued to ratchet over time, how has it been maintained over tens of thousands of years when each new generation starts out marginally further from the goal of understanding any given beneficial technology? For the example of division above, we can point to instructional techniques that actually do start with separating rocks (or counters) into equal groups and building up to the more abstract long division algorithm. But this suite of techniques is already a relic. Digital computing has thoroughly taken over this work, and it’s probably safe to say that very few people (adults and children) really know how it works.

If long division is not a salient example for you, you can relate to the feeling of being an ignorant stranger to your own species’ cultural achievements by asking yourself how much you really understand about how toilets work, how cars work, and on and on. Or consider one of the many gruesome examples—described by Joseph Henrich in his book The Secret of Our Success—of what happens when otherwise intelligent and strong people find themselves outside the protections of relevant cultural understandings:

In June 1845 the HMS Erebus and the HMS Terror, both under the command of Sir John Franklin, sailed away from the British Isles in search of the fabled Northwest Passage, a sea channel that could energize trade by connecting western Europe to East Asia. This was the Apollo mission of the mid-nineteenth century, as the British raced the Russians for control of the Canadian Arctic and to complete a global map of terrestrial magnetism. The British admiralty outfitted Franklin, an experienced naval officer who had faced Arctic challenges before, with two field-tested, reinforced ice-breaking ships equipped with state-of-the-art steam engines, retractable screw propellers, and detachable rudders. With cork insulation, coal-fired internal heating, desalinators, five years of provisions, including tens of thousands of cans of food (canning was a new technology), and a twelve-hundred-volume library, these ships were carefully prepared to explore the icy north and endure long Arctic winters.

As expected, the expedition’s first season of exploration ended when the sea ice inevitably locked them in for the winter around Devon and Beechney Islands, 600 miles north of the Arctic Circle. After a successful ten-month stay, the seas opened and the expedition moved south to explore the seaways near King William Island, where in September they again found themselves locked in by ice. This time, however, as the next summer approached, it soon became clear that the ice was not retreating and that they’d remain imprisoned for another year. Franklin promptly died, leaving his crew to face the coming year in the pack ice with dwindling supplies of food and coal (heat). In April 1848, after nineteen months on the ice, the second-in-command, an experienced Arctic officer named Crozier, ordered the 105 men to abandon ship and set up camp on King William Island.

The details of what happened next are not completely known, but what is clear is that everyone gradually died. . . .

King William Island lies at the heart of Netsilik territory, an Inuit population that spent its winters out on the pack ice and their summers on the island, just like Franklin’s men. In the winter, they lived in snow houses and hunted seals using harpoons. In the summer, they lived in tents, hunted caribou, musk ox, and birds using complex compound bows and kayaks, and speared salmon using leisters. The Netsilik name for the main harbor on King William Island is Uqsuqtuuq, which means “lots of fat” (seal fat). For the Netsilik, this island is rich in resources for food, clothing, shelter, and tool-making (e.g., drift wood).

It’s Not the Innovation

What can explain the rapid progress in cumulative cultural achievements in our species (and no others, to the same extent) when each new generation must in many ways “catch up” to the ratcheted accomplishments of the previous ones? Let’s start with what the answer cannot possibly be. Tomasello again:

Perhaps surprisingly, for many animal species it is not the creative component, but rather the stabilizing ratchet component, that is the difficult feat. Thus, many nonhuman primate individuals regularly produce intelligent behavioral innovations and novelties, but then their groupmates do not engage in the kinds of social learning that would enable, over time, the cultural ratchet to do its work (Kummer and Goodall, 1985).

Similarly, Franklin’s men did not turn to cannibalism and eventually succumb to the elements because they lacked creativity or innovation or could not think outside the box.

The reason Franklin’s men could not survive is that humans don’t adapt to novel environments the way other animals do or by using our individual intelligence. None of the 105 big brains figured out how to use driftwood, which was available on King William Island’s west coast where they camped, to make the recurve composite bows, which the Inuit used when stalking caribou. They further lacked the vast body of cultural know-how about building snow houses, creating fresh water, hunting seals, making kayaks, spearing salmon and tailoring cold-weather clothing.

Innovation, by itself, gets us nowhere. The notion that our culture progresses because our species is endowed with big innovative brains (and we just need to unlock that potential) is nonsense in light of what we know about cultural evolution. In reality, what best explains the ratchet effect is a lot of imitation (solving the more difficult problem of storing and transmitting cultural knowledge) and a little bit of innovation (solving the problem of occasionally generating novel ideas, spread by imitation).

It’s the Imitation

The Inuit that can survive and thrive in an environment that killed all of Franklin’s men do so because, like Franklin’s men and like us, they are good imitators within their own cultures (and not very good innovators on average). All of us imitate valuable cultural knowledge without completely understanding what we’re doing. We need this skill precisely because of the ratchet effect. It is simply not possible, in general, to personally innovate solutions that can rival the effectiveness of those built up over thousands of generations, and it is similarly impossible to conceptually understand everything in the world before we need to use it. Thus, we imitate first and understand later. Indeed, “understandings” (or, answers to “why” questions) are imitated just as readily as answers to “how” questions, and can be equally causally opaque. If asked by a child why we don’t fly off into space when we jump, your answer would involve copying an understanding—an understanding not of your own devising—about gravity. And you don’t know what gravity is because no one does.

Lest you think (despite the story about Sir John Franklin) that causal opacity and rapid ratcheting is just a puzzle for tech-rich, conventionally educated, Western cultures in developed countries, here’s Henrich again:

Let’s briefly consider just a few of the Inuit cultural adaptations that you would need to figure out to survive on King William Island. To hunt seals, you first have to find their breathing holes in the ice. It’s important that the area around the hole be snow covered—otherwise the seals will hear you and vanish. You then open the hole, smell it to verify that it’s still in use (what do seals smell like?), and then assess the shape of the hole using a special curved piece of caribou antler. The hole is then covered with snow, save for a small gap at the top that is capped with a down indicator. If the seal enters the hole, the indicator moves, and you must blindly plunge your harpoon into the hole using all your weight. Your harpoon should be about 1.5 meters (5 ft) long, with a detachable tip that is tethered with a heavy braid of sinew line. You can get the antler from the previously noted caribou, which you brought down with your driftwood bow. The rear spike of the harpoon is made of extra-hard polar bear bone (yes, you also need to know how to kill polar bears; best to catch them napping in their dens). Once you’ve plunged your harpoon’s head into the seal, you’re then in a wrestling match as you reel him in, onto the ice, where you can finish him off with the aforementioned bear-bone spike.

Another reason to believe that imitation is (most of) the secret sauce for cultural evolution is that imitation shows up very early and robustly in development. In fact, children engage in what is called overimitation—imitating actions performed by a model even when those actions are obviously causally irrelevant to achieving the model’s goal. Other primates don’t do this. Legare and Nielsen explain this counterintuitive finding from research:

Why faithfully copy all of the actions of a demonstrator, even those that are obviously irrelevant? Given the potentially overwhelming number of objects, tools, and artifacts children must learn to use, it is useful to replicate the entire suite of actions used by an expert when first learning how to do something. Some propose that overimitation is an adaptive human strategy facilitating more rapid social learning of instrumental skills than would be possible if copying required a full representation of the causal structure of an event.

Conclusion

There are many takeaways and elaborations that come to mind in light of the above—all of which I’m still sussing out. One important takeaway worth mentioning, I think, is that, because humans have had culture for possibly hundreds of thousands of years, it is not out of the question that we have undergone some psychological adaptations that allow us to, most importantly, store and transmit and, less importantly, innovate on, valuable prefabricated solutions in our cultural groups.

Is it possible that the ratchet effect can help explain a foundational concept in Cognitive Load Theory: that our working memories (our innovation engines) are severely limited while our long-term memories (our imitation engines) are functionally infinite?

The other takeaway comes from Paul Harris, in the last paragraph of his book Trusting What You’re Told: How Children Learn from Others, which follows many of the same themes elaborated above, specifically from the child development angle. It is a takeaway worth taking away, especially for those in education who believe, without question or doubt, that children should be thought of as “little scientists”:

The classic method in social anthropology is not the scientific method in the way that experimental scientists conceive of it. It includes no experiments or control groups. Instead, when anthropologists want to understand a new culture, they immerse themselves in the language, learn from participant observation, and rely on trusted informants. Of course, this method has an ancient pedigree. Human children have successfully used it for millennia across innumerable cultures. Indeed, judging by their methods and their talents, we would do well to think of children not as scientists, but as anthropologists.

GCF and LCM Triangles

Go grab some dot paper or grid paper—or just make some dots in a square grid on a blank piece of paper. Let’s start with a 4 × 4 grid of dots, like so.

A 4 by 4 array of dots.

Now, start at the top left corner, draw a vertical line down to the bottom of the grid, and count each dot that your pen enters—which just means that you won’t count the first dot, since your pen leaves that dot but does not enter it. Then, draw a horizontal line to the right, starting over with your counting. Again, count each dot that your pen enters. Count just 2 dots as you draw to the right.

4 by 4 array of dots with an L-shape 3 high and 2 wide

Finally, draw a straight line (a hypotenuse) back to your starting point. Here again, count the number of dots you enter.

4 by 4 array of dots with a right triangle 3 high and 2 wide

One example is not, of course, enough to convince you that the number of dots your pen enters when drawing the hypotenuse is the greatest common factor (GCF) of the number of counted vertical dots and the number of counted horizontal dots. So, here are a few more examples with just a 4 × 4 grid.

No doubt there are tons of people out there for whom this display is completely unsurprising. But it surprised me. The GCF of two numbers is an object that seems as though it should be rather hidden—a value that may appear when we crack two numbers open and do some calculations with them, not something that just pops up when we draw lines on dot paper. We use prime factorization to suss out GCF, after all, and that is by no means an intuitive process.

Connections

There are some very nice mathematical connections here. The first is to the coordinate plane, or perhaps more simply to orthogonal axes, which we use to compare values all the time—but only in certain contexts. Widen or eliminate the context constraint, and it seems obvious that comparing two numbers orthogonally could yield insights about GCF.

And slope is, ultimately, the “reason” why this all works. The slope of a line in lowest terms is just the rise over the run with both the numerator and denominator divided by the GCF: \[\mathtt{\frac{\text{rise}}{\text{run}}\div\frac{\text{GCF}}{\text{GCF}}=\text{slope in lowest terms}}\]

Once slope is there, all kinds of connections take hold: divisibility, fractions, lowest terms, etc. Linear algebra, too, contains a connection, which itself is connected to something called Bézout’s Identity. There is also a weird connection to calculus—maybe—that I haven’t quite teased out. To see what I mean, let’s also draw the LCM out of these images.

From the lowest entered point on the hypotenuse, draw a horizontal line extending to the width of the triangle. Then draw a vertical line to the bottom right corner of the triangle. Now go left: draw a horizontal line all the way to the left edge of the triangle. Then a vertical line extending to the height of the lowest entered point on the hypotenuse. Finally, move right and draw a horizontal line back to where you started. You should draw a rectangle as shown in each of these examples. The area of each rectangle is the LCM of the two numbers.

The maybe-calculus connection I speak of is the visible curve vs. area-under-the-curve vibe we’ve got going on there. I’m still noodling on that one.

Mr Barton’s Second Book

It has been now just two years since I reviewed Mr Barton’s stellar first book. I say “just,” in part because the last three weeks during this pandemic have felt like five years, and in part because Barton packs so much into his second book, it is a little surprising he did it in just two years.

The central theme of Reflect, Expect, Check, Explain is using and constructing ‘intelligent’ sequences of mathematics exercises, “providing opportunities to think mathematically.” The intelligence behind these sequences is the way we order and arrange them, allowing for comparison (reflection) between two or more exercises, the anticipation of what the answer or solution method will be (expectation) based on what the previous answer or solution method was, determination of the answer (check), and then an explanation of the connection between the exercises (explain).

Consider, for example, the sequence at left, from early in the book. During reflect, for the first pair of exercises, I can notice that the lower and upper bounds have stayed the same, and the second number line has minor ticks for every second minor tick of the first number line. I can also notice that the sought-after decimal value is at the same location on both number lines. This noticing can lead me to expect that since I identified the missing value for the first number line as 2.6, my answer should be the same for the second number line. It’s possible, though, that I won’t come up with an expectation. In the check phase, I fill in the values for the equal intervals on the second number line, coming up with the value for the question mark. Finally, when I explain, I either have a chance to talk about my earlier expectation and explain why I was off or why my expectation was correct or, if I couldn’t formulate an expectation, I can explain why the question-marked values are the same even though the tick marks are different.

As I move through the sequence, there are really interesting thoughts to have.

  • Why did the question-marked values line up when moving from 10 to 5 equal intervals (between Questions 1 and 2) but not when moving from 5 to 4 equal intervals (between Questions 3 and 4)?
  • Why does “lining up” fail me in Questions 4, 5, and 6 when it worked between Questions 1 and 2?
  • I can’t rely on inspection every time to figure out the intervals. Is there something I can do to make that task simpler?
  • Is the question-marked value in Question 9 just the question-marked value in Question 8, divided by 10?
  • Can I extend my interval calculator method to decimals?

If this were the entire book, that would be enough for me, to be honest. But Mr Barton spends an exemplary amount of effort addressing possible questions and misconceptions about such sequences (the FAQ chapter is excellent) and explaining how these sequences can both fit into more extensive learning episodes and can function in different ways from practice. All the while, the sequences remain the stars of the show.

I highly recommend (again) Mr Barton’s book, especially to math teachers. He outlines in brilliant detail how you can turn a set of boring exercises into a powerful method for soliciting students’ mathematical thinking. No revolution required.

Choice Quotes

Below are just a few snips from the book that I added to my notebook while reading. These are not necessarily reflective of the entire argument. But after a long day of educhatter, which more often than not reads like an ancient scroll from some monist cult, it is comforting to read these thoughts and know that there is still a place for practical, technical, dispassionate thinking about teaching and learning in the 21st century—a place for waging the cerebral battle, rather than constantly leading with our chin or our hearts.

Teaching a method in isolation and practising it in isolation is important to develop confidence and competence with that method, and indeed, students can get pretty good pretty quickly. But if we do not then challenge them to decide when they should use that method – and crucially when they should not – we deny them the opportunity to identify the strategy needed to solve the problem.

There are two main arguments in favour of teaching a particular method before delving into why it works.

The path to flexible knowledge The key point that Willingham makes is that acquiring inflexible knowledge is a necessary step on the path to developing flexible knowledge. There is no short cut. The ‘why’ is conceptual and abstract. We understand concepts through examples. The ‘how’ generates our students’ experience of examples. In other words, often we have to do things several times to appreciate exactly how and why they work.

Motivation As Garon-Carrier et al. (2015) conclude, motivation is likely to be built on a foundation of success, and not the other way around.

The mistake I made for much of my career was trying to fast track my students to this [problem solving] stage. This was partly due to my obsession with differentiation – heaven forbid a child should be in their comfort zone for more than a few seconds – but also based on my belief that problem solving offered some sort of incredible 2-for-1 deal. I thought it would enable my students to practice the basics, whilst at the same time allowing them to develop that magic problem solving skill.

I will again quote John Mason: “It is the ways of thinking that are rich, not the task itself.”

Scala Math

I‘ve started a writing project recently that I’m having a good time working on so far. I’ve called it Scala Math (and on Twitter here) for now, because its central focus is deconstructing concepts and procedures into steps, and la scala is Italian for ‘staircase’. You can see the word at work in ‘escalator’, ‘scale’, etc. Scala is also the name of a programming language. Here are some reasons for that I found online.

Most of the projects I’ve worked on over the past few years have also been ways for me to learn new software languages or libraries. For Geometry Theorems, it was d3. For Scala, it was React—as well as the beautiful, amazing database that a normal person can actually look at and edit and it’s still a database: Airtable.

How It Works: Learn

Every Scala has a display window—where images and videos are shown—and a steps window, where you find the text of the steps, or ‘parts’. These areas are divided by a brain, which I’ll talk about below. When you land on a Scala (this one is Solving Arithmetic Sequences), the first thing shown in the display window is an image presenting a quick snippet of what will be covered. The image shows an essential question at the top. The use-case for the snippet was a student wanting a quick reminder about something they are working on, perhaps for homework, without having to search online and wade through tons of stuff that sorta-kinda matches what you want but not really.

The remainder of the section shown at left (called ‘Learn’ mode) is a series of steps (in this case, six), explained with text, audio narration, and the accompanying images that you can see appearing when clicking on each step. The dot navigation at the top shows us that we are on the first screen of this Scala.

Each step card has a button to replay the step, which can be pressed at any time while the step is active, and a button (up arrow) to go to the preceding step.

How It Works: Reflect

As you can see at the end of the video above, there is a Reflection question which calls for a short or extended text response. This is where the audio input on my cell phone comes in handy. Students’ responses are, at the moment, compared to a few ‘correct’ responses that I have written, and others have conributed to. The response which has the highest numerical match on a scale from 0 to 100 is presented as your score, and the pre-written response is presented as a suggested answer.

How It Works: Try

After the Learn phase is the Try phase, which consists of example-problem pairs (usually; for a very few cases, so far, stepped-out problems only). Or, more specifically, stepped-out problems followed by not-stepped-out problems. These look a little different from what I typically see as example-problem pairs, where the example and the problem are set side by side. Here, the problem follows the example, and the example is not provided when solving the problem. The typical sequence is shown below.

For the Try and Test phases, it’s always multiple choice, although it’s in the plan to look at other response inputs. When students are logged in, they build up (not earn; see below) points for every question. Right now, it’s just 50 points for each, though that gets cut in half and rounded up to the nearest integer for every incorrect answer. For an item with 3 choices, the lowest point total possible is 13. For an item with 4 choices, the lowest is 7.

On desktop, students can have the question read aloud via text-to-speech. As far as I know, that hasn’t yet come to mobile as a built-in feature, but I’ll keep my ears open for when it does.

How It Works: Test

Finally, there’s the Test phase. This is typically 4 to 6 questions that are of the same form as the ‘problems’ in the example-problem-pair Try phase. I’m just showing one such question in the video at the right.

When students are logged in, they can earn points by taking the test. The points are built up in both the Learn and Try phases. I have described how the points work for the Try phase above. The Learn phase is simpler: just clicking on a step builds up 100 points. At the moment, no points are tied to the Reflect question.

Once a student reaches the Test phase, the greatest number of points he or she can ‘bank’ is the number he or she has built up over the course of the Learn and Try phases. And the Test phase is fairly high stakes, in that each incorrect answer divides the total possible points to earn in half.

The stars shown on the score modal are awarded based on percent of total points earned. For the lesson shown in this post, the total that can be earned is 1700. So, approximately 560 points is 1 star (33%), 1130 points is 2 stars (66%), and 1360 points is 3 stars (80%).

Finally, to make sure this product connects knowledgeable people with students (whether they be parents or teachers or both) and guards against mindlessly pressing buttons to earn points, there is a final front-and-back activity, wherein students solve a different problem by listing the steps themselves and showing all their work.

Post-Hoc Confidences

A smart defense of any argument for less teacher-directed instruction in mathematics classrooms is to point to the logical connectedness of mathematics as a body of knowledge and suggest that students are capable of crossing many if not all of the logical bridges between propositions themselves, or with minimal guidance.

Such connectedness–it can be suggested–makes mathematics somewhat different from other school subjects. For example, given a student’s conceptual understanding of a fraction as a part-to-whole ratio, which can include his or her ability to represent a fraction with a visual or physical model, it seems to follow logically that he or she can then add two fractions and get the correct sum, so long as the student knows (intuitively or more formally) that addition is about combining values linearly. It doesn’t matter how many prerequisites there are for adding fractions. The suggestion is that once those prerequisites have been met, it is a matter of merely crossing a logical bridge to adding fractions (mostly) correctly.

By way of contrast, a student can’t really induce what happened after, say, the bombing of Pearl Harbor. They have to be informed about it directly. The effects can certainly be narrowed down using common sense reasoning and other domain-specific knowledge. But, ultimately, what happened happened and there is no reason to suspect that, in general, students can make their way through a study of history mostly blindfolded, relying only on logic and common sense.

The example of history brings up an interesting point (to me, anyway) about the example of mathematics, though. Historical consequences from historical causes can be dubbed “inevitable” only after the fact. How can we be sure it is not the same when learning anything, including mathematics? Once you know, conceptually as it were, what adding fractions is, of course it seems to be a purely logical consequence of what fractions are fundamentally. But is this seeming inevitability available to the novice, the learner who is aware of what fractions are but hasn’t ever thought about adding them? With the average novice is, after all, where that feeling of logical inevitability has to lie. It is not enough for educated adults to think of something as ‘logical’ after they already know it.

Bertrand Russell argues, in a 1907 essay, that even in mathematics we don’t proceed from premises to conclusions, but rather the other way around.

We tend to believe the premises because we can see that their consequences are true, instead of believing the consequences because we know the premises to be true. But the inferring of premises from consequences is the essence of induction [abduction]; thus the method in investigating the principles of mathematics is really an inductive method, and is substantially the same as the method of discovering in any other science.

So, how can we decide whether some bridge in reasoning is available to and crossable by the average novice? I hope it’s clear that we can’t just figure it out via anecdotes and armchair reasoning. Our intuitions can’t be trusted with this question. And our opinions one way or the other on the matter are not helpful, no matter what they are.

Providing Bad Intel

research

A really nice thing about scientific research is its transparency. Researchers write down the methods they use in their experiments—sometimes in excruciating detail—so that others can try to replicate their work if they choose. And scrutinizable methods allow us and other researchers to think about issues that the original experimenters might have overlooked—or, at least, didn’t mention in their published work.

Every once in a while we come across research which individuals themselves can simulate at home on a computer, even if they don’t have any participants, and this allows us to bring the experiment to life a little more than can be done with text descriptions.

The research I look at in this post is such a study. Students in the study (81 in all, from 7 to 10 years of age) were given an “app” very similar to the one shown below. Play with it a bit by clicking on the animal pictures to see what students were exposed to in this study.

The Method

In this study, students were presented with a question and then an explanation answering that question for the 12 animals shown above (images used in the study were different from above). Students rated the quality of explanations about animal biology on a 5-point scale. (In the version above, your ratings are not recorded. You can just click on the image of the rating system to move on.) The audio recorded in the app above use the questions and explanations from the study verbatim, though in the actual study two different people speak the questions and explanations (above, it’s just me).

As you could no doubt tell if you played around with the app above, some of the explanations are laughably bad. Researchers designated these as circular explanations (e.g., How do colugos use their skin flaps to travel? Their skin flaps help them to move from one place to another). The other, better explanations were identified as mechanistic explanations (e.g., How do thorny dragons use the grooves between their thorns to help them drink water? Their grooves collect water and send the water to their mouths). After rating the explanation, students were then given a choice to either get more information about the animal or to move on to a different animal. Here again, all you get is a screen to click on, and any click takes you back to the main screen with the 12 animals. In the actual study, students were given an even more detailed mechanistic explanation when clicking to get more information (e.g., Thorny dragons have grooves between their thorns, which are able to collect water. The water is drawn from groove to groove until it reaches their mouths, so they can suck water from all over their bodies).

The Curious Case of Curiosity

What the researchers found was that, in general, students were significantly more likely to click to get more information on an animal when the explanation given was circular. And, importantly, students were more likely to click to get more information when they rated the explanation as poor. This behavior—of clicking to get more information—was operationalized as curiosity and can be explained using the deprivation theory of curiosity.

In everyday life, children sometimes receive weak explanations in response to their questions. But what do children do when they receive weak explanations? According to the deprivation theory of curiosity, if children think that an explanation is unsatisfying, then they should sometimes feel inclined to seek out a better answer to their question to bolster their knowledge; the same is not true for explanations appraised as high in quality. To our knowledge, our research is the first to investigate this theory in regards to children’s science learning, examining whether 7- to 10-year-olds are more likely to seek out additional information in response to weak explanation than informative ones in the domain of biology.

But is that really curiosity? Do I stimulate your curiosity about colugos’ skin flaps by not really answering your questions about them? We can more easily answer no to this question if we assume that Square 1 represents students’ wanting to know something about colugos’ skin flaps. In that case, the initial question stimulates curiosity, as it were, and the non-explanation simply fails to satisfy this curiosity, or initial desire for knowledge. The circular explanation has not made them curious or even more curious. They were already curious. Not helping them scratch that itch just fails to move them to Square 2, which is where they wanted to go after hearing the question (knowing something about how colugos’ skin flaps work). The fact that students with unscratched itches were more likely to go to Square 3 is not surprising, since Square 3, for them, was actually Square 2, the square that everyone wanted to get to.

An Unavoidable Byproduct of Quality Teaching

If you are more inclined to believe the above interpretation, as I am, it might seem that we still must contend with the evidence that quality explanations were indeed shown to reduce information-seeking, relative to the levels of information-seeking shown for circular explanations. But this is not necessarily the case. What we see, from this study at least, is that not scratching the initial itch likely caused a different behavior in students than did scratching it. A clicking behavior did increase for students who still had itches, but this does not mean that it decreased for students who had no itch. We have evidence here that bad explanations are recognizably bad. We do not have evidence suggesting that quality explanations make students incurious.

If this is the case, though—if quality explanations reduce curiosity—it seems likely to me that it is simply an unavoidable byproduct of quality teaching. One that can be anticipated and planned for. Explanations are, after all, designed to reduce curiosity, in some sense. What high quality explanations do—in every scientific field and likely in our everyday lives—is move us on to different, better things to be curious about.


Thinking About and Thinking With

I have a tendency, when writing blog posts, to leave important things unsaid. So, let me fix that up front before I forget. What I wanted to say here was that, in my view, learning doesn’t happen unless we tick off all three boxes: encoding, consolidation, and retrieval.

It’s not that learning gets better or stronger when more of those boxes are ticked off. Learning isn’t possible—to some degree of certainty—in the first place without all three. And it’s not the case that focusing on just retrieval instead of just encoding or just consolidation represents some kind of revolution in pedagogical thinking. You’re simply ignoring one or two vital components of learning when before you were ignoring one or two other ones. Learning can still happen even when we don’t think about one or two (or all three) of the above components, but then it’s haphazard, random, implicit, and/or incidental. (In that case, learning comes down to mental horsepower and genes rather than processes over which we have some control.) All three components still must be addressed for learning to occur; it’s just that we can decide to not be in control of one or all of them (to students’ detriment).

But even then we’re not done. We’ve covered the components of the process of learning, but all three of those components intersect with another dimension of learning, which describes the products of learning: thinking about and thinking with.

Thinking With

In the previous post linked above, the examples of slope could all be categorized as “thinking about” slope. Put too simply, encoding the concept of slope means absorbing information about slope, retrieving knowledge about slope means remembering the slope concept and saying your knowledge out loud or writing it down, and consolidating knowledge about slope means practicing, such that what is encoded stays encoded and what is known can be retrieved.

All of this—encoding, consolidation, and retrieval—must happen with “thinking with” as well as with “thinking about.” Encoding–Thinking With, for example, would involve absorbing information about how slope can be applied to do other things, whether mathematically or in the real world. Common examples include designing wheelchair ramps (which have ADA-recommended height-length ratios of 1 : 12), measuring and comparing the steepnesses of things, and determining whether two lines are parallel or perpendicular (or not either). Consolidating–Thinking With would involve practice with that encoded knowledge—solving word problems is a typical example. Finally, Retrieving–Thinking With would involve remembering that encoded knowledge, particularly after some time has passed, say by using slope to solve a programming problem or a problem on a test.

All six boxes have to be checked off for learning to occur (such that it is within our control).

Teach Thinking With

In education, we have difficulties—again, in my view—with Encoding–Thinking With, and Consolidating–Thinking With. As far as these two are concerned, it is rare in my experience to see guidance and practice on a wide variety of different problems involving thinking with (for example) slope to answer questions that aren’t about slope. We tend to think that all we should do is give students a bunch of think-about slope facts and then hope for them to magically retrieve and apply those to thinking-with situations. We misunderstand transfer as some kind of conjuring out of thin air, so that’s what we give students—thin air. Then we stand back and hope to see some conjuring. When this doesn’t produce results that we’d like, we—for some unimaginably stupid reason—blame knowing facts for the problem, and instead of supplementing that knowledge with thinking-with teaching, we swap them. Which is much worse than what it tries to replace.

Instead, it is necessary to teach students how to think with slope and many other mathematical concepts, and to provide them with practice in thinking with these concepts. Thinking with is as much knowledge as thinking about is.

One of my favorite examples of thinking with slope—and one which I have, admittedly, not yet written a lesson about—has to do with drawing convex or concave quadrilaterals. Given 4 coordinate pairs for points that can form a convex or concave quadrilateral (no three points lie on the same line, etc.), how can I decide, somewhat algorithmically, on the order in which I should connect the points, such that the line segments I actually draw do create a quadrilateral and not the image on the far right?

One way to go about it is to first select the leftmost point—the point with the lowest x-coordinate (there could be two, with equal x-coordinates, but I’ll leave that to the reader to figure out). Then calculate the slope of each line connecting the leftmost point with each other point. The order in which the points can be connected is the order of the slopes from least to greatest. This process would create a different proper quadrilateral than the one shown in the middle above.

Checking Off All the Boxes

Students’ minds are not magical. They don’t turn raw facts magically into applied understanding (the extreme traditionalist view), and they don’t magically vacuum up knowledge hidden in applied contexts (the extreme constructivist view). Put more accurately, this kind of magic does happen (which is why we believe in it), but it happens outside of our control, as a result of genetic and socioeconomic differences, so we can take no credit for it.

Importantly, ignoring components of students’ learning, for whatever reason, subjects them to a roll of the dice. Those students who start behind stay behind, and those who are underserved stay so. We seem to have enough leftover energy to try our hand at amateur psychology and social-emotional learning. Why not take a fraction of that energy and channel it into, you know, plain ol’ teaching?

Learn: You Keep Using That Word

Sometimes kids say “nothing” when their parents ask them what they learned in school today. And, although that response is something we don’t want to hear, it is probably closer to the truth than we want to believe, because, as we all most certainly know, learning doesn’t really happen in a single class period. And, when it does, it’s not learning per se, but encoding, consolidation, or retrieval—or some mixture of the three.

Encoding

The encoding stage involves introducing you to some knowledge pattern in the natural, social, or academic environment. For example, you may know ratios and rates, how to graph lines on the coordinate plane, and what steepness is, but at some point you are completely new to the concept of slope—which packages those former concepts into a unique bundle—so encoding is what happens when you are first introduced to slope.

There are a few important things to note here. First, slope could have been introduced, or encoded, as an isolated dot. (Well, not exactly. Nothing is ever completely “isolated.” But you get the idea.) Second, regardless whether it is encoded as a standalone concept or as a package of concepts, slope is a new object of knowledge. It is perhaps possible now for the slope blob above to interact or connect with the green blob of content knowledge, whereas none of the individual items can do so. And, third, whatever we mean by slope above, we cannot mean the entire concept of slope (whatever that means anyway).

Consolidation

The new concept of slope on the right is a little too complete to represent encoding, plus any structure created there fades quickly over time like pictures in Back to the Future (forgetting). This is where consolidation comes in. Consolidation solidifies and maintains the arrangements of knowledge components assembled by encoding.

Generally, consolidation is associated with simple practice—i.e., practicing the concept you have encoded rather than extending or altering the concept in any way. But it is as true to say that you are learning slope via simple practice as it is to say that you are doing so by encoding the concept in an introductory lesson.

Retrieval

Finally, there is retrieval, which is the process of reconstructing an encoded concept from memory in response to a natural or artificial stimulus. What is the slope of a horizontal line? The answer to this question requires triggering the slope concept, where the answer may be directly stored, or you may have to drill down into the slope package above—into the ratios and rates concepts—to figure out that the slope of a horizontal line is a 0 rise over some nonzero run, so the answer is 0. Or, the fact that the slope of a horizontal line is 0 can be stored together with the concept package shown above, giving you two ways to figure out the answer.

Why should retrieving a concept to answer questions be considered a part of learning that concept? Because, at minimum, retrieving strengthens an encoded concept.

Explicitation

research

I came across this case study recently that I managed to like a little. It focuses on an analysis of a Singapore teacher’s practice of making things explicit in his classroom. Specifically, the paper outlines three ways the teacher engages in explicitation (as the authors call it): (1) making ideas in the base materials (i.e., textbook) explicit in the lesson plan, (2) making ideas within the plan of the unit more explicit, and (3) making ideas explicit in the enactment of teaching the unit(s). These parts are shown in the diagram below, which I have redrawn, with minor modifications, from the paper.

The teacher interviewed for this case study, “Teck Kim,” taught math to Year 11 (10th grade) students in the “Normal (Academic)” track, and the work focus of the case study was on a unit the teacher called “Vectors in Two Dimensions.”

Explicit From

The first category of explicitation, Explicit From, involves using base materials such as a textbook as a starting point and adapting these materials to make more explicit what it is the teacher wants students to learn. The paper provides an illustration of some of the textbook content related to explaining column vectors, along with Kim’s adaptation. I have again redrawn below what was provided in the paper. Here I also made minor modifications to the layout of the textbook example and one small change to fix a possible translation error (or typo) in the teacher’s example. The textbook content is on the left, and the teacher’s is on the right (if it wasn’t painfully obvious).

There are many interesting things to notice about the teacher’s adaptation. Most obviously, it is much simpler than the textbook’s explanation. This is due, in part, to the adaptation’s leaving magnitude unexplained during the presentation and instead asking a leading question about it.

The textbook presented the process of calculating the magnitudes of the given vectors, leading to a ‘formula’ of \(\mathtt{\sqrt{x^2+y^2}}\) for column vector (\(\mathtt{x y}\)). In its place, Teck Kim’s notes appeared to compress all these into one question: “How would you calculate the magnitude?” On the surface, it appears that Teck Kim was less explicit than the textbook in the computational process of magnitude. But a careful examination into the pre-module interview reveals that the compression of this section into a question was deliberate . . . He meant to use the question to trigger students’ initial thoughts on the manner—which would then serve to ready their frame of mind when the teacher explains the procedure in class.

So, it is not the case that explanation has been removed—only that the teacher has moved the explication of vector magnitude into the Explicit To section of the process. We can also notice, then, in this Explicit From phase, that the teacher makes use of both dual coding and variation theory in his compression of the to-be-explained material. The text in the teacher’s work is placed directly next to the diagram as labels to describe the meaning of each component of the vector, and the vector that students are to draw varies minimally from the one demonstrated: a change in sign is the only difference, allowing students to see how negative components change the direction of a vector. All much more efficient and effective than the textbook’s try at the same material.

Explicit Within

Intriguingly, Explicit Within is harder to explain than the other two, but is closer to the work I do every day. A quote from the article nicely describes explicitation within the teacher’s own lesson plan as an “inter-unit implicit-to-explicit strategy”:

This inter-unit implicit-to-explicit strategy reveals a level of sophistication in the crafting of instructional materials that we had not previously studied. The common anecdotal portrayal of Singapore mathematics teachers’ use of materials is one of numerous similar routine exercise items for students to repetitively practise the same skill to gain fluency. In the case of Teck Kim’s notes, it was not pure repetitive practice that was in play; rather, students were given the opportunity to revisit similar tasks and representations but with added richness of perspective each time.

We saw a very small example of explicit-within above as well. The plan, following the textbook, would have delayed the introduction of negative components of vectors, but Teck Kim introduces it early, as a variational difference. The idea is not necessarily that students should know it cold from the beginning, but that it serves a useful instructional purpose even before it is consolidated.

Explicit To

Finally, there is Explicit To, which refers to the classroom implementation of explicitation, and which needs no lengthy description. I’ll leave you with a quote again from the paper.

No matter how well the instructional materials were designed, Teck Kim recognised the limitations to the extent in which the notes by itself can help make things explicit to the students. The explicitation strategy must go beyond the contents contained in the notes. In particular, he used the notes as a springboard to connect to further examples and explanations he would provide during in-class instruction. He drew students’ attention to questions spelt out in the notes, created opportunities for students to formulate initial thoughts and used these preparatory moves to link to the explicit content he subsequently covered in class.

Almost Variation with Inequalities

I‘ve started thinking about Modules 0 for Grade 6. And I’ve written my first sequence for inequalities, which I’ll show below. Although I tried to design the sequence using ideas from variation theory, I found that the specific goal I had for this sequence—writing inequalities of the form x < c and c < x from number line models—did not make it easy to think of a boatload of questions I could ask, each slightly different from the previous one. Plus, I had some slightly more robust instructional goals in mind. Still, I found that it paid off to even just try thinking about variation.

So, I start with the video below, which serves as the first (and only) instructional worked example in the sequence.


I use the Silent Teacher method, wherein I essentially show the worked example twice, the second time with my voice annotating what I’m seeing, doing, and thinking as I write the inequality to represent the two models. In the lesson, I include a brief reminder to students above the video what the inequality symbols mean and what the equals sign means.

My assumptions with regard to this content are that students have seen and used inequality symbols for a long time before they get to Grade 6, though primarily with positive numbers and not variables or negatives. So, this represents a kind of “start-again” topic, which is one reason why I include the block models along with the number line model. It is a compromise between extending the concept and reviewing it: so I do a bit of both.

Another reason I include the block models is because they make a solid, albeit abstract, connection to the use of inequalities with algebraic expressions to express relative values in situations where we don’t know one of the values. We know that q above represents a number greater than x, but we can’t mark q on the number line because we don’t know its exact value. This is what the thinking question below the video is hopefully getting at. It’s numbered in case an instructor wants to assign the sequence to a student.

The Sequence

After the video, there is a sequence of a mere 8 questions. The first of these, shown at the right, is not a typical “Your Turn” type of question, where the student tries out a technique on a very similar problem. Here we unpack the other ways to express the inequalities shown in the video—it’s important to constantly make the point that there is almost always a few different ways of looking at mathematical relationships—and we include the equation, in part because research tells us that comparing the equals sign with other relational operators reinforces the correct relational view of the equals sign.

Next up is a more typical Your Turn, with a block model and number line model both closely mirroring the models shown in the video.

Students can write n or 1 to represent the single block (or the point labeled with both n and 1 on the number line). Doing so helpfully reinforces a slightly better meaning of “variable,” which is a letter that represents any quantity, known or unknown.

And here, for the first time (in a thinking question), I ask students to relate the number line model to the blocks model.

The next question in the sequence is an example of some minimal variation. What’s different here is that the m and n block towers switch sides in the illustration, and the inequality model on the number line shifts to the right. Everything else stays the way it was.

We could continue in this way, adding or subtracting blocks, switching sides, etc., but this kind of model has limitations that don’t allow for examining more of the variation space. But we can hint at the fact that adding the same number to both sides of an inequality doesn’t change the direction of the inequality.

And that’s what we do in the next exercise in the sequence. Here also, the known number is moved along the number line. The thinking question I ask here is:

Would adding 1 block to each tower change the direction of the inequality? Why or why not?

I phrase the question as a hypothetical because, strictly speaking, it’s not evident from the diagram that I added exactly 1 block to tower m.

And Now for a Big Change

Now we see how this isn’t really a sequence of minimal variation. One reason for the change-up is that I realized too late that the model I started with could only show the greater quantity as the unknown quantity. I thought about changing to a different model, one which could show the full range of variation, but I couldn’t think of a situation that worked.

This example, in which the larger quantity (the greater height) is the known, was too good to pass up. And it gave me a context to foreshadow subtracting both sides of an inequality by the same number, which is what (kind of) happens in the next exercise.

Here, though—and again—it was not plausible to hit this balance of operations idea directly (plus, it’s outside of the scope anyway). We only hint at it. But we still ask the thinking question—again, as a hypothetical—about whether subtracting the same value from both quantities changes the direction of the inequality.

The height examples, and perhaps all of the items in the sequence, lie somewhere between minimal variation and maximal variation. At some point while designing it, I had to stop searching for more perfect examples and just run with it.

The final two items in the sequence present two more (more or less abstract) situations where inequalities seem to fit.

The first, shown at the right, is the “swarm,” which contains too many items to count, though we can know for sure that the number is a greater value than 6. Here too is an example situation that better fits with the idea of a larger unknown that couldn’t be handled by the earlier block models.

In this example, I’ve switched up the labels on the number line for a small taste of minimal variation within all the macro variation going on.

Finally, there’s temperature and a quick example showing negative numbers.

What we get at here, also, is that we haven’t left the universe of comparing numbers just because we’re introducing a little algebra. Plus, I’ve eliminated the number line model here, just for a little flavor—and it’s too close in appearance to the thermometer levels. I didn’t want that confusion creeping in.