## Post-Hoc Confidences

A smart defense of any argument for less teacher-directed instruction in mathematics classrooms is to point to the logical connectedness of mathematics as a body of knowledge and suggest that students are capable of crossing many if not all of the logical bridges between propositions themselves, or with minimal guidance.

Such connectedness–it can be suggested–makes mathematics somewhat different from other school subjects. For example, given a student’s conceptual understanding of a fraction as a part-to-whole ratio, which can include his or her ability to represent a fraction with a visual or physical model, it seems to follow logically that he or she can then add two fractions and get the correct sum, so long as the student knows (intuitively or more formally) that addition is about combining values linearly. It doesn’t matter how many prerequisites there are for adding fractions. The suggestion is that once those prerequisites have been met, it is a matter of merely crossing a logical bridge to adding fractions (mostly) correctly.

By way of contrast, a student can’t really induce what happened after, say, the bombing of Pearl Harbor. They have to be informed about it directly. The effects can certainly be narrowed down using common sense reasoning and other domain-specific knowledge. But, ultimately, what happened happened and there is no reason to suspect that, in general, students can make their way through a study of history mostly blindfolded, relying only on logic and common sense.

The example of history brings up an interesting point (to me, anyway) about the example of mathematics, though. Historical consequences from historical causes can be dubbed “inevitable” only after the fact. How can we be sure it is not the same when learning anything, including mathematics? Once you know, conceptually as it were, what adding fractions is, of course it seems to be a purely logical consequence of what fractions are fundamentally. But is this seeming inevitability available to the novice, the learner who is aware of what fractions are but hasn’t ever thought about adding them? With the average novice is, after all, where that feeling of logical inevitability has to lie. It is not enough for educated adults to think of something as ‘logical’ after they already know it.

Bertrand Russell argues, in a 1907 essay, that even in mathematics we don’t proceed from premises to conclusions, but rather the other way around.

We tend to believe the premises because we can see that their consequences are true, instead of believing the consequences because we know the premises to be true. But the inferring of premises from consequences is the essence of induction [abduction]; thus the method in investigating the principles of mathematics is really an inductive method, and is substantially the same as the method of discovering in any other science.

So, how can we decide whether some bridge in reasoning is available to and crossable by the average novice? I hope it’s clear that we can’t just figure it out via anecdotes and armchair reasoning. Our intuitions can’t be trusted with this question. And our opinions one way or the other on the matter are not helpful, no matter what they are.

research

A really nice thing about scientific research is its transparency. Researchers write down the methods they use in their experiments—sometimes in excruciating detail—so that others can try to replicate their work if they choose. And scrutinizable methods allow us and other researchers to think about issues that the original experimenters might have overlooked—or, at least, didn’t mention in their published work.

Every once in a while we come across research which individuals themselves can simulate at home on a computer, even if they don’t have any participants, and this allows us to bring the experiment to life a little more than can be done with text descriptions.

The research I look at in this post is such a study. Students in the study (81 in all, from 7 to 10 years of age) were given an “app” very similar to the one shown below. Play with it a bit by clicking on the animal pictures to see what students were exposed to in this study.

The Method

In this study, students were presented with a question and then an explanation answering that question for the 12 animals shown above (images used in the study were different from above). Students rated the quality of explanations about animal biology on a 5-point scale. (In the version above, your ratings are not recorded. You can just click on the image of the rating system to move on.) The audio recorded in the app above use the questions and explanations from the study verbatim, though in the actual study two different people speak the questions and explanations (above, it’s just me).

As you could no doubt tell if you played around with the app above, some of the explanations are laughably bad. Researchers designated these as circular explanations (e.g., How do colugos use their skin flaps to travel? Their skin flaps help them to move from one place to another). The other, better explanations were identified as mechanistic explanations (e.g., How do thorny dragons use the grooves between their thorns to help them drink water? Their grooves collect water and send the water to their mouths). After rating the explanation, students were then given a choice to either get more information about the animal or to move on to a different animal. Here again, all you get is a screen to click on, and any click takes you back to the main screen with the 12 animals. In the actual study, students were given an even more detailed mechanistic explanation when clicking to get more information (e.g., Thorny dragons have grooves between their thorns, which are able to collect water. The water is drawn from groove to groove until it reaches their mouths, so they can suck water from all over their bodies).

The Curious Case of Curiosity

What the researchers found was that, in general, students were significantly more likely to click to get more information on an animal when the explanation given was circular. And, importantly, students were more likely to click to get more information when they rated the explanation as poor. This behavior—of clicking to get more information—was operationalized as curiosity and can be explained using the deprivation theory of curiosity.

In everyday life, children sometimes receive weak explanations in response to their questions. But what do children do when they receive weak explanations? According to the deprivation theory of curiosity, if children think that an explanation is unsatisfying, then they should sometimes feel inclined to seek out a better answer to their question to bolster their knowledge; the same is not true for explanations appraised as high in quality. To our knowledge, our research is the ﬁrst to investigate this theory in regards to children’s science learning, examining whether 7- to 10-year-olds are more likely to seek out additional information in response to weak explanation than informative ones in the domain of biology.

But is that really curiosity? Do I stimulate your curiosity about colugos’ skin flaps by not really answering your questions about them? We can more easily answer no to this question if we assume that Square 1 represents students’ wanting to know something about colugos’ skin flaps. In that case, the initial question stimulates curiosity, as it were, and the non-explanation simply fails to satisfy this curiosity, or initial desire for knowledge. The circular explanation has not made them curious or even more curious. They were already curious. Not helping them scratch that itch just fails to move them to Square 2, which is where they wanted to go after hearing the question (knowing something about how colugos’ skin flaps work). The fact that students with unscratched itches were more likely to go to Square 3 is not surprising, since Square 3, for them, was actually Square 2, the square that everyone wanted to get to.

An Unavoidable Byproduct of Quality Teaching

If you are more inclined to believe the above interpretation, as I am, it might seem that we still must contend with the evidence that quality explanations were indeed shown to reduce information-seeking, relative to the levels of information-seeking shown for circular explanations. But this is not necessarily the case. What we see, from this study at least, is that not scratching the initial itch likely caused a different behavior in students than did scratching it. A clicking behavior did increase for students who still had itches, but this does not mean that it decreased for students who had no itch. We have evidence here that bad explanations are recognizably bad. We do not have evidence suggesting that quality explanations make students incurious.

If this is the case, though—if quality explanations reduce curiosity—it seems likely to me that it is simply an unavoidable byproduct of quality teaching. One that can be anticipated and planned for. Explanations are, after all, designed to reduce curiosity, in some sense. What high quality explanations do—in every scientific field and likely in our everyday lives—is move us on to different, better things to be curious about.

## Thinking About and Thinking With

I have a tendency, when writing blog posts, to leave important things unsaid. So, let me fix that up front before I forget. What I wanted to say here was that, in my view, learning doesn’t happen unless we tick off all three boxes: encoding, consolidation, and retrieval.

It’s not that learning gets better or stronger when more of those boxes are ticked off. Learning isn’t possible—to some degree of certainty—in the first place without all three. And it’s not the case that focusing on just retrieval instead of just encoding or just consolidation represents some kind of revolution in pedagogical thinking. You’re simply ignoring one or two vital components of learning when before you were ignoring one or two other ones. Learning can still happen even when we don’t think about one or two (or all three) of the above components, but then it’s haphazard, random, implicit, and/or incidental. (In that case, learning comes down to mental horsepower and genes rather than processes over which we have some control.) All three components still must be addressed for learning to occur; it’s just that we can decide to not be in control of one or all of them (to students’ detriment).

But even then we’re not done. We’ve covered the components of the process of learning, but all three of those components intersect with another dimension of learning, which describes the products of learning: thinking about and thinking with.

Thinking With

In the previous post linked above, the examples of slope could all be categorized as “thinking about” slope. Put too simply, encoding the concept of slope means absorbing information about slope, retrieving knowledge about slope means remembering the slope concept and saying your knowledge out loud or writing it down, and consolidating knowledge about slope means practicing, such that what is encoded stays encoded and what is known can be retrieved.

All of this—encoding, consolidation, and retrieval—must happen with “thinking with” as well as with “thinking about.” Encoding–Thinking With, for example, would involve absorbing information about how slope can be applied to do other things, whether mathematically or in the real world. Common examples include designing wheelchair ramps (which have ADA-recommended height-length ratios of 1 : 12), measuring and comparing the steepnesses of things, and determining whether two lines are parallel or perpendicular (or not either). Consolidating–Thinking With would involve practice with that encoded knowledge—solving word problems is a typical example. Finally, Retrieving–Thinking With would involve remembering that encoded knowledge, particularly after some time has passed, say by using slope to solve a programming problem or a problem on a test.

All six boxes have to be checked off for learning to occur (such that it is within our control).

Teach Thinking With

In education, we have difficulties—again, in my view—with Encoding–Thinking With, and Consolidating–Thinking With. As far as these two are concerned, it is rare in my experience to see guidance and practice on a wide variety of different problems involving thinking with (for example) slope to answer questions that aren’t about slope. We tend to think that all we should do is give students a bunch of think-about slope facts and then hope for them to magically retrieve and apply those to thinking-with situations. We misunderstand transfer as some kind of conjuring out of thin air, so that’s what we give students—thin air. Then we stand back and hope to see some conjuring. When this doesn’t produce results that we’d like, we—for some unimaginably stupid reason—blame knowing facts for the problem, and instead of supplementing that knowledge with thinking-with teaching, we swap them. Which is much worse than what it tries to replace.

Instead, it is necessary to teach students how to think with slope and many other mathematical concepts, and to provide them with practice in thinking with these concepts. Thinking with is as much knowledge as thinking about is.

One of my favorite examples of thinking with slope—and one which I have, admittedly, not yet written a lesson about—has to do with drawing convex or concave quadrilaterals. Given 4 coordinate pairs for points that can form a convex or concave quadrilateral (no three points lie on the same line, etc.), how can I decide, somewhat algorithmically, on the order in which I should connect the points, such that the line segments I actually draw do create a quadrilateral and not the image on the far right?

One way to go about it is to first select the leftmost point—the point with the lowest x-coordinate (there could be two, with equal x-coordinates, but I’ll leave that to the reader to figure out). Then calculate the slope of each line connecting the leftmost point with each other point. The order in which the points can be connected is the order of the slopes from least to greatest. This process would create a different proper quadrilateral than the one shown in the middle above.

Checking Off All the Boxes

Students’ minds are not magical. They don’t turn raw facts magically into applied understanding (the extreme traditionalist view), and they don’t magically vacuum up knowledge hidden in applied contexts (the extreme constructivist view). Put more accurately, this kind of magic does happen (which is why we believe in it), but it happens outside of our control, as a result of genetic and socioeconomic differences, so we can take no credit for it.

Importantly, ignoring components of students’ learning, for whatever reason, subjects them to a roll of the dice. Those students who start behind stay behind, and those who are underserved stay so. We seem to have enough leftover energy to try our hand at amateur psychology and social-emotional learning. Why not take a fraction of that energy and channel it into, you know, plain ol’ teaching?

## Learn: You Keep Using That Word

Sometimes kids say “nothing” when their parents ask them what they learned in school today. And, although that response is something we don’t want to hear, it is probably closer to the truth than we want to believe, because, as we all most certainly know, learning doesn’t really happen in a single class period. And, when it does, it’s not learning per se, but encoding, consolidation, or retrieval—or some mixture of the three.

Encoding

The encoding stage involves introducing you to some knowledge pattern in the natural, social, or academic environment. For example, you may know ratios and rates, how to graph lines on the coordinate plane, and what steepness is, but at some point you are completely new to the concept of slope—which packages those former concepts into a unique bundle—so encoding is what happens when you are first introduced to slope.

There are a few important things to note here. First, slope could have been introduced, or encoded, as an isolated dot. (Well, not exactly. Nothing is ever completely “isolated.” But you get the idea.) Second, regardless whether it is encoded as a standalone concept or as a package of concepts, slope is a new object of knowledge. It is perhaps possible now for the slope blob above to interact or connect with the green blob of content knowledge, whereas none of the individual items can do so. And, third, whatever we mean by slope above, we cannot mean the entire concept of slope (whatever that means anyway).

Consolidation

The new concept of slope on the right is a little too complete to represent encoding, plus any structure created there fades quickly over time like pictures in Back to the Future (forgetting). This is where consolidation comes in. Consolidation solidifies and maintains the arrangements of knowledge components assembled by encoding.

Generally, consolidation is associated with simple practice—i.e., practicing the concept you have encoded rather than extending or altering the concept in any way. But it is as true to say that you are learning slope via simple practice as it is to say that you are doing so by encoding the concept in an introductory lesson.

Retrieval

Finally, there is retrieval, which is the process of reconstructing an encoded concept from memory in response to a natural or artificial stimulus. What is the slope of a horizontal line? The answer to this question requires triggering the slope concept, where the answer may be directly stored, or you may have to drill down into the slope package above—into the ratios and rates concepts—to figure out that the slope of a horizontal line is a 0 rise over some nonzero run, so the answer is 0. Or, the fact that the slope of a horizontal line is 0 can be stored together with the concept package shown above, giving you two ways to figure out the answer.

Why should retrieving a concept to answer questions be considered a part of learning that concept? Because, at minimum, retrieving strengthens an encoded concept.

## Explicitation

research

I came across this case study recently that I managed to like a little. It focuses on an analysis of a Singapore teacher’s practice of making things explicit in his classroom. Specifically, the paper outlines three ways the teacher engages in explicitation (as the authors call it): (1) making ideas in the base materials (i.e., textbook) explicit in the lesson plan, (2) making ideas within the plan of the unit more explicit, and (3) making ideas explicit in the enactment of teaching the unit(s). These parts are shown in the diagram below, which I have redrawn, with minor modifications, from the paper.

The teacher interviewed for this case study, “Teck Kim,” taught math to Year 11 (10th grade) students in the “Normal (Academic)” track, and the work focus of the case study was on a unit the teacher called “Vectors in Two Dimensions.”

Explicit From

The first category of explicitation, Explicit From, involves using base materials such as a textbook as a starting point and adapting these materials to make more explicit what it is the teacher wants students to learn. The paper provides an illustration of some of the textbook content related to explaining column vectors, along with Kim’s adaptation. I have again redrawn below what was provided in the paper. Here I also made minor modifications to the layout of the textbook example and one small change to fix a possible translation error (or typo) in the teacher’s example. The textbook content is on the left, and the teacher’s is on the right (if it wasn’t painfully obvious).

There are many interesting things to notice about the teacher’s adaptation. Most obviously, it is much simpler than the textbook’s explanation. This is due, in part, to the adaptation’s leaving magnitude unexplained during the presentation and instead asking a leading question about it.

The textbook presented the process of calculating the magnitudes of the given vectors, leading to a ‘formula’ of $$\mathtt{\sqrt{x^2+y^2}}$$ for column vector ($$\mathtt{x y}$$). In its place, Teck Kim’s notes appeared to compress all these into one question: “How would you calculate the magnitude?” On the surface, it appears that Teck Kim was less explicit than the textbook in the computational process of magnitude. But a careful examination into the pre-module interview reveals that the compression of this section into a question was deliberate . . . He meant to use the question to trigger students’ initial thoughts on the manner—which would then serve to ready their frame of mind when the teacher explains the procedure in class.

So, it is not the case that explanation has been removed—only that the teacher has moved the explication of vector magnitude into the Explicit To section of the process. We can also notice, then, in this Explicit From phase, that the teacher makes use of both dual coding and variation theory in his compression of the to-be-explained material. The text in the teacher’s work is placed directly next to the diagram as labels to describe the meaning of each component of the vector, and the vector that students are to draw varies minimally from the one demonstrated: a change in sign is the only difference, allowing students to see how negative components change the direction of a vector. All much more efficient and effective than the textbook’s try at the same material.

Explicit Within

Intriguingly, Explicit Within is harder to explain than the other two, but is closer to the work I do every day. A quote from the article nicely describes explicitation within the teacher’s own lesson plan as an “inter-unit implicit-to-explicit strategy”:

This inter-unit implicit-to-explicit strategy reveals a level of sophistication in the crafting of instructional materials that we had not previously studied. The common anecdotal portrayal of Singapore mathematics teachers’ use of materials is one of numerous similar routine exercise items for students to repetitively practise the same skill to gain fluency. In the case of Teck Kim’s notes, it was not pure repetitive practice that was in play; rather, students were given the opportunity to revisit similar tasks and representations but with added richness of perspective each time.

We saw a very small example of explicit-within above as well. The plan, following the textbook, would have delayed the introduction of negative components of vectors, but Teck Kim introduces it early, as a variational difference. The idea is not necessarily that students should know it cold from the beginning, but that it serves a useful instructional purpose even before it is consolidated.

Explicit To

Finally, there is Explicit To, which refers to the classroom implementation of explicitation, and which needs no lengthy description. I’ll leave you with a quote again from the paper.

No matter how well the instructional materials were designed, Teck Kim recognised the limitations to the extent in which the notes by itself can help make things explicit to the students. The explicitation strategy must go beyond the contents contained in the notes. In particular, he used the notes as a springboard to connect to further examples and explanations he would provide during in-class instruction. He drew students’ attention to questions spelt out in the notes, created opportunities for students to formulate initial thoughts and used these preparatory moves to link to the explicit content he subsequently covered in class.

## Almost Variation with Inequalities

I‘ve started thinking about Modules 0 for Grade 6. And I’ve written my first sequence for inequalities, which I’ll show below. Although I tried to design the sequence using ideas from variation theory, I found that the specific goal I had for this sequence—writing inequalities of the form x < c and c < x from number line models—did not make it easy to think of a boatload of questions I could ask, each slightly different from the previous one. Plus, I had some slightly more robust instructional goals in mind. Still, I found that it paid off to even just try thinking about variation.

So, I start with the video below, which serves as the first (and only) instructional worked example in the sequence.

I use the Silent Teacher method, wherein I essentially show the worked example twice, the second time with my voice annotating what I’m seeing, doing, and thinking as I write the inequality to represent the two models. In the lesson, I include a brief reminder to students above the video what the inequality symbols mean and what the equals sign means.

My assumptions with regard to this content are that students have seen and used inequality symbols for a long time before they get to Grade 6, though primarily with positive numbers and not variables or negatives. So, this represents a kind of “start-again” topic, which is one reason why I include the block models along with the number line model. It is a compromise between extending the concept and reviewing it: so I do a bit of both.

Another reason I include the block models is because they make a solid, albeit abstract, connection to the use of inequalities with algebraic expressions to express relative values in situations where we don’t know one of the values. We know that q above represents a number greater than x, but we can’t mark q on the number line because we don’t know its exact value. This is what the thinking question below the video is hopefully getting at. It’s numbered in case an instructor wants to assign the sequence to a student.

The Sequence

After the video, there is a sequence of a mere 8 questions. The first of these, shown at the right, is not a typical “Your Turn” type of question, where the student tries out a technique on a very similar problem. Here we unpack the other ways to express the inequalities shown in the video—it’s important to constantly make the point that there is almost always a few different ways of looking at mathematical relationships—and we include the equation, in part because research tells us that comparing the equals sign with other relational operators reinforces the correct relational view of the equals sign.

Next up is a more typical Your Turn, with a block model and number line model both closely mirroring the models shown in the video.

Students can write n or 1 to represent the single block (or the point labeled with both n and 1 on the number line). Doing so helpfully reinforces a slightly better meaning of “variable,” which is a letter that represents any quantity, known or unknown.

And here, for the first time (in a thinking question), I ask students to relate the number line model to the blocks model.

The next question in the sequence is an example of some minimal variation. What’s different here is that the m and n block towers switch sides in the illustration, and the inequality model on the number line shifts to the right. Everything else stays the way it was.

We could continue in this way, adding or subtracting blocks, switching sides, etc., but this kind of model has limitations that don’t allow for examining more of the variation space. But we can hint at the fact that adding the same number to both sides of an inequality doesn’t change the direction of the inequality.

And that’s what we do in the next exercise in the sequence. Here also, the known number is moved along the number line. The thinking question I ask here is:

Would adding 1 block to each tower change the direction of the inequality? Why or why not?

I phrase the question as a hypothetical because, strictly speaking, it’s not evident from the diagram that I added exactly 1 block to tower m.

And Now for a Big Change

Now we see how this isn’t really a sequence of minimal variation. One reason for the change-up is that I realized too late that the model I started with could only show the greater quantity as the unknown quantity. I thought about changing to a different model, one which could show the full range of variation, but I couldn’t think of a situation that worked.

This example, in which the larger quantity (the greater height) is the known, was too good to pass up. And it gave me a context to foreshadow subtracting both sides of an inequality by the same number, which is what (kind of) happens in the next exercise.

Here, though—and again—it was not plausible to hit this balance of operations idea directly (plus, it’s outside of the scope anyway). We only hint at it. But we still ask the thinking question—again, as a hypothetical—about whether subtracting the same value from both quantities changes the direction of the inequality.

The height examples, and perhaps all of the items in the sequence, lie somewhere between minimal variation and maximal variation. At some point while designing it, I had to stop searching for more perfect examples and just run with it.

The final two items in the sequence present two more (more or less abstract) situations where inequalities seem to fit.

The first, shown at the right, is the “swarm,” which contains too many items to count, though we can know for sure that the number is a greater value than 6. Here too is an example situation that better fits with the idea of a larger unknown that couldn’t be handled by the earlier block models.

In this example, I’ve switched up the labels on the number line for a small taste of minimal variation within all the macro variation going on.

Finally, there’s temperature and a quick example showing negative numbers.

What we get at here, also, is that we haven’t left the universe of comparing numbers just because we’re introducing a little algebra. Plus, I’ve eliminated the number line model here, just for a little flavor—and it’s too close in appearance to the thermometer levels. I didn’t want that confusion creeping in.

## Sicklied O’er

research

My grandfather used to tell me a story about a young boy who was stuck in traffic with his family for hours because an 18-wheeler had got itself pinned under an overpass bridge ahead of them. The huge truck was wedged in so strongly and strangely that a flock of engineers had descended on the scene. They argued back and forth about their favorite physical and mathematical models that would unpin the trapped vehicle and release the miles-long stream of cars idling behind it on the freeway. This bickering went on for hours—until the boy got out of his car, walked up to the group of engineers, and shouted, “Why don’t you just let the air out the tires!”

It’s a nice story, precisely because it’s so rare and noticeable. We don’t notice unbroken strings of solved problems from experts, because that’s what we expect of experts—and, for the most part, what we get from them. We notice when they fail. And, because these failures are more noticeable than the far more boring and numerable successes, we fall prey to availability bias, and assume that expert failure occurs with much more regularity than it actually does. (In turn, we start to think that it’s maybe a good idea to keep students naive and, therefore, creative and open-minded rather than have them study things that other people have already figured out.) As Tom Nichols writes in The Death of Expertise:

At the root of all this is an inability among laypeople to understand that experts being wrong on occasion about certain issues is not the same thing as experts being wrong consistently on everything. The fact of the matter is that experts are more often right than wrong, especially on essential matters of fact. And yet the public constantly searches for the loopholes in expert knowledge that will allow them to disregard all expert advice they don’t like.

A 2008 study which put this folk notion of expert inflexibility to the test compared chess experts and novices, and measured the famous Einstellung effect in both groups across three experiments.

In the first experiment, the experts were given the board on the left and were instructed to find the shortest solution. The board on the left is designed to activate a motif familiar to chess experts (and thus activate Einstellung)—the smothered mate motif—which can be carried out using 5 moves. A shorter solution (3 moves) also exists, however.

If the experts failed to find the three-move solution, they were then given the board on the right. This board can be solved by the shorter three-move solution but not by the Einstellung motif of the smothered mate. The group of novices in the experiment were all given this second board (the one on the right) featuring the three-move mate solution without the Einstellung motif as well.

Findings

If knowledge corrupts insight, as it were, then the experts would, by and large, be fixated by the smothered mate sequence and miss the three-move solution. And this is indeed what happened—sort of. What the researchers found was that level of expertise correlated strongly with the results. Grandmasters (those with the highest levels of chess expertise) were not taken in by the Einstellung motif at all. Every one of them found the optimal three-move solution. However, experts with lower ratings, such as International Masters, Masters, and Candidate Masters, all experienced the Einstellung effect, with 50%, 18%, and 0%, respectively, finding the shorter solution on the first board, even though all of them found the optimal solution when it was presented on the second board, in the absence of the smothered mate motif.

The novices’ performance showed a positive correlation with rating also. Sixty-three percent of the highest rated (Class A) players in the novices group found the optimal solution on the right board, while 13% of Class B players and 0% of Class C players found the three-move solution. Thus, the Einstellung effect made International Masters experts perform like Class A players, Master players perform like Class B players, and Candidate Masters perform like Class C players.

Experiment 2 replicated the above finding in a slightly more naturalistic setting, and Experiment 3 did so with strategic Einstellungs instead of tactical ones.

Knowledge Is Essential for Cognitive Flexibility

While this study shows that Einstellung effects are powerful and observable in expert performance, it also demonstrates that the notion that expertise causes cognitive inflexibility is probably wrong.

The failure of the ordinary experts to find a better solution when they had already found a good one supports the view that experts can be vulnerable to inflexible thought patterns. But the performance of the super experts shows that ‘experts are inflexible’ would be the wrong conclusion to draw from this failure. The Einstellung effect is very powerful—the problem solving capability of our ordinary experts was reduced by about three SDs when a well-known solution was apparent to them. But the super experts, at least with the range of difficulty of problems used here, were less susceptible to the effect. Greater expertise led to greater flexibility, not less.

Knowledge, and the expertise inevitably linked to it, were also responsible for both forms of expert flexibility demonstrated in the experiments. The optimal solution was more likely to be noticed immediately, even before the nominally more familiar solution, among some super experts. Hence, expertise helped super experts avoid an Einstellung situation in the first place because they immediately found the optimal solution. Even when experts did not find the optimal solution immediately, expertise and knowledge were positively associated with the probability of finding the optimal solution after the non-optimal solution had been generated first. Finally, when knowledge discrepancy was minimized, as in the third experiment, super experts had sufficient resources to outperform their slightly weaker colleagues. In all three instances, knowledge was inextricably and positively related to expert flexibility. . . .

The training required to produce experts should not be seen as a source of potential problems but as a way to acquire the skill to deal effectively and flexibly with all the situations that can arise in the domain. Creativity is a consequence of expertise rather than expertise being a hindrance to creativity. To produce something novel and useful it is necessary first to master the previous knowledge in the domain. More knowledge empowers creativity rather than hurting it (e.g., Kulkarni & Simon, 1988; Simonton, 1997; Weisberg, 1993, 1999).

## Makin’ Copies

research

At the heart of many calls to improve education is the taken-for-granted notion that because the world is now changing so rapidly, it is better for schools to focus on producing innovative and critical thinkers and ‘not just’ knowledgable students. The common instructional approach deployed, at all scales, to produce this effect—whether it is inquiry learning or personalized learning—is to remove or dramatically lessen the influence of knowledgable others.

Copying the effective behaviors of knowledgable others was a much more effective learning strategy than learning directly from the environment.

But important research on learning strategies in the wild shows that, at the very least, different intuitions are possible here. Researchers discovered—much to their surprise—that, in a rapidly changing environment, copying the effective behaviors of knowledgable others (social learning) could be a much more effective learning strategy than learning directly from the environment (asocial learning). This result held even when social learning was “noisy” and asocial learning was noise free.

The team has gone on to further investigate and apply their findings to other animal studies, and a book, Darwin’s Unfinished Symphony, was released just last year, detailing their work.

Social Learning Strategies Tournament

The method used for this research was a tournament in which the researchers designed a computer simulation environment and entrants to the tournament (104 in all) designed ‘agents’ that competed to survive in the generated environment by learning behaviors and applying them to receive payoffs for those behaviors. Each agent had three possible moves it could play: Observe, Innovate, or Exploit. The first two of these moves—Observe and Innovate—were learning moves, which allowed the agent to acquire new behaviors (or not in some cases), and the third move, Exploit, allowed agents to apply their acquired behaviors to receive a payoff (or not, depending on the environment and the behavior). As was mentioned above, Observe moves were “noisy,” whereas Innovate moves were noise free:

Innovate represented asocial learning, that is, individual learning stemming solely through direct interaction with the environment, for example, through trial and error. An Innovate move always returned accurate information about the payoff of a randomly selected behavior previously unknown to the agent. Observe represented any form of social learning or copying through which an agent could acquire a behavior performed by another individual, whether by observation of or interaction with that individual. An Observe move returned noisy information about the behavior and payoff currently being demonstrated in the population by one or more other agents playing Exploit. Playing Observe could return no behavior if none was demonstrated or if a behavior that was already in the agent’s repertoire is observed and always occurred with error, such that the wrong behavior or wrong payoff could be acquired. The probabilities of these errors occurring and the number of agents observed were parameters we varied.

Some Key Findings

When the winning agent, which learned primarily by copying, was modified to learn only through Innovate moves, it placed last.

It was not effective to play a lot of learning moves. But when learning moves were played, agents which relied almost exclusively on Observe outperformed the rest, and an increase in copying was strongly positively correlated with higher payoffs. When the winning agent (called DISCOUNTMACHINE) was modified to learn only through Innovate moves, it placed last.

Even when learning by copying was made noisier—the probability and size of copying errors increased—agents which relied on it heavily still did best.

Finally, agents who combined asocial and social learning in more balanced ways (winning agents used social learning at least 95% of the time) performed worse than those who opted for social learning most of the time.

Why Copying Is Effective

It must be underscored, again, that, in more naturalistic environments there is a cost to asocial learning that copying does not have. Learning by observation is safer than learning by interacting directly with the environment, alone. But in this simulation, that cost was erased. And social learning (copying) STILL outperformed innovation, even when social learning was noisy (Observe “failed to introduce new behavior into an agent’s repertoire in 53% of all the Observe moves in the first tournament phase, overwhelmingly because agents observed behaviors they already knew”).

So, why was copying effective? The researchers boiled it down to being surrounded by rational agents, which I choose to rephrase as “knowledgable adults”:

Social learning proved advantageous because other agents were rational in demonstrating the behavior in their repertoire with the highest payoff, thereby making adaptive information available for others to copy. This is confirmed by modified simulations wherein social learners could not benefit from this filtering process and in which social learning performed poorly. Under any random payoff distribution, if one observes an agent using the best of several behaviors that it knows about, then the expected payoff of this behavior is much higher than the average payoff of all behaviors, which is the expected return for innovating. Previous theory has proposed that individuals should critically evaluate which form of learning to adopt in order to ensure that social learning is only used adaptively, but a conclusion from our tournament is that this may not be necessary. Provided the copied individuals themselves have selected the best behavior to perform from at least two possible options, social learning will be adaptive.

Any takeaways for education from this will be stretches. The research was a computer simulation, after all. But, whatever. My takeaway from all this is that, as long as there are knowledgable adults around, we should encourage students to learn directly from them. A milder takeaway (or maybe stronger, depending on your point of view): regardless of how adept you feel yourself to be in your social world, social worlds are not intuitive. What seems to make sense to you as a strong connection between ideas A and B (in this case, changing world → promote innovation) will not necessarily be effective just because a lot of people believe it and it makes intuitive sense. The way to change that is not to stop making those arguments, because few people do. The way to change it is to stop forwarding those kinds of arguments along when they are made. That way, the behavior won’t be copied. : )

Coda

I should add, by way of the quote below from Darwin’s Unfinished Symphony, that, although copying was a more successful strategy than innovating, it was not, by itself, the reason for success. What made the difference was better, more efficient, more accurate copying behaviors:

The tournament teaches us that natural selection will tend to favor those individuals who exhibit more efficient, more strategic, and higher-fidelity (i.e., more accurate) copying over others who either display less efficient or exact copying, or are reliant on asocial learning.

## Variation and Example Spaces

I‘ve been thinking a lot about Craig Barton’s wonderful book How I Wish I’d Taught Maths and have been scanning three of his new websites, Variation Theory, Same Surface, Different Deep Problems, and Maths Venns, as well as some research and other books on variation, and a lot of online commentary, in anticipation of starting to implement these ideas in some way.

Writing Algebraic Expressions

As I was reading the last page of Mr Barton’s Book, I was working on instruction around writing algebraic expressions, so this is the topic kind of hovering next to me wherever I go, waiting for when I have time to dig in. This topic is a little more fraught than the purely procedural examples that have been circulating, so it’s worth exploring how variation can be applied to something a little looser.

What does writing algebraic expressions involve (for a beginner)? Well, if I force myself to ignore what other people think writing algebraic expressions involves (essentially ignoring standards and any written material on the topic), then I would say that writing algebraic expressions means to write something like s + 2 or 2 + s when presented with a question like “How old in years will Sam be in exactly two years?”1

This, then, I would call the first example in my example space. Or, rather, an example of an example in the example space—because, if this example is any good, then I will use it as an instructional example to start and leave it out of variation work, which is about PRACTICE, not instruction.2 So, something like this, with the brilliantly simple Silent Teacher method, mentioned in Barton’s book (and a few other places), though without the natural pauses and instructions for students to copy down the correct worked example used during a normal classroom implementation of this.

Try This One

Write an algebraic expression to model the situation.

How old in years was Sam exactly 10 years ago?

I would include a follow-up to this process, here involving a discussion around (a) the idea that the resulting algebraic expression represents an answer to the question of how old Sam will be—it’s just that one part of that expression is not known, (b) asking students to check that the answer makes sense, here by substituting different values for s and comparing the result to the situation, (c) the idea that any letter can be chosen for the variable, and (d) perhaps drawing a visual model of the result (an annotated number line). Some of these could be packaged into the instruction and question above, of course—or perhaps I’ll decide to split this up even more, considering how much “in addition to” I’ve now done about this—but I think that, in general, leaving room for a stepping back step at the end of this is a good idea, to catch the kind of overflow that is difficult to squeeze into expositions like this.

And Now Enters Variation

The paired problem here has opened up a dimension of variation—using addition or subtraction in the expression, so we can play with that during Intelligent Practice (really love that phrase). Technically, the instruction was open to all four operations, but I think it makes sense to focus exclusively on addition and subtraction, leaving multiplication and division expressions for another round.

Here’s what I cooked up.

 How much money in dollars did Sam have if he got exactly 10 dollars? How much money in dollars did Sam have if he got exactly 10 cents? How much money in dollars did Sam have if he got exactly 2 dollars? How much money in dollars did Sam have if he lost exactly 10 cents? How much money in dollars did Sam have if he got exactly 1 dollar? How much money in dollars did Sam have if he lost exactly 1 dollar? How much money in dollars did Sam have if he got exactly 50 cents? How much money in dollars did Sam have if he lost exactly 2 dollars? How much money in dollars did Sam have if he got exactly 25 cents? How much money in dollars did Sam have if he didn’t lose or gain any money?

After this, it might be good to have students cut out the strips and place them on a number line.

It’s interesting how much my experience and training rebels against this process. What I want to get to, right away, are the difficult and ambiguous situations. In particular, I started with, and then rejected, a variation sequence involving height: How tall in inches will Sam be if he grows 2 inches? The subtraction variation is bound to confuse: How tall in inches was Sam if he grew 2 inches? That’s tricky.

But knowing about and looking out for those tricky and ambiguous and interesting situations can serve you well creating instructional routines like this. It shows you where you’re going—and your example space can be richer and broader. And if you’re serious about implementing minimally different variation like this, it shows you how far away your knowledge really is from a beginner’s. You just have to learn to have more sympathy for learners who are encountering mathematics for the first time that you’ve seen a gazillion times.

1. It’s important to me—at the moment, at least—that the examples in this example space should also involve identifying the correct unknown, rather than simply recording the unknown, as would happen with a question like, “Sam is s years old. How old will he be in 2 years?” or with an exercise of the form “2 more than a number.” In both of these cases, the unknown is entirely exposed.
2. This is an important aspect of variation that I worry will be lost on U.S. teachers. Intelligent practice can’t happen, beneficially, until some acquisition has happened. In 20 years, I haven’t seen a robust public discussion about acquisition. The rhetoric around instruction in the States treats it as just one long assessment, though almost no one realizes that’s what it has become.

## Mr Barton’s Book

It wasn’t too long ago—not even three years—that I finished reading David Didau’s terrific book (this one), so I still remember the excitement that I felt reading it, and watching all of the silly certainties of common wisdom in education being dismantled in front of my eyes, making way—I could only hope—for pedagogical practices informed by a real science of learning.

I felt a similar excitement reading Craig Barton’s book How I Wish I’d Taught Maths, because in this book, at long last, are many of those practices in one place, constructed, as readers will see, next to the debris of familiar canards and shallow reasoning that once guided parts of Barton’s teaching.

It is not a book full of proclamations about “best” practice. But you will find in this book a beautiful translation of the science of learning to the classroom. And far from the drudgery that one may imagine this to be, the joy of effective explicit instruction, for both teacher and students, comes through in every chapter of the author’s writing. It is serious, thorough, humble, and humane. And accessible: perhaps the greatest pleasure in reading it is knowing that you could turn around and start to implement many of these practices in short order—or, perhaps, that you already do these things, but don’t know why you should stick with them or how you could improve on them.

I have a lot of underlines and margin notes, but I think these three snippets together, from the chapter on problem solving and independence, are my favorites. The section starts, as they all do, with what the author used to think:

I used to love the sight of my students struggling through problems. Scratching heads, heavy sighs, and even the snap of a pencil thrown down in frustration were the soundtrack to learning. . . .

And then we are introduced to one of these problems, Question 23 from this paper (PDF), along with a deep concern for how novices will handle it. Contrast Barton’s new diagnosis below with common wisdom—that students ask why they are doing math because it is boring, tedious, procedural, or not relevant to their lives.

The task of choosing cards and calculating their totals may prove so cognitively demanding that novices do not have any spare cognitive capacity to recognise patterns. They do not realise that it is not the actual totals that matter, but whether those totals are odd or even. They just carry on regardless. Moreover, students are so consumed with the minutiae of the problem that no cognitive capacity remains to consider the global picture—why are they doing this? The result is that the novices may end up with an assortment of lists and totals, but not actually do anything with it—the fact that this is a probability question was pushed out of working memory long ago when the first set of cards was being processed.

As you might imagine, since the diagnosis is different from that received from common wisdom, the prescribed treatment is different too:

Before I set students off to work independently, I ensure they have enough domain-specific knowledge to solve problems on their own.

Although the snippets above are certainly grist for my mill, How I Wish I’d Taught Maths is not an ideological tome. It is eminently practical, taking the best ideas from all corners of the educational universe, squeezing them through the filter of cognitive science, and setting them in the right proportion to create a firm foundation that any educator—and especially any math educator—can use and build on. I highly highly recommend it to anyone who wants to strive for better in teaching and learning.