Mr Barton’s Second Book

It has been now just two years since I reviewed Mr Barton’s stellar first book. I say “just,” in part because the last three weeks during this pandemic have felt like five years, and in part because Barton packs so much into his second book, it is a little surprising he did it in just two years.

The central theme of Reflect, Expect, Check, Explain is using and constructing ‘intelligent’ sequences of mathematics exercises, “providing opportunities to think mathematically.” The intelligence behind these sequences is the way we order and arrange them, allowing for comparison (reflection) between two or more exercises, the anticipation of what the answer or solution method will be (expectation) based on what the previous answer or solution method was, determination of the answer (check), and then an explanation of the connection between the exercises (explain).

Consider, for example, the sequence at left, from early in the book. During reflect, for the first pair of exercises, I can notice that the lower and upper bounds have stayed the same, and the second number line has minor ticks for every second minor tick of the first number line. I can also notice that the sought-after decimal value is at the same location on both number lines. This noticing can lead me to expect that since I identified the missing value for the first number line as 2.6, my answer should be the same for the second number line. It’s possible, though, that I won’t come up with an expectation. In the check phase, I fill in the values for the equal intervals on the second number line, coming up with the value for the question mark. Finally, when I explain, I either have a chance to talk about my earlier expectation and explain why I was off or why my expectation was correct or, if I couldn’t formulate an expectation, I can explain why the question-marked values are the same even though the tick marks are different.

As I move through the sequence, there are really interesting thoughts to have.

  • Why did the question-marked values line up when moving from 10 to 5 equal intervals (between Questions 1 and 2) but not when moving from 5 to 4 equal intervals (between Questions 3 and 4)?
  • Why does “lining up” fail me in Questions 4, 5, and 6 when it worked between Questions 1 and 2?
  • I can’t rely on inspection every time to figure out the intervals. Is there something I can do to make that task simpler?
  • Is the question-marked value in Question 9 just the question-marked value in Question 8, divided by 10?
  • Can I extend my interval calculator method to decimals?

If this were the entire book, that would be enough for me, to be honest. But Mr Barton spends an exemplary amount of effort addressing possible questions and misconceptions about such sequences (the FAQ chapter is excellent) and explaining how these sequences can both fit into more extensive learning episodes and can function in different ways from practice. All the while, the sequences remain the stars of the show.

I highly recommend (again) Mr Barton’s book, especially to math teachers. He outlines in brilliant detail how you can turn a set of boring exercises into a powerful method for soliciting students’ mathematical thinking. No revolution required.

Choice Quotes

Below are just a few snips from the book that I added to my notebook while reading. These are not necessarily reflective of the entire argument. But after a long day of educhatter, which more often than not reads like an ancient scroll from some monist cult, it is comforting to read these thoughts and know that there is still a place for practical, technical, dispassionate thinking about teaching and learning in the 21st century—a place for waging the cerebral battle, rather than constantly leading with our chin or our hearts.

Teaching a method in isolation and practising it in isolation is important to develop confidence and competence with that method, and indeed, students can get pretty good pretty quickly. But if we do not then challenge them to decide when they should use that method – and crucially when they should not – we deny them the opportunity to identify the strategy needed to solve the problem.

There are two main arguments in favour of teaching a particular method before delving into why it works.

The path to flexible knowledge The key point that Willingham makes is that acquiring inflexible knowledge is a necessary step on the path to developing flexible knowledge. There is no short cut. The ‘why’ is conceptual and abstract. We understand concepts through examples. The ‘how’ generates our students’ experience of examples. In other words, often we have to do things several times to appreciate exactly how and why they work.

Motivation As Garon-Carrier et al. (2015) conclude, motivation is likely to be built on a foundation of success, and not the other way around.

The mistake I made for much of my career was trying to fast track my students to this [problem solving] stage. This was partly due to my obsession with differentiation – heaven forbid a child should be in their comfort zone for more than a few seconds – but also based on my belief that problem solving offered some sort of incredible 2-for-1 deal. I thought it would enable my students to practice the basics, whilst at the same time allowing them to develop that magic problem solving skill.

I will again quote John Mason: “It is the ways of thinking that are rich, not the task itself.”

Scala Math

I‘ve started a writing project recently that I’m having a good time working on so far. I’ve called it Scala Math (and on Twitter here) for now, because its central focus is deconstructing concepts and procedures into steps, and la scala is Italian for ‘staircase’. You can see the word at work in ‘escalator’, ‘scale’, etc. Scala is also the name of a programming language. Here are some reasons for that I found online.

Most of the projects I’ve worked on over the past few years have also been ways for me to learn new software languages or libraries. For Geometry Theorems, it was d3. For Scala, it was React—as well as the beautiful, amazing database that a normal person can actually look at and edit and it’s still a database: Airtable.

How It Works: Learn

Every Scala has a display window—where images and videos are shown—and a steps window, where you find the text of the steps, or ‘parts’. These areas are divided by a brain, which I’ll talk about below. When you land on a Scala (this one is Solving Arithmetic Sequences), the first thing shown in the display window is an image presenting a quick snippet of what will be covered. The image shows an essential question at the top. The use-case for the snippet was a student wanting a quick reminder about something they are working on, perhaps for homework, without having to search online and wade through tons of stuff that sorta-kinda matches what you want but not really.

The remainder of the section shown at left (called ‘Learn’ mode) is a series of steps (in this case, six), explained with text, audio narration, and the accompanying images that you can see appearing when clicking on each step. The dot navigation at the top shows us that we are on the first screen of this Scala.

Each step card has a button to replay the step, which can be pressed at any time while the step is active, and a button (up arrow) to go to the preceding step.

How It Works: Reflect

As you can see at the end of the video above, there is a Reflection question which calls for a short or extended text response. This is where the audio input on my cell phone comes in handy. Students’ responses are, at the moment, compared to a few ‘correct’ responses that I have written, and others have conributed to. The response which has the highest numerical match on a scale from 0 to 100 is presented as your score, and the pre-written response is presented as a suggested answer.

How It Works: Try

After the Learn phase is the Try phase, which consists of example-problem pairs (usually; for a very few cases, so far, stepped-out problems only). Or, more specifically, stepped-out problems followed by not-stepped-out problems. These look a little different from what I typically see as example-problem pairs, where the example and the problem are set side by side. Here, the problem follows the example, and the example is not provided when solving the problem. The typical sequence is shown below.

For the Try and Test phases, it’s always multiple choice, although it’s in the plan to look at other response inputs. When students are logged in, they build up (not earn; see below) points for every question. Right now, it’s just 50 points for each, though that gets cut in half and rounded up to the nearest integer for every incorrect answer. For an item with 3 choices, the lowest point total possible is 13. For an item with 4 choices, the lowest is 7.

On desktop, students can have the question read aloud via text-to-speech. As far as I know, that hasn’t yet come to mobile as a built-in feature, but I’ll keep my ears open for when it does.

How It Works: Test

Finally, there’s the Test phase. This is typically 4 to 6 questions that are of the same form as the ‘problems’ in the example-problem-pair Try phase. I’m just showing one such question in the video at the right.

When students are logged in, they can earn points by taking the test. The points are built up in both the Learn and Try phases. I have described how the points work for the Try phase above. The Learn phase is simpler: just clicking on a step builds up 100 points. At the moment, no points are tied to the Reflect question.

Once a student reaches the Test phase, the greatest number of points he or she can ‘bank’ is the number he or she has built up over the course of the Learn and Try phases. And the Test phase is fairly high stakes, in that each incorrect answer divides the total possible points to earn in half.

The stars shown on the score modal are awarded based on percent of total points earned. For the lesson shown in this post, the total that can be earned is 1700. So, approximately 560 points is 1 star (33%), 1130 points is 2 stars (66%), and 1360 points is 3 stars (80%).

Finally, to make sure this product connects knowledgeable people with students (whether they be parents or teachers or both) and guards against mindlessly pressing buttons to earn points, there is a final front-and-back activity, wherein students solve a different problem by listing the steps themselves and showing all their work.

Post-Hoc Confidences

A smart defense of any argument for less teacher-directed instruction in mathematics classrooms is to point to the logical connectedness of mathematics as a body of knowledge and suggest that students are capable of crossing many if not all of the logical bridges between propositions themselves, or with minimal guidance.

Such connectedness–it can be suggested–makes mathematics somewhat different from other school subjects. For example, given a student’s conceptual understanding of a fraction as a part-to-whole ratio, which can include his or her ability to represent a fraction with a visual or physical model, it seems to follow logically that he or she can then add two fractions and get the correct sum, so long as the student knows (intuitively or more formally) that addition is about combining values linearly. It doesn’t matter how many prerequisites there are for adding fractions. The suggestion is that once those prerequisites have been met, it is a matter of merely crossing a logical bridge to adding fractions (mostly) correctly.

By way of contrast, a student can’t really induce what happened after, say, the bombing of Pearl Harbor. They have to be informed about it directly. The effects can certainly be narrowed down using common sense reasoning and other domain-specific knowledge. But, ultimately, what happened happened and there is no reason to suspect that, in general, students can make their way through a study of history mostly blindfolded, relying only on logic and common sense.

The example of history brings up an interesting point (to me, anyway) about the example of mathematics, though. Historical consequences from historical causes can be dubbed “inevitable” only after the fact. How can we be sure it is not the same when learning anything, including mathematics? Once you know, conceptually as it were, what adding fractions is, of course it seems to be a purely logical consequence of what fractions are fundamentally. But is this seeming inevitability available to the novice, the learner who is aware of what fractions are but hasn’t ever thought about adding them? With the average novice is, after all, where that feeling of logical inevitability has to lie. It is not enough for educated adults to think of something as ‘logical’ after they already know it.

Bertrand Russell argues, in a 1907 essay, that even in mathematics we don’t proceed from premises to conclusions, but rather the other way around.

We tend to believe the premises because we can see that their consequences are true, instead of believing the consequences because we know the premises to be true. But the inferring of premises from consequences is the essence of induction [abduction]; thus the method in investigating the principles of mathematics is really an inductive method, and is substantially the same as the method of discovering in any other science.

So, how can we decide whether some bridge in reasoning is available to and crossable by the average novice? I hope it’s clear that we can’t just figure it out via anecdotes and armchair reasoning. Our intuitions can’t be trusted with this question. And our opinions one way or the other on the matter are not helpful, no matter what they are.

Providing Bad Intel

research

A really nice thing about scientific research is its transparency. Researchers write down the methods they use in their experiments—sometimes in excruciating detail—so that others can try to replicate their work if they choose. And scrutinizable methods allow us and other researchers to think about issues that the original experimenters might have overlooked—or, at least, didn’t mention in their published work.

Every once in a while we come across research which individuals themselves can simulate at home on a computer, even if they don’t have any participants, and this allows us to bring the experiment to life a little more than can be done with text descriptions.

The research I look at in this post is such a study. Students in the study (81 in all, from 7 to 10 years of age) were given an “app” very similar to the one shown below. Play with it a bit by clicking on the animal pictures to see what students were exposed to in this study.

The Method

In this study, students were presented with a question and then an explanation answering that question for the 12 animals shown above (images used in the study were different from above). Students rated the quality of explanations about animal biology on a 5-point scale. (In the version above, your ratings are not recorded. You can just click on the image of the rating system to move on.) The audio recorded in the app above use the questions and explanations from the study verbatim, though in the actual study two different people speak the questions and explanations (above, it’s just me).

As you could no doubt tell if you played around with the app above, some of the explanations are laughably bad. Researchers designated these as circular explanations (e.g., How do colugos use their skin flaps to travel? Their skin flaps help them to move from one place to another). The other, better explanations were identified as mechanistic explanations (e.g., How do thorny dragons use the grooves between their thorns to help them drink water? Their grooves collect water and send the water to their mouths). After rating the explanation, students were then given a choice to either get more information about the animal or to move on to a different animal. Here again, all you get is a screen to click on, and any click takes you back to the main screen with the 12 animals. In the actual study, students were given an even more detailed mechanistic explanation when clicking to get more information (e.g., Thorny dragons have grooves between their thorns, which are able to collect water. The water is drawn from groove to groove until it reaches their mouths, so they can suck water from all over their bodies).

The Curious Case of Curiosity

What the researchers found was that, in general, students were significantly more likely to click to get more information on an animal when the explanation given was circular. And, importantly, students were more likely to click to get more information when they rated the explanation as poor. This behavior—of clicking to get more information—was operationalized as curiosity and can be explained using the deprivation theory of curiosity.

In everyday life, children sometimes receive weak explanations in response to their questions. But what do children do when they receive weak explanations? According to the deprivation theory of curiosity, if children think that an explanation is unsatisfying, then they should sometimes feel inclined to seek out a better answer to their question to bolster their knowledge; the same is not true for explanations appraised as high in quality. To our knowledge, our research is the first to investigate this theory in regards to children’s science learning, examining whether 7- to 10-year-olds are more likely to seek out additional information in response to weak explanation than informative ones in the domain of biology.

But is that really curiosity? Do I stimulate your curiosity about colugos’ skin flaps by not really answering your questions about them? We can more easily answer no to this question if we assume that Square 1 represents students’ wanting to know something about colugos’ skin flaps. In that case, the initial question stimulates curiosity, as it were, and the non-explanation simply fails to satisfy this curiosity, or initial desire for knowledge. The circular explanation has not made them curious or even more curious. They were already curious. Not helping them scratch that itch just fails to move them to Square 2, which is where they wanted to go after hearing the question (knowing something about how colugos’ skin flaps work). The fact that students with unscratched itches were more likely to go to Square 3 is not surprising, since Square 3, for them, was actually Square 2, the square that everyone wanted to get to.

An Unavoidable Byproduct of Quality Teaching

If you are more inclined to believe the above interpretation, as I am, it might seem that we still must contend with the evidence that quality explanations were indeed shown to reduce information-seeking, relative to the levels of information-seeking shown for circular explanations. But this is not necessarily the case. What we see, from this study at least, is that not scratching the initial itch likely caused a different behavior in students than did scratching it. A clicking behavior did increase for students who still had itches, but this does not mean that it decreased for students who had no itch. We have evidence here that bad explanations are recognizably bad. We do not have evidence suggesting that quality explanations make students incurious.

If this is the case, though—if quality explanations reduce curiosity—it seems likely to me that it is simply an unavoidable byproduct of quality teaching. One that can be anticipated and planned for. Explanations are, after all, designed to reduce curiosity, in some sense. What high quality explanations do—in every scientific field and likely in our everyday lives—is move us on to different, better things to be curious about.


Thinking About and Thinking With

I have a tendency, when writing blog posts, to leave important things unsaid. So, let me fix that up front before I forget. What I wanted to say here was that, in my view, learning doesn’t happen unless we tick off all three boxes: encoding, consolidation, and retrieval.

It’s not that learning gets better or stronger when more of those boxes are ticked off. Learning isn’t possible—to some degree of certainty—in the first place without all three. And it’s not the case that focusing on just retrieval instead of just encoding or just consolidation represents some kind of revolution in pedagogical thinking. You’re simply ignoring one or two vital components of learning when before you were ignoring one or two other ones. Learning can still happen even when we don’t think about one or two (or all three) of the above components, but then it’s haphazard, random, implicit, and/or incidental. (In that case, learning comes down to mental horsepower and genes rather than processes over which we have some control.) All three components still must be addressed for learning to occur; it’s just that we can decide to not be in control of one or all of them (to students’ detriment).

But even then we’re not done. We’ve covered the components of the process of learning, but all three of those components intersect with another dimension of learning, which describes the products of learning: thinking about and thinking with.

Thinking With

In the previous post linked above, the examples of slope could all be categorized as “thinking about” slope. Put too simply, encoding the concept of slope means absorbing information about slope, retrieving knowledge about slope means remembering the slope concept and saying your knowledge out loud or writing it down, and consolidating knowledge about slope means practicing, such that what is encoded stays encoded and what is known can be retrieved.

All of this—encoding, consolidation, and retrieval—must happen with “thinking with” as well as with “thinking about.” Encoding–Thinking With, for example, would involve absorbing information about how slope can be applied to do other things, whether mathematically or in the real world. Common examples include designing wheelchair ramps (which have ADA-recommended height-length ratios of 1 : 12), measuring and comparing the steepnesses of things, and determining whether two lines are parallel or perpendicular (or not either). Consolidating–Thinking With would involve practice with that encoded knowledge—solving word problems is a typical example. Finally, Retrieving–Thinking With would involve remembering that encoded knowledge, particularly after some time has passed, say by using slope to solve a programming problem or a problem on a test.

All six boxes have to be checked off for learning to occur (such that it is within our control).

Teach Thinking With

In education, we have difficulties—again, in my view—with Encoding–Thinking With, and Consolidating–Thinking With. As far as these two are concerned, it is rare in my experience to see guidance and practice on a wide variety of different problems involving thinking with (for example) slope to answer questions that aren’t about slope. We tend to think that all we should do is give students a bunch of think-about slope facts and then hope for them to magically retrieve and apply those to thinking-with situations. We misunderstand transfer as some kind of conjuring out of thin air, so that’s what we give students—thin air. Then we stand back and hope to see some conjuring. When this doesn’t produce results that we’d like, we—for some unimaginably stupid reason—blame knowing facts for the problem, and instead of supplementing that knowledge with thinking-with teaching, we swap them. Which is much worse than what it tries to replace.

Instead, it is necessary to teach students how to think with slope and many other mathematical concepts, and to provide them with practice in thinking with these concepts. Thinking with is as much knowledge as thinking about is.

One of my favorite examples of thinking with slope—and one which I have, admittedly, not yet written a lesson about—has to do with drawing convex or concave quadrilaterals. Given 4 coordinate pairs for points that can form a convex or concave quadrilateral (no three points lie on the same line, etc.), how can I decide, somewhat algorithmically, on the order in which I should connect the points, such that the line segments I actually draw do create a quadrilateral and not the image on the far right?

One way to go about it is to first select the leftmost point—the point with the lowest x-coordinate (there could be two, with equal x-coordinates, but I’ll leave that to the reader to figure out). Then calculate the slope of each line connecting the leftmost point with each other point. The order in which the points can be connected is the order of the slopes from least to greatest. This process would create a different proper quadrilateral than the one shown in the middle above.

Checking Off All the Boxes

Students’ minds are not magical. They don’t turn raw facts magically into applied understanding (the extreme traditionalist view), and they don’t magically vacuum up knowledge hidden in applied contexts (the extreme constructivist view). Put more accurately, this kind of magic does happen (which is why we believe in it), but it happens outside of our control, as a result of genetic and socioeconomic differences, so we can take no credit for it.

Importantly, ignoring components of students’ learning, for whatever reason, subjects them to a roll of the dice. Those students who start behind stay behind, and those who are underserved stay so. We seem to have enough leftover energy to try our hand at amateur psychology and social-emotional learning. Why not take a fraction of that energy and channel it into, you know, plain ol’ teaching?

Learn: You Keep Using That Word

Sometimes kids say “nothing” when their parents ask them what they learned in school today. And, although that response is something we don’t want to hear, it is probably closer to the truth than we want to believe, because, as we all most certainly know, learning doesn’t really happen in a single class period. And, when it does, it’s not learning per se, but encoding, consolidation, or retrieval—or some mixture of the three.

Encoding

The encoding stage involves introducing you to some knowledge pattern in the natural, social, or academic environment. For example, you may know ratios and rates, how to graph lines on the coordinate plane, and what steepness is, but at some point you are completely new to the concept of slope—which packages those former concepts into a unique bundle—so encoding is what happens when you are first introduced to slope.

There are a few important things to note here. First, slope could have been introduced, or encoded, as an isolated dot. (Well, not exactly. Nothing is ever completely “isolated.” But you get the idea.) Second, regardless whether it is encoded as a standalone concept or as a package of concepts, slope is a new object of knowledge. It is perhaps possible now for the slope blob above to interact or connect with the green blob of content knowledge, whereas none of the individual items can do so. And, third, whatever we mean by slope above, we cannot mean the entire concept of slope (whatever that means anyway).

Consolidation

The new concept of slope on the right is a little too complete to represent encoding, plus any structure created there fades quickly over time like pictures in Back to the Future (forgetting). This is where consolidation comes in. Consolidation solidifies and maintains the arrangements of knowledge components assembled by encoding.

Generally, consolidation is associated with simple practice—i.e., practicing the concept you have encoded rather than extending or altering the concept in any way. But it is as true to say that you are learning slope via simple practice as it is to say that you are doing so by encoding the concept in an introductory lesson.

Retrieval

Finally, there is retrieval, which is the process of reconstructing an encoded concept from memory in response to a natural or artificial stimulus. What is the slope of a horizontal line? The answer to this question requires triggering the slope concept, where the answer may be directly stored, or you may have to drill down into the slope package above—into the ratios and rates concepts—to figure out that the slope of a horizontal line is a 0 rise over some nonzero run, so the answer is 0. Or, the fact that the slope of a horizontal line is 0 can be stored together with the concept package shown above, giving you two ways to figure out the answer.

Why should retrieving a concept to answer questions be considered a part of learning that concept? Because, at minimum, retrieving strengthens an encoded concept.

Explicitation

research

I came across this case study recently that I managed to like a little. It focuses on an analysis of a Singapore teacher’s practice of making things explicit in his classroom. Specifically, the paper outlines three ways the teacher engages in explicitation (as the authors call it): (1) making ideas in the base materials (i.e., textbook) explicit in the lesson plan, (2) making ideas within the plan of the unit more explicit, and (3) making ideas explicit in the enactment of teaching the unit(s). These parts are shown in the diagram below, which I have redrawn, with minor modifications, from the paper.

The teacher interviewed for this case study, “Teck Kim,” taught math to Year 11 (10th grade) students in the “Normal (Academic)” track, and the work focus of the case study was on a unit the teacher called “Vectors in Two Dimensions.”

Explicit From

The first category of explicitation, Explicit From, involves using base materials such as a textbook as a starting point and adapting these materials to make more explicit what it is the teacher wants students to learn. The paper provides an illustration of some of the textbook content related to explaining column vectors, along with Kim’s adaptation. I have again redrawn below what was provided in the paper. Here I also made minor modifications to the layout of the textbook example and one small change to fix a possible translation error (or typo) in the teacher’s example. The textbook content is on the left, and the teacher’s is on the right (if it wasn’t painfully obvious).

There are many interesting things to notice about the teacher’s adaptation. Most obviously, it is much simpler than the textbook’s explanation. This is due, in part, to the adaptation’s leaving magnitude unexplained during the presentation and instead asking a leading question about it.

The textbook presented the process of calculating the magnitudes of the given vectors, leading to a ‘formula’ of \(\mathtt{\sqrt{x^2+y^2}}\) for column vector (\(\mathtt{x y}\)). In its place, Teck Kim’s notes appeared to compress all these into one question: “How would you calculate the magnitude?” On the surface, it appears that Teck Kim was less explicit than the textbook in the computational process of magnitude. But a careful examination into the pre-module interview reveals that the compression of this section into a question was deliberate . . . He meant to use the question to trigger students’ initial thoughts on the manner—which would then serve to ready their frame of mind when the teacher explains the procedure in class.

So, it is not the case that explanation has been removed—only that the teacher has moved the explication of vector magnitude into the Explicit To section of the process. We can also notice, then, in this Explicit From phase, that the teacher makes use of both dual coding and variation theory in his compression of the to-be-explained material. The text in the teacher’s work is placed directly next to the diagram as labels to describe the meaning of each component of the vector, and the vector that students are to draw varies minimally from the one demonstrated: a change in sign is the only difference, allowing students to see how negative components change the direction of a vector. All much more efficient and effective than the textbook’s try at the same material.

Explicit Within

Intriguingly, Explicit Within is harder to explain than the other two, but is closer to the work I do every day. A quote from the article nicely describes explicitation within the teacher’s own lesson plan as an “inter-unit implicit-to-explicit strategy”:

This inter-unit implicit-to-explicit strategy reveals a level of sophistication in the crafting of instructional materials that we had not previously studied. The common anecdotal portrayal of Singapore mathematics teachers’ use of materials is one of numerous similar routine exercise items for students to repetitively practise the same skill to gain fluency. In the case of Teck Kim’s notes, it was not pure repetitive practice that was in play; rather, students were given the opportunity to revisit similar tasks and representations but with added richness of perspective each time.

We saw a very small example of explicit-within above as well. The plan, following the textbook, would have delayed the introduction of negative components of vectors, but Teck Kim introduces it early, as a variational difference. The idea is not necessarily that students should know it cold from the beginning, but that it serves a useful instructional purpose even before it is consolidated.

Explicit To

Finally, there is Explicit To, which refers to the classroom implementation of explicitation, and which needs no lengthy description. I’ll leave you with a quote again from the paper.

No matter how well the instructional materials were designed, Teck Kim recognised the limitations to the extent in which the notes by itself can help make things explicit to the students. The explicitation strategy must go beyond the contents contained in the notes. In particular, he used the notes as a springboard to connect to further examples and explanations he would provide during in-class instruction. He drew students’ attention to questions spelt out in the notes, created opportunities for students to formulate initial thoughts and used these preparatory moves to link to the explicit content he subsequently covered in class.

Almost Variation with Inequalities

I‘ve started thinking about Modules 0 for Grade 6. And I’ve written my first sequence for inequalities, which I’ll show below. Although I tried to design the sequence using ideas from variation theory, I found that the specific goal I had for this sequence—writing inequalities of the form x < c and c < x from number line models—did not make it easy to think of a boatload of questions I could ask, each slightly different from the previous one. Plus, I had some slightly more robust instructional goals in mind. Still, I found that it paid off to even just try thinking about variation.

So, I start with the video below, which serves as the first (and only) instructional worked example in the sequence.


I use the Silent Teacher method, wherein I essentially show the worked example twice, the second time with my voice annotating what I’m seeing, doing, and thinking as I write the inequality to represent the two models. In the lesson, I include a brief reminder to students above the video what the inequality symbols mean and what the equals sign means.

My assumptions with regard to this content are that students have seen and used inequality symbols for a long time before they get to Grade 6, though primarily with positive numbers and not variables or negatives. So, this represents a kind of “start-again” topic, which is one reason why I include the block models along with the number line model. It is a compromise between extending the concept and reviewing it: so I do a bit of both.

Another reason I include the block models is because they make a solid, albeit abstract, connection to the use of inequalities with algebraic expressions to express relative values in situations where we don’t know one of the values. We know that q above represents a number greater than x, but we can’t mark q on the number line because we don’t know its exact value. This is what the thinking question below the video is hopefully getting at. It’s numbered in case an instructor wants to assign the sequence to a student.

The Sequence

After the video, there is a sequence of a mere 8 questions. The first of these, shown at the right, is not a typical “Your Turn” type of question, where the student tries out a technique on a very similar problem. Here we unpack the other ways to express the inequalities shown in the video—it’s important to constantly make the point that there is almost always a few different ways of looking at mathematical relationships—and we include the equation, in part because research tells us that comparing the equals sign with other relational operators reinforces the correct relational view of the equals sign.

Next up is a more typical Your Turn, with a block model and number line model both closely mirroring the models shown in the video.

Students can write n or 1 to represent the single block (or the point labeled with both n and 1 on the number line). Doing so helpfully reinforces a slightly better meaning of “variable,” which is a letter that represents any quantity, known or unknown.

And here, for the first time (in a thinking question), I ask students to relate the number line model to the blocks model.

The next question in the sequence is an example of some minimal variation. What’s different here is that the m and n block towers switch sides in the illustration, and the inequality model on the number line shifts to the right. Everything else stays the way it was.

We could continue in this way, adding or subtracting blocks, switching sides, etc., but this kind of model has limitations that don’t allow for examining more of the variation space. But we can hint at the fact that adding the same number to both sides of an inequality doesn’t change the direction of the inequality.

And that’s what we do in the next exercise in the sequence. Here also, the known number is moved along the number line. The thinking question I ask here is:

Would adding 1 block to each tower change the direction of the inequality? Why or why not?

I phrase the question as a hypothetical because, strictly speaking, it’s not evident from the diagram that I added exactly 1 block to tower m.

And Now for a Big Change

Now we see how this isn’t really a sequence of minimal variation. One reason for the change-up is that I realized too late that the model I started with could only show the greater quantity as the unknown quantity. I thought about changing to a different model, one which could show the full range of variation, but I couldn’t think of a situation that worked.

This example, in which the larger quantity (the greater height) is the known, was too good to pass up. And it gave me a context to foreshadow subtracting both sides of an inequality by the same number, which is what (kind of) happens in the next exercise.

Here, though—and again—it was not plausible to hit this balance of operations idea directly (plus, it’s outside of the scope anyway). We only hint at it. But we still ask the thinking question—again, as a hypothetical—about whether subtracting the same value from both quantities changes the direction of the inequality.

The height examples, and perhaps all of the items in the sequence, lie somewhere between minimal variation and maximal variation. At some point while designing it, I had to stop searching for more perfect examples and just run with it.

The final two items in the sequence present two more (more or less abstract) situations where inequalities seem to fit.

The first, shown at the right, is the “swarm,” which contains too many items to count, though we can know for sure that the number is a greater value than 6. Here too is an example situation that better fits with the idea of a larger unknown that couldn’t be handled by the earlier block models.

In this example, I’ve switched up the labels on the number line for a small taste of minimal variation within all the macro variation going on.

Finally, there’s temperature and a quick example showing negative numbers.

What we get at here, also, is that we haven’t left the universe of comparing numbers just because we’re introducing a little algebra. Plus, I’ve eliminated the number line model here, just for a little flavor—and it’s too close in appearance to the thermometer levels. I didn’t want that confusion creeping in.

Sicklied O’er

research

My grandfather used to tell me a story about a young boy who was stuck in traffic with his family for hours because an 18-wheeler had got itself pinned under an overpass bridge ahead of them. The huge truck was wedged in so strongly and strangely that a flock of engineers had descended on the scene. They argued back and forth about their favorite physical and mathematical models that would unpin the trapped vehicle and release the miles-long stream of cars idling behind it on the freeway. This bickering went on for hours—until the boy got out of his car, walked up to the group of engineers, and shouted, “Why don’t you just let the air out the tires!”

It’s a nice story, precisely because it’s so rare and noticeable. We don’t notice unbroken strings of solved problems from experts, because that’s what we expect of experts—and, for the most part, what we get from them. We notice when they fail. And, because these failures are more noticeable than the far more boring and numerable successes, we fall prey to availability bias, and assume that expert failure occurs with much more regularity than it actually does. (In turn, we start to think that it’s maybe a good idea to keep students naive and, therefore, creative and open-minded rather than have them study things that other people have already figured out.) As Tom Nichols writes in The Death of Expertise:

At the root of all this is an inability among laypeople to understand that experts being wrong on occasion about certain issues is not the same thing as experts being wrong consistently on everything. The fact of the matter is that experts are more often right than wrong, especially on essential matters of fact. And yet the public constantly searches for the loopholes in expert knowledge that will allow them to disregard all expert advice they don’t like.

A 2008 study which put this folk notion of expert inflexibility to the test compared chess experts and novices, and measured the famous Einstellung effect in both groups across three experiments.

In the first experiment, the experts were given the board on the left and were instructed to find the shortest solution. The board on the left is designed to activate a motif familiar to chess experts (and thus activate Einstellung)—the smothered mate motif—which can be carried out using 5 moves. A shorter solution (3 moves) also exists, however.

If the experts failed to find the three-move solution, they were then given the board on the right. This board can be solved by the shorter three-move solution but not by the Einstellung motif of the smothered mate. The group of novices in the experiment were all given this second board (the one on the right) featuring the three-move mate solution without the Einstellung motif as well.

Findings

If knowledge corrupts insight, as it were, then the experts would, by and large, be fixated by the smothered mate sequence and miss the three-move solution. And this is indeed what happened—sort of. What the researchers found was that level of expertise correlated strongly with the results. Grandmasters (those with the highest levels of chess expertise) were not taken in by the Einstellung motif at all. Every one of them found the optimal three-move solution. However, experts with lower ratings, such as International Masters, Masters, and Candidate Masters, all experienced the Einstellung effect, with 50%, 18%, and 0%, respectively, finding the shorter solution on the first board, even though all of them found the optimal solution when it was presented on the second board, in the absence of the smothered mate motif.

The novices’ performance showed a positive correlation with rating also. Sixty-three percent of the highest rated (Class A) players in the novices group found the optimal solution on the right board, while 13% of Class B players and 0% of Class C players found the three-move solution. Thus, the Einstellung effect made International Masters experts perform like Class A players, Master players perform like Class B players, and Candidate Masters perform like Class C players.

Experiment 2 replicated the above finding in a slightly more naturalistic setting, and Experiment 3 did so with strategic Einstellungs instead of tactical ones.

Knowledge Is Essential for Cognitive Flexibility

While this study shows that Einstellung effects are powerful and observable in expert performance, it also demonstrates that the notion that expertise causes cognitive inflexibility is probably wrong.

The failure of the ordinary experts to find a better solution when they had already found a good one supports the view that experts can be vulnerable to inflexible thought patterns. But the performance of the super experts shows that ‘experts are inflexible’ would be the wrong conclusion to draw from this failure. The Einstellung effect is very powerful—the problem solving capability of our ordinary experts was reduced by about three SDs when a well-known solution was apparent to them. But the super experts, at least with the range of difficulty of problems used here, were less susceptible to the effect. Greater expertise led to greater flexibility, not less.

Knowledge, and the expertise inevitably linked to it, were also responsible for both forms of expert flexibility demonstrated in the experiments. The optimal solution was more likely to be noticed immediately, even before the nominally more familiar solution, among some super experts. Hence, expertise helped super experts avoid an Einstellung situation in the first place because they immediately found the optimal solution. Even when experts did not find the optimal solution immediately, expertise and knowledge were positively associated with the probability of finding the optimal solution after the non-optimal solution had been generated first. Finally, when knowledge discrepancy was minimized, as in the third experiment, super experts had sufficient resources to outperform their slightly weaker colleagues. In all three instances, knowledge was inextricably and positively related to expert flexibility. . . .

The training required to produce experts should not be seen as a source of potential problems but as a way to acquire the skill to deal effectively and flexibly with all the situations that can arise in the domain. Creativity is a consequence of expertise rather than expertise being a hindrance to creativity. To produce something novel and useful it is necessary first to master the previous knowledge in the domain. More knowledge empowers creativity rather than hurting it (e.g., Kulkarni & Simon, 1988; Simonton, 1997; Weisberg, 1993, 1999).

Makin’ Copies

research

At the heart of many calls to improve education is the taken-for-granted notion that because the world is now changing so rapidly, it is better for schools to focus on producing innovative and critical thinkers and ‘not just’ knowledgable students. The common instructional approach deployed, at all scales, to produce this effect—whether it is inquiry learning or personalized learning—is to remove or dramatically lessen the influence of knowledgable others.

Copying the effective behaviors of knowledgable others was a much more effective learning strategy than learning directly from the environment.

But important research on learning strategies in the wild shows that, at the very least, different intuitions are possible here. Researchers discovered—much to their surprise—that, in a rapidly changing environment, copying the effective behaviors of knowledgable others (social learning) could be a much more effective learning strategy than learning directly from the environment (asocial learning). This result held even when social learning was “noisy” and asocial learning was noise free.

The team has gone on to further investigate and apply their findings to other animal studies, and a book, Darwin’s Unfinished Symphony, was released just last year, detailing their work.

Social Learning Strategies Tournament

The method used for this research was a tournament in which the researchers designed a computer simulation environment and entrants to the tournament (104 in all) designed ‘agents’ that competed to survive in the generated environment by learning behaviors and applying them to receive payoffs for those behaviors. Each agent had three possible moves it could play: Observe, Innovate, or Exploit. The first two of these moves—Observe and Innovate—were learning moves, which allowed the agent to acquire new behaviors (or not in some cases), and the third move, Exploit, allowed agents to apply their acquired behaviors to receive a payoff (or not, depending on the environment and the behavior). As was mentioned above, Observe moves were “noisy,” whereas Innovate moves were noise free:

Innovate represented asocial learning, that is, individual learning stemming solely through direct interaction with the environment, for example, through trial and error. An Innovate move always returned accurate information about the payoff of a randomly selected behavior previously unknown to the agent. Observe represented any form of social learning or copying through which an agent could acquire a behavior performed by another individual, whether by observation of or interaction with that individual. An Observe move returned noisy information about the behavior and payoff currently being demonstrated in the population by one or more other agents playing Exploit. Playing Observe could return no behavior if none was demonstrated or if a behavior that was already in the agent’s repertoire is observed and always occurred with error, such that the wrong behavior or wrong payoff could be acquired. The probabilities of these errors occurring and the number of agents observed were parameters we varied.

Some Key Findings

When the winning agent, which learned primarily by copying, was modified to learn only through Innovate moves, it placed last.

It was not effective to play a lot of learning moves. But when learning moves were played, agents which relied almost exclusively on Observe outperformed the rest, and an increase in copying was strongly positively correlated with higher payoffs. When the winning agent (called DISCOUNTMACHINE) was modified to learn only through Innovate moves, it placed last.

Even when learning by copying was made noisier—the probability and size of copying errors increased—agents which relied on it heavily still did best.

Finally, agents who combined asocial and social learning in more balanced ways (winning agents used social learning at least 95% of the time) performed worse than those who opted for social learning most of the time.

Why Copying Is Effective

It must be underscored, again, that, in more naturalistic environments there is a cost to asocial learning that copying does not have. Learning by observation is safer than learning by interacting directly with the environment, alone. But in this simulation, that cost was erased. And social learning (copying) STILL outperformed innovation, even when social learning was noisy (Observe “failed to introduce new behavior into an agent’s repertoire in 53% of all the Observe moves in the first tournament phase, overwhelmingly because agents observed behaviors they already knew”).

So, why was copying effective? The researchers boiled it down to being surrounded by rational agents, which I choose to rephrase as “knowledgable adults”:

Social learning proved advantageous because other agents were rational in demonstrating the behavior in their repertoire with the highest payoff, thereby making adaptive information available for others to copy. This is confirmed by modified simulations wherein social learners could not benefit from this filtering process and in which social learning performed poorly. Under any random payoff distribution, if one observes an agent using the best of several behaviors that it knows about, then the expected payoff of this behavior is much higher than the average payoff of all behaviors, which is the expected return for innovating. Previous theory has proposed that individuals should critically evaluate which form of learning to adopt in order to ensure that social learning is only used adaptively, but a conclusion from our tournament is that this may not be necessary. Provided the copied individuals themselves have selected the best behavior to perform from at least two possible options, social learning will be adaptive.

Any takeaways for education from this will be stretches. The research was a computer simulation, after all. But, whatever. My takeaway from all this is that, as long as there are knowledgable adults around, we should encourage students to learn directly from them. A milder takeaway (or maybe stronger, depending on your point of view): regardless of how adept you feel yourself to be in your social world, social worlds are not intuitive. What seems to make sense to you as a strong connection between ideas A and B (in this case, changing world → promote innovation) will not necessarily be effective just because a lot of people believe it and it makes intuitive sense. The way to change that is not to stop making those arguments, because few people do. The way to change it is to stop forwarding those kinds of arguments along when they are made. That way, the behavior won’t be copied. : )

Coda

I should add, by way of the quote below from Darwin’s Unfinished Symphony, that, although copying was a more successful strategy than innovating, it was not, by itself, the reason for success. What made the difference was better, more efficient, more accurate copying behaviors:

The tournament teaches us that natural selection will tend to favor those individuals who exhibit more efficient, more strategic, and higher-fidelity (i.e., more accurate) copying over others who either display less efficient or exact copying, or are reliant on asocial learning.