Inverses and Transposes

This will now be my 22nd post on linear algebra, and I hope it’ll be noticeable, looking at all of them together so far, that we haven’t talked about systems of equations. And there’s a good reason for that: because they suck you down an ugly hole of mindless calculation, meaning-challenged tedium, and fruitless, pointless backward thinking. Systems are awesome and important, and I’m sure someone can make a strong argument for introducing them early, but, for me, they can wait.

The inverses of matrices are pretty interesting. The inverse of a 2 × 2 matrix is the matrix that, when multiplied to the original matrix, gives the product that is the identity matrix. The inverse is given by the middle matrix below: \[\begin{bmatrix}\mathtt{a} & \mathtt{b}\\\mathtt{c} & \mathtt{d}\end{bmatrix}\begin{bmatrix}\mathtt{\,\,\,\,\frac{d}{ad-bc}} & \mathtt{-\frac{b}{ad-bc}}\\\mathtt{-\frac{c}{ad-bc}} & \mathtt{\,\,\,\,\frac{a}{ad-bc}}\end{bmatrix}\mathtt{=}\begin{bmatrix}\mathtt{1} & \mathtt{0}\\\mathtt{0} & \mathtt{1}\end{bmatrix}\]

There is an easy-to-follow derivation of this formula here if you’re interested—one that requires only some high school algebra.

You can notice, given this setup, that the identity matrix is its own inverse. Are there any others that fit this description? One thought: we need \(\mathtt{ad-bc}\) to be equal to 1, and we also want \(\mathtt{a=d}\). The values on the other diagonal, \(\mathtt{b}\) and \(\mathtt{c}\), should both be 0, since they have to be equal to their opposites.

In that case, \(\begin{bmatrix}\mathtt{-1} & \mathtt{\,\,\,\,0}\\\mathtt{\,\,\,\,0} & \mathtt{-1}\end{bmatrix}\) would be its own inverse too. What does that mean?

For the identity matrix, it means that if we apply the do-nothing transformation to the home-base matrix, we stay on home base, with the “x” vector pointed to (1, 0) and the “y” vector pointed to (0, 1). The matrix above represents a reflection across the origin. So, applying a reflection across the origin to it should return us to home base.

Wouldn’t a reflection across the x-axis be an inverse of itself, then? Yes! The criteria above for an inverse need to be amended a little to allow for negatives effectively canceling each other out to make positives. \[\begin{bmatrix}\mathtt{1} & \mathtt{\,\,\,\,0}\\\mathtt{0} & \mathtt{-1}\end{bmatrix}\begin{bmatrix}\mathtt{1} & \mathtt{\,\,\,\,0}\\\mathtt{0} & \mathtt{-1}\end{bmatrix}\mathtt{=}\begin{bmatrix}\mathtt{1} & \mathtt{0}\\\mathtt{0} & \mathtt{1}\end{bmatrix}\]

The inverse of a matrix is represented with a superscript \(\mathtt{-1}\). So, the inverse of the matrix \(\mathtt{A}\) is written as \(\mathtt{A^{-1}}\).

The Transpose

The transpose of a matrix is the matrix you get when you take the rows of the matrix and turn them into columns instead. The transpose of a matrix is represented with a superscript \(\mathtt{T}\). So: \[\begin{bmatrix}\mathtt{\,\,\,\,1} & \mathtt{3}\\\mathtt{-2} & \mathtt{4}\end{bmatrix}^{\mathtt{T}}\mathtt{=}\begin{bmatrix}\mathtt{1} & \mathtt{-2}\\\mathtt{3} & \mathtt{\,\,\,\,4}\end{bmatrix}\]

If a 2 × 2 matrix, \(\mathtt{V}\) is orthogonal—meaning that its column vectors are perpendicular and each column vector has a length of 1—then its transpose is the same as its inverse, or \(\mathtt{V^{-1}=V^T}\).

Eigenvectors for Reflections

Having just finished a post on eigenvalue decomposition, involving eigenvalues and eigenvectors, I was flipping through this nifty little textbook, where I saw a description of geometric reflections using eigenvectors, and I thought, “Oh my gosh, of course!”

We have seen reflections here, yet back then we had to find something called the foot of a point to figure out the reflection. But we can construct a reflection matrix (same as a scaling matrix) using only knowledge about eigenvectors.

When we talked about eigenvectors before—those vectors which do not change direction under a transformation (except if they reverse direction)—we were looking for them. But in a reflection, we already know them. In any reflection across a line, we already know that the vector that matches the line of reflection will not change direction, and the vector perpendicular to the line of reflection will only be scaled by \(\mathtt{-1}\). So, both the vector which describes the line of reflection and the vector perpendicular to that line are eigenvectors.

So let’s say we want to reflect point \(\mathtt{C}\) across the line described by the vector \(\mathtt{\alpha(2, -1)}\).

Our eigenvector matrix will be \[\begin{bmatrix}\mathtt{\,\,\,\,2}&\mathtt{-1}\\\mathtt{-1}&\mathtt{-2}\end{bmatrix}\] Here we see the second eigenvector, but we can simply use the vector perpendicular to the first if we don’t know the second one.

Our eigenvalue matrix will be \[\begin{bmatrix}\mathtt{1}&\mathtt{\,\,\,\,0}\\\mathtt{0}&\mathtt{-1}\end{bmatrix}\] since we will keep the first eigenvector—the line of reflection—fixed (eigenvalue of 1) and just flip the second eigenvector (eigenvalue of –1).

As a penultimate step, we calculate the inverse of the eigenvector matrix (we’ll get into inverses fairly soon), and then finally multiply all those matrices together (right to left), as we saw with the eigenvalue decomposition, to get the reflection matrix. \[\begin{bmatrix}\mathtt{\,\,\,\,2}&\mathtt{-1}\\\mathtt{-1}&\mathtt{-2}\end{bmatrix}\begin{bmatrix}\mathtt{1}&\mathtt{\,\,\,\,0}\\\mathtt{0}&\mathtt{-1}\end{bmatrix}\begin{bmatrix}\mathtt{\,\,\,\,\frac{2}{5}}&\mathtt{-\frac{1}{5}}\\\mathtt{-\frac{1}{5}}&\mathtt{-\frac{2}{5}}\end{bmatrix}\mathtt{=}\begin{bmatrix}\mathtt{\,\,\,\,\frac{3}{5}}&\mathtt{-\frac{4}{5}}\\\mathtt{-\frac{4}{5}}&\mathtt{-\frac{3}{5}}\end{bmatrix}\]

The final matrix on the right side is the reflection matrix for reflections across the line represented by the vector \(\mathtt{\alpha(2, -1)}\), or essentially all lines with a slope of \(\mathtt{-\frac{1}{2}}\).

It’s worth playing around with building these matrices and using them to find reflections. A lot of what’s here is pretty obvious, but reflection matrices can come in handy in some non-obvious ways too. The perpendicular vector at the very least has to be on the same side of the line as the point you are reflecting.

Eigenvalue Decomposition

I wrote about eigenvalues and eigenvectors a while back, here. In this post, I’ll show how determining the eigenvalues and eigenvectors of a matrix (2 by 2 in this case) is pretty much all of the work of what’s called eigenvalue decomposition. We’ll start with this matrix, which represents a linear transformation: \[\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\]

You can see the action of this matrix at the right (sort of). It sends the (1, 0) vector to (0, –2) and the (0, 1) vector to (1, –3).

The eigenvectors of this transformation are any nonzero vectors that do not change their direction during this transformation, but only scale up or down (or stay the same) by a factor of \(\mathtt{\lambda}\) as a result of the transformation. So,

\(\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix}=\lambda \begin{bmatrix}\mathtt{r_1}\\\mathtt{r_2}\end{bmatrix}\)

Using our calculations from the previous post linked above, we calculate the eigenvalues to be \(\mathtt{\lambda_1=-2}\) and \(\mathtt{\lambda_2=-1}\). And the corresponding eigenvectors are of the form \(\mathtt{(r, -2r)}\) and \(\mathtt{(-r, r)}\), respectively.

The red vector (representing the eigenvector \(\mathtt{(-r, r)}\)) at right starts at \(\mathtt{(-1, 1)}\). It is scaled by the eigenvalue of \(\mathtt{-1}\) during the transformation—meaning it simply turns in the opposite direction and its magnitude doesn’t change. Any vector of the form \(\mathtt{(-r, r)}\) will behave this way during this tranformation.

The purple vector (representing the eigenvector \(\mathtt{(r, -2r)}\)) starts at \(\mathtt{(-1, 2)}\). It is scaled by the eigenvalue of \(\mathtt{-2}\) during the transformation—meaning it turns in the opposite direction and is scaled by a factor of \(\mathtt{2}\). Any vector of the form \(\mathtt{(r, -2r)}\) will behave this way during this tranformation.

And Now for the Decomposition

We can now use the equation above and plug in each eigenvalue and its corresponding eigenvector to create two matrix equations.

\(\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,2}\end{bmatrix}=\mathtt{-2}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,2}\end{bmatrix}\) \[\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,1}\end{bmatrix}=\mathtt{-1}\begin{bmatrix}\mathtt{-1}\\\mathtt{\,\,\,\,1}\end{bmatrix}\]

We can combine the items on the left side of each equation and the items on the right side of each equation into one matrix equation.

\(\begin{bmatrix}\mathtt{\,\,\,\,0}&\mathtt{\,\,\,\,1}\\\mathtt{-2}&\mathtt{-3}\end{bmatrix}\begin{bmatrix}\mathtt{-1}&\mathtt{-1}\\\mathtt{\,\,\,\,2}&\mathtt{\,\,\,\,1}\end{bmatrix}=\begin{bmatrix}\mathtt{-1}&\mathtt{-1}\\\mathtt{\,\,\,\,2}&\mathtt{\,\,\,\,1}\end{bmatrix}\begin{bmatrix}\mathtt{-2}&\mathtt{\,\,\,\,0}\\\mathtt{\,\,\,\,0}&\mathtt{-1}\end{bmatrix}\)

This leaves us with [original matrix][eigenvector matrix] = [eigenvector matrix][eigenvalue matrix]. Finally, we multiply both sides by the inverse of the eigenvector matrix, in order to remove it from the left side of the equation. We can’t remove it from the right side, because matrix multiplication is not commutative. That leaves us with the final decomposition (hat tip to Math the Beautiful for some of the ideas in this post):

Multiplying these three matrices together, or combining the transformations represented by the matrices as we showed here, will result in the original matrix.

Subtractive Knowledge

I was intrigued by a pedagogical insight offered by the example below, from the introductory class of a course called Computational Linear Algebra. The setup is that the graph (diagram) at the bottom of the image represents a Markov model, and it shows the probabilities of moving from one stage of a disease to another in a year.

So, if a patient is asymptomatic, there is a 7% (0.07) probability of moving from asymptomatic to symptomatic, a 90% chance of staying at asymptomatic (indicated by a curved arrow), and so on. This information is also encoded in the stochastic matrix shown.

Here’s the problem: Given a group of people: 85% are asymptomatic, 10% are symptomatic, 5% have AIDS, and of course 0% are deceased, what percent will be in each health state in a year? Putting yourself in the mind of a student right now, take a moment to try to answer the question and, importantly, reflect on your thinking at this stage, even if that thinking involves having no clue how to proceed.

I hope that, given this setup, you’ll be somewhat surprised to learn the following: If a high school student (or even middle school student) knows a little bit about probability and how to multiply and add, they should be able to answer this question.

Why? Well, if 85 out of every 100 in a group is asymptomatic, and there is a 90% probability of remaining asymptomatic in a year, then (0.85)(0.9) = 76.5% of the group is predicted to be asymptomatic in a year. The symptomatic group has two products that must be added: 10% of the group is symptomatic, and there is a 93% probability of remaining that way, so that’s (0.93)(0.1). But this group also takes on 7% of the 85% that were asymptomatic but moved to symptomatic. So, the total is (0.93)(0.1) + (0.07)(0.85) = 15.25%. The AIDS group percent is the sum of three products, for a total of 6.45%, and the Death group percent is a sum of four products, for a total of 1.8%.

Probability, multiplication, and addition are all you need to know. No doubt, knowing something about matrix-vector multiplication, as we have discussed, (and transposes) can be helpful, but it does not seem to be necessary in this case.

Bamboozled

I think it’s reasonable to suspect that many knowledgeable students—and knowledgeable adults—would be bamboozled by the highfalutin language here into believing that they cannot solve this problem, when in fact they can. If that’s true, then why is that the case?

Knowledge is domain specific, of course, (and very context specific) and that would seem to be the best explanation of students’ hypothesized difficulties. That is, given the cues (both verbal and visual) that this problem involves knowledge of linear algebra, Markov models, and/or stochastic matrices, anyone without that knowledge would naturally assume that they don’t have what is required to solve the problem and give up. And even if they suspected that some simple probability theory, multiplication, and addition were all that they needed, being bombarded by even a handful of foreign mathematical terms would greatly reduce their confidence in this suspicion.

Perhaps, then, the reason we are looking for—the reason students don’t believe they can solve problems when in fact they can—has to do with students’ attitudes, not their knowledge. And situations like these during instruction are enough to convince many that knowledge is overrated. The solution to this psychological reticence is, for many people, to encourage students to be fearless, to have problem-solving orientations and growth mindsets. After all, it’s clear that more knowledge would be helpful, but it’s not necessary in many cases. We’ll teach knowledge, sure, but we can do even better if we spend time encouraging soft skills along the way. Do we want students to give up every time they face a situation in life that they were not explicitly taught to deal with?

Knowledge Is Not Just Additive

The problem with this view is that it construes knowledge as only additive. That is, it is thought, knowledge only works to give its owner things to think about and think with. So, in the above example, students already have all the knowledge things to work with: probability, multiplication, and addition. Anything else would only serve to bring in more things to think about, which would be superfluous.

But this isn’t the only way knowledge works. It can also be subtractive—that is, knowing something can tell you that it is irrelevant to the current problem. Not knowing it means that you can’t know about its relevance (and situations like the above will easily bias you to giving superficial information a high degree of relevance). So, students cannot know with high confidence that matrices are essentially irrelevant to the problem above if they don’t know what matrices are. But even knowing nothing about matrices, knowing that, computationally, linear algebra is fundamentally about multiplying and adding things may be enough. Taking that perspective can allow you to ignore the superficial setup of the problem. But that’s still knowledge.

A better interpretation of students’ difficulties with the above is that, in fact, they do need more knowledge to solve the problem. The knowledge they need is subtractive; it will help them ignore superficial irrelevant details to get at the marrow of the problem.

Knowledge is obviously additive, but it is much more subtly subtractive too, helping to clear away facts that are irrelevant to a given situation. Subtractive knowledge is like the myelinated sheaths around some nerve cells in the brain. It acts as an insulator for thinking—making it faster, more efficient, and, as we have seen, more effective.

The Gricean Maxims

When we converse with one another, we implicitly obey a principle of cooperation, according to language philosopher Paul Grice’s theory of conversational implicature.

This ‘cooperative principle’ has four maxims, which although stated as commands are intended to be descriptions of specific rules that we follow—and expect others will follow—in conversation:

  • quality:    Be truthful.
  • quantity:  Don’t say more or less than is required.
  • relation:  Be relevant.
  • manner:    Be clear and orderly.

I was drawn recently to these maxims (and to Grice’s theory) because they rather closely resemble four principles of instructional explanation that I have been toying with off and on for a long time now: precision, clarity, order, and cohesion.

In fact, there is a fairly snug one-to-one correspondence among our respective principles, a relationship which is encouraging to me precisely because it is coincidental. Here they are in an order corresponding to the above:

  • precision:  Instruction should be accurate.
  • cohesion: Group related ideas.
  • clarity:     Instruction should be understandable and present to its audience.
  • order:       Instruction should be sequenced appropriately.

Both sets of principles likely seem dumbfoundingly obvious, but that’s the point. As principles (or maxims), they are footholds on the perimeters of complex ideas—in Grice’s case, the implicit contexts that make up the study of pragmatics; in my case (insert obligatory note that I am not comparing myself with Paul Grice), the explicit “texts” that comprise the content of our teaching and learning.

The All-Consuming Clarity Principle

Frameworks like these can be more than just armchair abstractions; they are helpful scaffolds for thinking about the work we do. Understanding a topic up and down the curriculum, for example, can help us represent it more accurately in instruction. We can think about work in this area as related specifically to the precision principle and, in some sense, as separate from (though connected to) work in other areas, such as topic sequencing (order), explicitly building connections (cohesion), and motivation (clarity).

But principle frameworks can also lift us to some height above this work, where we can find new and useful perspectives. For instance, simply having these principles, plural, in front of us can help us see—I would like to persuade you to see—that “clarity,” or in Grice’s terminology, “relevance,” is the only one we really talk about anymore, and that this is bizarre given that it’s just one aspect of education.

The work of negotiating the accuracy, sequencing, and connectedness of instruction drawn from our shared knowledge has been largely outsourced to publishers and technology startups and Federal agencies, and goes mostly unquestioned by the “delivery agents” in the system, whose role is one of a go-between, tasked with trying to sell a “product” in the classroom to student “customers.”

Making Parallelepipeds

We have talked about the cross product (here and here), so let’s move on to doing something marginally interesting with it: we’ll make a rectangular prism, or the more general parallelepiped.

A parallelepiped is shown at the right. It is defined by three vectors: u, v, and w. The cross product vector \(\mathtt{v \wedge w}\) is perpendicular to both v and w, and its magnitude, \(\mathtt{||v \wedge w||}\), is equal to the area of the parallelogram formed by the vectors v and w (something I didn’t mention in the previous two posts).

The perpendicular height of the skewed prism, or parallelepiped, is given by \(\mathtt{||u||cos(θ)}\).

The volume of the parallelepiped can thus be written as the area of the base times the height, or \(\mathtt{V = (||v \wedge w||)(||u||\text{cos}(θ))}\).

We can write \(\mathtt{\text{cos}(θ)}\) in this case as \[\mathtt{\text{cos}(θ) = \frac{(v \wedge w) \cdot u}{(||v \wedge w||)(||u||)}}\]

Which means that, after simplifying the volume equation, we’re left with \(\mathtt{V = (v \wedge w) \cdot u}\): so, the dot product of the vector perpendicular to the base and the slant height of the prism. The result is a scalar value, of course, for the volume of the parallelepiped, and, because it is a dot product, it is a signed value. We can get negative volumes, which doesn’t mean the volume is negative but tells us something about the orientation of the parallelepiped.

Creating Some Parallelepipeds

Creating prisms and skewed prisms can be done in Geogebra 3D, but here I’ll show how to create these figures from scratch using Three.js. Click and drag on the window to rotate the scene below. Right click and drag to pan left, right, up, or down. Scroll to zoom in and out.

Click on the Pencil icon above in the lovely Trinket window and navigate to the parallelepiped.js tab to see the code that makes the cubes. You can see that vectors are used to create the vertices (position vectors, so just points). Each face is composed of 2 triangles: (0, 1, 2) means to create a face from the 0th, 1st, and 2nd vertices from the vertices list. Make some copies of the box in the code and play around!

To determine the volume of each cube: \[\left(\begin{bmatrix}\mathtt{1}\\\mathtt{0}\\\mathtt{0}\end{bmatrix} \wedge \begin{bmatrix}\mathtt{0}\\\mathtt{1}\\\mathtt{0}\end{bmatrix}\right) \cdot \begin{bmatrix}\mathtt{0}\\\mathtt{0}\\\mathtt{1}\end{bmatrix} \mathtt{= 1}\]

Spooky Action at a Distance

I really like this recent post, called Tell Me More, Tell Me More, by math teacher Dani Quinn. The content is an excellent analysis of expert blindness in math teaching. The form, though, is worth seeing as well—it is a traditional educational syllogism, which Quinn helpfully commandeers to arrive at a non-traditional conclusion, that instructional effects have instructional causes, on the right:

The Traditional Argument An Alternative Argument
There is a problem in how we teach: We typically spoon-feed students procedures for answering questions that will be on some kind of test.

“There is a problem in how we teach: We typically show pupils only the classic forms of a problem or a procedure.”

This is why students can’t generalize to non-routine problems: we got in the way of their thinking and didn’t allow them to take ownership and creatively explore material on their own. “This is why they then can’t generalise: we didn’t show them anything non-standard or, if we did, it was in an exercise when they were floundering on their own with the least support.”

Problematically for education debates, each of these premises and conclusions taken individually are true. That is, they exist. At our (collective) weakest, we do sometimes spoon-feed kids procedures to get them through tests. We do cover only a narrow range of situations—what Engelmann refers to as the problem of stipulation. And we can be, regrettably in either case, systematically unassertive or overbearing.

Solving equations provides a nice example of the instructional effects of both spoon-feeding and stipulation. Remember how to solve equations? Inverse operations. That was the way to do equations. If you have something like \(\mathtt{2x + 5 = 15}\), the table shows how it goes.

Equation Step
\(\mathtt{2x + 5 \color{red}{- 5} = 15 \color{red}{- 5}}\) Subtract \(\mathtt{5}\) from both sides of the equation to get \(\mathtt{2x = 10}\).
\(\mathtt{\color{white}{+ 5 \,\,} 2x \color{red}{\div 2} = 10 \color{red}{\div 2}}\) Divide both sides of the equation by 2.
\(\mathtt{\color{white}{+ 5 \,\,}x = 5}\) You have solved the equation.

Do that a couple dozen times and maybe around 50% of the class freezes when they encounter \(\mathtt{22 = 4x + 6}\), with the variable on the right side, or, even worse, \(\mathtt{22 = 6 + 4x}\).

That’s spoon-feeding and stipulation: do it this one way and do it over and over—and, crucially, doing that summarizes most of the instruction around solving equations.

Of course, the lack of prior knowledge exacerbates the negative instructional effects of stipulation and spoon-feeding. But we’ll set that aside for the moment.

The Connection Between Premises and Conclusion

The traditional and alternative arguments above are easily (and often) confused, though, until you include the premise that I have omitted in the middle for each. These help make sense of the conclusions derived in each argument.

The Traditional Argument An Alternative Argument
There is a problem in how we teach: We typically spoon-feed students procedures for answering questions that will be on some kind of test.

“There is a problem in how we teach: We typically show pupils only the classic forms of a problem or a procedure.”

Students’ success in schooling is determined mostly by internal factors, like creativity, motivation, and self-awareness.

Students’ success in schooling is determined mostly by external factors, like amount of instruction, socioeconomic status, and curricula.

This is why students can’t generalize to non-routine problems: we got in the way of their thinking and didn’t allow them to take ownership and creatively explore material on their own. “This is why they then can’t generalise: we didn’t show them anything non-standard or, if we did, it was in an exercise when they were floundering on their own with the least support.”

In short, the argument on the left tends to diagnose pedagogical illnesses and their concomitant instructional effects as people problems; the alternative sees them as situation problems. The solutions generated by each argument are divergent in just this way: the traditional one looks to pull the levers that mostly benefit personal, internal attributes that contribute to learning; the alternative messes mostly with external inputs.

It’s Not the Spoon-Feeding, It’s What’s on the Spoon

I am and have always been more attracted to the alternative argument than the traditional one. Probably for a very simple reason: my role in education doesn’t involve pulling personal levers. Being close to the problem almost certainly changes your view of it—not necessarily for the better. But, roles aside, it’s also the case that the traditional view is simply more widespread, and informed by the positive version of what is called the Fundamental Attribution Error:

We are frequently blind to the power of situations. In a famous article, Stanford psychologist Lee Ross surveyed dozens of studies in psychology and noted that people have a systematic tendency to ignore the situational forces that shape other people’s behavior. He called this deep-rooted tendency the “Fundamental Attribution Error.” The error lies in our inclination to attribute people’s behavior to the way they are rather than to the situation they are in.

What you get with the traditional view is, to me, a kind of spooky action at a distance—a phrase attributed to Einstein, in remarks about the counterintuitive consequences of quantum physics. Adopting this view forces one to connect positive instructional effects (e.g., thinking flexibly when solving equations) with something internal, ethereal and often poorly defined, like creativity. We might as well attribute success to rabbit’s feet or lucky underwear or horoscopes!

Teaching and Learning Coevolved?

Just a few pages in to David Didau and Nick Rose’s new book What Every Teacher Needs to Know About Psychology, and I’ve already come across what is, for me, a new thought—that teaching ability and learning ability coevolved:

Strauss, Ziv, and Stein (2002) . . . point to the fact that the ability to teach arises spontaneously at an early age without any apparent instruction and that it is common to all human cultures as evidence that it is an innate ability. Essentially, they suggest that despite its complexity, teaching is a natural cognition that evolved alongside our ability to learn.

Or perhaps this is, even for me, an old thought, but just unpopular enough—and for long enough—to seem like a brand new thought. Perhaps after years of exposure to the characterization of teaching as an anti-natural object—a smoky, rusty gearbox of torture techniques designed to break students’ wills and control their behavior—I have simply come to accept that it is true, and have forgotten that I had done so.

Strauss, et. al, however, provide some evidence in their research that it is not true. Very young children engage in teaching behavior before formal schooling by relying on a naturally developing ability to understand the minds of others, known as theory of mind (ToM).

Kruger and Tomasello (1996) postulated that defining teaching in terms of its intention—to cause learning, suggests that teaching is linked to theory of mind, i.e., that teaching relies on the human ability to understand the other’s mind. Olson and Bruner (1996) also identified theoretical links between theory of mind and teaching. They suggested that teaching is possible only when a lack of knowledge can be recognized and that the goal of teaching then is to enhance the learner’s knowledge. Thus, a theory of mind definition of teaching should refer to both the intentionality involved in teaching and the knowledge component, as follows: teaching is an intentional activity that is pursued in order to increase the knowledge (or understanding) of another who lacks knowledge, has partial knowledge or possesses a false belief.

The Experiment

One hundred children were separated into 50 pairs—25 pairs with a mean age of 3.5 and 25 with a mean age of 5.5. Twenty-five of the 50 children in each age group served as test subjects (teachers); the other 25 were learners. The teachers completed three groups of tasks before teaching, the first of which (1) involved two classic false-belief tasks. If you are not familiar with these kinds of tasks, the video at right should serve as a delightfully creepy precis—from what appears to be the late 70s, when every single instructional video on Earth was made. The second and third groups of tasks probed participants’ understanding that (2) a knowledge gap between teacher and learner must exist for “teaching” to occur and (3) a false belief about this knowledge gap is possible.

Finally, children participated in the teaching task by teaching the learners how to play a board game. The teacher-children were, naturally, taught how to play the game prior to their own teaching, and they were allowed to play the game with the experimenter until they demonstrated some proficiency. The teacher-learner pair was then left alone, “with no further encouragement or instructions.”

The Results

Consistent with the results from prior false-belief studies, there were significant differences between the 3- and 5-year-olds in Tasks (1) and (3) above, both of which relied on false-belief mechanisms. In Task (3), when participants were told, for example, that a teacher thought a child knew how to read when in fact he didn’t, 3-year-olds were much more likely to say that the teacher would still teach the child. Five-year-olds, on the other hand, were more likely to recognize the teacher’s false belief and say that he or she would not teach the child.

Intriguingly, however, the development of a theory of mind does not seem necessary to either recognizing the need for a special type of discourse called “teaching” or to teaching ability itself—only to a refinement of teaching strategies. Task (2), in which participants were asked, for instance, whether a teacher would teach someone who knew something or someone who didn’t, showed no significant differences between 3- and 5-year-olds in the study. But the groups were significantly different in the strategies they employed during teaching.

Three-year-olds have some understanding of teaching. They understand that in order to determine the need for teaching as well as the target learner, there is a need to recognize a difference in knowledge between (at least) two people . . . Recognition of the learner’s lack of knowledge seems to be a necessary prerequisite for any attempt to teach. Thus, 3-year-olds who identify a peer who doesn’t know [how] to play a game will attempt to teach the peer. However, they will differ from 5-year-olds in their teaching strategies, reflecting the further change in ToM and understanding of teaching that occurs between the ages of 3 and 5 years.

Coevolution of Teaching and Learning

The study here dealt with the innateness of teaching ability and sensibilities but not with whether teaching and learning coevolved, which it mentions at the beginning and then leaves behind.

It is an interesting question, however. Discussions in education are increasingly focused on “how students learn,” and it seems to be widely accepted that teaching should adjust itself to what we discover about this. But if teaching is as natural a human faculty as learning—and coevolved alongside it—then this may be only half the story. How students (naturally) learn might be caused, in part, by how teachers (naturally) teach, and vice versa. And learners perhaps should be asked to adjust to what we learn about how we teach as much as the other way around.

Those seem like new thoughts to me. But they’re probably not.

Providing Bad Intel

research

A really nice thing about scientific research is its transparency. Researchers write down the methods they use in their experiments—sometimes in excruciating detail—so that others can try to replicate their work if they choose. And scrutinizable methods allow us and other researchers to think about issues that the original experimenters might have overlooked—or, at least, didn’t mention in their published work.

Every once in a while we come across research which individuals themselves can simulate at home on a computer, even if they don’t have any participants, and this allows us to bring the experiment to life a little more than can be done with text descriptions.

The research I look at in this post is such a study. Students in the study (81 in all, from 7 to 10 years of age) were given an “app” very similar to the one shown below. Play with it a bit by clicking on the animal pictures to see what students were exposed to in this study.

The Method

In this study, students were presented with a question and then an explanation answering that question for the 12 animals shown above (images used in the study were different from above). Students rated the quality of explanations about animal biology on a 5-point scale. (In the version above, your ratings are not recorded. You can just click on the image of the rating system to move on.) The audio recorded in the app above use the questions and explanations from the study verbatim, though in the actual study two different people speak the questions and explanations (above, it’s just me).

As you could no doubt tell if you played around with the app above, some of the explanations are laughably bad. Researchers designated these as circular explanations (e.g., How do colugos use their skin flaps to travel? Their skin flaps help them to move from one place to another). The other, better explanations were identified as mechanistic explanations (e.g., How do thorny dragons use the grooves between their thorns to help them drink water? Their grooves collect water and send the water to their mouths). After rating the explanation, students were then given a choice to either get more information about the animal or to move on to a different animal. Here again, all you get is a screen to click on, and any click takes you back to the main screen with the 12 animals. In the actual study, students were given an even more detailed mechanistic explanation when clicking to get more information (e.g., Thorny dragons have grooves between their thorns, which are able to collect water. The water is drawn from groove to groove until it reaches their mouths, so they can suck water from all over their bodies).

The Curious Case of Curiosity

What the researchers found was that, in general, students were significantly more likely to click to get more information on an animal when the explanation given was circular. And, importantly, students were more likely to click to get more information when they rated the explanation as poor. This behavior—of clicking to get more information—was operationalized as curiosity and can be explained using the deprivation theory of curiosity.

In everyday life, children sometimes receive weak explanations in response to their questions. But what do children do when they receive weak explanations? According to the deprivation theory of curiosity, if children think that an explanation is unsatisfying, then they should sometimes feel inclined to seek out a better answer to their question to bolster their knowledge; the same is not true for explanations appraised as high in quality. To our knowledge, our research is the first to investigate this theory in regards to children’s science learning, examining whether 7- to 10-year-olds are more likely to seek out additional information in response to weak explanation than informative ones in the domain of biology.

But is that really curiosity? Do I stimulate your curiosity about colugos’ skin flaps by not really answering your questions about them? We can more easily answer no to this question if we assume that Square 1 represents students’ wanting to know something about colugos’ skin flaps. In that case, the initial question stimulates curiosity, as it were, and the non-explanation simply fails to satisfy this curiosity, or initial desire for knowledge. The circular explanation has not made them curious or even more curious. They were already curious. Not helping them scratch that itch just fails to move them to Square 2, which is where they wanted to go after hearing the question (knowing something about how colugos’ skin flaps work). The fact that students with unscratched itches were more likely to go to Square 3 is not surprising, since Square 3, for them, was actually Square 2, the square that everyone wanted to get to.

An Unavoidable Byproduct of Quality Teaching

If you are more inclined to believe the above interpretation, as I am, it might seem that we still must contend with the evidence that quality explanations were indeed shown to reduce information-seeking, relative to the levels of information-seeking shown for circular explanations. But this is not necessarily the case. What we see, from this study at least, is that not scratching the initial itch likely caused a different behavior in students than did scratching it. A clicking behavior did increase for students who still had itches, but this does not mean that it decreased for students who had no itch. We have evidence here that bad explanations are recognizably bad. We do not have evidence suggesting that quality explanations make students incurious.

If this is the case, though—if quality explanations reduce curiosity—it seems likely to me that it is simply an unavoidable byproduct of quality teaching. One that can be anticipated and planned for. Explanations are, after all, designed to reduce curiosity, in some sense. What high quality explanations do—in every scientific field and likely in our everyday lives—is move us on to different, better things to be curious about.


Thinking About and Thinking With

I have a tendency, when writing blog posts, to leave important things unsaid. So, let me fix that up front before I forget. What I wanted to say here was that, in my view, learning doesn’t happen unless we tick off all three boxes: encoding, consolidation, and retrieval.

It’s not that learning gets better or stronger when more of those boxes are ticked off. Learning isn’t possible—to some degree of certainty—in the first place without all three. And it’s not the case that focusing on just retrieval instead of just encoding or just consolidation represents some kind of revolution in pedagogical thinking. You’re simply ignoring one or two vital components of learning when before you were ignoring one or two other ones. Learning can still happen even when we don’t think about one or two (or all three) of the above components, but then it’s haphazard, random, implicit, and/or incidental. (In that case, learning comes down to mental horsepower and genes rather than processes over which we have some control.) All three components still must be addressed for learning to occur; it’s just that we can decide to not be in control of one or all of them (to students’ detriment).

But even then we’re not done. We’ve covered the components of the process of learning, but all three of those components intersect with another dimension of learning, which describes the products of learning: thinking about and thinking with.

Thinking With

In the previous post linked above, the examples of slope could all be categorized as “thinking about” slope. Put too simply, encoding the concept of slope means absorbing information about slope, retrieving knowledge about slope means remembering the slope concept and saying your knowledge out loud or writing it down, and consolidating knowledge about slope means practicing, such that what is encoded stays encoded and what is known can be retrieved.

All of this—encoding, consolidation, and retrieval—must happen with “thinking with” as well as with “thinking about.” Encoding–Thinking With, for example, would involve absorbing information about how slope can be applied to do other things, whether mathematically or in the real world. Common examples include designing wheelchair ramps (which have ADA-recommended height-length ratios of 1 : 12), measuring and comparing the steepnesses of things, and determining whether two lines are parallel or perpendicular (or not either). Consolidating–Thinking With would involve practice with that encoded knowledge—solving word problems is a typical example. Finally, Retrieving–Thinking With would involve remembering that encoded knowledge, particularly after some time has passed, say by using slope to solve a programming problem or a problem on a test.

All six boxes have to be checked off for learning to occur (such that it is within our control).

Teach Thinking With

In education, we have difficulties—again, in my view—with Encoding–Thinking With, and Consolidating–Thinking With. As far as these two are concerned, it is rare in my experience to see guidance and practice on a wide variety of different problems involving thinking with (for example) slope to answer questions that aren’t about slope. We tend to think that all we should do is give students a bunch of think-about slope facts and then hope for them to magically retrieve and apply those to thinking-with situations. We misunderstand transfer as some kind of conjuring out of thin air, so that’s what we give students—thin air. Then we stand back and hope to see some conjuring. When this doesn’t produce results that we’d like, we—for some unimaginably stupid reason—blame knowing facts for the problem, and instead of supplementing that knowledge with thinking-with teaching, we swap them. Which is much worse than what it tries to replace.

Instead, it is necessary to teach students how to think with slope and many other mathematical concepts, and to provide them with practice in thinking with these concepts. Thinking with is as much knowledge as thinking about is.

One of my favorite examples of thinking with slope—and one which I have, admittedly, not yet written a lesson about—has to do with drawing convex or concave quadrilaterals. Given 4 coordinate pairs for points that can form a convex or concave quadrilateral (no three points lie on the same line, etc.), how can I decide, somewhat algorithmically, on the order in which I should connect the points, such that the line segments I actually draw do create a quadrilateral and not the image on the far right?

One way to go about it is to first select the leftmost point—the point with the lowest x-coordinate (there could be two, with equal x-coordinates, but I’ll leave that to the reader to figure out). Then calculate the slope of each line connecting the leftmost point with each other point. The order in which the points can be connected is the order of the slopes from least to greatest. This process would create a different proper quadrilateral than the one shown in the middle above.

Checking Off All the Boxes

Students’ minds are not magical. They don’t turn raw facts magically into applied understanding (the extreme traditionalist view), and they don’t magically vacuum up knowledge hidden in applied contexts (the extreme constructivist view). Put more accurately, this kind of magic does happen (which is why we believe in it), but it happens outside of our control, as a result of genetic and socioeconomic differences, so we can take no credit for it.

Importantly, ignoring components of students’ learning, for whatever reason, subjects them to a roll of the dice. Those students who start behind stay behind, and those who are underserved stay so. We seem to have enough leftover energy to try our hand at amateur psychology and social-emotional learning. Why not take a fraction of that energy and channel it into, you know, plain ol’ teaching?