Educational Achievement and Religiosity


educational achievement

I outlined a somewhat speculative argument that would support a prediction that increased religiosity at the social level should have a negative effect on educational achievement here, where I suggested that

Educators surrounded by cultures with higher religiosity—and regardless of their own personal religious orientations—will simply have greater exposure to concerns about moral and spiritual harm that can be wrought by science, in addition to the benefits it can bring.

Such weakened confidence in science may not only directly water down the content of instruction in both science and mathematics—by, for example, diluting science content antagonistic to religious beliefs in published standards and curriculum guides—but could also represent an environment in which it is seen as inartful or even taboo for educators of any stripe to lean on scientific findings and perspectives in order to improve educational outcomes (because nurturing children may be seen to be the provenance of more spiritual and less scientific approaches). Both of these effects, one social, one policy-level, could have a negative effect on achievement.

A new paper, coauthored by renowned evolutionary psychologist David Geary, shows that religiosity at a national level does indeed have a strong negative effect on achievement (r = –0.72, p < 0.001). Yet, Stoet and Geary’s research suggests a different, simpler mechanism at work than the mechanisms I suggested above to explain the connection between religiosity and math and science educational achievement. This mechanism is displacement.

The Displacement Hypothesis

It’s a bit much to give this hypothesis its own section heading—not that it isn’t important, necessarily. It’s just self-explanatory. Religiosity may be correlated with lower educational achievement because people have a finite amount of time and attention, and spending time learning about religion or engaging in religious activities necessarily takes time away from learning math and science.

It is not necessarily the content of the religious beliefs that might influence educational growth (or lack thereof), but that investment of intellectual abilities that support educational development are displaced by other (religious) activities (displacement hypothesis). This follows from Cattell’s (1987) investment theory, with investment shifting from secular education to religious materials rather than shifts from one secular domain (e.g., mathematics) to another (e.g., literature). This hypothesis might help to explain part of the variation in educational performance broadly (i.e., across academic domains), not just in science literacy.

One reason the displacement hypothesis makes sense is that religiosity is as powerfully negatively correlated with achievement in mathematics as it is with science achievement.

The Scattering Hypothesis

But certainly a drawback of the displacement hypothesis is that there are activities we engage in—as unrelated to mathematics and science as religion is—which don’t, as far as we know, correlate strongly negatively with achievement. Physical exercise, for goodness’ sake, is one example of such an activity. Perhaps there is something especially toxic about religiosity as the displacer which deserves our attention.

Maybe religiosity (or, better, a perspective which allows for supernatural explanations or, indeed, unexplainable phenomena) has a diluent or scattering effect on learning. If so, here are two analogies for how that might work:

  • Consider object permanence. Prior to developing the understanding that objects continue to exist once they are out of view, children will almost immediately lose interest in an object that is deliberately hidden from them, even if they were attending to it just moments earlier. Why? Because it is possible (to them) that the object has vanished from existence when you move it out of their view. If it were possible for a 4-month-old to crawl up and look behind the sofa to see that grandma had actually disappeared during a game of peek-a-boo, they would have nothing to wonder about. The disappearance was possible, so why shouldn’t it happen? This possibility is gone once you develop object permanence.
  • Perhaps more relevant, not to mention ominous: climate change. It is well known that religiosity and acceptance of the theory of evolution are negatively correlated. And it turns out there is a strong positive link between evolution denialism and climate-change denialism. How might religiosity support both of these denialisms? Here we can benefit from substituting for ‘religiosity’ some degree of subscription to supernatural explanations: If the universe was made by a deity for us, then how can we be intruders in it, and how could we—by means that do not transgress the laws of this deity—defile it? This seems a perfectly reasonable use of logic, once you have allowed for the possibility of an omniscient benevolence who gifted your species the entire planet you live on.

The two of these together seem pretty bizarre. But I’m sure you catch the drift. In each case, I would argue that the constriction of possibilities—to those supported by naturalistic explanations rather than supernatural ones—is actually a good thing. You are less likely to be prodded to explain how the natural world works when supernatural reasons are perfectly acceptable. And supernaturalism can prevent you from fully appreciating your own existence and the effects it has on the natural world. Under supernaturalism, you can still engage in logical arguments and intellectual activity. You can write books and go to seminars. Your neurons could be firing. But if you’re not thinking about reality, it doesn’t do you any good.

Religiosity or supernaturalism does not make you dumb. But perhaps it has the broader effect of making it more difficult to fasten minds onto reality, as it fills the solution space with only those possibilities that have little bearing on the real world we live in. This would certainly show up in measures of educational achievement.
Stoet, G., & Geary, D. (2017). Students in countries with higher levels of religiosity perform lower in science and mathematics Intelligence DOI: 10.1016/j.intell.2017.03.001

Expert Knowledge: Birds and Worms


Pay attention to your thought process and how you use expert knowledge as you answer the question below. How do you think very young students would think about it?

Here are some birds and here are some worms. How many more birds than worms are there?

Hudson (1983) found that, among a small group of first-grade children (mean age of 7.0), just 64% completed this type of task correctly. However, when the task was rephrased as follows, all of the students answered correctly.

Here are some birds and here are some worms. Suppose the birds all race over, and each one tries to get a worm. Will every bird get a worm? How many birds won’t get a worm?

This is consistent with adults’ intuitions about the two tasks as well. Members of the G+ mathematics education community were polled on the two birds-and-worms tasks recently, and, as of today, 69% predicted that more students would answer the second one correctly.

Interpret the Results

Still, what can we say about these results? Is it the case that 100% of the students used “their knowledge of correspondence to determine exact numerical differences between disjoint sets”? That is how Hudson describes students’ unanimous success in the second task. The idea seems to be that the knowledge exists; it’s just that a certain magical turn of phrase unlocks and releases this otherwise submerged expertise.

But that expert knowledge is given in the second task: “each one tries to get a worm.” The question paints the picture of one-to-one correspondence, and gives away the procedure to use to determine the difference. So, “their knowledge” is a bit of a stretch, and “used their knowledge” is even more of a stretch, since the task not only sets up a structure but animates its moving parts as well (“suppose the birds all race over”).

Further, questions about whether or not students are using knowledge they possess raise questions about whether or not students are, in fact, determining “exact numerical differences between disjoint sets.” On the contrary, it can be argued that students are simply watching almost all of a movie in their heads (a mental simulation)—a movie for which we have provided the screenplay—and then telling us how it ends (spoiler: 2 birds don’t get a worm). The deeper equivalence between the solution “2” and the response “2” to the question “How many birds won’t get a worm?” is evident only to a knowledgeable onlooker.

Experiment 3

Hudson anticipates some of the skepticism on display above when he introduces the third and last experiment in the series.

It might be argued, success in the Won’t Get task does not require a deep level of mathematical understanding; the children could have obtained the exact numerical differences by mimicking by rote the actions described by the problem context . . . In order to determine more fully the level of children’s understanding of correspondences and numerical differences, a third experiment was carried out that permitted a detailed analysis of children’s strategies for establishing correspondences between disjoint sets.

The wording in the Numerical Differences task of this third experiment, however, did not change. The “won’t get” locutions were still used. Yet, in this experiment, when paying attention to students’ strategies, Hudson observed that most children did not mentally simulate in the way directly suggested by the wording (pairing up the items in a one-to-one correspondence).

This does not defeat the complaint above, though. The fact that a text does not effectively compel the use of a procedure does not mean that it is not the primary influence on correct answers. It still seems more likely than not that participants who failed the “how many more” task simply didn’t have stable, abstract, transferable notions about mathematical difference. And the reformulation represented by the “won’t get” task influenced students to provide a response that was correct.

But this was a correct response to a different question. As adults with expert knowledge, we see the logical and mathematical similarities between the “how many more” and “won’t get” situations, and, thus we are easily fooled into believing that applying skills and knowledge in one task is equivalent to doing so in the other.

expert knowledge

Hudson, T. (1983). Correspondences and Numerical Differences between Disjoint Sets Child Development, 54 (1) DOI: 10.2307/1129864

Religiosity and Confidence in Science

research post


In response to a question posed on Twitter recently asking why people from the U.K. seemed to show a great deal more interest in applying cognitive science to education than their U.S. counterparts, I suggested, linking to this article, that the differences in the religiosity of the two countries might play a role.

Princeton economist Roland Bénabou led a study, for instance, which found that religiosity and scientific innovation were negatively correlated. Across the world, regions with higher levels of religiosity also had lower levels of scientific and technical innovation—a finding which held even when controlling for income, population, and education. Bénabou commented in this article:

Much comes down to the political power of the religious population in a given location. If it is large enough, it can wield its strength to block new insights. “Disruptive new ideas and practices emanating from science, technical progress or social change are then met with greater resistance and diffuse more slowly,” comments Bénabou, citing everything from attempts to control science textbook content to efforts to cut public funding of certain kinds of research (for instance involving embryonic stem cells or cloned human embryos). In secular places, by contrast, “discoveries and innovations occur faster, and some of this new knowledge inevitably erodes beliefs in any fixed dogma.”


The study’s analysis also includes a comparison of U.S. States, which showed a similar negative correlation, as shown at the left.

Importantly, this kind of analysis has nothing to say about the effects of one’s personal religious beliefs on one’s innovativeness or acceptance of science. This song is not about you. It is a sociological analysis which suggests that the religiosity of the culture one finds oneself in (regardless of income and education levels) can have an effect on one’s exposure to scientific innovation.

Religiosity can have this effect at the political and cultural levels while simultaneously having a quite different effect (or no similar effect) at the personal level.

But About That Personal Level

Perhaps more apropos of the original question, researchers have found that individual religiosity is not significantly correlated with interest in science, nor with knowledge of science—but it is significantly negatively correlated with one’s confidence in scientific findings.

More religious individuals report the same interest levels and knowledge of science as less religious people, but they report significantly lower levels of confidence in science. This means that their lack of confidence is not a product of interest or ignorance but represents some unique uneasiness with science. . . .

Going a little further, the researchers provide this quote in the conclusion, which is as perfect an echo of educators’ qualms with education research (that I’ve heard) as can likely be found in literature discussing a completely different topic (emphases mine):

Religious individuals may be fully aware of the potential for material and physical gains through biotechnology, neuroscience, and other scientific advancements. Despite their knowledge of and interest in this potential, they may also hold deep reservations about the moral and spiritual costs involved . . . Religious individuals may interpret [questions about future harms and benefits from science] as involving spiritual and moral harms and benefits. Concerns about these harms and gains are probably moderated by a perception, not entirely unfounded given the relatively secular nature of many in the academic scientific community (Ecklund and Scheitle 2007; Ecklund 2010), that the scientific community does not share the same religious values and therefore may not approach issues such as biotechnology in the same manner as a religious respondent.

It may be, then, that educators surrounded by cultures with higher religiosity—and regardless of their own personal religious orientations—will simply have greater exposure to concerns about moral and spiritual harm that can be wrought by science, in addition to the benefits it can bring. Consistent with my own thinking about the subject, these concerns would be amplified in situations, like education, where science looks to produce effects on human behavior and cognition, especially children’s behavior and cognition.
Johnson, D., Scheitle, C., & Ecklund, E. (2015). Individual Religiosity and Orientation towards Science: Reformulating Relationships Sociological Science, 2, 106-124 DOI: 10.15195/v2.a7

Provided Examples vs. Generated Examples

research post

The results reported in this research (below) about the value of provided examples versus generated examples are a bit surprising. To get a sense of why that’s the case, start with this definition of the concept availability heuristic used in the study—a term from the social psychology literature:

Availability heuristic: the tendency to estimate the likelihood that an event will occur by how easily instances of it come to mind.

All participants first read this definition, along with the definitions of nine other social psychology concepts, in a textbook passage. Participants then completed two blocks of practice trials in one of three groups: (1) subjects in the provided examples group read two different examples, drawn from an undergraduate psychology textbook, of each of the 10 concepts (two practice blocks, so four examples total for each concept), (2) subjects in the generated examples group created their own examples for each concept (four generated examples total for each concept), and (3) subjects in the combination group were provided with an example and then created their own example of each concept (two provided and two generated examples total for each concept).

The researchers—Amanda Zamary and Katharine Rawson at Kent State University in Ohio—made the following predictions, with regard to both student performance and the efficiency of the instructional treatments:

We predicted that long-term learning would be greater following generated examples compared to provided examples. Concerning efficiency, we predicted that less time would be spent studying provided examples compared to generating examples . . . [and] long-term learning would be greater after a combination of provided and generated examples compared to either technique alone. Concerning efficiency, our prediction was that less time would be spent when students study provided examples and generate examples compared to just generating examples.

Achievement Results

All participants completed the same two self-paced tests two days later. The first assessment, an example classification test, asked subjects to classify each of 100 real-world examples into one of the 10 concept definition categories provided. Sixty of these 100 were new (Novel) to the provided-examples group, 80 of the 100 were new to the combination group, and of course all 100 were likely new to the generated-examples group. The second assessment, a definition-cued recall test, asked participants to type in the definition of each of the 10 concepts, given in random order. (The test order was varied among subjects.)

provided examples

Given that participants in the provided-examples and combination groups had an advantage over participants in the generated-examples group on the classification task (they had seen between 20 and 40 of the examples previously), the researchers helpfully drew out results on just the 60 novel examples.

Subjects who were given only textbook-provided examples of the concepts outperformed other subjects on applying these concepts to classifying real-world examples. This difference was significant. No significant differences were found on the cued-recall test between the provided-examples and generated-examples groups.

Also, Students’ Time Is Valuable

Another measure of interest to the researchers in this study, as mentioned above, was the time used by the participants to read through or create the examples. What the authors say about efficiency is worth quoting, since it does not often seem to be taken as seriously as measures of raw achievement (emphasis mine):

Howe and Singer (1975) note that in practice, the challenge for educators and researchers is not to identify effective learning techniques when time is unlimited. Rather, the problem arises when trying to identify what is most effective when time is fixed. Indeed, long-term learning could easily be achieved if students had an unlimited amount of time and only a limited amount of information to learn (with the caveat that students spend their time employing useful encoding strategies). However, achieving long-term learning is difficult because students have a lot to learn within a limited amount of time (Rawson and Dunlosky 2011). Thus, long-term learning and efficiency are both important to consider when competitively evaluating the effectiveness of learning techniques.

provided examples

With that in mind, and given the results above, it is noteworthy to learn that the provided-examples group outperformed the generated-examples group on real-world examples after engaging in practice that took less than half as much time. The researchers divided subjects’ novel classification score by the amount of time they spent practicing and determined that the provided-examples group had an average gain of 5.7 points per minute of study, compared to 2.2 points per minute for the generated-examples group and 1.7 points per minute for the combination group.

For learning declarative concepts in a domain and then identifying those concepts in novel real-world situations, provided examples proved to be better than student-generated examples for both long-term learning and for instructional efficiency. The second experiment in the study replicated these findings.

Some Commentary

First, some familiarity with the research literature makes the above results not so surprising. The provided-examples group likely outperformed the other groups because participants in that group practiced with examples generated by experts. Becoming more expert in a domain does not necessarily involve becoming more isolated from other people and their interests. Such expertise is likely positively correlated with better identifying and collating examples within a domain that are conceptually interesting to students and more widely generalizable. I reported on two studies, for example, which showed that greater expertise was associated with a significantly greater number of conceptual explanations, as opposed to “product oriented” (answer-getting) explanations—and these conceptual explanations resulted in the superior performance of students receiving them.

Second, I am sympathetic to the efficiency argument, as laid out here by the study’s authors—that is, I agree that we should focus in education on “trying to identify what is most effective when time is fixed.” Problematically, however, a wide variety of instructional actions can be informed by decisions about what is and isn’t “fixed.” Time is not the only thing that can be fixed in one’s experience. The intuition that students should “own their own learning,” for example, which undergirds the idea in the first place that students should generate their own examples, may rest on the more fundamental notion that students themselves are “fixed” identities that adults must work around rather than try to alter. This notion is itself circumscribed by the research summarized above. So, it is worth having a conversation about what should and should not be considered “fixed” when it comes to learning.

provided examples
Zamary, A., & Rawson, K. (2016). Which Technique is most Effective for Learning Declarative Concepts—Provided Examples, Generated Examples, or Both? Educational Psychology Review DOI: 10.1007/s10648-016-9396-9

Retrieval Practice with Kindle: Feel the Learn

I use Amazon’s free Kindle Reader for all of my (online and offline) book reading, except for any book that I really want that just can’t be had digitally. Besides notes and highlights, the Reader has a nifty little Flashcards feature that works really well for retrieval practice. Here’s how I do retrieval practice with Kindle.

Step 1: Construct the Empty Flashcard Decks

retrieval practice with Kindle

Currently I’m working through Sarah Guido and Andreas Müller’s book Introduction to Machine Learning with Python. I skimmed the chapters before starting and decided that the authors’ breakdown by chapter was pretty good—not too long and not too short. So, I made a flashcard deck for each chapter in the book, as shown at the right. On your Kindle Reader, click on the stacked cards icon. Then click on the large + sign next to “Flashcards” to create and name each new deck.

Depending on your situation, you may not have a choice in how you break things down. But I think it’s good advice to set up the decks—however far in advance you want—before you start reading.

So, if I were assigned to read the first half of Chapter 2 for a class, I would create a flashcard deck for the first half of Chapter 2 before I started reading. And, although I didn’t set titles in this example, it’s probably a good idea to give the flashcard deck a title related to what it’s about (e.g., Supervised Learning).

Step 2: Read, Just Read

retrieval practice with Kindle, retrieval practice with Kindle, retrieval practice with Kindle, retrieval practice with Kindle, retrieval practice with Kindle,

You still need to read and comprehend the content. Retrieval practice adds, it doesn’t replace. So, I read and highlight and write notes like I normally would. I don’t worry at this point about the flashcards, about what is important or not. I just read for the pleasure of finding things out. I highlight things that strike me as especially interesting and write notes with questions, or comments I want to make on the text.

retrieval practice with kindle

Read a section of the content represented by one flashcard deck. Since I divided my decks by chapter, I read the first chapter straight through, highlighting and making notes as I went.

The reading doesn’t have to be done in one sitting. The important thing is to just focus on reading one section before moving on to the next step.

Step 3: Create the Fronts for the Flashcards

Now, go through the content of your first section of reading and identify important concepts, items worth remembering, things you want to be able to produce. You’ll want to add these as prompts on your flashcards. You don’t necessarily have to write these all down in a list. You can enter a prompt on a flashcard, return to the text for another prompt, enter a prompt on another flashcard, and on and on.

retrieval practice with kindle

Screenshot 1

retrieval practice with kindle

Screenshot 2

retrieval practice with kindle

Screenshot 3

When you have at least one prompt, click on the flashcard deck and then click on Add a Card (Screenshot 1) and enter the prompt.

Enter the prompt at the top. (Screenshot 2) This will be the front of the flashcard you will see when testing yourself. Leave the back blank for the moment. Click Save and Add Another Card at the bottom right to repeat this with more prompts.

When you are finished entering one card or all the cards, click on Save at the top right. This will automatically take you to the testing mode (Screenshot 3), which you’ll want to ignore for a while. Click on the stacked cards icon to return to the text for more prompts. When you come back to the flashcards, your decks may have shifted, since the most recently edited deck will be at the top.

Importantly, though, Screenshot 3 is the screen you will see when you return and click on a deck. To add more cards from this screen, click on the + sign at the bottom right. When you are done entering the cards for a section, get ready for the retrieval practice challenge! This is where it gets good (for learning).

Step 4: Create the Backs for the Flashcards

Rather than simply enter the backs of the flashcards from the information in the book, I first fill out the backs by simply trying to retrieve what I can remember. For example, for the prompt, “Write the code for the Iris model, using K Nearest Neighbors,” I wrote something like this on the back of the card:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris_dataset = load_iris()
X_train, X_test, y_train, y_test = train_test_split(,

There are a lot of omissions here and some errors, and I moved things around after I wrote them down, but I tried as hard as I could to remember the code. To make the back of the card right, I filled in the omissions and corrected the errors. As I went through this process with all the cards in a section, I edited the fronts and backs of the cards and even added new cards as the importance of some material presented itself more clearly.

Create the backs of the flashcards for a section by first trying as hard as you can to retrieve the information asked for in the prompt. Then, correct the information and fill in omissions. Repeat this for each card in the deck.

Step 5: Test Yourself and Feel the Learn

One thing you should notice when you do this is that it hurts. And it should. In my view, the prompts should not be easy to answer. Another prompt I have for a different chapter is “Explain how k-neighbors regression works for both 1 neighbor and multiple neighbors.” My expectations for my response are high—I want to answer completely with several details from the text, not just a mooshy general answer. I keep the number of cards per chapter fairly low (about 5 to 10 cards per 100 pages). But your goals for retaining information may be different.

But once you have a set of cards for a section, come back to them occasionally and complete a round of testing for the section. To test yourself, click on the deck and respond to the first prompt you see without looking at the answer. Try to be as complete (correct) as possible before looking at the correct response.

To view the correct response, click on the card. Then, click on the checkmark if you completely nailed the response. Anything short of that, I click on the red X.

For large decks, you may want to restudy those items you got incorrect. In that case, you can click on Study Incorrect to go back over just those cards you got wrong. There is also an option to shuffle the deck (at the bottom left), which you should make use of if the content of the cards build on each other, making them too predictable.

For more information on retrieval practice, go talk to The Learning Scientists (on Twitter as @AceThatTest).

ResearchEd: Getting Beyond Appearances

Not from ResearchEd.

The Beuchet Chair shown at the right is a fun visual illusion—a trick involving distance and perspective—and illusions like it are solid, predictable go-tos for anyone trying to make the case for the importance of learning about science and research at events like ResearchEd.

The idea is to show you how appearances can be deceiving, how your own cognitive apparatus is not designed to present the world to you perfectly as it is, and that, most importantly, experiences alone, whether isolated or combined, do not reliably illuminate the hidden patterns and regularities which govern our lives and the natural world.

Once this doubt is sown, what we hope happens next is that you will re-evaluate your beliefs about the world as you continue to move through your life, strengthening some of them with better explanations and justifications, loosening the threads of others, and considering new beliefs and motives, too.

And central to this ongoing project for those of us both inside and outside of science are, I think, two tendencies, represented at some of the sessions I attended at the ResearchEd Washington event last week:

  1. The tendency to distrust the superficial, shallow, easy, or popular—those things that are, like the illusion above, true only from a limited perspective. It is the tendency to be dissatisfied with everyday explanations, short-term thinking, folk wisdom, and faith-based certainty.
  2. The tendency to seek out deep explanations rather than ephemeral ones—a preference for connected, theoretical (though still fallible), conceptual knowledge, which “constitutes the means society uses to transcend the limits of individual experience to see beyond appearances to the nature of relations in the natural and social world.”

Robert Pondiscio: Why Knowledge Matters

Robert Pondiscio’s session was as pure a distillation of this latter tendency as you’ll find. Robert memorably contrasted two reactions to President Obama’s inauguration: one which expressed an elation that the United States now had a president that looked like many underrepresented students, and one which expressed a deep connection to the nearly 50 years of American history that came full circle on January 20, 2009—a history that could not be accessed except by the knowledgeable.

He cautioned that the two reactions are not mutually exclusive, while still driving home the importance of conceptual knowledge and the school’s vital role in providing students access to it.

Knowing stuff is pretty exciting!

I was reminded, again, of scenes we often see when something of astronomical importance has just happened—that roomful of jubilant scientists at, say, the Jet Propulsion Laboratory.

Sure, the images of, say, the Mars Rover’s safe landing come along eventually. But pictures are not what gets these folks excited. It’s data. Data that says the Rover has entered orbit, has deployed the parachute, has fired its rockets. What causes all the excitement is, quite literally, knowledge.

The Learning Scientists: Teaching the Science of Learning

The Learning Scientists continued to reinforce the power of investigating the deep and often hidden patterns and regularities involved in education as they presented evidence for the benefits of spaced practice and retrieval practice on student learning.

Many lifetimes lived out in close proximity to children and students have failed to systematically reveal these robust effects on learning. Yet, stand back, apply a little (1) and (2) from above, and you get results that help overturn the destructive notion that the brain is like a tape recorder. While it would be a mistake to assume that a result is true just because it’s counterintuitive, results around spacing and retrieval often are, even to the participants in the study.

Dylan Wiliam, Ben Riley

What I took away from Dylan’s keynote and Ben’s presentation (and from the Learning Scientists’ session)—other than what they were about (info on Ben’s session here)—is that while I am attracted to those ideas in education that feature a suspicion of everyday thinking and a search for deeper regularities, it is absolutely vital that we have people in our community who can bring this search for general meaning to our everyday thinking (and not the other way around! which is essentially searching for empirical justification for low-level theorizing; also called just-so stories)—people who understand the realities of the classroom, where much of what is discovered in education science must play out. People who are much more diplomatic than I, but with whom I could easily find common cause.

Because we all have a desire to see learning science and other education research have a tangible, practical, positive effect on students’ (and teachers’) lives. But we can’t pull it off alone. We have such a great start in connecting research with practice in groups like ResearchEd!

Contiguity Effective for Deductive Inference

research post

The discourse that surrounds the technicalities in this paper contains an agenda: to convince readers that the benefits of retrieval practice extend beyond the boring old “helps you remember stuff” caricature to something more “higher order” like deductive inference. But I’m not convinced that the experiments show this. Rather, what they demonstrate fairly convincingly is that informational contiguity, not retrieval practice, benefits inference-making. A related result from the research, on the benefits of text coherence, is explained here.

The Setup

The arch-nemesis of this research is a paper by Tran, et al. last year, which appeared to show some domain limitations on retrieval practice:

They found that retrieval practice failed to benefit participants’ later ability to make accurate deductions from previously retrieved information. In their study, participants were presented sentences one at a time to learn . . . The sentences could be related to one another to derive inferences about particular scenarios. Although retrieval practice was shown to improve memory of the sentences relative to a restudy control condition, there was no benefit on a final inference test that required integration of information from across multiple sentences.


So, in this study, researchers replicated Tran et al.’s methods, except in one important way: they did not present the sentences to be learned one at a time but together instead.

Participants were each presented with four scenarios (two of which are outlined at right) consisting of seven to nine premises in the form of sentences. In each scenario, deductions to specific conclusions were possible. For two of the four scenarios, subjects used retrieval practice. They were given a chance to read the sentences in a scenario at their own pace and then were shown the sentences again for five minutes—in cycles where the order of the sentences was randomized. During this five-minute session, subjects were asked to type in the missing words in each premise (between one and three missing words). The complete sentences were then shown as feedback. Each participant used restudy for the other two scenarios. During the restudy five-minute session, subjects simply reread the premises again, in cycles again, with the order of the premises randomized for each cycle.

The Results and Discussion

Two days later, participants were given a 32-item multiple choice test which “assessed participants’ ability to draw logical conclusions derived from at least two premises within each scenario.” And consistent with the researchers’ hypothesis, the retrieval practice conditions yielded significantly better results on a test of deductive inference than did the restudy conditions.

Yet, it’s not at all clear that retrieval practice was the cause of the better performance with respect to inference-making. There was another cause preceding it: the improved contiguity of the presented information, as compared with Tran et al.’s one-at-a-time procedure. It’s possible that the effectiveness of retrieval practice is limited to recall of already-integrated information, and the contiguity of the premises in this study allowed for such integration, which, in turn, allowed retrieval practice to outperform restudy. It is a possibility the researchers raise in the paper and one that, in my view, the current research has not effectively answered:

However, other recent studies have failed to find a benefit of retrieval practice for learning educational materials (Leahy et al. 2015; Tran et al. 2015; Van Gog and Sweller 2015). These studies all used learning materials that required learners to simultaneously relate multiple elements of the materials during study and/or test. Such materials that are high in element interactivity need constituent elements to be related to one another in order for successful learning or task completion (element interactivity may also be considered as a measure of the complexity of materials, see (Sweller 2010)).

What we can say, with some confidence, is that even if the benefits of retrieval practice were limited to improvements in recall (as prior research has demonstrated), such improvements do not stand in the way of improvements to higher-order reasoning, such as inference-making. (And shaping the path for students, such as improving informational contiguity can have a positive effect too.)
Eglington, L., & Kang, S. (2016). Retrieval Practice Benefits Deductive Inference Educational Psychology Review DOI: 10.1007/s10648-016-9386-y

Inference Calls in Text

research post

Britton and Gülgöz (1991) conducted a study to test whether removing “inference calls” from text would improve retention of the material. Inference calls are locations in text that demand inference from the reader. One simple example from the text used in the study is below:

Air War in the North, 1965

By the Fall of 1964, Americans in both Saigon and Washington had begun to focus on Hanoi as the source of the continuing problem in the South.

There are at least a few inferences that readers need to make here. Readers need to infer the causal link between “the fall of 1964” and “1965,” they are asked to infer that “North” in the title refers to North Vietnam, and they need to infer that “Hanoi” refers to the capital of North Vietnam.

The authors of the study identified 40 such inference calls (using the “Kintsch” computer program) throughout the text and “repaired” them to create a new version called a “principled revision.” Below is their rewrite of the text above, which appeared in the principled revision:

Air War in the North, 1965

By the beginning of 1965, Americans in both Saigon and Washington had begun to focus on Hanoi, capital of North Vietnam, as the source of the continuing problem in the South.

Two other versions (revisions), the details of which you can read about in the study, were also produced. These revisions acted as controls in one way or another for the original text and the principled revision.

Method and Predictions

One hundred seventy college students were randomly assigned one of the four texts–the original or one of the three revisions. The students were asked to read the texts carefully and were informed that they would be tested on the material. Eighty subjects took a free recall test, in which they were asked to write down everything they could remember from the text. The other ninety subjects took a ten-question multiple-choice test on the information explicitly stated in each text.

It’s not at all difficult, given this set up, to anticipate the researchers’ predictions:

We predicted that the principled revision would be retrieved better than the original version on a free-recall test. This was because the different parts of the principled revision were more likely to be linked to each other, so the learner was more likely to have a retrieval route available to use…. Readers of the original version would have to make the inferences themselves for the links to be present, and because some readers will fail to make some inferences, we predicted that there would be more missing links among readers of this version.

This is, indeed, what researchers found. Subjects who read the principled revision recalled significantly more propositions from the text (adjusted mean = 58.6) than did those who read the original version (adjusted mean = 35.5). Researchers’ predictions for the multiple-choice test were also accurate:

On the multiple-choice test of explicit factual information that was present in all versions, we predicted no advantage for the principled revision. Because we always provided the correct answer explicitly as one of the multiple choices, the learner did not have to retrieve this information by following along the links but only had to test for his or her recognition of the information by using the stem and the cue that was presented as one of the response alternatives. Therefore, the extra retrieval routes provided by the principled revision would not help, because according to our hypothesis, retrieval was not required.

Analysis and Principles

Neither of the two results mentioned above are surprising, but the latter is interesting. Although we might say that students “learned more” from the principled revision, subjects in the original and principled groups performed equally well on the multiple-choice test (which tests recognition, as opposed to free recall). As the researchers noted, this result was likely due to the fact that repairing the inference calls provided no advantage to the principled group in recognizing explicit facts, only in connecting ideas in the text.

But the result also suggests that students who were troubled by inference calls in the text just skipped over them. Indeed, subjects who read the original text did not read it at a significantly faster or slower rate than subjects who read the principled revision and both groups read the texts in about the same amount of time. Yet, students who read the original text recalled significantly less than those who read the principled revision.

In repairing the inference calls, the authors of the study identified three principles for better texts:

Principle 1: Make the learner’s job easier by rewriting the sentence so that it repeats, from the previous sentence, the linking word to which it should be linked. Corollary of Principle 1: Whenever the same concept appears in the text, the same term should be used for it.

Principle 2 is to make the learner’s job easier by arranging the parts of each sentence so that (a) the learner first encounters the old part of the sentence, which specifies where that sentence is to be connected to the rest of his or her mental representation; and (b) the learner next encounters the new part of the sentence, which indicates what new information to add to the previously specified location in his or her mental representation.

Principle 3 is to make the learner’s job easier by making explicit any important implicit references; that is, when a concept that is needed later is referred to implicitly, refer to it explicitly if the reader may otherwise miss it.

Britton, B., & Gülgöz, S. (1991). Using Kintsch’s computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures. Journal of Educational Psychology, 83 (3), 329-345 DOI: 10.1037//0022-0663.83.3.329

Are Teaching and Learning Coevolved?

some hummingbirds coevolved with some flower species

Just a few pages in to David Didau and Nick Rose’s new book What Every Teacher Needs to Know About Psychology, and I’ve already come across what is, for me, a new thought—that teaching ability and learning ability coevolved:

Strauss, Ziv, and Stein (2002) . . . point to the fact that the ability to teach arises spontaneously at an early age without any apparent instruction and that it is common to all human cultures as evidence that it is an innate ability. Essentially, they suggest that despite its complexity, teaching is a natural cognition that evolved alongside our ability to learn.

Or perhaps this is, even for me, an old thought, but just unpopular enough—and for long enough—to seem like a brand new thought. Perhaps after years of exposure to the characterization of teaching as an anti-natural object—a smoky, rusty gearbox of torture techniques designed to break students’ wills and control their behavior—I have simply come to accept that it is true, and have forgotten that I had done so.

Strauss, et. al, however, provide some evidence in their research that it is not true. Very young children engage in teaching behavior before formal schooling by relying on a naturally developing ability to understand the minds of others, known as theory of mind (ToM).

Kruger and Tomasello (1996) postulated that defining teaching in terms of its intention—to cause learning, suggests that teaching is linked to theory of mind, i.e., that teaching relies on the human ability to understand the other’s mind. Olson and Bruner (1996) also identified theoretical links between theory of mind and teaching. They suggested that teaching is possible only when a lack of knowledge can be recognized and that the goal of teaching then is to enhance the learner’s knowledge. Thus, a theory of mind definition of teaching should refer to both the intentionality involved in teaching and the knowledge component, as follows: teaching is an intentional activity that is pursued in order to increase the knowledge (or understanding) of another who lacks knowledge, has partial knowledge or possesses a false belief.

The Experiment

One hundred children were separated into 50 pairs—25 pairs with a mean age of 3.5 and 25 with a mean age of 5.5. Twenty-five of the 50 children in each age group served as test subjects (teachers); the other 25 were learners. The teachers completed three groups of tasks before teaching, the first of which (1) involved two classic false-belief tasks. If you are not familiar with these kinds of tasks, the video at right should serve as a delightfully creepy precis—from what appears to be the late 70s, when every single instructional video on Earth was made. The second and third groups of tasks probed participants’ understanding that (2) a knowledge gap between teacher and learner must exist for “teaching” to occur and (3) a false belief about this knowledge gap is possible.

Finally, children participated in the teaching task by teaching the learners how to play a board game. The teacher-children were, naturally, taught how to play the game prior to their own teaching, and they were allowed to play the game with the experimenter until they demonstrated some proficiency. The teacher-learner pair was then left alone, “with no further encouragement or instructions.”

The Results

Consistent with the results from prior false-belief studies, there were significant differences between the 3- and 5-year-olds in Tasks (1) and (3) above, both of which relied on false-belief mechanisms. In Task (3), when participants were told, for example, that a teacher thought a child knew how to read when in fact he didn’t, 3-year-olds were much more likely to say that the teacher would still teach the child. Five-year-olds, on the other hand, were more likely to recognize the teacher’s false belief and say that he or she would not teach the child.

Intriguingly, however, the development of a theory of mind does not seem necessary to either recognizing the need for a special type of discourse called “teaching” or to teaching ability itself—only to a refinement of teaching strategies. Task (2), in which participants were asked, for instance, whether a teacher would teach someone who knew something or someone who didn’t, showed no significant differences between 3- and 5-year-olds in the study. But the groups were significantly different in the strategies they employed during teaching.

Three-year-olds have some understanding of teaching. They understand that in order to determine the need for teaching as well as the target learner, there is a need to recognize a difference in knowledge between (at least) two people . . . Recognition of the learner’s lack of knowledge seems to be a necessary prerequisite for any attempt to teach. Thus, 3-year-olds who identify a peer who doesn’t know [how] to play a game will attempt to teach the peer. However, they will differ from 5-year-olds in their teaching strategies, reflecting the further change in ToM and understanding of teaching that occurs between the ages of 3 and 5 years.

Coevolution of Teaching and Learning

The study here dealt with the innateness of teaching ability and sensibilities but not with whether teaching and learning coevolved, which it mentions at the beginning and then leaves behind.

It is an interesting question, however. Discussions in education are increasingly focused on “how students learn,” and it seems to be widely accepted that teaching should adjust itself to what we discover about this. But if teaching is as natural a human faculty as learning—and coevolved alongside it—then this may be only half the story. How students (naturally) learn might be caused, in part, by how teachers (naturally) teach, and vice versa. And learners perhaps should be asked to adjust to what we learn about how we teach as much as the other way around.

Those seem like new thoughts to me. But they’re probably not.

Strauss, S., Ziv, M., & Stein, A. (2002). Teaching as a natural cognition and its relations to preschoolers’ developing theory of mind Cognitive Development, 17 (3-4), 1473-1487 DOI: 10.1016/S0885-2014(02)00128-4

Problem Solving, Instruction: Chicken, Egg

problem solving before instruction

We’ve looked before at research which evaluated the merits of different instructional sequences like problem solving before instruction.

In this post, for example, I summarized a research review by Rittle-Johnson that revealed no support for the widespread belief that conceptual instruction must precede procedural instruction in mathematics. The authors of that review went so far as to call the belief (one held and endorsed by the National Council of Teachers of Mathematics) a myth. And another study, summarized in this post, finds little evidence for another very popular notion about instruction—that cognitive conflict of some kind is a necessary prerequisite to learning.

The review we will discuss in this post looks at studies which compared two types of teaching sequences: problem solving followed by instruction (PS-I) and instruction followed by problem solving (I-PS). As far as horserace comparisons, the main takeaway is shown in the table below. Each positive (+) is a study result which showed that PS-I outperformed an I-PS control, each equals sign (=) a result where the two conditions performed the same, and each negative (–) a result where I-PS outperformed PS-I.

Procedural Conceptual Transfer

= = = = = = = = =

– –

+ + + +

= = =

– –

+ + + +

= = = =

Summary of learning outcomes for PS-I vs. I-PS.

Importantly, 8 of the results reviewed are not represented in the table above. In these results, the review authors suggest, participants in the PS-I conditions were given better learning resources than those in the I-PS conditions. This difference confounded those outcomes (see Greg’s post on this) and, unsurprisingly, added 15 plusses, 7 equals, and just 1 negative to the overall picture of PS-I.

Needless to say, when research has more fairly compared PS-I with I-PS, it has concluded that, in general, the sequence doesn’t matter all that much, though there are some positive trends on conceptual and transfer assessments for PS-I. Even if we ignore the Procedural column, roughly 55% of the results are equal or negative for PS-I. It really doesn’t seem to matter all that much whether you place problem solving before instruction or not.

Contrasting Cases and Building on Student Solutions

Horserace aside (sort of), an intriguing discussion in this review centers around two of the confounds identified above—those extra benefits provided in some studies to learners in the ‘problem solving before instruction’ conditions. They were (1) using contrasting cases during problem solving and (2) building on student solutions during instruction. Here the authors describe contrasting cases (I’ve included their example from the paper):

problem solving before instruction

Contrasting cases consist of small sets of data, examples, or strategies presented side-by-side (e.g., Schwartz and Martin 2004; Schwartz and Bransford 1998). These minimal pairs differ in one deep feature at a time ceteris paribus [other things being equal], thereby highlighting the target features. In the example provided in the right column of Table 2, the datasets of player A and player B differ with regard to the range, while other features (e.g., mean, number of data points) are held constant. The next pair of datasets addresses another feature: player B and C have the same mean and range but different distribution of the data points.

There’s something funny about this, I have to admit, given the soaring rhetoric one encounters in education about the benefits of “rich” problems and the awkwardness of textbook problems. Although they are confounds in these studies, contrasting cases manage to be helpful to learning in PS-I as sets of (a) small, (b) artificial problems which (c) vary one idea at a time. “Rich” problems, in contrast, do not show the same positive effects.

And here, some more detail about building on student solutions in instruction. The only note I have here is that it seems worthwhile to point out the obvious: that this confound which also improves learning in ‘problem solving before instruction’ has almost everything to do with the I, and very little to do with the PS:

Another way of highlighting deep features in problem solving before instruction is to compare non-canonical student solutions to each other and to the canonical solution during subsequent instruction (e.g., Kapur 2012; Loibl and Rummel 2014a). Explaining why erroneous solutions are incorrect has been found to be beneficial for learning, in some cases even more than explaining correct solutions (Booth et al. 2013). Furthermore, the comparison supports students in detecting differences between their own prior ideas and the canonical solution (Loibl and Rummel 2014a). More precisely, through comparing them to other students’ solution and to the canonical solution, students experience how their solution approaches fail to address one or more important aspects of the problem (e.g., diSessa et al. 1991). This process guides students’ attention to the deep features addressed by the canonical solution (cf. Durkin and Rittle-Johnson 2012).
Loibl, K., Roll, I., & Rummel, N. (2016). Towards a Theory of When and How Problem Solving Followed by Instruction Supports Learning Educational Psychology Review DOI: 10.1007/s10648-016-9379-x