So Easy a “Housewife” Could Do It?

Two years before the United States put men on the moon (to put some context to the word “housewife” in the title), William James Popham and colleagues conducted two very interesting—and to a reader in the 21st century, bizarre—education experiments in southern California which were designed to validate a test they had developed to measure teaching proficiency.

Instructors in both studies were given a specific set of social science objectives, a list of possible activities they could use, and then time with high-school students. Each instructor’s “proficiency” relied solely on how well students did on a post-test of those objectives after instruction, relative to a pre-test given before instruction. Thus, rather than focusing on how well an instructor followed “good teaching procedures,” his or her performance as a teacher was measured only by student growth from pre-test to post-test after one instructional session.

What’s fascinating about these experiments is to whom the researchers compared experienced teachers with regard to achieving these instructional objectives: “housewives” and college students:

Our plan has been to attempt a validation wherein the performance of nonteachers (housewives or college students, for example) is pitted against that of experienced teachers. The validation hypothesis predicts that the experienced teachers will secure better pupil achievement than will the nonteachers. This particular hypothesis, of course, is an extremely gross test in that we wish to secure a marked contrast between (1) those that have never taught, and (2) those who have taught for some time.

Keep in mind that the purpose of this study was not simply to compare the instructional performances of teachers and nonteachers; it was to see if the measure they had developed (student growth) would pick up a difference between the groups. In theory, of course (their theory), differences in student growth between teacher-taught and nonteacher-taught students should be noticeable on a test purporting to measure teacher proficiency.

It is also worth emphasizing that instructional approaches were not prescribed by the researchers. The various instructors were simply given a list of suggested activities, which could have been immediately thrown away if the instructor so chose.

Results, People, and Procedures

Let’s just skip to the results and work our way out from there. In short, there was no difference between experienced teachers and either “housewives” or college students in effecting student growth. This first table compares experienced teachers and “housewives.” It shows the “student growth” from pre-test to post-test by instructor:

Subjects n Mean Post-Test Mean Pretest
1 4 59.2 11.2
2 4 58.2 8.4
3 3 57.6 12.3
4 4 67.2 14.6
5 4 51.5 9.8
6 3 60.0 15.8
1 4 56.5 9.5
2 3 59.3 12.4
3 4 63.7 14.2
4 3 64.0 14.3
5 4 58.0 12.3
6 4 61.7 8.9

The 6 “experienced teachers” in the study were actually student teachers, yet these represented half of just 6.5% of candidates who met the following criteria: “(1) they were social science majors; (2) they had completed at least one quarter of student teaching in which they were judged superior by their supervisors; and (3) they had received a grade of ‘A’ in a pre-service curriculum and instruction class which emphasized the use of behavioral objectives and the careful planning of instruction to achieve these goals.”

The 6 nonteachers in this first experiment were “housewives” who (1) were not enrolled in school at the time of the study, (2) did not have teaching experience, (3) completed at least 2 years of college, and (4) were social science majors.

As you can no doubt already tell, the whole gestalt here is very Skinnerian, very “behavioral” (it’s the mid-60s!) so I’ll just quote selectively from the article about the procedure used in the first experiment:

The subjects [the instructors] were selected three weeks prior to the day of the study. . . . All subjects were mailed a copy of the resource unit, including the list of 13 objectives and sample test items for each. An enclosed letter related that the purpose of the study was to try out a new teaching unit for social studies. They also received a set of directions telling them where to report, that they would have six hours in which to teach, that they would be teaching a group of three or four high school students, and that they should try to teach all of the objectives. . . .

Learners reported at 8:45 in the morning, and the Wonderlic Personnel Test, a 12 minute test of mental ability, was administered. The learners were next allowed 15 minutes to complete the 33 item pretest . . . Students were then assigned and dispersed to their rooms.

At 9:30 a.m. all learners and teachers were in their designated places and each of the 12 teachers commenced his instruction. After a 45 minute lunch break, instruction was resumed and continued until 4:00 p.m., at which time the high school students . . . were first given the 68 item post-test measuring each of the explicit objectives. They next completed a questionnaire (found in Appendix D) designed to measure their feelings about the content of the unit and the instruction they received. [No significant “affective” differences between the teachers and nonteachers either.]

The next table compares experienced teachers and college students on mean post-test scores, a comparison that was conducted in a second experiment. Only a post-test was given:

Class Teacher Nonteacher
1 31.49 36.89
2 33.14 33.79
3 33.07 31.26
4 35.07 35.85
5 37.78 34.47
6 34.61 30.75
7 28.41 34.54
8 34.51 30.82
9 35.86 34.76
10 32.43 27.63
11 29.25 30.16
12 31.79 31.13
13 31.79 27.86

This experiment went down differently than the first experiment, but it is worth mentioning that it was designed to remedy some of the possible weaknesses of the first experiment. (You can read the researchers’ rationale in the study embedded above.)

In particular, the experienced teachers in the second study were working teachers. The college students (all female) were all social science majors or minors who had completed at least two years of college and had neither teaching experience nor experience with college coursework in education. There were other minor differences in the second experiment, which you can read about in the study, but the most significant was that the instruction time was reduced from 6 hours to 4.


It’s worth quoting the researchers’ interpretation of both studies in detail (emphasis is the authors’). It’s pretty comprehensive, so I think I’ll let this stand as its own section:

Some of the possible reasons for the results obtained in the initial validation study were compensated for by adjustments in the second study. We cannot, therefore, easily explain away the unfulfilled prediction on the basis of such explanations as “The study should have been conducted in a school setting,” or “The nonteachers were too highly motivated.” Nor can we readily dismiss the lack of differences between teachers and nonteachers because of a faulty measuring device. The internal consistency estimates were acceptable and there was sufficient “ceiling” for high learner achievement.

Indeed, in the second validation study the teacher group had several clear advantages over their nonteacher counterparts. They were familiar with the school setting, e.g., classroom facilities, resource materials, etc. They knew their students, having worked with them for approximately three weeks prior to the time the study was conducted. Couple these rather specific advantages with those which might be attributed to teaching experience (such as ability to attain better classroom discipline, ease of speaking before high school students, sensitivity to the learning capabilities of this age group, etc.) and one might expect the teachers to do better on this type of task. The big question is “Why not?”

Although there are competing explanations, such as insufficient teaching time, the explanation that seems inescapably probable is the following: Experienced teachers are not experienced at bringing about intentional behavior changes in learners. . . .

Lest this sound like an unchecked assault on the integrity of the teaching profession, it should be quickly pointed out that there is little reason to expect that teachers should be skilled goal achievers. [No unchecked assault here!] Certainly they have not been trained to be; teacher education institutions rarely foster this sort of competence. There is no premium placed on such instructional skill; neither the general public nor professional teachers’ groups attach any special importance to the teacher’s attainment of clearly stated instructional objectives. Whatever rewards exist for the teacher in his typical school environment are not dependent upon his skill in promoting measurable behavior changes in learners. Indeed, the entire educational establishment seems drawn to any method of rewarding instructors other than by their ability to alter the behavior of pupils.

So there you have most of the authors’ interpretations of the results. What’s your interpretation?

Update I: The study linked below is not the same as the one embedded above, but it’s so closely related that I thought it okay to use it (Research Blogging couldn’t find the citation to the above study). The below study has the same basic design as the one above, except the domain was “vocational studies,” like shop and home-ec. In that study as well, no significant difference was reported between teachers and non-teachers.

Update II: Just to be perfectly fair, the tenor of the quoted writing above is not reflective of its author’s current views with regard to education. Here’s an interview with Mr Popham in 2012.

Image mask credit: Classic Film

Popham, W. (1971). Performance Tests of Teaching Proficiency: Rationale, Development, and Validation American Educational Research Journal, 8 (1), 105-117 DOI: 10.3102/00028312008001105


Published by

Josh Fisher

Instructional designer and editor for K-12 mathematics. My research interests center mostly around mathematics education.

Leave a Reply