# Just Some Data

I‘ve got nothing much lately. Here’s some data I’ve been playing with from the Department of Ed. It might take a second or two (or ten) to load.

These are data showing school-wide (all grade levels) state-assessment mathematics achievement for over 68,000 schools in the United States for the 14–15 school year. Each point represents a school, and each school’s location on the plot represents (x) the percent of male students at the school scoring at or above proficient and (y) the percent of female students at the school scoring at or above proficient.

You’ll notice some rectangularity to the data. This is due to the fact that many of the percent-proficient values were given as ranges. For each gender reported, I translated the data to the top value of the range. So, if a school reported 50–54 percent at or above proficient for females and 50–54 percent at or above proficient for males, that school would be placed at (54, 54).

Another noticeable feature of the plot is that it doesn’t look at all like there are over 68,000 points represented. This is because many of the values are stacked on top of each other. The lightest shade of blue that is present on the plot is the color of every data point, so if you’re seeing dark blue, you’re likely seeing 4 or 5 schools all at one location.

The data cut straight down the middle, as you might expect—perhaps much closer to the middle than might be expected. So, in general, the scores for males and females on state math assessments are very close. The regression line is $$\mathtt{y = 0.9396x + 3.98204}$$, which shows an almost indiscernible advantage for the boys across all these data.

The regression line shows us that the data create a prediction that, given a male percent proficient or above from 0% to about 65%, you would predict a better female performance. From 67% upward, that prediction is reversed. You can see from the data points that what seems to weigh the line downward is what you might call outlier male-female disparities at the top of the range.