Jersey Jazzman: DFER's Misleading and Innumerate "Research"

Saturday, June 1, 2013

DFER's Misleading and Innumerate "Research"

Bruce Baker has sort of cornered the market on critiquing stupid education policy graphs, so I feel like maybe I need to pay him a royalty or something when I say this:

Democrats For Education Reform has produced quite possibly the year's dumbest education policy graph.

Ladies and gentlemen, feast your eyes (p.4):

Let's go right to the text of the policy brief this comes from, written by Mac LeBuhn, and see if we can figure out what the young gentleman is trying to say:

The hypothetical graph above...

Whoa, whoa, WHOA, WHOA! "Hypothetical"?! You mean this graph doesn't actually show anything real? You just made this up? This graph is based on nothing?!

Yep. Keep that in mind as we continue:

The hypothetical graph above offers another way to visualize administrators’ frustration with observation feedback. The x-axis displays scores on two measures of teacher effectiveness, with a score of 1 being the lowest and a score of 5 being the highest. Under the student achievement measure, teachers are distributed across the range scores, with some high-performers, some low-performers, and most falling somewhere in the middle. In contrast, the observation scores are lopsided: few teachers are identified under the lowest rating and many more are described as excellent.

Let's start with a pet peeve. Some of you who are visually astute might have noticed the blue line approximates a normal distribution or "bell curve" - except it's not smooth. Both lines start, end, and angle in only five places, implying that there are only five data points. That's because the "scores" are what statisticians call "ordinal" measures: they are rankings, with no spaces between, and no way of knowing how far apart each "ranking" is from each other.

So a linear graph is completely inappropriate here. Which, I guess, is nit-picking, as this entire thing is completely made up. But we'll still push on...

Notice where the data points are for each of the scores: if you were to add up all five in each line, you'd get 100%. That makes sense... except why would the "student achievement score" be so perfectly distributed? In other words, how do we know there are just as many "5s" as there are "1s" when measuring the teacher impact on student achievement?

The answer is: we don't. What LeBuhn is giving us is a forced ranking; he is forcing the "student achievement scores" into a normal distribution, whether they belong there or not.

I've been going on for a while about a methodology for ranking students and teachers based on their "growth" in test scores: Student Growth Percentiles, or SGPs. These are important here in New Jersey, because the Education Commissioner has been pushing them for use in teacher evaluation, even though they are completely inappropriate and even their "inventor" says they can't be used to determine a teacher's effect on test score outcomes.

The problem, as I've shown before, is that SGPs force a ranking of teachers into a normal distribution. In other words, somebody's got to lose: even if every kid in the state showed "growth" in standardized tests, SGPs force some kids to the bottom, because they are ranked against each other. And that means that some teachers are forced to the bottom on their evaluations.

What does this have to do with DFER's graph? Well, the state where SGPs really got going is Colorado, thanks to State Senator Mike Johnston. It's clear that the Senator has no statistics or education research background; otherwise he never would have promoted this inappropriate measure for use in the Colorado teacher evaluation system. But here's the thing: guess who interned his office on this policy? Yes, that's right: Mac LeBuhn, the author of this DFER policy brief.

Undoubtedly, LeBuhn is using an SGP model for his contention that "student achievement scores" in a teacher evaluation follow a normal distribution. But that is only because they have been forced into that distribution! Since LeBuhn is so keen on hypothetical graphs, let me show him one of mine* from a previous post:

It's possible that every teacher's students could show "growth," but not as much growth as other teachers. And the "growth" could be unevenly distributed. LeBuhn's chart is not a reflection of the reality of students' "growth"; it is a forced ranking that is both misleading and mathematically unsound. The premise of LeBuhn's argument is that teacher observations don't match up with student achievement scores - but those scores are artificial. His entire thesis is based on an unproven assumption.

It speaks volumes that DFER thinks they are contributing something helpful by putting out briefs like this. All they are doing is muddy the waters - probably deliberately.

* Note to Rick Hess: this is how you use a hypothetical example. Got it?

1 comment:

Unknown said...: I will NEVER forget earning a 94 in my college psychology course and having the prof give me 'B' because "there were too many A's"! I went to the dean and got my A back because we weren't warned about this.
This is no different.; June 2, 2013 at 11:28:00 PM PDT