Like everybody in L.A., I was intrigued when our local paper-of-record claimed to have combed through the publicly-available data on all the teachers in the LAUSD and computed the relative educational “value” each contributed to his students, as measured by year-over-year test score improvements:
In essence, it projects a child’s future performance by using past scores — in this case, on math and English tests. That projection is then compared to the student’s actual results. The difference is the “value” that the teacher added or subtracted. Comparing each student to him or herself in the past largely controls for differences in students’ backgrounds.
Elana and I, researching the schools near our new apartment, recently stopped to think seriously about this last sentence, and what it means in real life.
For the past year, we’ve lived in a largely Hispanic neighborhood on the east side. Many of our neighbors are first- or second-generation immigrants, and many speak Spanish in the home. Apparently a large number of children in our area don’t speak English as their first language, because of the 603 students at the nearby Loreto Street Elementary School, according to LAUSD statistics, a whopping 268 are defined as “English learners.” Unsurprisingly, when half your students don’t properly understand instruction, academic performance lags. The school is given a 4/10 rating by the nonprofit greatschools.org, which reports that in grades 2 and 3, students at Loreto are drastically below-average in English and math, but that they come very close to catching up by 5th grade — exactly what you would expect from a school half of whose student body are still learning the language of instruction when they start school.
The Times claims to have accounted for English-language deficiencies:
In addition to students’ past test scores, value-added models can adjust for a variety of factors to measure a teacher’s effect on students. In our first version, we also took into account gender, poverty, number of years in the district and whether a student was classified as an English-language learner. Our new model includes further adjustments for the educational attainment of the student’s parents, class size, student mobility and five levels of English proficiency.
And in the semi-academic paper that further explains the methodology, there’s this paragraph explaining the measurement of “peer composition” as a teacher-independent factor:
Classroom peers may influence student academic progress. Individual students might learn less in classes with large concentrations of disadvantaged students or ELLs [English language learners]. The peer composition is based on proportion of students with particular individual characteristics, for example, the proportion of beginning ELLs or the proportion of parents with college degrees. The analysis included the full range of measured individual characteristics.
But this seems to me to misstate the importance of having a preponderance of ELLs in the classroom in terms of year-over-year change, which is what this supposed metric of teacher effectiveness is ultimately trying to measure. It seems to me that the presence of large numbers of ELLs would actually inflate teacher effectiveness in producing change over time, because the students are constantly improving their ability to understand.
Consider a hypothetical student who starts at Loreto Street with practically no knowledge of English. He starts taking ESL classes in kindergarten, and by 2nd grade, which is when testing starts, he’s reasonably fluent in everyday conversation, but he still makes a lot of grammatical mistakes and has little understanding of the more technical and formal English he encounters on the standardized tests. But by 5th grade, he’s had three more years of reading, writing, and interacting with teachers. He’s certainly got enough English to have mastered the limited and technical vocabulary of math classes, and he’s closing the gap in “English Language Arts” (ELA). Thus, as the methodological paper notes,
ELA achievement is inversely correlated with the concentration of ELLs in low proficiency categories, but math achievement is less sensitive to the mix of ELLs in the class.
It’s obviously possible, in theory, to “factor out” the influence of a high concentration of ELLs in the lower grades on the year-over-year improvements, which is what the researchers claim to have done here. But this misses (I think) the larger point that schools like Loreto (and its individual teachers) start with a large population of students who are simply further behind at the beginning of the year than comparable students at other schools. This means that the amount of improvement possible for students in a given year is substantially greater than it would be for students from more advantageous backgrounds.
For example, consider Gary Fong, an award-winning teacher whose students get “consistently high scores on the California Standards Test.” Fong teaches at Clover Avenue Elementary, which is around the corner from our new apartment. Clover gets a 10/10 from greatschools.org and is rated by the state as having an API (a single-number rating of school quality devised by the state) of 941/1000, compared to Loreto’s 758. (The target number is 800.) Clover gets a big boost from the massive blocks of UCLA graduate student housing within its geographic district, which ensure a high level of educational attainment among the school’s parents. It also can’t hurt that many of them are semi-employed academics, which translates to more time for engagement with the school as volunteers. Finally, Clover has only 100 or so English Language Learners, and at least some of those are children of foreign graduate students, who for obvious reasons are going to be in a good position to reinforce English learning in the home.
Given all these advantages, of course, it’s no surprise that Gary Fong’s students consistently do well on standardized tests, or that Clover’s test scores are vastly superior to the state average. (The number of students meeting state standards at Clover is generally above 80%, while the percentages statewide hover in the 50’s and 60’s.) But that also means that Gary Fong is at a huge disadvantage when it comes to trying to improve his students’ scores, because they already come in operating at or near maximum capacity. They’re already hugging the far right end of the bell curve, so it’s unlikely they’re going to improve as much, even under a great teacher, as students who come in scoring far below what their innate talents probably allow. (Fong is also the Gifted program teacher, which only exacerbates this problem.)
Yet according to the L.A. Times‘ analysis, Fong is a “less effective than average” teacher, and Clover is a “less effective than average” school. (Loreto Street, on the other hand, is considered “a most effective school.” Yes, that is really the Gilbert&Sullivanesque language the Times has decided to go with.)
How do we square this with the school’s outstanding reputation and strong test scores? No doubt the researchers working with the Times would say that their measurements can only show the degree of academic improvement that can be imputed to the teacher or school, and that they can’t help the fact that some schools and teachers start off with kids who just don’t need that much improvement. But by using the language of “effectiveness,” they’ve made it appear that these ratings measure some sort of objective pedagogical ability, which is probably not the case.
Ultimately, parents are probably not better-prepared to make decisions about their children’s education after having read the Times‘ analyses. More disturbing, though, is that the Times has portrayed its teacher rankings as a kind of public accountability project, holding public employees accountable through standardized measurements of their effectiveness at their jobs. Given the almost-certain distortions of the data by factors that don’t appear to have been controlled for, it’s hard to see this as a valuable service. But it certainly provides ammunition for the “teacher accountability” movement, because it gives the (probably illusory) sense that a bureaucrat at LAUSD headquarters or in Sacramento or Washington could get a meaningful sense of a teacher’s effectiveness by looking at a few numbers. (Probably the smarter move, in terms of getting more control of bad teachers, is to give principals, who can observe teachers in much finer detail, more freedom in hiring and firing, as Geoffrey Canada argues in The Lottery.)
And all of this, of course, is to say nothing of the occasional mess like the one experienced by Paul Barger, a substitute teacher whose “value-added” score was determined solely by one class that he took over two weeks before testing, after the original teacher had fallen ill and the students had already had 50 different substitutes over the course of the year. Yipes.