Food for thought: A warning label on value-added testing?

March 06, 2009

Audrey Amrein-Beardsley has a caveat when it comes to measuring young students’ growth from one year to the next. The Arizona State University assistant professor of teacher preparation has a buyer-beware message when it comes to the legitimacy of the Education Value-Added Assessment System (EVAAS) that has become the nation’s standard in evaluating the knowledge that districts, schools and teachers add to young minds as pupils progress through schools.

Her voice is being heard. Recently raising the issue in the prestigious Educational Researcher (March, 2008), she is preparing another provocative piece, what she calls “part two,” for Phi Delta Kappan, a professional journal for education that advocates for research-based school reform.

A former middle and high school math teacher who earned her Ph.D. in educational policy and research methods from ASU in 2002, Amrein-Beardsley argues that, although EVAAS is probably the most sophisticated value-added model, it has flaws that must be addressed before receiving blanket acceptance. Included are the lack of external reviews, its insufficient user-friendliness, and methodological issues surrounding missing data and student background variables.

For example, she asks if using students’ state standardized test scores – from the spring of one year to the spring of the following year – can be used to accurately evaluate the “value” teachers added to students’ learning. Among the variables she offers are students switching from one teacher to another during that time, the loss of varying levels of knowledge over summer breaks, and the differences between classes with team teachers vs. those with individualized instruction. Additionally, students may take multiple classes in one subject area with two different teachers with two differential impacts on students’ learning. Amrein-Beardsley asks if with all of this and other extraneous variables impacting students inside and outside of the school doors, value-added analysis of test scores truly reveal an objective reality.

As value-added research designs continue to attract attention and gain in popularity, they have undergone increased scrutiny with the education profession, and Amrein-Beardsley warns that the model, in her opinion, is “promising way more than it can deliver.”

Like Food and Drug Administration (FDA) warning labels that appear on food items in grocery stores across the country, Amrein-Beardsley believes the education consumer deserves the same type of protection.

“Might the FDA approach also serve as a model to protect our country’s intellectual health?,” she asks. “Might this be a model educational leaders follow when they pass legislation or policies, the benefits and risks of which are unknown?

“I think these questions come at a critical time with a new presidential administration, a chaotic economy, and the future of No Child Left Behind unknown.”

EVAAS is currently taking over as a way to guage Adequate Yearly Progress (AYP), as required by the No Child Left Behind Act of 2001. The act mandates that all states measure student academic achievement using standardized tests and that they report on yearly progress. The question being asked by Beardsley and her peers is what kind of measurement system should be used to judge AYP.

Amrein-Beardsley, nationally recognized for her research on high-stakes testing policies and issues related to teacher quality and both of their effects on student academic achievement, says studies must be conducted to determine if EVAAS works in the ways professed, if the inferences drawn from the results are valid, and if the results just plain make sense.

“The question is whether EVAAS works as advertised and deserves a methodological stamp of approval,” says Amrein-Beardsley. “Are the claims made by the developers of EVAAS in fact true, and does the assessment method work in the ways purported? The answer is no, and yet the model continues to be oversold without the strong validation studies needed and being called for.”

ASU Regents Professor of Education David Berliner says Amrein-Beardsley is a “gutsy scholar” who is willing to do the hard work of looking deeply at another person’s work and then upholding the best norms of science.

“To be a skeptic is the proper position to take as a scientist,” says Berliner, who is a past president of the American Educational Research Association (AERA). “Science advances and corrects itself when the work one does is subject to criticism.

“She is an active and a creative scholar with a wide range of interests and strong methodological skills, and she is working in one of the most important areas of education and methodology right now. Value-added ideas are ‘hot,’ so we need to get it right for the nation.”

Amrein-Beardsley says that for the EVAAS model to work, it must be statistically sophisticated; however, as it becomes more complicated, it becomes less user-friendly. She notes that confusing data reports and a lack of training for teachers and administrators in how to understand the data reports prevent schools and teachers from using value-added data to improve student learning and achievement. “The EVAAS value-added model is stuck on the horns of this dilemma,” she says.

The sixth-year ASU faculty member points to a common set of standards for high-stakes testing that are among 12 issued by AERA in 2000, all of which pertain to the EVAAS value-added model. These guidelines for defensible assessments include: high-stakes decisions should not be made on the basis of a single test score; high-stakes tests must be validated for each intended use; the negative side-effects of a high-stakes assessment program must be fully disclosed to policy makers; the accuracy of achievement levels must be established; students with disabilities must be appropriately tended to; and the intended and unintended effects of the testing program must be continuously evaluated and disclosed.

“These are time-tested principles and commitments that should be applied to all large-scale assessment systems, especially when high stakes are attached to the test results,” she says. “These questions have yet to be addressed satisfactorily by EVAAS developers.”

Mari Koerner, dean of ASU’s College of Teacher Education and Leadership says Amrein-Beardsley’s research has never been more important, considering the fragile state of the economy.

“Almost all evaluations for all federally funded programs use this methodology and believe all its results,” says Koerner. “Value-added evaluations often focus only on certain kinds of programs, ignoring research that shows other methods that are effective. We may be wasting money and time relying on instruction that will, later down the road, be shown to be ineffective.”

Education Mary Lou Fulton College for Teaching and Learning Innovation