Showing posts with label Testing. Show all posts
Showing posts with label Testing. Show all posts

Wednesday, November 30, 2016

Teachers Should Teach to the Test

Should teachers teach to the test? Some say of course we should, in order to give students the best chance for achieve their highest potential score. Some have even made teaching to the test a lucrative business. Schools are sacrificing more and more instructional time to test prep. Others say that teaching to the test games the outcome in favor of some students without actually reflecting the acquisition of real knowledge or achievement. Who is correct?

First, we must be careful to distinguish between tests teachers write covering material they themselves taught, and standardized tests. Standardized test are not written by the teacher who is teaching the material, and indeed, it is considered cheating if teachers see the questions ahead of time. Teacher-written tests cover a specific subset of content. The purpose of the test is to evaluate the students’ learning of that specific knowledge. Theoretically, if everyone in the class masters the material, everyone can potentially score 100%. Practically, teachers try to have a mix of harder and easier questions in order to differentiate levels of mastery. However, there should not be any questions outside the subset domain.

Standardized tests are very different. Test designers try to ensure that half the students will score above the target median and half below. From the students’ point of view, they perceive right away that it feels like they do not know half the questions. The realization often makes them feel inadequate and creates much of the test anxiety surrounding standardized test. I have found that explaining the difference between the test I write and standardized tests relieves much of the anxiety.

There is, of course, no point in explaining jargon like normative evaluation, median, etc. It is sufficient to simply say that the people who wrote the bubble test wrote it for lots and lots of students who have been taught by lots and lots of teachers. The writers really have no idea what I taught or how I taught it. So the writers write lots of question that they expect no one will know the answer. In fact, they write the test expecting that students will miss fully half the questions. I reassure them that it is perfectly normal to feel as if they are probably missing a lot of questions. Go ahead and guess anyway.

I tell them that the test designers include questions from lower grades in the test and questions from higher grades. The test designers know which questions are which, but of course the students do not know. I tell them if they feel like they do not know a question, it is probably from a higher grade and not to worry about it. The test designers look at the answer sheet and can tell if the students correctly answered the questions from their own grade level. If they do, they will get at least 50. I tell them this does not mean 50 points, nor does it mean 50%. I tell them it is a different kind of scoring system because it is not a test that their own teacher (like me) wrote. With high school students, I discuss a little more statistics and the idea of percentiles.

This kind of explanation usually satisfies students, removes perplexity and frustration, and helps them do their best. If the teacher’s curricular philosophy and design is strong and the teacher is a skilled teacher, then there is no need to worry about the standardized tests. Simply teach, and the standardized test will take care of itself. If the curriculum is weak, teachers will feel a strong need to teach directly to the test. However, by all means, teach to your own tests.

Monday, June 1, 2009

Can the Top-Scoring State Beat International Scores?

How does the math covered in the highest-ranking American state stack up against that of a top-scoring international performer?

International comparison studies typically focus on the comparing the scores achieved by same-age students in different countries. Also typically, students from Asian countries tend to outperform US students over and over again. Each time a report like that comes out, just as predictably there will be an out-pouring of the same old tired excuses. Their students are different from our students. Their culture is homogeneous whereas ours is diverse. Their schools are allowed to teach whereas our schools must meet social, medical and nutritional needs. Their parents value education whereas our parents, not so much. On and on. The excuses act as a sedative to put society back to sleep. Okay, society says, there are understandable reasons for the differences in performance. The results are not really comparable. Apples to oranges. What a relief. So we stop thinking about it.

Could there be something more?

Sean Cavanaugh of Edweekreports:

A host of recent studies have examined how U.S. students’ mathematics skills compare against those of their foreign peers. Now, a new analysis probes a more precise question: How does the math covered in the highest-ranking American state stack up against that of a top-scoring international performer?


Let's repeat the question: How does the math covered in the highest-ranking American state stack up against that of a top-scoring international performer? It does not matter whether the results are comparable or not. No matter the reason our kids come out second rate, other kids are beating our kids in the worldwide competition. Remember, President Obama said that if we want our kids to out-compete the world, we must out-educate them.

So how does the math covered stack up?

A study released last week finds that elementary students in Hong Kong are exposed to more difficult and complex math than pupils in Massachusetts, an elite scorer on national and international exams. The analysis, published by the American Institutes for Research, in Washington, examines the math content of Hong Kong and Massachusetts by comparing the two jurisdictions’ standardized tests in 3rd grade math.


We're talking about third grade, part of the foundation of the rest of a child's academic career. The study did not look at scores on a specially designed test for international comparison purposes. The study did not look at the content of such a specially designed test. The study examined the respective jurisdiction's in-house test, the standardized test for Massachusetts and Hong Kong. Even more interesting, the study had no interest in the children's scores on these tests. The study studied the test content itself. And why Massachusetts?

Massachusetts is also a consistent elite-scorer on the primary U.S. domestic test, the National Assessment of Educational Progress.


What the study found is that the Hong Kong test emphasizes number and measurement concepts. The test also contains a larger percentage of constructed responses rather than chosen responses. The Hong Kong test questions were more complex, requiring the application of knowledge and non-routine, multi-step solutions over simple recall. From the foundations, children in Hong Kong are tested on higher-order thinking skills than American children, even “elite” American children.

Do Chinese teachers teach to the test?

(Steven Leinwand, one of the study's authors), said the authors chose to examine test content in Hong Kong and Massachusetts because the two jurisdictions' early-grades math curricula were relatively similar—and because state tests in the United States tend to guide math instruction.

American educators “pay attention to the tests,” he observed. “If you change the state tests, it’s a powerful lever for what goes on in the classroom.”


In the US, the favorite quick and dirty way to reform education is to redesign the tests. That's what Arizona did in the 1990's with their AIMS test. Arizona created high-stakes tests for fifth, eighth and eleventh grade, as if new tests automatically change educational philosophy and encourage innovation. Even honor students flunked these tests. The overwhelming response to high-stakes tests is to teach to the test, a response well-documented by No Child Left Behind. When a test reflects existing educational philosophy, there is no need for sample tests or practice materials.


Liping Ma has documented the emphasis Chinese teachers place on concept development over computational procedures. James Stigler reiterated many of the same points. Chinese math education, exemplified by Hong Kong, already valued conceptual understanding and the test reflects that value. The US, regardless of all the pretty talk in the media, values computational procedures and the Massachusetts test reflects that value.

How did Mr. Leinwand put it? “... state tests in the United States tend to guide math instruction.” That is the large part of the problem. We are suppose to test what we teach, not teach what we test. The US mistakenly thinks testing drives instruction.

The Uncomfortable Conclusion

Laying solid foundations in the early years matters.


Hong Kong’s use of more difficult and complex test items could be connected to a higher proportion of its test-takers, 40 percent, scoring at the “advanced” TIMSS level, than Massachusetts, at 22 percent. Just 10 percent of American students, on the whole, reached that level, the authors argue. In addition, research shows a “strong correlation” between nations’ math performance in early and later grades, they say.

Sunday, May 10, 2009

Why Standardized Testing Will Always Fail

The most basic characteristic of any test is validity, that is, whether the test actually tests what it purports to test. Everyone, from the “professionals” who write standardized tests, to the everyday classroom teacher putting together a five-point quiz, learns that a test that does not actually test what it claims to test is worthless. They all learned about validity in the colleges of education.

So John Pearson makes a great point when he observes that every test is a reading test.

TAKS is stressful enough to prepare for at the 3rd grade level, and our kids at least can get reading assistance on the math test! There has been a little bit of debate over exactly what that means, but at least it is specified that, on an individual basis, a student may ask to have a word or a question read aloud. This helps immensely, especially with a child who is a struggling reader and/or an English Language Learner.

However, after 3rd grade, the kids are completely on their own for every TAKS test -- excepting those kids with special modifications, of course. The vast majority of kids taking these tests every year cannot ask to have a word read, cannot ask for clarification on a question, cannot ask ANYTHING except a question about the directions, and the directions are usually "Pick the best answer."

So what it comes down to is that these kids are taking a series of reading tests. Some of them are ABOUT math or ABOUT science, but they don't strictly assess those subject areas as much as they assess whether or not the child can read the questions, some of which are highly complicated.


I knew a little boy in Japan who was completely bilingual in both Japanese and English, but who had attended only Japanese preschool and kindergarten. The first thing to understand about his situation is that the Japanese kindergarten ends near the end of March, so when he “graduated” from kindergarten, his parents decided to enroll him in an international school where instruction was conducted in English. The principal said the first grade teacher needed to access the boy's readiness.

On the appointed day in March, this boy sat down with a clearly unhappy first grade teacher. She did not want any new students entering her class so close to the end of the school year, especially one whose parents had the idea the child would go on to second grade after less than three months in first grade. The teacher asked a number of questions about fairy tales and a few addition problems and announced that the boy was “marginal.” She would allow him into her first grade class on the condition that the parents understood that in September he would very likely have to “repeat” the first grade. The parents accepted the condition.

In April, the school gave the annual Stanford 9 bubble tests. The first grade teacher made a copy of this boy's answer sheets to hand grade, because the score reports would not be available before the end of the school year. She needed ammunition for the parent-teacher conference she was sure she would need when she planned to tell the parents that yes, indeed, their son would have to repeat first grade.

To her utter astonishment, the boy had almost a perfect set of answer sheets. The score report, when it eventually arrived, placed the boy in the 99th percentile on every battery. Obviously he went to the second grade along with his class. Eventually the same boy graduated from an American university at age eighteen with a degree in chemistry.

So why did the teacher consider the boy marginal? Mostly because he did not know who Rumpleskilskin was. The boy could have told her all about Momotaro, a Japanese fairy tale character the teacher had never heard of, if only she had known to ask, except Momotaro was not included in the school's first grade curriculum anyway.

Imagine going to live in Russia for a year and taking a math class. After 3 months, you are given a math test in Russian, consisting of word problems and lengthy questions. I don't know about you, but I would fail that test miserably. Would ANYONE in their right mind think that that means I don't know math?? Or that that test accurately gauged my knowledge??


I was a teacher in that international school in Japan. I taught math and science to the middle-schoolers. Every year fully 50% of my students were non-native speakers of English. One year four of my students were non-English speakers who had transferred from the Japanese school just that year. Lucky for me I also speak Japanese. I was the only American teacher in the school who spoke Japanese. There were a few Japanese-speaking teacher's aides.

I made all kinds of accommodations to help my non-native English-speaking students. I paired each one with a native speaker for labs. I translated my instruction to Japanese on the fly on a regular basis. I adapted reading instruction techniques usually used in much lower grades to the science book as if the science book were a basal reader. I read words or whole questions from my tests for any student who asked. And for those four non-English speakers, I translated the whole test to Japanese. I did all these things because I knew what every tester should know, that is, the purpose of the tests. The purpose of my tests was to evaluate the student's mastery of my instruction with the corollary purpose of giving the students the best chance for success.

We may think the purpose of standardized is to evaluate individual student's knowledge, but in reality, the tests serve to rank students compared to the norming population, and then by extension, to rank the quality of the school relative to the norming population. The reality will always frustrate because the nature of norming means that half will be above the 50th and half will be below the 50th percentile when compared to the total population.

If some schools can attract an overabundance of topside students, obviously other schools will end up with an overabundance of bottomside students. Testing can, by design or not, perpetuate the inequality of educational opportunity and undermine any promising efforts of school reform.

So who would want to perpetuate inequality of educational opportunity? Sadly, dear parents and other adults, Lake Wobegone does not exist.

Wednesday, May 7, 2008

Why “Good” Schools Need “Bad” Schools

Good schools need bad schools. That is one reason education reform cannot work. Here-today-gone-tomorrow education fads give the appearance of constant effort, keeping researchers employed, giving administrators something to implement, and making busy teachers even busier. These fads, masquerading as reform efforts, deflect attention from the need to maintain bad” schools for the benefit of “good” schools.

Standardized tests drive this strange relationship. Most standardized tests are norm-referenced as opposed to criterion-referenced. Norm-referenced tests compare the test taker to the whole population of test takers. Criterion-referenced tests compare the test taker to a set of criteria.

Therefore, norm-referenced tests often express the score in terms of percentile. For example, if you score at the 85th percentile, it means you did better than 85 percent of the test takers. By definition, the 50th percentile means that half the test takers did better than the other half, and half the test takers did worse than the other half. Percentile deems the median to be the average. It does not matter how well a student learned or how well the teacher taught, half of the students are destined to be "below average" on a norm-referenced test (like nearly all standardized tests). Therefore the main problem with percentile is that the existence of a schools with above average performance necessitates the existence of schools with below average performance. It is impossible for all boats to rise.

A second problem with norm-referencing is the inherent competition. Academic achievement is an individual, personal achievement, or should be. All children have the potential to improve their academic achievement; all boats have the potential to rise. Norm-referencing undermines that potential. A third problem with norm-referencing is that it can actually disguise truly poor performance with a mask of apparent excellent performance. A bad score could be better than the 95% worse scores. A misleadingly high percentile could give the test taker a false sense of their performance. A fourth problem is that not all norm-referenced tests are expressed as percentiles. A good example is SAT tests which seem to be in terms of an actual score, but actually the scores are recalibrated periodically to ensure the mean and the median are the same.

Research has shown that time on task under the guidance of a skilled teacher is the major determinant of academic achievement. Every test reduces instructional time. It is ironic that so-called experts who should know better recommend more testing as the answer. Classroom teachers do not need tests to know how their students are doing. The system, however, does require some nominally “objective” measure of student performance. Maybe we can live with some tests; what we do not need is more tests, especially more norm-referenced tests.