Implications of the

Implications of the

North Carolina Psychology Association

Report on Accountability Standards

released 2/2001

(1) Are the EOGs & EOCs valid?

(2) The Dubious Nature of Norm-Referenced Tests

(3) The "Thinking Skills" component of NC’s Standardized Tests. How NC rewards those with good "test-taking" skills.

(4) The (non) Effectiveness of Retention

(5) Additional Pressures Brought to Bear on Kindergarten and First-grade Students.

(6) The Narrowing Down of the Curriculum

(7) Consequences for Student Motivation

(1) Are the EOGs & EOCs valid?

(A)The end-of-grade tests (EOGs) have not been validated for use on individual students. The EOGs were only validated for use to assess the performance of individual schools or school districts. The only application where the result may be retention according to the NC Technical Report (a document which addresses the proper use of the EOGs) is the use of the eighth grade EOG (commonly known as the "competency test") as a requirement for graduation. Federal guidelines strongly suggest, if not outright demand, that separate validity studies be done if tests are to be used a particular purpose that they be expressly validated for that purpose and that states cannot rely on previous validity studies for new applications of the same instrument.

Why not? The report states: "Over 160,000 students participated in the 1992 field test. The EOG manual cites the relationship between teacher judgments of student’s achievement levels and their concurrent EOG scores as evidence of the test’s criterion-related validity." In other words, teachers indicated 60% of the time on the surveys that students were at Level III or IV, and 40% of the time that students were at Level I or II. Since roughly 60% of the students fell into the Level III/IV category and roughly 40% fell into the Level I/II category the state decided there was enough of a correlation between what the teachers had predicted and what the test measured. Therefore, according to the state, the test was indeed valid. This "group validity" glosses over the fact that thousands of students were "mis-predicted". In other words, in thousands of cases, the teacher’s prediction did not match the student’s performance on the test. Maybe the teacher was at fault, maybe the test was faulty, maybe both — we have no way of knowing. Yet it is on this shaky foundation that the state contends a student be designated "retained" until he or she can prove he or she is worthy of promotion — guilty until proven innocent.

(B) The reports asserts that part of the attempt to validate the test for use on groups expressly makes the test an improper instrument to assess an individual student. In 1992, during the field test and validation phase of the EOGs, teachers were asked to rank their students from Level I (the lowest) to Level IV (the highest). The subheading attached to Level II was "achieves at a basic level." It is reasonable to suppose that teachers attached the Level II/achieves at a basic level label to their average or only slightly-below-average performing students. Compare the 1992 phrase attached to Level II "achieves at a basic level" to the current subheading attached to Level II students: " Students performing at this level demonstrate inconsistent mastery of knowledge and skills in this subject area and are minimally prepared to be successful at the next grade level." Nine years makes a lot of difference. NCCDS contends most reasonable people would believe there is a substantial difference in the two subheadings. It would appear that retaining students who perform at Level II, even if the test were valid, would be committing an unjust act when one considers how the labels have shifted under current state guidelines. Beginning in the 5th grade this year and in grades 3 and 8 thereafter, all Level II and below students are subject to retention.

(2) The Dubious Nature of Norm-Referenced Tests

Most North Carolinians are either uninformed about the nature of the EOG and EOC tests in NC or they believe that the tests are "criterion-referenced." Criterion-referenced usually indicates the test will indicate specific mastery of well defined information and/or skills that is to be learned over a period of time. The EOGs and EOCs are not criterion-referenced they are norm-referenced. This practice has several consequences, almost all of them negative.

Norm-referencing means a student’s performance on a test is not measured on a set standard (the way it is when you take a driver’s license test). Rather, the student’s performance is compared to other students who have taken the same test. By this measurement standard, a student could perform at a mediocre level on the test but if most students have performed in a similar or worse fashion, then his percentiles would appear higher than they would on a criterion-referenced test. On the other hand, if most students cluster near the high-end and the student performs at a mediocre level, his percentile would appear lower than it would under a criterion-referenced test.

According to the Psychology Association Report, the EOG was constructed so that:

25% of the items are easy (meaning they can be answered by 70% of examinees)

50% of the items are at the medium level (meaning they can be answered by 50% - 60% of students)

25% are at the difficult level (meaning they can be answered by only 20% - 30% of the students)

Why norm-referenced tests are class-referenced tests

In short, one can theoretically make a "100" on a criterion-referenced test but not a norm-referenced test. More importantly, norm-referenced tests guarantee winners and guarantee losers. The "winners" might be incompetent or the "losers" might be just a shade under the "winners". Who knows? Nobody, really.

Since the test ordains that there must be losers, it is important to consider who is most likely to fall into the "loser" category since those are the students most likely to be retained. Furthermore, the test determines which schools will bear the negative label of "low-performing" or "maintaining status quo." The latter is a negative label in the ABCs game since no bonus is paid to the faculty and staff.

All counties now operate some sort of remediation program to get their Level I/II students to do better on the tests. It stands to reason that the wealthier counties are best able to financially provide optimum remediation, while poorer counties have the double burden of more students to remediate as well as less money to provide the services. The state has partially made up for this shortfall but it is unrealistic to expect that the state’s current efforts even begin to put poorer counties on a level playing field. This means that a wealthier county such as Guilford can and will do more to "bring up the scores" than a poorer county. On a norm-referenced test, the Guilford students will typically perform better than there less prepared counterparts. The NC Psychology Association Report concluded that "since test preparation materials are purchased locally, students in more wealthy school systems would seem to have a significant advantage over students in less wealthy districts."

(3) The "Thinking Skills" component of NC’s Standardized Tests. How NC rewards those with good "test-taking" skills.

The "thinking skills" component of the NC curriculum receives a scattershot approach — it is vague, thin, and confusing. Consequently, the teaching "thinking skills" in a NC classroom is an exercise in guesswork. Since the factual recall portion of the curriculum is more easily understood and taught, it generally receives more attention. In the portion of the NC Standard Course of Study that is entitled "Dimensions of Thinking" teachers are given the following golden nuggets of direction and inspiration from the state:

(1) "All students can become better thinkers."

(2) "Thinking is improved when the learner takes control of his/her thinking processes and skills."

(3)"The teaching of thinking should be deliberate and explicit....." (DPI, 1999, p. xi)

With such clear and inspiring direction as this, it’s hard to imagine where NC teachers could go wrong. Of course, for the students, this is no joke.

Question: So how does NC measure "thinking skills"? Answer: By using a "best answer" format.

On NC multiple-choice EOGs and EOCs, each answer is supposed to "appear plausible for someone who has not achieved mastery of the representative objective." The NC Psychology Report states that the "EOG stresses the ability to apply information in new and different ways rather than just mastery of learned information." On the surface, this seems fair enough. But when one confronts the likelihood that "low-ability students, who can acquire core academic skills, will not be able to demonstrate their mastery of those skills on the EOG" because of the "best-answer" format, it seems unfair. The hard-working student with lower-level development will be penalized. On the other hand, the student who just happens to have better "thinking-skills" (higher-level of development) and who may have performed very little work or actually learned much new content will be disproportionately rewarded because he is the more adept game player. Furthermore, his game-playing ability exists independently from the possible impact of classroom instruction. If the ABCs are to keep schools and teachers accountable, how can measuring a variable on which schools and teachers have such little impact translate into accountability? And most important to consider: is this an appropriate approach to sort students into stacks of "promote" and "retain?"

(4) The (non) Effectiveness of Retention

Before the Gateways became "live" in 2001, retention in NC was already on the rise. Retention has risen from 3.2% in 1992-1993 to 5.0% in 1999. The NC Psychology Association lists several possible reason students are retained along with the statistics for dropping out among retainees. Most compelling was the study that found that students feared retention only less than the possibility of becoming blind as well as the death of a parent. William Romey’s article that appeared in a 2000 issue of the Phi Delta Kappan says a lot: "Retaining a child who hasn’t passed a certain level at the end of June isn’t really retention at all. It is moving the child clear back to the beginning of the year he or she has failed rather than working with the individual child at his or her achievement level." Romey suggests focused practice in the areas where the deficits exist are more fruitful than a total and complete repeat of the year and doesn't stigmatize the child the way retention generally does. Even the state acknowledges the issue in a twisted fashion. Parents are urged to counsel their retained child and to explain to him that this is "not the end of the world."

(5) Additional Pressures Brought to Bear on Kindergarten and First-grade Students.

From the NC Psychology Association Report: " It does appear, however, that K — 2 students are being affected by the Student Accountability Standards through developmentally inappropriate instructional practices, downward pressure on the curriculum and early identification of those who might not "pass the test."" The report was written in February 2001. Last week (April 11, 2001), a bill was proposed mandating that students read and "enjoy reading" at grade level upon entering 2nd grade. If enacted, this more than likely means high-stakes testing for kindergarten and first grade students.

(6) The Narrowing Down of the Curriculum

The report highlights the concerns discovered by a UNC-Chapel Hill survey conducted in 1999 and reported in the Phi Delta Kappan. In that study "80% of the teachers stated that their students spent more than 20% of their instructional time practicing for the test." Many teachers report spending less time on non-tested subjects such as social studies, science, and health while tailoring their math, reading, and writing instruction to fit the requirements of the test. In Scarsdale, NY this month (April 2000) a large group of upper-middle class parents kept their middle-school students home the day the state tests were given to protest the dumbing down, and narrowing-down of the curriculum that occurred as a consequence of the standardized tests.

(7) Consequences for Student Motivation

From the NC Psychology Association Report: "Positive emotions, such as curiosity, generally enhance motivation and facilitate learning and performance. However, intense negative emotions (e.g. anxiety, panic, rage, and insecurity) and related thoughts (e.g. worrying about competence, ruminating about failure, fearing punishment, ridicule, or stigmatizing consequences) generally detract from motivation, interfere with learning, and contribute to low performance." In other words, once a students starts on a shame-based spiral, it is quite difficult to break the cycle. Telling a student she is a Level II or "you failed the competency test, again" is unlikely to restore the requisite confidence for a student to approach remedial instruction and have it be meaningful and helpful. The report concludes: "Emphasizing a student’s improvement over time, rather than comparing a student’s performance to other students, is likely to increase the student’s self-efficacy for learning."

John deville, April 2001.