Implications of the
North Carolina Psychology Association
Report on Accountability Standards
released 2/2001
(1) Are the EOGs &
EOCs valid?
(2) The Dubious Nature
of Norm-Referenced Tests
(3) The "Thinking
Skills" component of NC’s Standardized Tests. How NC rewards those with
good "test-taking" skills.
(4) The (non)
Effectiveness of Retention
(5) Additional Pressures
Brought to Bear on Kindergarten and First-grade Students.
(6) The Narrowing Down
of the Curriculum
(7) Consequences for
Student Motivation
(1) Are the EOGs &
EOCs valid?
(A)The end-of-grade tests (EOGs) have not
been validated for use on individual students. The EOGs were only validated for use to assess the
performance of individual schools or school districts. The only
application where the result may be retention according to the NC Technical
Report (a document which addresses the proper use of the EOGs) is the use of
the eighth grade EOG (commonly known as the "competency test") as a
requirement for graduation. Federal guidelines strongly suggest, if not
outright demand, that separate validity studies be done if tests are to be used
a particular purpose that they be expressly validated for that purpose and that
states cannot rely on previous validity studies for new applications of the
same instrument.
Why not? The report states: "Over 160,000 students
participated in the 1992 field test. The EOG manual cites the relationship
between teacher judgments of student’s achievement levels and their concurrent
EOG scores as evidence of the test’s criterion-related validity." In other
words, teachers indicated 60% of the time on the surveys that students were at
Level III or IV, and 40% of the time that students were at Level I or II. Since
roughly 60% of the students fell into the Level III/IV category and roughly 40%
fell into the Level I/II category the state decided there was enough of a
correlation between what the teachers had predicted and what the test measured.
Therefore, according to the state, the test was indeed valid. This "group
validity" glosses over the fact that thousands of students were
"mis-predicted". In other words, in thousands of cases, the
teacher’s prediction did not match the student’s performance on the test.
Maybe the teacher was at fault, maybe the test was faulty, maybe both — we have
no way of knowing. Yet it is on this shaky foundation that the state
contends a student be designated "retained" until he or she can prove
he or she is worthy of promotion — guilty until proven innocent.
(B) The reports asserts that part of the
attempt to validate the test for use on groups expressly makes the test an
improper instrument to assess an individual student. In 1992, during the field test and validation phase
of the EOGs, teachers were asked to rank their students from Level I (the
lowest) to Level IV (the highest). The subheading attached to Level II was
"achieves at a basic level." It is reasonable to suppose that
teachers attached the Level II/achieves at a basic level label to their average
or only slightly-below-average performing students. Compare the 1992 phrase
attached to Level II "achieves at a basic level" to the current
subheading attached to Level II students: " Students performing at this
level demonstrate inconsistent mastery of knowledge and skills in this subject
area and are minimally prepared to be successful at the next grade level."
Nine years makes a lot of difference. NCCDS contends most
reasonable people would believe there is a substantial difference in the two
subheadings. It would appear that retaining students who perform at Level II,
even if the test were valid, would be committing an unjust act when one
considers how the labels have shifted under current state guidelines. Beginning
in the 5th grade this year and in grades 3 and 8 thereafter, all Level II
and below students are subject to retention.
(2) The Dubious Nature
of Norm-Referenced Tests
Most North Carolinians are either uninformed
about the nature of the EOG and EOC tests in NC or they believe that the tests
are "criterion-referenced." Criterion-referenced usually indicates
the test will indicate specific mastery of well defined information and/or
skills that is to be learned over a period of time. The EOGs and EOCs are
not criterion-referenced they are norm-referenced. This practice has
several consequences, almost all of them negative.
Norm-referencing means a student’s
performance on a test is not measured on a set standard (the way it is when you
take a driver’s license test). Rather, the student’s performance is compared to
other students who have taken the same test. By this measurement standard, a
student could perform at a mediocre level on the test but if most students have
performed in a similar or worse fashion, then his percentiles would appear
higher than they would on a criterion-referenced test. On the other hand, if
most students cluster near the high-end and the student performs at a mediocre
level, his percentile would appear lower than it would under a criterion-referenced
test.
According to the Psychology Association
Report, the EOG was constructed so that:
25% of the items are easy (meaning they can
be answered by 70% of examinees)
50% of the items are at the medium level
(meaning they can be answered by 50% - 60% of students)
25% are at the difficult level (meaning they
can be answered by only 20% - 30% of the students)
Why norm-referenced
tests are class-referenced tests
In short, one can theoretically make a
"100" on a criterion-referenced test but not a norm-referenced test.
More importantly, norm-referenced tests guarantee winners and guarantee
losers. The "winners" might be incompetent or the
"losers" might be just a shade under the "winners". Who
knows? Nobody, really.
Since the test ordains that there must be
losers, it is important to consider who is most likely to fall into the
"loser" category since those are the students most likely to be
retained. Furthermore, the test determines which schools will bear the negative
label of "low-performing" or "maintaining status quo." The
latter is a negative label in the ABCs game since no bonus is paid to the
faculty and staff.
All counties now operate some sort of
remediation program to get their Level I/II students to do better on the tests.
It stands to reason that the wealthier counties are best able to financially
provide optimum remediation, while poorer counties have the double burden of
more students to remediate as well as less money to provide the services. The
state has partially made up for this shortfall but it is unrealistic to expect
that the state’s current efforts even begin to put poorer counties on a level
playing field. This means that a wealthier county such as Guilford can and will
do more to "bring up the scores" than a poorer county. On a norm-referenced
test, the Guilford students will typically perform better than there less
prepared counterparts. The NC Psychology Association Report concluded that
"since test preparation materials are purchased locally, students in more
wealthy school systems would seem to have a significant advantage over students
in less wealthy districts."
(3) The "Thinking
Skills" component of NC’s Standardized Tests. How NC rewards those with
good "test-taking" skills.
The "thinking skills" component of
the NC curriculum receives a scattershot approach — it is vague, thin, and
confusing. Consequently, the teaching "thinking skills" in a NC
classroom is an exercise in guesswork. Since the factual recall portion of the
curriculum is more easily understood and taught, it generally receives more
attention. In the portion of the NC Standard Course of Study that is entitled
"Dimensions of Thinking" teachers are given the following golden
nuggets of direction and inspiration from the state:
(1) "All students can become better
thinkers."
(2) "Thinking is improved when the
learner takes control of his/her thinking processes and skills."
(3)"The teaching of thinking should be
deliberate and explicit....." (DPI, 1999, p. xi)
With such clear and inspiring direction as
this, it’s hard to imagine where NC teachers could go wrong. Of course, for the
students, this is no joke.
Question: So how does NC measure
"thinking skills"? Answer: By using a "best answer" format.
On NC multiple-choice EOGs and EOCs, each
answer is supposed to "appear plausible for someone who has not achieved
mastery of the representative objective." The NC Psychology Report states
that the "EOG stresses the ability to apply information in new and different
ways rather than just mastery of learned information." On the surface,
this seems fair enough. But when one confronts the likelihood that
"low-ability students, who can acquire core academic skills, will not be
able to demonstrate their mastery of those skills on the EOG" because of
the "best-answer" format, it seems unfair. The hard-working
student with lower-level development will be penalized. On the other hand, the
student who just happens to have better "thinking-skills"
(higher-level of development) and who may have performed very little work or
actually learned much new content will be disproportionately rewarded because
he is the more adept game player. Furthermore, his game-playing ability
exists independently from the possible impact of classroom instruction.
If the ABCs are to keep schools and teachers accountable, how can measuring a
variable on which schools and teachers have such little impact translate into
accountability? And most important to consider: is this an appropriate
approach to sort students into stacks of "promote" and
"retain?"
(4) The (non)
Effectiveness of Retention
Before the Gateways became "live"
in 2001, retention in NC was already on the rise. Retention has risen from 3.2%
in 1992-1993 to 5.0% in 1999. The NC Psychology Association lists several
possible reason students are retained along with the statistics for dropping
out among retainees. Most compelling was the study that found that students
feared retention only less than the possibility of becoming blind as well as
the death of a parent. William Romey’s article that appeared in a 2000 issue of
the Phi Delta Kappan says a lot: "Retaining a child who hasn’t passed a
certain level at the end of June isn’t really retention at all. It is moving
the child clear back to the beginning of the year he or she has failed rather
than working with the individual child at his or her achievement level."
Romey suggests focused practice in the areas where the deficits exist are more
fruitful than a total and complete repeat of the year and doesn't stigmatize
the child the way retention generally does. Even the state
acknowledges the issue in a twisted fashion.
Parents are urged to counsel their retained child and to explain to him that
this is "not the end of the world."
(5) Additional Pressures
Brought to Bear on Kindergarten and First-grade Students.
From the NC Psychology Association
Report: " It does appear,
however, that K — 2 students are being affected by the Student Accountability
Standards through developmentally inappropriate instructional practices,
downward pressure on the curriculum and early identification of those who might
not "pass the test."" The report was written in February 2001.
Last week (April 11, 2001), a
bill was proposed mandating that students read and "enjoy reading" at grade level upon entering 2nd grade. If enacted, this more than likely
means high-stakes testing for kindergarten and first grade students.
(6) The Narrowing Down
of the Curriculum
The report highlights the concerns
discovered by a UNC-Chapel Hill survey conducted in 1999 and reported in the
Phi Delta Kappan. In that study "80% of the teachers stated that their
students spent more than 20% of their instructional time practicing for the
test." Many teachers report spending less time on non-tested subjects such
as social studies, science, and health while tailoring their math, reading, and
writing instruction to fit the requirements of the test. In Scarsdale, NY this
month (April 2000) a large group of upper-middle class parents kept their
middle-school students home the day the state tests were given to protest the
dumbing down, and narrowing-down of the curriculum that occurred as a
consequence of the standardized tests.
(7) Consequences for
Student Motivation
From the NC Psychology Association
Report: "Positive emotions,
such as curiosity, generally enhance motivation and facilitate learning and
performance. However, intense negative emotions (e.g. anxiety, panic, rage,
and insecurity) and related thoughts (e.g. worrying about competence,
ruminating about failure, fearing punishment, ridicule, or stigmatizing
consequences) generally detract from motivation, interfere with learning, and
contribute to low performance." In other words, once a students starts
on a shame-based spiral, it is quite difficult to break the cycle. Telling a
student she is a Level II or "you failed the competency test, again"
is unlikely to restore the requisite confidence for a student to approach
remedial instruction and have it be meaningful and helpful. The report concludes:
"Emphasizing a student’s improvement over time, rather than comparing a
student’s performance to other students, is likely to increase the student’s
self-efficacy for learning."
John deville, April 2001.