Stephen van Vlack
Sookmyung Women`s University
Graduate School of TESOL
Approaches to English Grammar
Fall 2006
Week 12; Answers
Thornbury (1999) Chapter 9: How to test grammar, pp. 141-150.
1. What are some of the different types of tests you can with a grammar focus? Give specific examples.
Thornbury breaks tests with a grammatical focus down into the basic groups; discrete item tests and production tests. This is a highly simplified picture of the different type of tests, or rather assessment tools, we can use with a grammar focus. While this distinction is all well and good it relates only to tests whose primary focus is the assessment of grammar. It still makes the mistake of equating language competence with grammatical knowledge. Since grammar is everywhere there are all sorts of ways to get students to use it for all sorts of purposes. Thus, we can and should be inventive with the means we use to test grammar
Anything we have our students do will, in effect, involve grammar at various levels and to various degrees. We really need to get over the idea of gramma r as a distinct component which can and should be addresses separately for other components. Based on this we need holistic tests which measure all four skills. A simple example would be a test based on reading skills, but it is obvious that to read a passage effectively a test-taker would need to use not only grammatical skill but other skills as well, such as vocabulary, not to mention pragmatic skills. Thus, listening or reading are also tests of grammar and we can still use a rather traditional format for doing this. All we would need to do is to extend the range of questions to include ones which focus on grammar as a necessary mechanism for the processing of the text. We can, therefore, see that creating receptive tests with a familiar format, but with a grammar focus included, is a simple way of diversifying our means of testing our students. One of the main tricks or, often, stumbling blocks, in receptive tests is, however, that we need to go out and find good texts which we can use to test our students. They need to be valid in respect to our goals as well as students expectations which means it may be hard to use completely authentic texts for testing. Teachers may have to adapt pre-existing authentic texts for testing purposes based on reception.
Production tests, as we discussed our in class, are harder to manufacture than their receptive counterparts. While they might seem at first glance to be easier in making the questions (this is NOT true) they are very challenging to grade. As with all test we need validity on both sides. We need construct validity which means that each test is designed using the same set of principles (same rubric). The basic idea is that each tests should be similar but not the same. Thus the same person taking the test twice should come up with a similar score. Having done this people need to be trained to actually administer the test. If more than one person is giving the test and this is a distinct possibility as in productive tests, unless you have developed a highly sophisticated computer program like the MATE, you will only be able to test one person at a time. Thus, several testers will need to be tested in regards to how she/he should give the test to students. The most common type of productive test is the interview but there are some serious questions about their validity as a way of testing global competence. We might want to get our test-takers to act out specific scenarios and this will require a lot of planning and training on the part of the examiner which will probably need to act in the scenario as well, but without effecting the result. Then the test-taker`s performance must also be assessed. As in the receptive tests for which there are probably right or wrong answers the assessment of a productive test will always be a matter of judgement. Thus judges need to be trained with a particular scale so that they all come to the same conclusions. Rater or judges need to think the same way and there is no room for deviation from the predetermined scale. Based on all this it should be apparent that it is very hard for an individual or even small underfunded organization to create an effective productive test which is actually valid. This does not mean that we can`t use some of these ideas in the classroom, however.
One means of assessing student which is often overlooked is the idea of assessing student less formally in the classroom. This means then that our classroom is not merely used to prepare students informationally for a formal exam as is all too often done, but to use the classroom as a forum allowing students to practice real language use, albeit in an unreal setting. To do this we need to shift our focus away from a the idea of instilling in students the knowledge they will need to successfully complete a test, to actually using the target language. We assess each of their performances as best we can withing the confines of our attention. To do this we will need to have a grading system not unlike the ones that might be used in formal productive exams. The trick is to make sure we always assess in the same way. The teacher herself must remain consistent and systematic to keep the assessment valid.
2. What is face validity?
The idea underlying face validity is very simple. In order for a test to face validity it must appear fair to the students. This means, for one, that it should actually test what was presented in class. Nothing is more annoying to a student than a test which asked students to do things and no things which they could not prepare for. This doesn't always mean that you have to teach every single thing that's on the test, in introducing a test you need to specify what it is they should do to prepare for, particularly if you haven't prepare them entirely in the class. Obviously students are going to study hard for tests because it has high stakes. It would seem efficient to try to use the test actually get the students to study something and maybe you don't have time in class. But it must be doable. No tricks or the test will simply not have face validity. The test should also be constructed in a way that is clear for students to follow. In doing a test the student should always know exactly what it is that they're supposed to do. If the directions are unclear and as a result students do the wrong thing then it is the teacher's fault because the test lacks validity. Finally, students need to be able to come up with the right answers understanding with the right answer might be. This means that as students move through the test they should be able to self-assessment how well they are doing. All these things basely boil down to the fact that students feel if not good about the test at least at the test was fair.
3. How can tests be made to be reliable and/or valid?
Only tests that are reliable can be said to be valid. Unfortunately, reliability is a difficult level to attain and to be honest most tested are given are probably not reliable. One type of reliability revolves around the idea construct validity. The test first of all should be constructed so that it actually tests of what it's supposed to test. Strangers may seem this is not a given an often tests wind up doing something different than what they were actually designed for. For example, often language tests involving listening become memory tests, or reading tests are not necessarily tests of language but of scanning ability. Once more, the same tests given to different students should yield the same results. In the same vein same student taking the same test should wind up getting the same score or around the same score if the test has construct validity. Looking at this we can see that test design is extremely important.
In addition to design there is also very important area of scoring. Different people in scoring the same test should come up with the same scores. This is relatively easy on a discrete point test prided the test is designed well, but can pose a major and often insurmountable obstacle for more productive based tests. To best deal wi this a strict scale and procedure for assessing performance must be not only be established, but must also be strictly adhered to.
4. What is backwash and how can we use it?
Backwash flows in two different directions. First of all it describes how useful a test might be from the initial teaching perspective. In this respect, a test is said to have good backwash effects if by teaching for the test the students are also learning important and useful things as evidenced by research into language acquisition. This can be tied to the idea of validity but validity from a research perspective. A test should somehow relate to what we know about the language acquisition process. Therefore, it might actually be useful to teach for the test. Since we invariably teach for a test whether it is a good test or not based on acquisition criterion we can there fore say that a test should produce good backwash effects. Simply, by teaching for the test we should also be teaching students things which will help their language acquisition. Thus, a test that requires lots of specific strategies and skills related solely to that test or type of test will not show good backwash effects. In the other direction, a test is said to show good backwash effects when it is useful to use the test as a teaching tool on respect to going over the test after it has been given and assessed. So, by going over the test and the answers the students produced the students should be learning useful information. This is the other side of backwash which again relates to the validity of the design in relation to second language acquisition research.
Lewis (1997) Chapter 8: Classroom reports, pp. 143-176.
5. Which of the reports did you find most useful?
Of the six reports I think all of them report on aspects of teaching which we have, to some small extent at least, discussed on this class. Reports 3 and 4 deal with a similar idea, namely that of chunking which we already know is of great importance. It is equally clear that chunking can be dealt with effectively and meaningfully by turning to the lexical approach. So these are good but they are not necessarily surprising. I like report 2 the best because it gives us a clear idea of how we can concretely deal with a problem by using the lexical approach. It is particularly useful because it shows how we can create materials which are interesting and will aid our student sin language acquisition while using the lexical approach. For the same reason of practicality I also like report 6 which describes the use of lexical notebooks. For some time in the class we have advocated the use of lexical notebooks remarking on the initial difficulty of getting the students to understand the underlying rationale as well as organizing the notebooks. This gives a better idea of how we might go about doing this.
6. Which did you find least useful?
The least useful report is maybe report one for it generality.
7. What is one conclusion you can draw from all of these reports?
The main conclusion we can draw form these reports is that the lexical approach is indeed teachable and really in the way Lewis advocates throughout the book; as an enhancer. That is, we an use the lexical approach as a way of enhancing our regular teaching. Based on some of these reports it is really is not so hard to do.