IV Criterion of a good language test
validity ? appeared with the development of the second generation in language tests ? Though being challenged by the third generation ? still remain very important in language testing studies.
Some think that they can not be separated.
Others consider them as two entirely distinct concepts..
Bachman believes both can be better understood by recognizing them as complementary aspects of a Common concern in measurement ----identifying, estimating and controlling the effects of factors that affect test scores. (Bachman: 160)
100 students ? A 100-item test ? On Wednesday afternoon and Thursday afternoon ? Test is excellently designed ? Conditions of administration almost identical ? No subjective judgment on the part of the scorers (though impossible) ? Carried out with perfect care ? No learning or forgetting taking place during one day interval
Would we expect the students to have got exactly the same scores? The answer: no!!!
This is inevitable. We must accept it.
What we have to do is:
construct, administer and score the tests in such a way that the scores of the two tests taken in different days are likely to be very similar to each other. ? Same student, with the same ability , at a different time. ? The more similar, the more reliable the test is.
is concerned with the question, “How much of an individual’s test performance is due to measurement errors, to factors other than the language ability we want to measure?” (Bachman: 160)
4.1.2 Factors that affect reliability
Length of test, more choices, more reliable, 2.Homogeneity（同质性）similar in dificulty, test form, number of items, coverage, test lay out and directions ? 3. Power of discrimination: the stronger, the more reliable.
Similar test conditions ( including health) ? 5. Grading method
4.1.3 How to make tests more reliable?
1). Have enough samples, but too long a test also decreases reliability. ? 2). Do not allow candidates too much freedom. ? 3). Write unambiguous items ? Where does the author direct the reader who is interested in non-standard dialects of English? ? Some answered: P3 ? But the expected answer: further reading section of the book.
Compare the following writing task:
Write a composition on tourism. ? B) write a composition on tourism in this country. ? C)write a composition on how we might develop the tourist industry in this country. ? D) discuss the following measures intended to increase the number of foreign tourists coming to this country:
more /better advertising and /or information (where? What form should it take?) ? II) Improve facilities(hotels, transportation, communication) ? III) training of personnel (guides hotel managers)
successive tasks impose more and more control over what is written.
The fourth task is likely to be much more reliable for a writing test
does the author direct the reader who is interested in non-standard dialects of English? ? Some answered: P3 ? But the expected answer: further reading section of the book.
4.1.3 How to make the test more reliable?
4). Provide clear and explicit instructions. 5). Ensure that tests are well laid out and perfectly legible. 6). Candidates should be familiar with forms and testing techniques. 7). Provide uniform and non-distracting conditions of administration.
4.1.3 How to make test more reliable
8). Use items that permit scoring which is as objective as possible. ? 9). Make comparison between candidates as direct as possible. (one topic rather than different ones) ? 10). Provide detailed criterion for scoring key. For
composition or oral test, representatives of different levels should be selected. Only when all scorers are agreed on the scores, should real scoring begin.
11). Train scorers ? 12. Identify candidates by number not name ? 13). Employ multiple, independent scorers, compare the two sets of scores and investigate discrepancies.
4.1.4 How to know if the test scores are reliable?(coefficient of reliability)
3 ways(Gui: 130 Shu:61 ) ? 1). Test-retest method( use the same test paper with interval between) risk: remember ? 2). Equivalent forms method (use two papers at two different times(2 weeks): advantage: avoid mechanical repetition. Risk: really equivalent. ? 3).split half method(use just one test paper): first half one time, second half the other. Or odd number one time, even number the other.
one test paper has high coefficient, it can be used as an equivalent for identifying if an other test is reliable.(Gui: 1986)
Questions for thought:
Look at your own instructional tests. Use the list of points in the chapter to say in what ways you could improve the reliability. ? 2. Zhou Shen: 43:2