Test Validity
This is a re-write of a post I did in 2016. I raise the issues again now because September is nigh, and thus the owners of high stakes exams are rubbing their hands in anticipation of all the lovely money that's going to come tumbling in - more than 500,000 people will pay approx. $250 each to take the IELTS test, for example.
Does a test accurately measure what it’s supposed to measure? If it does, it’s a valid test. Easy, right? But maybe a bit too easy. Cronbach and Meehl (1955) introduced the ‘trinitarian’ view of validity which was dominant until the 1990s. Validity was seen as comprising content validity
Large-scale tests are often used by the state or other authorities to ration limited resources and opportunities, and such tests are currently being used all over the world to achieve a wide range of political goals, including curbing immigration and promoting private education. Shohamy (2001) argues that “centralized systems” use externally imposed, standardized, one-shot, high-stakes tests to control educational systems by defining what kind of knowledge is prestigious. Glenn Fulcher goes further and suggests that we need to understand the political philosophies which lead to centralised or decentralised types of government, and their associated ways of using tests as policy tools. Fulcher argues that “depending on where a political philosophy stands on the cline between the state and the individual, we can identify the kind of government likely to be favoured, and the kind of society valued. It is my contention that it also explains (and predicts) the uses of tests that we are likely to find” (Fulcher, 2009, p. 5).
Fulcher defines “collectivist societies” as “those in which the identity, life, and value of the individual is determined by membership of the state and its institutions. Decisions are made to benefit the collective and its survival rather than its individual members.” In contrast, “modern individualism” starts from the claim that, as Locke put it, “men are by nature all free, equal, and independent”, that “no one can be subjected to the political power of another without his own consent”, and that there are limits upon the authority of the state, such that laws apply to all equally, that they protect the rights of individuals and that laws can only be made by the legislative who must be democratically elected. I’m not entirely happy with Fulcher’s use of these two “isms”, but at least they don’t equate simply with left- and right-wing politics, and, anyway, they can certainly be used to examine test use.
Collectivism and Testing
Fulcher argues that in societies that tend towards collectivism, the centralization of both educational systems and testing is a priority. Modern collectives use testing to control the educational system, to select and allocate individuals to roles or tasks that benefit the collective, and to ensure uniformity and standardization. While we might think immediately of countries like North Korea or China in this regard, Fulcher argues that established democracies are not immune from “neocollectivism”: we need look no further than the UK.
Examples of centrally controlled standards-based education systems, with a high level of control over teacher training and school learning, are not hard to find (Brindley, 2008). The clearest example is that of the United Kingdom, which has systematically introduced standards-based testing in an accountability framework that ensures total state control over the national curriculum and national tests, as well as teacher training; even educational staff are rewarded or disciplined based on national league tables (Mansell, 2007). (Fulcher, 2009., p.7).
Fulcher argues that that these hyper-accountability policies are pursued by the state in an attempt to improve performance in the global market place; “the educational system is reengineered to deliver the kinds of people who will serve the perceived needs of the economy” (Fulcher, 2009, p.7).
Fulcher goes on to give the Common European Framework of Reference (CEFR) as an example of neo-collectivism at the supranational level, claiming that the system is used to control language learning so as to deal with its weakened position in global markets. Fulcher claims that the CEFR is being used “as a tool for designing curricula, reporting both standards and outcomes on its scales, and for the recognition of language qualifications through linking test scores to levels on the CEFR scales.” He goes on
We now see stronger evidence for more intrusive collectivist policy emerging in calls for claims of linkage to the CEFR to be approved by a central body (Alderson, 2007), and the removal of the principle of subsidiarity from language education in Europe (Bonnet, 2007). If realized, these changes would lead to unaccountable centralized control of education and qualification recognition across the continent. (Fulcher, 2009, p.8).
Individualism and Testing
Enlightenment individualism claims “the right of each person to be free from control or oppression from a state that acquires too much power and begins to control the lives of citizens” (Fulcher, 2009, p. 9). Fulcher is quick to point out that “this is not a right-wing position” and that “attempts to summarily dismiss individualistic critiques of test use as right-wing reactionism by labelling them “Eurosceptic” (Alderson, 2007, p. 660) …fail to engage with the social consequences of test use and misuse” (p.10).
In societies that lean towards Fulcher’s individualistic political philosophy, the state has little say in what is taught, or how it’s taught, and the role of tests is to promote personal growth, or to provide individuals with new learning opportunities. Fulcher gives these examples of the uses of tests which are in keeping with individualism:
Recommended by LinkedIn
According to Fulcher, the general characteristics of this “individualistic paradigm” are:
Conclusion
The problem with Fulcher’s two “isms” is that the individualist political philosophy he champions seems utopian; in today’s neoliberal global capitalism, where are the examples of the individualistic paradigm in action? It’s instructive to note that the UK has adopted such a highly-centralised “neocollectivist” educational policy, but I can’t think of any country where the state has little say in what is taught, or how it’s taught, and the role of tests is to promote personal growth, or to provide individuals with new learning opportunities.
Nevertheless, I think Fulcher does help those of us who are fighting for change in countries where you can at least express your view and experiment without being thrown in jail. In such environments, with reference to ELT, I think there is a place for standardised large-scale tests as long as they’re carefully used within the restraints of Fulcher’s individualistic paradigm, i.e., when they’re used as an index of proficiency and are intended to give test takers the opportunity to demonstrate their mastery in a range of skills and abilities so as to gain access to further education, jobs and other opportunities. Standardised large-scale tests should not be used by the state or other authorities to carry out political objectives, and should not influence normal language classroom practice, although, in my opinion, there’s a legitimate place for well-defined exam preparation courses. Likewise, I think classroom assessment is fine when it is used to make decisions about learning and teaching which result in more efficacious classroom practice.
When considering standardised tests and classroom assessment, the fundamental distinction is the one Fulcher makes between the uses to which the two are put. As a result of these different uses, while standardized tests must be fair to all who take them, classroom assessment need not concern itself with fairness, but instead concentrate on further growth. While collaboration in a standardized test is labelled ‘cheating’, in the classroom it can be valued and praised. In standardized tests the score users are concerned with how meaningful the score is beyond the specific context that generated that score. Thus, score reliability (dependent on consistency of measurement, discrimination between test takers, the length of the test, and the homogeneity of what is tested) is of prime importance. But in a learning environment like the language classroom, we value divergent and conflicting opinion, and we often encourage it by dialogue and debate. “The only meaning we could ascribe to ‘reliability’ would be the extent to which the decisions we make for future growth are more appropriate than inappropriate” (Fulcher and Davidson, 2007, p.7).
P.S. I'm off on my hols now, so no talking till the bell goes.
References
See Fulcher, G. (2009) Test Use and Political Philosophy. Annual Review of Applied Linguistics 29, 3–20 for all references except:
Fulcher, G. and Davidson, F. (2007) Tests in Life and Learning: A deathly dialogue. Educational Philosophy and Theory, 40, 3. 407-417.
Fulcher’s “Test Use and Political Philosophy” can be downloaded here: