The problem of saltatory cut-score: some issues and recommendations for applying the Angoff to test Item Banks
conference contributionposted on 26.05.2006, 09:51 by William Coscarelli, Andrew Barrett, John Kleeman, Sharon Shrock
A fundamental issue in criterion-referenced test (CRT) development is: What should the cut score be to determine mastery? The literature has suggested three types of strategies for answering this question: Informed Judgment, Contrasting Groups, and Conjectural Techniques. For a number of reasons, the Conjectural approaches are probably the most common solution to this problem; and within this class, the Angoff is probably the most commonly used technique for setting cut scores. The Angoff uses subject matter experts (SMEs) to review each item and assign a weight to the item based on the SME’s conjecture that a minimally competent performer would answer the item correctly. These weights—which are fundamentally different from a traditional difficulty index—are then summed to provide the initial recommendation for the cut score. As CRT test development has become more widespread the use of multiple forms of the same test has also become more common. The use of computerized test development tools allows for random selection of questions that would make the number of forms combinatorially large. And thus, a new problem is created. Theoretically each form of the same test could have a different cut-score. This bouncing score would be defensible from a statistical perspective, but might give organizations implementation challenges for political and perhaps legal reasons. In this paper we look at how the concept of using Angoff weights to determine a cut score for an assessment where questions are selected at random might work, and give an illustrative example to allow people to consider it in action. We are using a true data set from a certification test and will look at the differences in cut scores using three assumptions: 1)) random sampling from the data set, 2) a “random” sample that draws on extremes of the data set, and, 3) stratified random sampling of the set. We then conclude with suggestions for sampling from item banks based on the size of the bank and the criticality of the test.
- University Academic and Administrative Support
- Professional Development
- CAA Conference