TOUGH FOR STUDENTS BUT EASY FOR JUDGES OR VICE VERSA?! AN EARLY EVALUATION OF ANGOFF ITEM ACCURACY IN THREE EXAMINATIONS
** Poster Award Nominee
Donna Beman, Tomlin J. Paul*, Joseph M. Branday, Lauriann Young, Elaine Williams, The University of the West Indies, Mona Campus, JAMAICA
Purpose
The modified Angoff-method was used for the first time in December 2006 at the University of the West Indies, Jamaica to determine pass/fail cut-points for Stage 1 medical examinations. This paper seeks to examine the level of agreement between judges’ item-estimates and level of difficulty of items.
Methods
All item-estimates for three multiple choice examinations (250 items) were collected from the standard setting process. The data were categorized into quartiles by level of item difficulty using the p-values. The difference between the p-values and item-estimates were measured.
Results
Judges’ item-estimates approximated the p-values in the mid-quartiles especially in the moderately difficult quartile. Item-estimates in the highest and lowest quartile of difficulty showed greater divergence with p-values.
|
Item difficulty |
Mean difference (S.D.) (p-value – item-estimates)
|
||
|
Exam A |
Exam B |
Exam C |
|
|
Highest |
-17.4 (8.1) |
-20.1 (5.8) |
-14.1 (8.2) |
|
Moderately High |
-1.9 (7.2) |
2.5 (8.7) |
5.4 (9.2) |
|
Moderately Low |
7.2 (9.4) |
18.4 (13.1) |
19.9 (6.8) |
|
Lowest |
26.6 (12.5) |
31.6 (8.2) |
32.5 (8.9) |
Conclusion
Consistent with the literature on standard setting, judges in this introductory period of standard setting, tended to underestimate the difficulty of hard items and overestimate the difficulty of easy items. Efforts should be made to improve judges’ appreciation of the possible ranges in item difficulty.