TOUGH FOR STUDENTS BUT EASY FOR JUDGES OR VICE VERSA?!  AN EARLY EVALUATION OF ANGOFF ITEM ACCURACY IN THREE EXAMINATIONS

 

** Poster Award Nominee

 

Donna Beman, Tomlin J. Paul*, Joseph M. Branday, Lauriann Young, Elaine Williams, The University of the West Indies, Mona Campus, JAMAICA

 

Purpose

The modified Angoff-method was used for the first time in December 2006 at the University of the West Indies, Jamaica to determine pass/fail cut-points for Stage 1 medical examinations.   This paper seeks to examine the level of agreement between judges’ item-estimates and level of difficulty of items.

Methods

All item-estimates for three multiple choice examinations (250 items) were collected from the standard setting process.  The data were categorized into quartiles by level of item difficulty using the p-values. The difference between the p-values and item-estimates were measured. 

Results

Judges’ item-estimates approximated the p-values in the mid-quartiles especially in the moderately difficult quartile.  Item-estimates in the highest and lowest quartile of difficulty showed greater divergence with p-values.  

 

Item difficulty

Mean difference (S.D.)

 (p-value – item-estimates)

 

Exam A

Exam B

Exam C

Highest

-17.4  (8.1)

-20.1  (5.8)

-14.1  (8.2)

Moderately High

-1.9   (7.2)

2.5    (8.7)

5.4    (9.2)

Moderately Low

7.2   (9.4)

18.4  (13.1)

19.9  (6.8)

Lowest

26.6   (12.5)

31.6  (8.2)

32.5  (8.9)

Conclusion

Consistent with the literature on standard setting, judges in this introductory period of standard setting, tended to underestimate the difficulty of hard items and overestimate the difficulty of easy items.  Efforts should be made to improve judges’ appreciation of the possible ranges in item difficulty.