Does the Rasch model fit the usability evaluation process?
In the paper to be presented on the CHI2008 (Florence, Italy) I propose the Rasch model for measuring the impact factors on usability evaluations. In particular the skills of inspectors and the difficulty of defects is at question.
The Rasch model is feared at least for two reasons:
- it’s an axiomatic theory and the data has to fulfill some strict properties to be Rasch scalable
- large sample sizes are reqired to get good estimates
Just at the moment I’m reanalysing two data sets of usability evaluations I found in the literature
A mixed picture appears. Well, the standard tests on goodness-of-fit give fair results. This suggests that the Rasch model holds for the data. But, a subsequent reliability analysis (split-half procedure) gives good results for items only. The much more interesting reliability of inspector skill parameters is very poor instead.
1 comment so far
Leave a reply




I’m fond of Rasch Measurement, and Software Useability and would very much like a copy of your paper. Your blog makes it sound as if you’ve traditionally used Birnbaum/Lord’s Item Response Theory approach; and I was also surprised to see you use split-half reliability when Rasch measurement (e.g. Winsteps) produces an entirely different parameter for both reliability and separation. I’m sure reading your paper would clear up some of my questions. Would you please post or email a copy of your paper?