36-720: The Rasch Model

Size: px

Start display at page:

Download "36-720: The Rasch Model"

Brook Boone
5 years ago
Views:

1 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more reading, see: Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests. University of Chicago. DeBoeck, P. & Wilson, M. (2004). Explanatory Item Response Models. NY: Springer. van der Linden R. J. & Hambleton, R. K. (1997). Handbook of Modern Item Response Theory. NY: Springer October 15, 2007

2 Multivariate Binary Response Data Ubiquitous in Education (standardized testing); Psychology (positive and negative responses to stimuli); Social Science & Marketing (opinion/attitude/preference data); and other areas. For specificity, we use the language of educational testing: For student i and question j on a particular exam, define 1, if student i got question j correct y i j = 0, else say, for i=1,..., N students and,..., J questions October 15, 2007

3 Viewing the data as a contingency table For a test of J questions, we construct a J-way table, with each dimension of the table corresponding to a single question, with two levels (0=wrong; 1= right): {n y : as y ranges over all 2 J possible patterns (y 1,...,y J )} is a 2 J table (J-way table with two levels each way ). Even if N= y n y is large, the 2 J table quickly becomes sparse: for example, with N= 100 and only J= 8 questions, there must be over 100 sampling zeros in the table (why??). Thus, the usual hierarchical log-linear models for the 2 J table won t be of much use, because sampling zeros will frustrate many model fit and model comparison efforts. However, there are log-linear models that are useful with{n y } and we will return to this representation later October 15, 2007

4 Viewing the data as two-way ANOVA data Instead of considering the table of counts n y we may consider the rectangular array y 11 y 12 y 1J Y= y 21 y 22 y 2J y N1 y N2 y NJ The i th row corresponds to the correct & incorrect answers given by examinee i to all J questions, and The j th column corresponds to the correct & incorrect answers given by all N examinees to the j th question. A logit analogue to the two-way additive ANOVA model for this array would be p i j log =θ i β j (1) 1 p i j where p i j = P[y i j = 1 θ i,β j ].θ i is the row effect andβ j is the column effect October 15, 2007

5 In the model log p i j 1 p i j =θ i β j, Rasch Model Asθ i increases so does p i j :θ i represents examinee i s proficiency, regardless of question. Asβ j increases, p i j decreases:β j represents the question s difficulty a. The model in (1) is called the Rasch Model (after Rasch s 1960 monograph); in logistic form it is written p i j = P[y i j = 1 θ i,β j ]= exp{θ i β j } 1+exp{θ i β j } (2) and is an example of an item response theory (IRT) model. ( item = survey or test question ). a The choice of sign here, i.e.θ i β j instead ofθ i +β j, is just a convention, but leads to this nice interpretation forβ j October 15, 2007

6 The likelihood for the i th examinee is a product of Bernoulli likelihoods for each y i j : P[y i1,...,y ij θ i ;β 1,...,β J ]= p y i j i j (1 p i j) y i j= exp{y i j (θ i β j )} 1+exp{θ i β j } We could formulate a joint likelihood for all examinees (and hence the entire arrayyabove) as (3) P[Y θ 1,...,θ N ;β 1,...,β J ]= N i=1 exp{y i j (θ i β j )} 1+exp{θ i β j } (4) and maximize overθ s andβ s but it is well-known a that this will result in inconsistent estimates as N increases, since the number ofθ i parameters also increases. a E.g. Haberman, S.J. (1977). Maximum likelihood estimates in exponential response models., The Annals of Statistics, 5, October 15, 2007

7 Rasch Marginal Likelihood as a GLMM A way around this is to think ofθ i as a random effect, so that the likelihood for one examinee is really a mixture over the random effect, P[y i1,...,y ij β 1,...,β J ;σ]= exp{y i j (θ i β j )} 1+exp{θ i β j } f (θ i σ) dθ i (5) and the joint likelihood for all examinees is P[Y β 1,...,β J ;σ]= N i=1 exp{y i j (θ i β j )} 1+exp{θ i β j } f (θ i σ) dθ i (6) Often f (θ σ) is taken to be a normal density with mean 0 and variance σ 2 but in fact any parametric family f (θ σ) would do. This is essentially the likelihood that is maximized when we fit the Rasch model as a GLMM withlmer() in R (or other software) October 15, 2007

8 One can use (6) in several different ways, e.g.: MLE s ˆβ j and ˆσ are useful in calibrating how easy or difficult the question are. For fixed J as N grows, the ˆβ j s and ˆσ are consistent and efficient estimators of theβ j s andσ. Given ˆβ j s and ˆσ we can produce predictors ˆθ i ofθ i s (e.g. conditional MLE s, empirical Bayes posterior modes, etc.), e.g. to rank examinees, compare examinees performance on different tests (given the right experimental design), etc. Fully Bayesian versions could be obtained by assigning priors to the β j s and toσ, and obtain a joint posterior distribution forθ 1,...,θ N, β 1,...,β J,σ, providing similar information to the MLE s and predictors above October 15, 2007

9 Rasch Marginal Likelihood as a Log-Linear Model We can view the probability p y = P[y 1,...,y J β 1,...,β J ;σ] in equation (5) as a cell probability in a multinomial model for the 2 J table n y. This turns out to be a certain log-linear model: p(y 1,...,y J ) = = = exp{y j (θ β j )} 1+exp{θ β j } f (θ σ)dθ exp{ β j y j } exp{ β j y j } exp{y j θ} 1+exp{θ β j } f (θ σ)dθ exp{θy + } J 1+exp{θ β j} f (θ σ)dθ Therefore, log p(y 1,...,y J )= J β jy j + J k=1 γ ki {y+ =k} October 15, 2007

10 To maintain the hierarchy principal, we incorporate an intercept term, writing J J log p y =α β j y j + γ k 1 {y+ =k} k=0 ( ) where we define y + = J y j, and 1 {y+ =k} is a dummy variable that equals 1 when y + = k and equals 0 otherwise. Note that Theβ j in ( ) are exactly the item difficulties in the Rasch model; Theγ k can be written as: γ k = E[(e θ ) k y=(0, 0,...,0)] i.e. they are moments of a positive random variable. Theγ k s are constrained by theβ j s in a complicated way, but as a first approximation the model can be fit, ignoring these constraints, as a straightforward log-linear model. Cressie & Holland (1981, Pmka); Holland (1990; Pmka) October 15, 2007

11 If we match up the terms in the model ( ) log p y =α J β j y j + J γ k 1 {y+ =k} with the non-redundant u-terms in the usual hierarchical log-linear model log p y = u 0 + u j1 + u jk11 + u jkl u j1 j 2 j J we can see that j j<k k=0 j<k<l u 0 =αand u j1 = β j, j; u jk11 γ 2, j, k: the two-way interactions are symmetric; u jkl111 γ 3, j, k,l: the three-way interactions are symmetric; etc. etc., i.e. each set of s-way interactions is symmetric. For these reasons, ( ) is sometimes called the model of quasi-symmetry. The model of symmetry would also have symmetric main effects (all u j1 equal to each other); and is equivalent to asserting that the y j s are exchangeable random variables October 15, 2007

12 Example We return to the LSAT example that we used to illustrate GLMM fits of the Rasch model last time. We can directly compare estimates of the fixed effects,β j : > rasch.lmer <- lmer(y j-1 + (1 i),data=lsat, + family=binomial,method="laplace") > summary(rasch.lmer)@coefs Estimate Std. Error z value Pr(> z ) j e-98 j e-40 j e-04 j e-59 j e October 15, 2007

13 > rasch.glm <- glm(n.,data=lsat.table,family=poisson) > summary(rasch.glm)$coef Estimate Std. Error z value Pr(> z ) (Intercept) e-02 Y e-44 Y e-03 Y e-02 Y e-08 Y e-26 Yplus e-02 Yplus e-04 Yplus e-07 Yplus e October 15, 2007

14 How are these related? > + summary(rasch.glm)$coef[2:6,1]) > lm(summary(rasch.glm)$coef[2:6,1] + summary(rasch.lmer)@coefs[1:5,1]) Coefficients: (Intercept) summary(rasch.lmer)@coefs[1:5, 1] Almost perfectly: log p i j =θ i β j = a ( [θ i c)/a [β j c]/a ) 1 p i j summary(rasch.glm)$coef[2:6, 1] summary(rasch.lmer)@coefs[1:5, 1] The regression result above suggests a 1 and c=0.558, so that the random effects distribution implied by the log-linear fit has the same scale but is shifted down from the random effects distribution estimated by lmer. This is another change in parametrization that does not affect the fit October 15, 2007

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar