A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W. Glas, Department OMD, Faculty of Behavioral Science, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. e- mail: c.a.w.glas@gw.utwente.nl

Difficulty of examination subjects - 2 This Report is a companion to the article Comparing the difficulty of examination subjects with item response theory by Korobko, Glas, Bosker and Luyten (2008). It gives details on the MML estimation and testing procedure used. The Model Consider a response variable if studentresponsed in category ( on subject if studentdid not chose examination subject, (1) (whereis an arbitrary constant) and a choice variable if studentdid chose examination subject if studentdid not chose examination subject, (2) for students and examination subjects. The model for the observed responses is the multidimensional version of the generalized partial credit model (Muraki, 1992). The probability of a grade in category is given by (3) The model for the choice variables is single peaked: (4) where (5)

Difficulty of examination subjects - 3 and to guarantee that is positive. The model given by (4) is related to a special case of the graded response model by Samejima (1969, 1973). The graded response model pertains to a polytomously scored response variable, for instance, a response variable that assumes the values 0, 1 or 2. The response probabilities are given by with as defined by (5). So model (5) can be derived from the graded response model by noting that the responses and are extreme cases and collapsed to. MML estimates The purpose is to concurrently compute estimates of the item parameters and the mean and covariance matrix of the distribution of the person parameters, by maximizing a likelihood function that is marginalized with respect to these person parameters. So let be the vector of the estimands, that is, whereand are vectors of the item parameters of the response model given by (3), is a vector of the item parameters of the choice model given by (4) and (5), andand are a vector of the mean and a vector of the variances and covariances of. Usually, the model is identified by settingequal to zero, and fixing a number of elements in, so these parameters are not estimated but fixed. Only one ability distribution will be considered here; the generalization to multiple groups is straightforward. In general, necessary and sufficient conditions for the existence of estimates are not available. However, the most

Difficulty of examination subjects - 4 important necessary condition, which in practice covers most problems, is that the design is linked. A design is linked, if for every pair of items (subjects) and there exists a sequence of items such that for every pair of consecutive items in the sequence, say, there exists some personsuch that. The marginal log-likelihood function is given by where is a vector with elements and can either be the grade or the choice of the student. The probability of is given by where is the density of which assumed to follow a multivariate normal distribution with mean vector 0 and variance-covariance. The maximum of the loglikelihood function can be found by solving The first order derivatives of the log-likelihood function for MML estimation can be found using Fisher s identity (Louis, 1982). In the IRT framework, Fisher s identity is given by (6) where the expectation is with respect to the posterior expectation, and (see, for instance, Glas, 1992). Notice that the derivative is a sum of the logarithm of the probability the response pattern and the logarithm of the density of the student proficiency parameter. The power of Fisher s identity is that the derivatives are simply to derive, while the derivation of is a cumbersome enterprise. For instance, it is well know that the

Difficulty of examination subjects - 5 maximum likelihood estimate of a covariance matrix is obtained as Inserting this into (6) results in (7) where and the posterior density has a form The likelihood equations for the item parameters of the choice model, and are also easily found by using Fisher s identity. The probability of the choice pattern of studentgivenand the item parameters is where where and are given by Formula (5), and the logarithm of this function is

Difficulty of examination subjects - 6 Using the short-hand notation forandwe have with Let denote the Kronecker symbol, that is equal to one if and equal to zero if and. Then Substitution of this expression into (6) and dropping the short-hand notation we obtain the equations and for In the analyses presented in this article, these equations were solved simultaneously with the well-known MML equations for the item parameters of the multidimensional version of the GPCM and the MML equation for the covariance matrix given by (7).

Difficulty of examination subjects - 7 A test for model fit Most item fit statistics are based on splitting up the sample of respondents into subgroups with different proficiency distributions and evaluating whether the item response frequencies in these subgroups differ from their expected values. Orlando and Thissen (2000) point out that the splitting criteria should be directly observable (for instance, number-correct scores) rather than estimated (for instance, estimated proficiency parameters). Following this suggestion, we split up the sample of students using a splitter examination labeled!. Two subgroups are formed, one subgroup of students that did choose subject! (so ) and subgroup of students that did not choose subject!(so ). The test is based on the assumption that this criterion splits the sample up in two subgroups with different proficiency distributions. We compute the average grade on subjectof students in both subgroups as " and " whereis the maximum grade on examination. These average grades can be compared with their expected values given by and

Difficulty of examination subjects - 8 where is with respect to a#-variate posterior distribution, so (8) We will show that a Pearson-type fit-statistic bases on the squared differences between these observed and expected values can be used to evaluate whether the observed and expected response frequencies are acceptably close given the observed value on the choice variable of the splitter examination. Further, we will show that the statistic has an asymptotic$ distribution with one degree of freedom. Fit of IRT models can be evaluated using the Lagrange Multiplier (LM) test (Glas, 1998, 1999; Glas & Suarez-Falcon, 2003; te Marvelde, Glas, Van Landeghem, & Van Damme, 2006). The LM test can be used to test an IRT model against a more general alternative, which is an IRT model with additional parameters. evaluated using the MML-estimates of the special IRT model only. The LM statistic is The statistic is asymptotically chi-square distributed with degrees of freedom equal to the number of fixed parameters. With a proper choice of alternative model, the LM test becomes a test based on residuals, that is, a test based on differences between observed and expected frequencies. As the alternative model we choose (9) for. In the model under the null hypothesis, the additional parameters are equal to zero and the model is equivalent with the multidimensional GPCM given by (3).

Difficulty of examination subjects - 9 Following Glas (1999) it can be inferred that the first order derivative of the likelihood with respect to is given by % A test for the null-hypothesis can be based on the fit-statistic & % ' (10) where' is the variance matrix of%, which is the opposite of the second order derivatives of the likelihood function with respect to the -parameter. If the statistic & is evaluated using, that is, using the MML-estimates of the null-model, & has an asymptotic $ distribution with one degree of freedom (see Glas, 1999; te Marvelde, Glas, Van Landeghem, & Van Damme, 2006). References Glas, C.A.W. (1992). A Rasch model with a multivariate distribution of proficiency. In M. Wilson, (Ed.), Objective measurement: Theory into practice, Vol. 1, (pp.236-258), New Jersey, NJ: Ablex Publishing Corporation. Glas, C. A. W. (1998) Detection of differential item functioning using Lagrange multiplier tests. Statistica Sinica, 8, vol. 1. 647-667. Glas, C. A. W. (1999). Modification indices for the 2-pl and the nominal response model. Psychometrika, 64, 273-294. Glas, C. A. W., & Suarez-Falcon, J.C. (2003). A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model. Applied Psychological Measurement, 27, 87-106. Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44, 226-233. Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.

Difficulty of examination subjects - 10 Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50-64. Samejima, F. (1969). Estimation of latent proficiency using a pattern of graded scores. Psychometrika, Monograph Supplement, No. 17. Samejima, F. (1973). Homogeneous case of the continuous response model. Psychometrika, 38, 203-219. te Marvelde, J. M., Glas, C. A. W., Van Landeghem, G., & Van Damme, J. (2006). Application of multidimensional IRT models to longitudinal data. Educational and Psychological Measurement, 66, 5-34.