A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

Similar documents
Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model

LSAC RESEARCH REPORT SERIES. Law School Admission Council Research Report March 2008

Monte Carlo Simulations for Rasch Model Tests

PIRLS 2016 Achievement Scaling Methodology 1

Pairwise Parameter Estimation in Rasch Models

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

Comparison between conditional and marginal maximum likelihood for a class of item response models

Preliminary Manual of the software program Multidimensional Item Response Theory (MIRT)

Equating Tests Under The Nominal Response Model Frank B. Baker

Applied Psychological Measurement 2001; 25; 283

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

Local Dependence Diagnostics in IRT Modeling of Binary Data

What is an Ordinal Latent Trait Model?

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Log-linear multidimensional Rasch model for capture-recapture

Item Response Theory (IRT) Analysis of Item Sets

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Lesson 7: Item response theory models (part 2)

IRT Model Selection Methods for Polytomous Items

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Local response dependence and the Rasch factor model

Rater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT,

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT

Assessment of fit of item response theory models used in large scale educational survey assessments

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen

STAT 501 EXAM I NAME Spring 1999

A Use of the Information Function in Tailored Testing

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Polytomous Item Explanatory IRT Models with Random Item Effects: An Application to Carbon Cycle Assessment Data

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

DETECTION OF DIFFERENTIAL ITEM FUNCTIONING USING LAGRANGE MULTIPLIER TESTS

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales

A Note on a Tucker-Lewis-Index for Item Response Theory Models. Taehun Lee, Li Cai University of California, Los Angeles

Model Estimation Example

Estimating Integer Parameters in IRT Models for Polytomous Items

Comparing IRT with Other Models

APPENDICES TO Protest Movements and Citizen Discontent. Appendix A: Question Wordings

JORIS MULDER AND WIM J. VAN DER LINDEN

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

A TEST OF SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED PROPORTIONS. John A. Keats

Nonparametric tests for the Rasch model: explanation, development, and application of quasi-exact tests for small samples

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm

GENERALIZED LATENT TRAIT MODELS. 1. Introduction

Testing and Model Selection

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

Chapter 11. Scaling the PIRLS 2006 Reading Assessment Data Overview

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Stat 5101 Lecture Notes

Analysis of Variance and Co-variance. By Manza Ramesh

Development and Calibration of an Item Response Model. that Incorporates Response Time

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

A Nonlinear Mixed Model Framework for Item Response Theory

36-720: The Rasch Model

The Use of Quadratic Form Statistics of Residuals to Identify IRT Model Misfit in Marginal Subtables

An Introduction to Path Analysis

UCLA Department of Statistics Papers

A general class of latent variable models for ordinal manifest variables with covariate effects on the manifest and latent variables

Topic 21 Goodness of Fit

arxiv: v1 [stat.ap] 11 Aug 2014

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Multidimensional Linking for Tests with Mixed Item Types

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Fundamental Probability and Statistics

Logistic Regression. Continued Psy 524 Ainsworth

Joint Assessment of the Differential Item Functioning. and Latent Trait Dimensionality. of Students National Tests

Basic IRT Concepts, Models, and Assumptions

Dimensionality Assessment: Additional Methods

Biostat 2065 Analysis of Incomplete Data

Logistic Regression: Regression with a Binary Dependent Variable

Statistical and psychometric methods for measurement: Scale development and validation

Quality of Life Analysis. Goodness of Fit Test and Latent Distribution Estimation in the Mixed Rasch Model

New Developments for Extended Rasch Modeling in R

Ability Metric Transformations

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

An Introduction to Mplus and Path Analysis

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.

Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment. Jingyu Liu

Introduction To Confirmatory Factor Analysis and Item Response Theory

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

A Note on Item Restscore Association in Rasch Models

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

Correlation: Relationships between Variables

Can you tell the relationship between students SAT scores and their college grades?

Statistics 3858 : Contingency Tables

Estimating the Hausman test for multilevel Rasch model. Kingsley E. Agho School of Public health Faculty of Medicine The University of Sydney

Stat 5102 Final Exam May 14, 2015

Diagnostic Classification Models: Psychometric Issues and Statistical Challenges

Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class Models with Covariates

Item Response Theory (IRT) an introduction. Norman Verhelst Eurometrics Tiel, The Netherlands

Transcription:

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W. Glas, Department OMD, Faculty of Behavioral Science, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. e- mail: c.a.w.glas@gw.utwente.nl

Difficulty of examination subjects - 2 This Report is a companion to the article Comparing the difficulty of examination subjects with item response theory by Korobko, Glas, Bosker and Luyten (2008). It gives details on the MML estimation and testing procedure used. The Model Consider a response variable if studentresponsed in category ( on subject if studentdid not chose examination subject, (1) (whereis an arbitrary constant) and a choice variable if studentdid chose examination subject if studentdid not chose examination subject, (2) for students and examination subjects. The model for the observed responses is the multidimensional version of the generalized partial credit model (Muraki, 1992). The probability of a grade in category is given by (3) The model for the choice variables is single peaked: (4) where (5)

Difficulty of examination subjects - 3 and to guarantee that is positive. The model given by (4) is related to a special case of the graded response model by Samejima (1969, 1973). The graded response model pertains to a polytomously scored response variable, for instance, a response variable that assumes the values 0, 1 or 2. The response probabilities are given by with as defined by (5). So model (5) can be derived from the graded response model by noting that the responses and are extreme cases and collapsed to. MML estimates The purpose is to concurrently compute estimates of the item parameters and the mean and covariance matrix of the distribution of the person parameters, by maximizing a likelihood function that is marginalized with respect to these person parameters. So let be the vector of the estimands, that is, whereand are vectors of the item parameters of the response model given by (3), is a vector of the item parameters of the choice model given by (4) and (5), andand are a vector of the mean and a vector of the variances and covariances of. Usually, the model is identified by settingequal to zero, and fixing a number of elements in, so these parameters are not estimated but fixed. Only one ability distribution will be considered here; the generalization to multiple groups is straightforward. In general, necessary and sufficient conditions for the existence of estimates are not available. However, the most

Difficulty of examination subjects - 4 important necessary condition, which in practice covers most problems, is that the design is linked. A design is linked, if for every pair of items (subjects) and there exists a sequence of items such that for every pair of consecutive items in the sequence, say, there exists some personsuch that. The marginal log-likelihood function is given by where is a vector with elements and can either be the grade or the choice of the student. The probability of is given by where is the density of which assumed to follow a multivariate normal distribution with mean vector 0 and variance-covariance. The maximum of the loglikelihood function can be found by solving The first order derivatives of the log-likelihood function for MML estimation can be found using Fisher s identity (Louis, 1982). In the IRT framework, Fisher s identity is given by (6) where the expectation is with respect to the posterior expectation, and (see, for instance, Glas, 1992). Notice that the derivative is a sum of the logarithm of the probability the response pattern and the logarithm of the density of the student proficiency parameter. The power of Fisher s identity is that the derivatives are simply to derive, while the derivation of is a cumbersome enterprise. For instance, it is well know that the

Difficulty of examination subjects - 5 maximum likelihood estimate of a covariance matrix is obtained as Inserting this into (6) results in (7) where and the posterior density has a form The likelihood equations for the item parameters of the choice model, and are also easily found by using Fisher s identity. The probability of the choice pattern of studentgivenand the item parameters is where where and are given by Formula (5), and the logarithm of this function is

Difficulty of examination subjects - 6 Using the short-hand notation forandwe have with Let denote the Kronecker symbol, that is equal to one if and equal to zero if and. Then Substitution of this expression into (6) and dropping the short-hand notation we obtain the equations and for In the analyses presented in this article, these equations were solved simultaneously with the well-known MML equations for the item parameters of the multidimensional version of the GPCM and the MML equation for the covariance matrix given by (7).

Difficulty of examination subjects - 7 A test for model fit Most item fit statistics are based on splitting up the sample of respondents into subgroups with different proficiency distributions and evaluating whether the item response frequencies in these subgroups differ from their expected values. Orlando and Thissen (2000) point out that the splitting criteria should be directly observable (for instance, number-correct scores) rather than estimated (for instance, estimated proficiency parameters). Following this suggestion, we split up the sample of students using a splitter examination labeled!. Two subgroups are formed, one subgroup of students that did choose subject! (so ) and subgroup of students that did not choose subject!(so ). The test is based on the assumption that this criterion splits the sample up in two subgroups with different proficiency distributions. We compute the average grade on subjectof students in both subgroups as " and " whereis the maximum grade on examination. These average grades can be compared with their expected values given by and

Difficulty of examination subjects - 8 where is with respect to a#-variate posterior distribution, so (8) We will show that a Pearson-type fit-statistic bases on the squared differences between these observed and expected values can be used to evaluate whether the observed and expected response frequencies are acceptably close given the observed value on the choice variable of the splitter examination. Further, we will show that the statistic has an asymptotic$ distribution with one degree of freedom. Fit of IRT models can be evaluated using the Lagrange Multiplier (LM) test (Glas, 1998, 1999; Glas & Suarez-Falcon, 2003; te Marvelde, Glas, Van Landeghem, & Van Damme, 2006). The LM test can be used to test an IRT model against a more general alternative, which is an IRT model with additional parameters. evaluated using the MML-estimates of the special IRT model only. The LM statistic is The statistic is asymptotically chi-square distributed with degrees of freedom equal to the number of fixed parameters. With a proper choice of alternative model, the LM test becomes a test based on residuals, that is, a test based on differences between observed and expected frequencies. As the alternative model we choose (9) for. In the model under the null hypothesis, the additional parameters are equal to zero and the model is equivalent with the multidimensional GPCM given by (3).

Difficulty of examination subjects - 9 Following Glas (1999) it can be inferred that the first order derivative of the likelihood with respect to is given by % A test for the null-hypothesis can be based on the fit-statistic & % ' (10) where' is the variance matrix of%, which is the opposite of the second order derivatives of the likelihood function with respect to the -parameter. If the statistic & is evaluated using, that is, using the MML-estimates of the null-model, & has an asymptotic $ distribution with one degree of freedom (see Glas, 1999; te Marvelde, Glas, Van Landeghem, & Van Damme, 2006). References Glas, C.A.W. (1992). A Rasch model with a multivariate distribution of proficiency. In M. Wilson, (Ed.), Objective measurement: Theory into practice, Vol. 1, (pp.236-258), New Jersey, NJ: Ablex Publishing Corporation. Glas, C. A. W. (1998) Detection of differential item functioning using Lagrange multiplier tests. Statistica Sinica, 8, vol. 1. 647-667. Glas, C. A. W. (1999). Modification indices for the 2-pl and the nominal response model. Psychometrika, 64, 273-294. Glas, C. A. W., & Suarez-Falcon, J.C. (2003). A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model. Applied Psychological Measurement, 27, 87-106. Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44, 226-233. Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.

Difficulty of examination subjects - 10 Orlando, M., & Thissen, D. (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50-64. Samejima, F. (1969). Estimation of latent proficiency using a pattern of graded scores. Psychometrika, Monograph Supplement, No. 17. Samejima, F. (1973). Homogeneous case of the continuous response model. Psychometrika, 38, 203-219. te Marvelde, J. M., Glas, C. A. W., Van Landeghem, G., & Van Damme, J. (2006). Application of multidimensional IRT models to longitudinal data. Educational and Psychological Measurement, 66, 5-34.