Lesson 7: Item response theory models (part 2)

Similar documents
Comparison between conditional and marginal maximum likelihood for a class of item response models

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Basic IRT Concepts, Models, and Assumptions

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Monte Carlo Simulations for Rasch Model Tests

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory

Introduction To Confirmatory Factor Analysis and Item Response Theory

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS. Sherwin G. Toribio. A Dissertation

PIRLS 2016 Achievement Scaling Methodology 1

Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment. Jingyu Liu

Statistical and psychometric methods for measurement: Scale development and validation

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT

36-720: The Rasch Model

Likelihood and Fairness in Multidimensional Item Response Theory

Development and Calibration of an Item Response Model. that Incorporates Response Time

Paradoxical Results in Multidimensional Item Response Theory

Bayesian Nonparametric Rasch Modeling: Methods and Software

Conditional maximum likelihood estimation in polytomous Rasch models using SAS

USING BAYESIAN TECHNIQUES WITH ITEM RESPONSE THEORY TO ANALYZE MATHEMATICS TESTS. by MARY MAXWELL

Pairwise Parameter Estimation in Rasch Models

Multidimensional item response theory observed score equating methods for mixed-format tests

Contents. 3 Evaluating Manifest Monotonicity Using Bayes Factors Introduction... 44

Ensemble Rasch Models

An Overview of Item Response Theory. Michael C. Edwards, PhD

The Discriminating Power of Items That Measure More Than One Dimension

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov

Computerized Adaptive Testing With Equated Number-Correct Scoring

AN INVESTIGATION OF INVARIANCE PROPERTIES OF ONE, TWO AND THREE PARAMETER LOGISTIC ITEM RESPONSE THEORY MODELS

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Lesson 6: Reliability

ML estimation: Random-intercepts logistic model. and z

IRT Model Selection Methods for Polytomous Items

Walkthrough for Illustrations. Illustration 1

Latent Trait Measurement Models for Binary Responses: IRT and IFA

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Application of Item Response Theory Models for Intensive Longitudinal Data

Presented by. Philip Holmes-Smith School Research Evaluation and Measurement Services

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.

LSAC RESEARCH REPORT SERIES. Law School Admission Council Research Report March 2008

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis

Generalized Linear Models for Non-Normal Data

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

Contributions to latent variable modeling in educational measurement Zwitser, R.J.

Item Response Theory (IRT) Analysis of Item Sets

Local response dependence and the Rasch factor model

Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments

The Difficulty of Test Items That Measure More Than One Ability

arxiv: v1 [math.st] 7 Jan 2015

Item Response Theory (IRT) an introduction. Norman Verhelst Eurometrics Tiel, The Netherlands

MiscPsycho An R Package for Miscellaneous Psychometric Analyses. Harold C. Doran American Institutes for Research (AIR)

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data

Latent Trait Reliability

The application and empirical comparison of item. parameters of Classical Test Theory and Partial Credit. Model of Rasch in performance assessments

NESTED LOGIT MODELS FOR MULTIPLE-CHOICE ITEM RESPONSE DATA UNIVERSITY OF TEXAS AT AUSTIN UNIVERSITY OF WISCONSIN-MADISON

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Stat 315c: Transposable Data Rasch model and friends

IRT models and mixed models: Theory and lmer practice

What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015

A new family of asymmetric models for item response theory: A Skew-Normal IRT family

Florida State University Libraries

Equating Tests Under The Nominal Response Model Frank B. Baker

Multidimensional Linking for Tests with Mixed Item Types

A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Minimum Message Length Analysis of the Behrens Fisher Problem

IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model

Paradoxical Results in Multidimensional Item Response Theory

Old and new approaches for the analysis of categorical data in a SEM framework

Lecture 3 September 1

New Developments for Extended Rasch Modeling in R

Comparing IRT with Other Models

Paul Barrett

flexmirt R : Flexible Multilevel Multidimensional Item Analysis and Test Scoring

A Note on the Expectation-Maximization (EM) Algorithm

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond

Measurement Error in Nonparametric Item Response Curve Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation

Nonparametric Online Item Calibration

Preliminary Manual of the software program Multidimensional Item Response Theory (MIRT)

COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS

Unsupervised Learning

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models

Mixtures of Rasch Models

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools

BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES

Package equateirt. June 8, 2018

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm

Model Estimation Example

DIAGNOSTIC MEASUREMENT FROM A STANDARDIZED MATH ACHIEVEMENT TEST USING MULTIDIMENSIONAL LATENT TRAIT MODELS

LATENT VARIABLE SELECTION FOR MULTIDIMENSIONAL ITEM RESPONSE THEORY MODELS VIA L 1 REGULARIZATION

Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities

Choice of Anchor Test in Equating

Transcription:

Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of Education Faculty of Education, Charles University, Prague NMST570, November 20, 2018 Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 1/30

Outline 1 1. Review: IRT models 2 2. Parameter estimation 3 3. Further topics 4 5. Conclusion Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 2/30

Review: IRT models Framework for estimating latent traits (ability levels) θ by means of manifest (observable) variables (item responses) and appropriate psychometric (statistical) model Notes: Ability θ is often treated as random variable (but see further) Items: dichotomous, polytomous, multiple-choice,... IRT model: describes probability of (correct) answer as function of ability level and item parameters This function is called: Item response function (IRF) Item characteristic curve (ICC) Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 3/30

Review: Introduction to IRT models Use of IRT models To calibrate items (i.e. to estimate difficulty, discrimination, guessing,...) To assess respondents latent trait (ability, satisfaction, anxiety,...) To describe test properties (standard error, test information,...) Test linking and equating, computerized adaptive testing, etc. IRT model assumptions 1 Model definition (functional form, usually monotonic ICC) e.g. 2PL IRT model: P (Y ij = 1 θ i, a j, b j) = π ij = ea j (θ i b j ) 2 Unidimensionality of latent variable θ 3 Local independence (conditional independence) e.g. P (Y i1 = 1, Y i2 = 1 θ i, a j, b j) = π i1 π i2 e.g. P (Y i1 = 1, Y i2 = 0 θ i, a j, b j) = π i1 (1 π i2) 4 Invariance of parameters 5 Independence of respondents 1+e a j (θ i b j ) Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 4/30

Rasch Model π ij = P (Y ij = 1 θ i, b j ) = exp(θ i b j ) 1 + exp(θ i b j ) θ i ability of person i, for i = 1,..., I b j difficulty of item j (location of inflection point) for j = 1,..., J (1) Item Characteristic Curve (ICC) Note: Originally, Rasch model denoted as π ij = To get to (1), consider θ i = log(τ i ), and b j = log(ξ j ) for τ i > 0, ξ j > 0 τi τ i+ξ j. Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 5/30

Logistic vs. Probit model (Note on Scaling parameter D) Rasch model is sometimes defined as: π ij = P (Y ij = 1 θ i, b j ) = exp(d[θ i b j ]) 1 + exp(d[θ i b j ]) D = 1.702 is scaling parameter introduced in order to match logistic and probit metrics very closely (Lord and Novick, 1968) Note: Probit (normal-ogive) model: π ij = Φ(θ i b j ), where Φ(x) is a cumulative distribution function for the standard normal distribution. Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 6/30

Item-Person Map (Wright Map) IRT models allow us to put items and persons on the same scale Note: See an example of 32-item test of body height (van der Linden, 2017), compare to Figure 2.4 Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 7/30

1PL IRT Model π ij = P (Y ij = 1 θ i, a, b j ) = exp[a(θ i b j )] 1 + exp[a(θ i b j )] θ i ability of person i for i = 1,..., I b j difficulty of item j (location of inflection point) for j = 1,..., J a discrimination common for all items (slope at inflection point) Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 8/30

2PL IRT Model π ij = P (Y ij = 1 θ i, a j, b j ) = exp[a j(θ i b j )] 1 + exp[a j (θ i b j )] θ i ability of person i for i = 1,..., I b j difficulty of item j (location of inflection point) a j discrimination of item j (slope at inflection point) for j = 1,..., J Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 9/30

3PL IRT Model exp[a j (θ i b j )] π ij = P (Y ij = 1 θ i, a j, b j, c j ) = c j + (1 c j ) 1 + exp[a j (θ i b j )] θ i ability of person i for i = 1,..., I b j difficulty of item j (location of inflection point) a j discrimination of item j (slope at inflection point) c j pseudo-guessing parameter of item j (lower/left asymptote), j = 1,..., J Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 10/30

4PL IRT Model exp[a j (θ i b j )] π ij = P (Y ij = 1 θ i, a j, b j, c j, d j ) = c j + (d j c j ) 1 + exp[a j (θ i b j )] θ i ability of person i for i = 1,..., I b j difficulty of item j (location of inflection point) a j discrimination of item j (slope at inflection point) c j pseudo-guessing parameter of item j (lower/left asymptote) d j inattention parameter of item j (upper/right asymptote), for j = 1,..., J Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 11/30

Information Function exp[a j (θ b j )] P (θ, a j, b j, c j, d j ) = c j + (d j c j ) 1 + exp[a j (θ b j )], I j (θ, a j, b j, c j, d j ) = bp δθ = a exp[a j (θ b j )] j(d j c j ) {1 + exp[a j (θ b j )]} 2 Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 12/30

Test Information and Reliability I(θ) = j I j (θ, a j, b j, c j, d j ) Note: Standard error SE(ˆθ θ) = 1/ I(ˆθ θ) Reliability SE(ˆθ θ) = σ (1 r xx (ˆθ θ)) Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 13/30

Maximum Likelihood Estimation Once the data have been collected, we can ask: Which (item/person) parameters would most likely produce these results? Estimating ability parameters: Assume five items with known item parameters Assume response pattern 11000 Student with what ability is most likely to produce these responses? Estimating item parameters: Assume 20 students with known abilities θ 1,..., θ 20 Assume responses to the first item 11000011110101001110 Item with what difficulty b is most likely to lead to these student responses? Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 14/30

Estimating ability parameter θ Problem Assume five items (obeying Rasch model) with known item parameters b 1 = 1.90, b 2 = 0.60, b 3 = 0.25, b 4 = 0.30, b 5 = 0.45. Assume response pattern 11000. How likely is average student (θ = 0) to produce these responses? How likely is weaker student (θ = 1) to produce these responses? Which student is more likely to produce these responses? Solution 1 calculate probability for each response in the pattern 2 calculate probability of the response pattern use assumption of conditional independence: product of probabilities of individual responses in the pattern Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 15/30

Estimating ability parameter θ Solution P (Y 1 = 1 θ = 0) = e(0 ( 1.9)),... 1+e (0 ( 1.9)) P (Y = 11000 θ = 0) = e 1.9 e 1+e 0.6 1.9 1+e 0.6 ( ) 1 e0.25 1+e 0.25 = 0.87 0.65 0.44 0.57 0.61 = 0.086 ( ) ( ) 1 e 0.3 1+e 1 e 0.45 0.3 1+e = 0.45 P (Y = 11000 θ = 1) = 0.71 0.40 0.68 0.79 0.81 = 0.123 Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 16/30

Estimating ability parameter θ Problem Student with what ability θ is most likely to produce responses 11000? Solution 1 calculate probability for each response in the pattern (as function of θ) 2 calculate probability of the response pattern (as function of θ) this function is known as likelihood function L use assumption of conditional independence: P (11000 θ) = L(11000 θ) = p 1 p 2 (1 p 3) (1 p 4) (1 p 5) 3 find the maximum value of the likelihood function Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 17/30

Estimating ability parameter θ Problem Student with what ability θ is most likely to produce responses 11000? Solution P (Y = 11000 θ) = e θ+1.9 1+e θ+1.9 e θ+0.6 1+e θ+0.6 ( ) ( ) ( ) 1 eθ+0.25 1 eθ 0.3 1 eθ 0.45 1+e θ+0.25 1+e θ 0.3 1+e θ 0.45 For which θ is the likelihood the highest? Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 18/30

Estimating ability parameter θ Log-likelihood Reaches maximum for the same θ as likelihood Easier to handle log P (Y = 11000 θ) = log eθ+1.9 + log eθ+0.6 1+e ( ) ( θ+1.9 ) log 1 eθ 0.3 + log 1 eθ 0.45 1+e θ 0.3 1+e θ 0.45 ( + log 1+e θ+0.6 1 eθ+0.25 1+e θ+0.25 ) + Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 19/30

Maximum Likelihood Estimation - Technical details For which θ is the likelihood the highest? Empirical MLE method of brackets does not provide with standard error of estimate Newton-Rhapson looks for log L = 0 (zero derivative of log L) uses second derivative log L to find it quickly: θ new = θ old log L log L derivatives can be further used for estimation of item information and standard error Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 20/30

Estimation of item parameters Problem (Estimating item difficulty b) Assuming that person abilities are known, item with what difficulty b is most likely to produce student responses 110010011000? Solution 1 calculate probability of student response pattern (as function of b) this function is again known as likelihood function L use assumption of conditional independence 2 find the maximum value of the likelihood function Note: For 2PL models likelihood-function is 2-dimensional! Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 21/30

Three types of ML Estimates in IRT models Usually, both person and item parameters need to be estimated. Joint Maximum Likelihood Used in Winsteps Ping-pong between person and item MLE With increasing number of examinees, number of parameters to be estimated increases May lead to inconsistent, biased estimates Marginal Maximum Likelihood Used in IRTPRO, ltm, mirt Assumes prior ability distribution (usually N(0, 1)) Ability is integrated out to get ML estimates of item parameters Expected a posteriori estimates of abilities Conditional Maximum Likelihood Used in erm Only applicable in 1PL (Rasch) models, where: Total score is sufficient statistics for ability Percent correct is sufficient statistics for difficulty Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 22/30

Joint Maximum Likelihood Mathematical and technical details: L = P (Y θ, a, b) = I J i=1 j=1 logarithm simplifies the above expression to sum: lnl = I i=1 j=1 π yij ij (1 π ij) (1 yij) J y ij ln(π ij ) + (1 y ij ) ln(1 π ij ) maximization incorporates computation of partial derivatives indeterminacy of parameter estimates in the origin and unit person centering item centering Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 23/30

Marginal Maximum Likelihood Mathematical and technical details: marginal likelihood (θ is integrated out) L = P (Y ) = P (x θ, a, b) g(θ a, b)dθ g(θ a, b) is so called prior distribution (usually assumed N(0,1)) integration solved using Gauss-Hermite quadrature (numerical integration) Can be understood as weighted sum: at each theta interval, the likelihood of response pattern rectangle is weighted by that rectangle s probability of being observed L does not depend on θ and can be maximized with respect to a, b Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 24/30

Assessing Ability Levels Once the item parameters are known (estimated) Maximum likelihood estimator (MLE) Weighted likelihood Bayes model estimator (BME), maximum a posteriori (MAP) Expected a posteriori (EAP) Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 25/30

MLE - iterative process 1 Select set of starting values randomly or intelligently the closer the starting values are to the actual values the better 2 Maximize the likelihood - get new estimates 3 Check the stopping rule - stop if: maximal number of runs is reached likelihood does not change too much Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 26/30

Model selection Log-likelihood - the bigger the better AIC (Akaike information criterion), BIC (Bayesian information criterion) - the smaller the better LRT (likelihood ratio test): if significant (p < 0.05) - submodel is rejected, use model with more parameters 3PL model: possibly problems with local maxima, problems to distinguish between models Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 27/30

Item and Person Fit Assessment Ames & Penfield (2015) Comparing ICC of the fitted model to observed proportion of correct responses Detection of improbable response patterns Comparing number of respondents with given response pattern to what is expected by the model (X 2 test) Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 28/30

Further Topics Further models Polytomous IRT models (ordinal/nominal) Multidimensional IRT models Hierarchical IRT models, etc. Accounting for Differential item functioning, etc. Applications Test equating Computerized adaptive testing, etc. Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 29/30

Vocabulary Rasch model, 1PL, 2PL, 3PL, 4PL IRT models Item Characteristic Curve (ICC) Item Response Function (IRF) Item Information Function (IIF) Test Information Function (TIF) Likelihood function Parameter estimation: JML, CML, MML Model fit, Item fit, Person fit Patrícia Martinková NMST570, L7: IRT models (part 2) Nov 20, 2018 30/30

Thank you for your attention! www.cs.cas.cz/martinkova