New Developments for Extended Rasch Modeling in R

Similar documents
Monte Carlo Simulations for Rasch Model Tests

Extended Rasch Modeling: The R Package erm

Nonparametric tests for the Rasch model: explanation, development, and application of quasi-exact tests for small samples

Quasi-Exact Tests for the dichotomous Rasch Model

Bayesian Nonparametric Rasch Modeling: Methods and Software

A PSYCHOPHYSICAL INTERPRETATION OF RASCH S PSYCHOMETRIC PRINCIPLE OF SPECIFIC OBJECTIVITY

Ensemble Rasch Models

Lesson 7: Item response theory models (part 2)

Conditional maximum likelihood estimation in polytomous Rasch models using SAS

Pairwise Parameter Estimation in Rasch Models

Score-Based Tests of Measurement Invariance with Respect to Continuous and Ordinal Variables

Basic IRT Concepts, Models, and Assumptions

Contents. 3 Evaluating Manifest Monotonicity Using Bayes Factors Introduction... 44

UCLA Department of Statistics Papers

Score-Based Tests of Measurement Invariance with Respect to Continuous and Ordinal Variables

Polytomous Item Explanatory IRT Models with Random Item Effects: An Application to Carbon Cycle Assessment Data

Extensions and Applications of Item Explanatory Models to Polytomous Data in Item Response Theory. Jinho Kim

Mixtures of Rasch Models

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

What is an Ordinal Latent Trait Model?

Validation of Cognitive Structures: A Structural Equation Modeling Approach

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

Links Between Binary and Multi-Category Logit Item Response Models and Quasi-Symmetric Loglinear Models

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

ADDITIVITY IN PSYCHOLOGICAL MEASUREMENT. Benjamin D. Wright MESA Psychometric Laboratory The Department of Education The University of Chicago

36-720: The Rasch Model

Preliminary Manual of the software program Multidimensional Item Response Theory (MIRT)

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Paul Barrett

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov

Local response dependence and the Rasch factor model

Examining Exams Using Rasch Models and Assessment of Measurement Invariance

Item Response Theory and Computerized Adaptive Testing

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

Item Response Theory (IRT) Analysis of Item Sets

Chapter 3 Models for polytomous data

Bayesian Methods for Testing Axioms of Measurement

Latent classes for preference data

Item Response Theory (IRT) an introduction. Norman Verhelst Eurometrics Tiel, The Netherlands

Examining item-position effects in large-scale assessment using the Linear Logistic Test Model

UNIDIMENSIONALITY IN THE RASCH MODEL: HOW TO DETECT AND INTERPRET. E. Brentari, S. Golia 1. INTRODUCTION

Contributions to latent variable modeling in educational measurement Zwitser, R.J.

Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments

IRT models and mixed models: Theory and lmer practice

The Covariate-Adjusted Frequency Plot for the Rasch Poisson Counts Model

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

Comparison of parametric and nonparametric item response techniques in determining differential item functioning in polytomous scale

Applied Psychological Measurement 2001; 25; 283

The Rasch Model, Additive Conjoint Measurement, and New Models of Probabilistic Measurement Theory

Identifying and accounting for outliers and extreme response patterns in latent variable modelling

UCLA Department of Statistics Papers

The application and empirical comparison of item. parameters of Classical Test Theory and Partial Credit. Model of Rasch in performance assessments

Statistical and psychometric methods for measurement: Scale development and validation

How to Measure the Objectivity of a Test

Comparison between conditional and marginal maximum likelihood for a class of item response models

Balancing simple models and complex reality

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Journal of Statistical Software

A Note on Item Restscore Association in Rasch Models

The SAS Macro-Program %AnaQol to Estimate the Parameters of Item Responses Theory Models

An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond

Stat 5101 Lecture Notes

PAIRED COMPARISONS MODELS AND APPLICATIONS. Regina Dittrich Reinhold Hatzinger Walter Katzenbeisser

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Bayesian Nonparametrics for Speech and Signal Processing

Item-Focussed Trees for the Detection of Differential Item Functioning in Partial Credit Models

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools

A Goodness-of-Fit Measure for the Mokken Double Monotonicity Model that Takes into Account the Size of Deviations

Quality of Life Analysis. Goodness of Fit Test and Latent Distribution Estimation in the Mixed Rasch Model

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore

Bayesian Nonparametrics

Item Response Theory in R

Rater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT,

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

Likelihood and Fairness in Multidimensional Item Response Theory

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

Walkthrough for Illustrations. Illustration 1

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Paradoxical Results in Multidimensional Item Response Theory

Subject CS1 Actuarial Statistics 1 Core Principles

Optimal Design for the Rasch Poisson-Gamma Model

arxiv: v1 [stat.ap] 11 Aug 2014

Using SPSS for One Way Analysis of Variance

Journal of Statistical Software

ALGEBRA I - Pacing Guide Glencoe Algebra 1 North Carolina Edition

A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model

APPENDICES TO Protest Movements and Citizen Discontent. Appendix A: Question Wordings

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

PAIRED COMPARISONS MODELS AND APPLICATIONS. Regina Dittrich Reinhold Hatzinger Walter Katzenbeisser

University of Groningen. Mokken scale analysis van Schuur, Wijbrandt H. Published in: Political Analysis. DOI: /pan/mpg002

ANALYSIS OF ITEM DIFFICULTY AND CHANGE IN MATHEMATICAL AHCIEVEMENT FROM 6 TH TO 8 TH GRADE S LONGITUDINAL DATA

PIRLS 2016 Achievement Scaling Methodology 1

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Transcription:

New Developments for Extended Rasch Modeling in R Patrick Mair, Reinhold Hatzinger Institute for Statistics and Mathematics WU Vienna University of Economics and Business

Content Rasch models: Theory, extensions. erm package: Implementation structure. Package features. Recent developments. Goodness-of-fit: Nonparametric tests using the RaschSampler package. Use case: Math exams at WU.

Item Response Theory (IRT) IRT is a branch of Psychometrics that focuses on the probabilistic modeling of item responses. The aim is to measure a underlying latent construct. Estimation of item difficulty parameters. Estimation of person ability parameters. R packages: erm (Mair & Hatzinger, 2007), ltm (Rizopoulos, 2006), mokken (van der Ark, 2007), etc. A special, restrictive IRT model is the Rasch model (Rasch, 1960).

Rasch Models: Georg Rasch (1901 1980) Danish Mathematician Philosopher Student: Erling B. Andersen (Statistician) Core publications: Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Danish Institute for Educational Research. Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, IV, pp. 321 334. Berkeley. Rasch, G. (1977). On Specific Objectivity: An attempt at formalizing the request for generality and validity of scientific statements. The Danish Yearbook of Philosophy, 14, 58 93.

Rasch Model: Formal Representation Georg Rasch (1952): Let X be a binary n k data matrix (Rasch, 1960): P (X vi = 1) = exp(θ v β i ) 1 + exp(θ v β i ) with β i (i = 1,..., k) item difficulty parameter, θ v (v = 1,..., n) as person ability (interval scale).

Properties of Rasch Models Unidimensionality: Only ONE latent construct is being measured. Local independence: Conditional independence of the item responses. Logistic, parallel item characteristic curves (ICC): Formal restrictions, logistic curves are not allowed to cross. Sufficiency of the raw scores: Margins (sum scores) contain the whole information. From the last assumption it follows the epistemological theory of specific objectivity (Rasch, 1977) which implies subgroup invariance of the parameters, sample independence, etc.

Extended Rasch Models Extension to polytomous items (Rasch, 1961; Andersen, 1995) with h = 0,..., m i item categories: P (X vi = h) = exp(φ hθ v + β ih ) m i l=0 exp(φ lθ v + β il ). with φ h as scoring (φ h = h; Andersen, 1977). Linear decomposition of the item-category parameters (Fischer, 1973): β ih = p j=1 w ihj η j. with W as design matrix with p columns (p < k).

Model Hierarchy LPCM PCM LRSM RSM LLTM RM

Implementation Structure

Some erm features and recent developments Missing values are allowed. Design matrix approach (basic parameters): β = Wη. ML-based person parameter estimation. Parametric and nonparametric goodness-of-fit tests. Some utility functions for data simulations. Plots: ICC-plots, goodness-of-fit plots (sample split), person-item maps, pathway maps.

Goodness-of-Fit in erm itemfit, personfit: infit and outfit statistics. Function call: itemfit(), personfit(). Wald test: z-statistics at item level based on binary sample split. Function call: Waldtest(). Andersen s LR-test: LR-statistic based on sample splits (Andersen, 1973). Function call: LRtest(). Martin-Löf test (Martin-Löf, 1973): Function call: MLoef(). Nonparametric tests (Ponocny, 2001): Function call: NPtest().

Nonparametric Goodness-of-Fit Tests Sampling principle (tetrad transformation): 0 1 1 0 1 0 0 1 Efficient MCMC-based sampling algorithm (RaschSampler; Verhelst, Hatzinger, & Mair, 2007; Verhelst, 2008). Testing approach: Compute test statistic t obs on observed 0/1 data matrix X (Ponocny, 2001). Sample 0/1 matrices for fixed X-margins and compute test statistic t s for each of them. Probability distribution T s. Compute quantile of t obs.

Usecase: Math exams at WU 20 multiple-choice prototype questions (text, formal, applied) to measure the latent construct mathematical ability (n = 9404, k = 20). interest (T) linear functions (T) quadratic functions (T) duopol (T) arithmetic sequences (T) geometric sequences (T) difference equation (T) linear equation systems (F) applied equations systems (A) applied matrix computations (A) matrix equations (A) I/O analysis (A) simplex 1 (T) simplex 2 (F) exponential functions (F) derivative (F) integral (F) derivative applied (T) optimization 1 (T) optimization 2 (T) Aim: Determine an item pool that satisfies highest psychometric standards.

3591 Bar Chart School Types Raw Score Distribution Frequencies 0 500 1000 1500 2000 2500 3000 3500 1854 2161 793 712 293 Frequencies 0 200 400 600 800 1000 1200 Mean: 12.96 Median 13 Standard Deviation: 3.91 AHS AUSL HAK HLA HTL SONST School Types 0 5 10 15 20 Items Solved

R> res.hom <- homals(xhom, ndim = 2, level = "ordinal") R> plot(res.hom, plot.type = "loadplot", main = "Item Loadings", + xlab = "Dimension 1", ylab = "Dimension 2") Item Loadings Dimension 2 0.05 0.00 0.05 0.10 0.15 simplex1 simplex2 lineq apl.lineq io.anal apl.matr matreq 0.00 0.05 0.10 0.15 Dimension 1

Rasch Analysis Model tests: Andersen s LR-test, Wald tests on item level, Martin-Löf test, nonparametric tests. Sample splits: 1000 students (Suarez-Falcon & Glas, 2003). R Call: R> psamp <- sample(1:9404, 1000) R> Xrasch <- Xmath.all[psamp,2:21] R> res.rasch <- RM(Xrasch)

R> Waldtest(res.rasch) Wald test on item level (z-values): z-statistic p-value beta interest 1.810 0.070 beta linear 1.261 0.207 beta quadratic -0.956 0.339 beta duopol 2.996 0.003 beta arith.seq -0.513 0.608 beta geo.seq 1.159 0.247 beta diffeq 0.958 0.338 beta lineq 3.776 0.000 beta apl.lineq 1.249 0.212 beta apl.matr -1.029 0.303 beta matreq -0.416 0.677 beta io.anal 0.301 0.763 beta simplex1 4.402 0.000 beta simplex2 4.205 0.000 beta expfun -2.884 0.004 beta diff -2.494 0.013 beta prim -2.981 0.003 beta apl.diff -0.758 0.448 beta opt1-0.092 0.926 beta opt2 1.370 0.171 R> res.and <- LRtest(res.rasch) R> res.and Andersen LR-test: LR-value: 86.997 Chi-square df: 19 p-value: 0 R> res.loef <- MLoef(res.rasch) R> res.loef Martin-Loef-Test (split: median) LR-value: 152.955 Chi-square df: 99 p-value: 0

Stepwise Item Elimination The following 7 items are excluded stepwise: Simplex tasks: simplex1, simplex2. (Applied) linear equation systems: lineq, appl.lineq. Applied matrix computations: appl.matr. Matrix equations: matreq. I/O Analysis: io.anal. elimlab <- c(8, 9, 10, 11, 12, 13, 14) Xrasch1 <- Xrasch[,-elimlab] res.rasch1 <- RM(Xrasch1) res.ppar1 <- person.parameter(res.rasch1)

R> Waldtest(res.rasch1) Wald test on item level (z-values): z-statistic p-value beta interest 2.200 0.028 beta linear 0.147 0.883 beta quadratic 0.069 0.945 beta duopol 2.593 0.010 beta arith.seq 0.933 0.351 beta geo.seq 1.657 0.098 beta diffeq 2.088 0.037 beta expfun -1.832 0.067 beta diff -0.327 0.744 beta prim -1.291 0.197 beta apl.diff -1.909 0.056 beta opt1-0.389 0.697 beta opt2 0.778 0.437 R> LRtest(res.rasch1, se = TRUE) Andersen LR-test: LR-value: 26.238 Chi-square df: 12 p-value: 0.01 R> LRtest(res.rasch1, splitcr = EDU[psamp]) Andersen LR-test: LR-value: 74.88 Chi-square df: 60 p-value: 0.093 R> MLoef(res.rasch1) Martin-Loef-Test (split: median) LR-value: 44.428 Chi-square df: 41 p-value: 0.329

Graphical Model Check Beta for Group: Raw Scores > Median 3 2 1 0 1 2 3 prim interest quadratic arith.seq linear diff expfun diffeqopt1 geo.seq duopol apl.diff opt2 3 2 1 0 1 2 3 Beta for Group: Raw Scores <= Median

Nonparametric Model Test: Subgroup Invariance Test statistic T 10 (Ponocny, 2001) R> rmat <- rsampler(as.matrix(xrasch1), rsctrl(burn_in=100, n_eff=500, seed=123)) R> eduvec <- EDU[psamp] R> eduhak <- ifelse(eduvec == "HAK", 1, 0) R> res.np10 <- NPtest(rmat, method= "T10", splitcr = eduhak) R> res.np10 Nonparametric RM model test: T10 (global test - subgroup-invariance) Number of sampled matrices: 500 Split: eduhak Group 1: n = 755 Group 2: n = 245 one-sided p-value: 0.164

Distribution for T10 Empirical Cumulative Distribution Function Frequencies 0 20 40 60 80 Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 T10 observed: 161238 Fn(x) = 0.836 80000 100000 120000 140000 160000 180000 200000 T10 Values 100000 150000 200000 T10 Values

Nonparametric Model Test: Local Independence Test statistic T 1 (Ponocny, 2001): T 1 = v δ xvi x vj. R> res.np1 <- NPtest(rmat, method = "T1") R> res.np1 Nonparametric RM model test: T1 (local dependence - inter-item correlations) Number of sampled matrices: 500 Number of Item-Pairs tested: 78 Item-Pairs with one-sided p < 0.05 (2,3) (2,6) (3,8) (8,9) (8,10) (8,11) (8,13) (9,10) (9,11) (9,12) 0.016 0.014 0.000 0.000 0.000 0.008 0.024 0.000 0.000 0.002 (10,11) (10,12) (10,13) (11,12) (11,13) 0.000 0.000 0.026 0.000 0.006

Person Item Map Person Parameter Distribution interest linear quadratic duopol arith.seq geo.seq diffeq expfun diff prim apl.diff opt1 opt2 2 1 0 1 2 Latent Dimension

Results and Implications Final subset of 13 items that correspond to highest psychometric standards (Rasch homogeneous). Now we can: score persons, examine person-fit in terms of guessing, carelessness, specific knowledge, etc. person/item comparisons on an interval scale, make detailed probabilistic statements regarding items and persons (ICC), adaptive testing, etc.

Summary, Outlook, References The erm package as a flexible tool for Rasch analysis. Next on the list: LLRA wrapper function, mixture distribution Rasch models, one-parameter logistic model (OPLM). erm Package vignette: vignette("erm") Selected articles in Journal of Statistical Software: http://www.jstatsoft.org Mair, P. & Hatzinger, R. (2007). Extended Rasch modeling: The erm package for the application of IRT models in R. JSS, 20(9). Verhelst, N., Hatzinger, R., & Mair, P. (2007). The Rasch sampler. JSS, 20(4).

Contact Patrick Mair Institute for Statistics and Mathematics WU Vienna University of Economics and Business Augasse 2-6 1090 Vienna Email: patrick.mair@wu.ac.at Website: http://statmath.wu.ac.at/~mair