Bayesian Nonparametric Rasch Modeling: Methods and Software

Similar documents
Bayesian Methods for Testing Axioms of Measurement

36-720: The Rasch Model

A Workshop on Bayesian Nonparametric Regression Analysis

Bayesian Nonparametric Meta-Analysis Model George Karabatsos University of Illinois-Chicago (UIC)

Lesson 7: Item response theory models (part 2)

PIRLS 2016 Achievement Scaling Methodology 1

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Semiparametric Generalized Linear Models

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Principles of Bayesian Inference

New Developments for Extended Rasch Modeling in R

Part 8: GLMs and Hierarchical LMs and GLMs

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

Monte Carlo Simulations for Rasch Model Tests

BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS. Sherwin G. Toribio. A Dissertation

UCLA Department of Statistics Papers

Polytomous Item Explanatory IRT Models with Random Item Effects: An Application to Carbon Cycle Assessment Data

Bayesian linear regression

Principles of Bayesian Inference

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Density Estimation. Seungjin Choi

Bayesian nonparametric predictive approaches for causal inference: Regression Discontinuity Methods

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Conditional maximum likelihood estimation in polytomous Rasch models using SAS

Bayesian non-parametric model to longitudinally predict churn

Lecture : Probabilistic Machine Learning

Principles of Bayesian Inference

Bayesian estimation of the discrepancy with misspecified parametric models

Psychology 282 Lecture #4 Outline Inferences in SLR

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

The Rasch Model, Additive Conjoint Measurement, and New Models of Probabilistic Measurement Theory

STAT 518 Intro Student Presentation

Introduction To Confirmatory Factor Analysis and Item Response Theory

Wavelet-Based Nonparametric Modeling of Hierarchical Functions in Colon Carcinogenesis

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

Bayes methods for categorical data. April 25, 2017

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

Nonparametric Bayesian Methods (Gaussian Processes)

Mixtures of Rasch Models

Comparing IRT with Other Models

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

6. Fractional Imputation in Survey Sampling

Lecture 13 Fundamentals of Bayesian Inference

STA 216, GLM, Lecture 16. October 29, 2007

Bayesian Methods for Machine Learning

Approximate Bayesian Computation

Comparison between conditional and marginal maximum likelihood for a class of item response models

An Overview of Item Response Theory. Michael C. Edwards, PhD

Centre for Education Research and Practice

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

CPSC 540: Machine Learning

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian Regression Linear and Logistic Regression

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Contents. 3 Evaluating Manifest Monotonicity Using Bayes Factors Introduction... 44

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Image segmentation combining Markov Random Fields and Dirichlet Processes

Markov Chain Monte Carlo

Bayesian Model Diagnostics and Checking

Plausible Values for Latent Variables Using Mplus

Measuring Social Influence Without Bias

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Development and Calibration of an Item Response Model. that Incorporates Response Time

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Bayesian model selection: methodology, computation and applications

Probabilistic Index Models

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Algorithms and Code JMASM24: Numerical Computing for Third-Order Power Method Polynomials (Excel)

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Bayesian Networks in Educational Assessment

Fast Likelihood-Free Inference via Bayesian Optimization

Graphical Models and Kernel Methods

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Walkthrough for Illustrations. Illustration 1

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov

Eco517 Fall 2004 C. Sims MIDTERM EXAM

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

CTDL-Positive Stable Frailty Model

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Lecture 3: Statistical Decision Theory (Part II)

Longitudinal analysis of ordinal data

Bayesian Inference for Regression Parameters

BAYESIAN DECISION THEORY

Basic IRT Concepts, Models, and Assumptions

MID-TERM EXAM ANSWERS. p t + δ t = Rp t 1 + η t (1.1)

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Transcription:

Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar (ORVOMS) Supported by NSF-MMS Research Grant SES-1156372 1

Outline of Presentation I. Review Rasch model II. Key feature of model: Interpretability and simplicity. Issues with Rasch modeling in practice Simple models often misfit real data. Difficulties with Rasch model fit analysis. III. A Bayesian nonparametric (BNP) Rasch model as a solution New model provides Rasch analysis without fit statistics, and automatic Rasch analysis. IV. Illustrate the BNP Rasch model on real NAEP data. Compare with ordinary Rasch, 1PL, 2PL, and 3PL models. Demonstrate my free software with new BNP Rasch model. V. Conclusions 2

Rasch (1960) Model Pr[Y ij = y θ i, β j ] = Φ(θ i β j ) y [1 Φ(θ i β j )] 1 y Y ij {y = 0,1} response by examinee i on test item j for i = 1,,n; j = 1,,J. θ i examinee ability; β j item difficulty (real-valued) Φ( ) = Pr[Y* < ] continuous c.d.f.; for Inverse Link. Usually: Logistic(0,1) c.d.f.: Φ( ) = exp( )/[1+exp( )] Alternatively: Normal ogive: Φ( ): Normal(0,1) c.d.f. (Φ( /1.7) approximates the Logistic(0,1)) Or parameterize scale, for normal Φ( / ). Etc. 3

Rasch (1960) Model Pr[Y ij = y θ i, β j ] = Φ(θ i β j ) y [1 Φ(θ i β j )] 1 y Model is very attractive for its interpretability/simplicity: Y ij a function of examinee ability & item difficulty. Under model, total test score, j y ij sufficient statistic for θ i, and total item score i y ij sufficient statistic for β j. Binary (e.g., logistic) GLM regression model with examinee indicator (0,1) and item indicator (0, 1) predictors/covariates. Additional covariates can be added easily. Easily extended to handle ratings Y ij, using GLM ideas. Interpretable/simple models, because they are understandable, are preferred for high-stakes decisions involving examinee measurement, and for making policy decisions. 4

Rasch (1960) Model Pr[Y ij = y θ i, β j ] = Φ(θ i β j ) y [1 Φ(θ i β j )] 1 y Rasch model is attractive for its interpretability/simplicity. Unfortunately, real data often poorly fit with simple models, and fit better with more complex/less interpretable models. Data misfit makes Rasch model less interpretable/meaningful. Data misfit destroys all sufficiency properties of Rasch models. For example, for a 6-item test, with items ordered by difficulty, the θ estimate of item responses 111000 is the same as the θ estimate of item responses 000111. In such a situation, the Rasch model is no longer interpretable. 5

Rasch (1960) Model Rasch model is attractive for its interpretability/simplicity. Unfortunately, real data often poorly fit with simple models, and fit better with more complex/less interpretable models. Fit statistics are often used to identify/remove problematic items from Rasch model (and problematic persons). This practice is not uncontroversial: leads to information loss. Identifying mis-fitting items is time consuming and difficult: An item may or may not appear to misfit, depending on what other test items happen to be included in the model (!) Many large data sets indicate items misfit the Rasch model. Incoherency: Coherency requires that the model represents the analyst s beliefs about the data. But the act of model fit 6 checking actually indicates her/his lack of belief in the model.

Rasch (1960) Model In real psychometric practice, is it possible to exploit the interpretability and simplicity of the Rasch model, while avoiding the pitfalls of model misfit and fit analysis? I propose a Bayesian nonparametric (BNP) Rasch model, which: Can provide accurate estimates of examinee ability and item difficulty parameters, that are robust to (control for) all observed and unobserved covariates/predictors/factors that are excluded from the model; Then, the BNP Rasch model is not wrong or incorrect for the data, and therefore this model: doesn t require data fit analysis/model checking; there is no point in performing a model fit analysis. 7 provides a Rasch analysis without fit statistics.

Bayesian Nonparametric Rasch Model We enlarge the Rasch model: Pr[Y ij = 1 θ i, β j ] = Φ(θ i β j ) to the Bayesian nonparametric (BNP) Rasch model, an infinite mixture Rasch model defined by: Pr[Y ij = 1 ] = Φ({θ i β j + 0 + ₀}/σ)dG ij ( ₀) = k= : Φ({θ i β j + 0 + 0k }/σ)ω ijk with k= : ω ijk = 1. where for examinee i and item j: 0 is the effect of all unobserved covariates/predictors /factors that are excluded from the model; G ij Φ( ) is the mixing distribution, with mixture weights ω ijk = Φ({j (θ i β j + 0 )}/ ) Φ({j 1 (θ i β j + 0 )}/ ) is the Normal c.d.f. 8

The BNP Rasch model: BNP Rasch Model Pr[Y ij = 1 ] = Φ({θ i β j + 0 + ₀}/σ)dG ij ( ₀) = k= : Φ({θ i β j + 0 + 0k }/σ)ω ijk ; ω ijk = Φ({j (θ i β j + 0 )}/ ) Φ({j 1 (θ i β j + 0 )}/ ) is completed by the specification of prior distributions: 0k ~ i.i.d. Normal(0, 2 ) ~ Uniform(0, b ) 0 σ 2 ~ Normal(0, σ 2 v 0 ) (, 1:J ) σ 2 ~ Normal nj (0, σ 2 vi nj ) σ 2 ~ InverseGamma(a 0 /2, a 0 /2) (, ) σ 2 ~ Normal nj (0, σ 2 v I nj+1 ) σ 2 ~ InverseGamma(a 0 /2, a 0 /2) 9

BNP Rasch Model The Bayesian infinite mixture Rasch model has parameters: = ({ 0k } k= :, σ μ, θ,, σ,,, ). It is a Bayesian nonparametric model because it has an infinite number of parameters, allowing for high model flexibility. Following Bayes theorem, the data, D = (y ij ) n J, updates the joint prior density ( ) to a posterior density: ( D) i j Pr[Y ij = y ij ] ( ), with Pr[Y ij = 1 ] given by the mixture model (previous slide), and Pr[Y ij = 0 ] = 1 Pr[Y ij = 1 ]. The posterior ( D), and all posterior functionals of interest, can be estimated via MCMC methods developed by Karabatsos & Walker (2012, Elec J Stat). 10

Easy Extensions of BNP Rasch Model Ordinal item responses k = 0,1,,m, by model specification: Pr[Y ij = k ] = {w(k 1)<y*<w(k)} n(y* θ i β j + 0 + ₀,σ)dG ij ( ₀)dy* given ordinal thresholds: w 1 < w 0 = 0 < w 1 = 1 < w 2 = 2 < < w m =, and n(,σ) the density function of the Normal(,σ) distribut. Continuous-valued item response Y ij, by model specification: f(y ij ) = n(y ij θ i β j + 0 + ₀, σ)dg ij ( ₀). Extra covariates/predictors (x) can be added to the model, beyond examinee and item indicators. E.g., judge indicators, and covariates describing the examinees, items, and/or judges (e.g., SES, item time), to provide a BNP FACETS model. Then we have a more general model for G x ( ₀). 11

Application of BNP Rasch Model to NAEP data Data: 1990 NAEP 6-item reading exam of 4 th and 6 th graders. 75 examinees (randomly). Dichotomous (0,1) item scores. Standardized residual: zˆ [ y Pr( Y 1 ˆ)] ij ij ij Pr( Y ij 1 ˆ)[1 Pr( Y ij 1 ˆ)] Other Rasch and IRT Models Number of Outliers (i.e., item responses where zˆ ij 2 ) out of the total of 450 (=75*6) item responses. Rasch model (JMLE/WINSTEPS) 17 (4%) Rasch/1PL (MMLE/irtoys) 33 (7%) 2PL and 3PL (MMLE/irtoys) 48 (11%) 12

13

14

Select model for data analysis (one of 43 possible choices of models) 15

Select dependent variable (y). Item response data. 16

Select examinee indicator (0,1) and item indicator (-1,1) covariates/predictors. 17

Enter prior parameters of the model. 18

Software now displays actual model with chosen prior parameters, and chosen covariates. 19

Click to Run Analysis: Generate 50K MCMC samples from model s posterior distribution. 20

After end of analysis run (after ~2 min), this text output file is generated and opened automatically. Text display of model. 21

Text display of chosen prior parameters of model. 22

Click to generate additional plots and output tables 23

Convergence Analysis Step #1: Click table to verify that 95% sizes Monte Carlo Confidence Intervals (MCCI) half-widths, for parameter estimates of interest, are sufficiently small (e.g., around.01) Convergence Analysis Step #2: Click to verify that Trace Plots of model parameters of interest display good mixing, i.e., look sufficiently hairy. 24

Convergence Analysis Step #1: Click table to verify that 95% sizes Monte Carlo Confidence Intervals (MCCI) half-widths, for parameter estimates of interest, are sufficiently small (e.g., around.01). ANSWER: As seen below, the 95%MCCI half widths are small. You can make them even smaller by running additional MCMC sampling iterations, over and beyond the 50K MCMC iterations already run. 25

Convergence Analysis Step #2: Click to verify that Trace Plots of model parameters of interest display good mixing, i.e., look sufficiently hairy. ANSWER: Trace plots look hairy and stable, for the ability parameters of the first 2 examinees (top 2 panels), and for all 6 test items (bottom 6 panels). Recall that we have chosen to remove the first 2K burn-in samples, for posterior parameter estimation. 26

After verifying MCMC convergence, you may then generate plots of marginal posterior summaries of model parameters. Such summaries are already provided in greater detail, in the text output files mentioned earlier. 27

Box plot: marginal posterior distribution of ability parameter, for each of the 75 examinees. 28

Box plot: marginal posterior distributions of difficulty parameter, for each of the 6 test items of the NAEP exam. 29

Plot the standardized fit residuals of the BNP Rasch model. 30

Posterior Predictive Model Fit Statistics (from text output file) Stat. 95% MCCIhw R squared = 0.99 0.00 Standardized Residuals of Dependent Variable Responses: Min 5% 10% 25% 50% 75% 90% 95% Max -0.53-0.34-0.25-0.08 0 0.09 0.33 0.25 0.63 (Range of 95% MCCI sizes of residuals: [.00,.00]). Zero outliers under the BNP Rasch model. But then again, as mentioned, model fit analysis is virtually unnecessary for this Rasch model. 31

Conclusions We propose a BNP Rasch model, which retains the interpretability of the ordinary Rasch model, while providing examinee ability and item difficulty estimates that are robust to outliers of the ordinary model. Model accounts for all covariates not included in the model. Therefore, the BNP Rasch model is not wrong. Then, there is virtually no point in performing fit analysis with the model. BNP model provides a Rasch analysis without fit statistics. BNP Rasch model can be easily extended to ordinal or continuous item responses, and can handle extra covariates as in FACETS. Free, user-friendly software is available for the BNP model http://www.uic.edu/~georgek/homepage/bayessoftware.html 32

References Karabatsos, G. (2014). Bayesian Regression: Nonparametric and Parametric Models, Version 2014b [Software]. April 30, University of Illinois-Chicago. Karabatsos, G., & Walker, S.G. (2012). Adaptive-Modal Bayesian Nonparametric Regression. Electronic Journal of Statistics, 6, 2038-2068. Karabatsos, G., & Walker, S.G. (2009). Coherent psychometric modeling with Bayesian nonparametrics. British Journal of Mathematical and Statistical Psychology, 62, 1-20. Karabatsos, G., & Walker, S.G. (2012, to appear). Bayesian nonparametric IRT. In chapter 19, Handbook Of Item Response Theory: Models, Statistical Tools, and Applications, Volume 1 (W.J. van der Linden & R. Hambleton, Eds.). New York: Taylor & Francis. Linacre, J. M. (2014). Winsteps Rasch measurement computer program. Beaverton, Oregon: Winsteps.com Linacre, J. M. (2013). Facets computer program for many-facet Rasch measurement. Beaverton, Oregon: Winsteps.com Partchev, I. (2014). irtoys: Simple interface to the estimation and plotting of IRT models. R package version 0.1.7. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: 33 Danish Institute for Educational Research.