Mixtures of Rasch Models

Similar documents
Comparison between conditional and marginal maximum likelihood for a class of item response models

Examining Exams Using Rasch Models and Assessment of Measurement Invariance

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Determining the number of components in mixture models for hierarchical data

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

Statistical Estimation

Latent classes for preference data

Dynamic sequential analysis of careers

Weighted Finite-State Transducers in Computational Biology

Estimating the parameters of hidden binomial trials by the EM algorithm

Monte Carlo Simulations for Rasch Model Tests

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Likelihood-based inference with missing data under missing-at-random

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Mixture models for heterogeneity in ranked data

Uncertainty quantification and visualization for functional random variables

Lecture 6: April 19, 2002

The Expectation Maximization or EM algorithm

Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems

Clustering on Unobserved Data using Mixture of Gaussians

Statistical Pattern Recognition

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

The Expectation Maximization Algorithm

A class of latent marginal models for capture-recapture data with continuous covariates

36-720: The Rasch Model

Variable selection for model-based clustering

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Finite Mixture Model Diagnostics Using Resampling Methods

Latent Variable Models and EM algorithm

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

What is Latent Class Analysis. Tarani Chandola

Score-Based Tests of Measurement Invariance with Respect to Continuous and Ordinal Variables

Lecture 6: Graphical Models: Learning

Web-based Supplementary Materials for Multilevel Latent Class Models with Dirichlet Mixing Distribution

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Latent Class Analysis

Week 3: The EM algorithm

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Machine Learning Summer School

Probabilistic clustering

More on Unsupervised Learning

Bayesian Nonparametric Rasch Modeling: Methods and Software

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Regression Clustering

The Covariate-Adjusted Frequency Plot for the Rasch Poisson Counts Model

Lesson 7: Item response theory models (part 2)

Naïve Bayes classification

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

6. Fractional Imputation in Survey Sampling

EM-algorithm for motif discovery

Introduction to Machine Learning

CPSC 540: Machine Learning

Lecture 8: Graphical models for Text

Sparse Linear Models (10/7/13)

1 Expectation Maximization

B.1 Optimization of the negative log-likelihood functions

FlexMix: A general framework for finite mixture models and latent glass regression in R

Score-Based Tests of Measurement Invariance with Respect to Continuous and Ordinal Variables

Lecture 14 More on structural estimation

Multilevel Mixture with Known Mixing Proportions: Applications to School and Individual Level Overweight and Obesity Data from Birmingham, England

The Regularized EM Algorithm

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

An Introduction to mixture models

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Walkthrough for Illustrations. Illustration 1

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

Lecture 6: Gaussian Mixture Models (GMM)

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Fractional Imputation in Survey Sampling: A Comparative Review

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Mixture Models and EM

Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees

Data Mining Techniques

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Latent Variable Models and EM Algorithm

Biostat 2065 Analysis of Incomplete Data

Optimization Methods II. EM algorithms.

Application of Item Response Theory Models for Intensive Longitudinal Data

arxiv: v1 [stat.ap] 11 Aug 2014

Statistical Analysis of List Experiments

New Developments for Extended Rasch Modeling in R

Model comparison and selection

EM for ML Estimation

Lecture 4: Probabilistic Learning

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Expectation Maximization Mixture Models HMMs

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Co-expression analysis of RNA-seq data

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

lcda: Local Classification of Discrete Data by Latent Class Models

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

November 2002 STA Random Effects Selection in Linear Mixed Models

Generalized Linear Models

Statistical Analysis of the Item Count Technique

CS Lecture 18. Expectation Maximization

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Chapter 17: Undirected Graphical Models

Transcription:

Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/

Introduction Rasch model for measuring latent traits Model assumption: Item parameters estimates do not depend on person sample Violated in case of differential item functioning (DIF) Several approaches to test for DIF: LR tests, Wald tests Rasch trees Mixture models Here: Two versions of the mixture model approach

Rasch Model Probability for person i to solve item j: P(Y ij = y ij θ i, β j ) = ey ij(θ i β j ) 1 + e θ i β j y ij : Response by person i to item j θ i : Ability of person i β j : Difficulty of item j

ML Estimation Factorization of the full likelihood on basis of the scores r i = m j=1 y ij L(θ, β) = f (y θ, β) = h(y r, θ, β)g(r θ, β) = h(y r, β)g(r θ, β) Joint ML: Joint estimation of β and θ is inconsistent Marginal ML: Assume distribution for θ and integrate out in g(r θ, β) Conditional ML: Assume g(r) = g(r θ, β) as given or that it does not depend on θ, β (but potentially other parameters). Hence, g(r) is a nuisance term and only h(y r, β) needs to be maximized.

Mixture Models Mixture models are a tool to model data with unobserved heterogeneity caused by, e.g., (latent) groups Mixture density = weight component Weights are a priori probabilities for the components Components are densities or (regression) models

Mixtures of Rasch Models Mixture of the full likelihoods by Rost (1990): n K f (y π, ψ, β) = π k ψ ri,kh(y i r i, β k ) with ψ ri,k = g k (r i ) i=1 k=1

Mixtures of Rasch Models Mixture of the full likelihoods by Rost (1990): n K f (y π, ψ, β) = π k ψ ri,kh(y i r i, β k ) i=1 k=1 with ψ ri,k = g k (r i ) Mixture of the conditional likelihoods: n K f (y π, β) = π k h(y i r i, β k ) i=1 k=1

Parameter Estimation EM algorithm by Dempster, Laird and Rubin (1977) Group membership is seen as a missing value Optimization is done iteratively by alternate estimation of group membership (E-step) and component densities (M-step) E-step: ˆp ik = M-step: For each component separately ˆβ k = argmax β k ˆπ k h(y i r i, ˆβ k ) K g=1 ˆπ gh(y i r i, ˆβ g ) n ˆp ik log h(y i r i, ˆβ k ) i=1

Number of Components How can the number of components k be established? A priori known number of groups in the data LR test: Regularity conditions are not fulfilled Distribution under H 0 unknown Bootstrap necessary Information criteria: AIC, BIC, ICL

Simulation Design 10 items, 1800 people, equal group sizes Latent groups in item and/or person parameters: β 1 = β 2 β 1 β 2 θ 1 = θ 2 A B θ 1 θ 2 C

Item Parameters A: One Latent Class (No DIF) B/C: Two Latent Classes (DIF) Item Difficulty 2 1 0 1 2 β 1 = β 2 Item Difficulty 2 1 0 1 2 β 1 β 2 2 4 6 8 10 Item Number 2 4 6 8 10 Item Number

Person Parameters A/B: θ 1 = θ 2 C: θ 1 θ 2 Density 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 θ 1 θ 2 6 4 2 0 2 4 6 Ability 6 4 2 0 2 4 6 Ability

Criteria for Goodness of Fit Number of components Rand index: Agreement between true and estimated partition Mean residual sum of squares: Agreement between true and estimated (item) parameter vector

No Latent Classes (No DIF) A 0 100 200 300 400 500 AIC BIC ICL 1 2 3 Number of Components

Two Latent Classes (DIF) B 0 100 200 300 400 500 AIC BIC ICL 1 2 3 Number of Components

Latent Structure in Item and Person Parameters (DIF + Ability Differences) C 0 100 200 300 400 500 AIC BIC ICL 1 2 3 4 Number of Components

Latent Structure in Item and Person Parameters (DIF + Ability Differences) AIC BIC ICL 0.75 0.80 0.85 0.90 0.95 Rand Index (C) (Accuracy of Clustering)

Latent Structure in Item and Person Parameters (DIF + Ability Differences) AIC BIC ICL 2 5 10 20 50 Log Mean Residual SSQ (C) (Accuracy of Item Parameter Estimates)

Summary and Outlook Model suitable for detecting latent classes with DIF Model also suitable when a latent structure in the person parameters is present AIC tends to overestimate the correct number of classes, BIC and ICL work well Clustering of the observations works well Estimation of the item parameters in the components works reasonably well Comparison with Rost s MRM to follow

Literature Arthur Dempster, Nan Laird, and Donald Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B, 39(1): 1 38, 1977. Bettina Grün and Friedrich Leisch. Flexmix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters. Journal of Statistical Software, 28(4): 1 35, 2008. Georg Rasch. Probabilistic Models for Some Intelligence and Attainment Tests. The University of Chicago Press, 1960. Jürgen Rost. Rasch Models in Latent Classes: An Integration of Two Approaches to Item Analysis. Applied Psychological Measurement, 14(3): 271 282, 1990. Carolin Strobl. Das Rasch-Modell - Eine verständliche Einführung für Studium und Praxis. Rainer Hampp Verlag, 2010.