Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Similar documents
Bayes methods for categorical data. April 25, 2017

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Marginal Specifications and a Gaussian Copula Estimation

STA 216, GLM, Lecture 16. October 29, 2007

Bayesian Multivariate Logistic Regression

Comparison between conditional and marginal maximum likelihood for a class of item response models

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Gibbs Sampling in Latent Variable Models #1

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Comparing IRT with Other Models

Part 8: GLMs and Hierarchical LMs and GLMs

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Bayesian non-parametric model to longitudinally predict churn

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Generalized Linear Models for Non-Normal Data

Bayesian Linear Models

Contents. Part I: Fundamentals of Bayesian Inference 1

PIRLS 2016 Achievement Scaling Methodology 1

Lecture 16: Mixtures of Generalized Linear Models

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian Methods for Machine Learning

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

STA 4273H: Statistical Machine Learning

Computer Emulation With Density Estimation

Bayesian Linear Models

Principles of Bayesian Inference

Generalized Linear Latent and Mixed Models with Composite Links and Exploded

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Bayesian Analysis of Latent Variable Models using Mplus

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

Multivariate Survival Analysis

Basic Sampling Methods

Semiparametric Generalized Linear Models

Modelling geoadditive survival data

A general mixed model approach for spatio-temporal regression data

STAT 518 Intro Student Presentation

Introduction to Statistical Analysis

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

CTDL-Positive Stable Frailty Model

COS513 LECTURE 8 STATISTICAL CONCEPTS

Basic IRT Concepts, Models, and Assumptions

Bayesian linear regression

Bayesian Inference in the Multivariate Probit Model

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

GENERALIZED LATENT TRAIT MODELS. 1. Introduction

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

MCMC algorithms for fitting Bayesian models

Part 2: One-parameter models

Analysing geoadditive regression data: a mixed model approach

A Nonparametric Approach Using Dirichlet Process for Hierarchical Generalized Linear Mixed Models

Stat 5101 Lecture Notes

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Bayesian hierarchical space time model applied to highresolution hindcast data of significant wave height

Bayesian Regression (1/31/13)

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

The linear model is the most fundamental of all serious statistical models encompassing:

Fitting Narrow Emission Lines in X-ray Spectra

Factor Analysis and Latent Structure of Categorical Data

Generalized logit models for nominal multinomial responses. Local odds ratios

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Making rating curves - the Bayesian approach

Nonparametric Bayes Inference on Manifolds with Applications

Using Estimating Equations for Spatially Correlated A

Bayesian Linear Regression

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Bayesian Nonparametric Rasch Modeling: Methods and Software

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Bayesian Inference. Chapter 9. Linear models and regression

Reconstruction of individual patient data for meta analysis via Bayesian approach

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Joint Modeling of Longitudinal Item Response Data and Survival

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Introduction to Machine Learning

Bayesian data analysis in practice: Three simple examples

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

CSC 2541: Bayesian Methods for Machine Learning

Simulating Random Variables

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Modelling Operational Risk Using Bayesian Inference

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Partial factor modeling: predictor-dependent shrinkage for linear regression

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Modeling conditional distributions with mixture models: Theory and Inference

Learning Bayesian network : Given structure and completely observed data

Naïve Bayes classification

Transcription:

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal items in order to measure an underlying trait using a collection of observed responses to test questions or items. For binary items, a common IRT model is the two parameter logistic (2PL) form. Assume that we have observed p item responses for n individuals where a correct responses is designated with a 1 and an incorrect response a 0. Let the matrix Y = {Y ij : i = 1,..., n; j = 1,..., p} denote the set of binary item responses. The 2PL IRT form for the probability of a correct response on item j for participant i denoted by Y ij = 1 is the following, P (Y ij = 1 θ i, a j, b j ) = 1 1 + exp ( a j (θ i b j )) Here a j and b j are item specific parameters while θ i is an individual specific parameter. The parameter b in this framework is a measure of the item s difficulty. θ represents the individual s level of a latent trait. The parameters θ and b are on the same scale here and when θ > b the individual has a greater probability of answering the item correctly than answering it incorrectly. The item parameter a measures the degree to which an item differentiates among individuals of different levels of the latent trait. As a result, it is referred to as the discrimination parameter. The probability of a correct response is assumed to increase with θ in IRT models and a is restricted to be positive. Finally, the responses of an individual are assumed to be independent given their ability θ. The above interpretations make the linear form a j (θ i b j ) a useful one. When working with ordinal responses of more than two categories, IRT practitioners commonly rely upon the graded response model (GRM) or the generalized partial credit model (GPCM). We discuss the GPCM briefly. The GPCM (1) 1

is essentially an adjacent categories logit model. If we assume that item j has a total of K j categories, the probability that individual i is credited with item response k j is ( kj ) exp h=1 a j(θ i b jh ) P (Y ij = k j θ, a j, b j ) = ( Kj l=1 exp l ) (2) h=1 a j(θ i b jh ) where b j is now a vector of location or threshold parameters between categories. It is convention to set b j1 = 0 so that there are a total of K j 1 parameters. For instances where the observed response is continuous or is discrete but with a large number of categories, the manifest observations may be binned in some fashion in order to apply existing IRT models to such data. We are interested in estimating cognitive functioning in individuals using a set of their observed responses to cognitive testing items. As a result, we are primarily focused on estimation of the latent trait. These sets of outcomes for each individual may consist of binary, categorical (ordinal), counts, rightcensored counts and continuous outcomes. Thus we require a more general approach than the above. One common approach to latent variable models for more diverse types of outcomes, or mixed outcomes, includes latent variables in generalized linear models. Sammel, Ryan and Legler [4] developed a generalized linear model formulation that included a latent variable among the covariates with the latent variable also a function of covariates. Application was to testing results similar to ours. Although the model was formulated to handle outcomes from exponential families in general, the methods were demonstrated for binary and continuous data only. Moustaki and Knott [3] focus on the measurement model, generalizing the results of Sammel and Ryan to include multiple latent variables and to handle polytomous, Poisson and gamma distributed outcomes as well as binary and normal outcomes. While the above models relied on maximum likelihood for inference and employed the EM algorithm for estimation, Dunson [1] cast the problem in a Bayesian framework and extended the models to allow for repeated measurements, serial correlations in the latent variables and individual-specific response behavior. More recently, we have applied these generalized latent traits models in an IRT setting and, motivated by items on tests of cognitive functioning, we have extended these methods to allow for right-censored count outcomes as well as time-to-completion outcomes. We will refer to this model as the mixed outcomes IRT model. Of course, we could continue to further extend our model to accommodate many more types of outcomes that we may encounter. However our interest is ultimately in obtaining a good estimate of the latent ability, θ. Specification 2

of a diverse set of distributions F 1,..., F p to model the p mixed outcomes is merely a means to obtaining a good estimate of θ. Ideally however we could estimate θ without the need to specify the distributions F 1,..., F p. Hoff [2] produced the extended rank likelihood method to estimate parameters of a Gaussian copula for arbitrary marginal distributions and without the need to specify any assumptions regarding the marginal distributions. We seek to use the extended rank likelihood method in an IRT framework. Section 2 discusses the extended rank likelihood method and how we may use it to estimate latent traits. Section 3 discusses the Bayesian methods by which we are able to obtain estimates from this model. In Section 4, we demonstrate these methods as well as the mixed outcomes IRT model on simulated data. We conclude with discussion. 2 Methods We start by summarizing the methods of Hoff and then proceed to discuss how we may adapt them to estimate latent abilities provided a number of observed responses to items with unknown distributions. Hoff developed the extended rank likelihood methods to estimate dependence among multiple outcomes using a Gaussian copula without relying on assumptions regarding the marginal distributions. As above, let i = 1,..., n denote the ith participant, and let j = 1,..., p denote the jth item. Then, y ij represents the observed response of participant i to item j with marginal distribution F j. Thus we may also represent y ij = F 1 j (u ij ) where u ij is a uniform (0,1) random variable. Similarly we might formulate y ij = F 1 j [Φ(z ij )] where Φ denotes the normal CDF and z ij is distributed normal. If we assume that the correlation of z ij with z ij for all 1 j, j p is specified by the p p correlation matrix C, then we have the following Gaussian copula sampling model as presented by Hoff, z 1,..., z n C i.i.d. N(0, C) (3) where z i is the p-length vector of z ij for participant i. y i,j = F 1 j [Φ(z ij )] (4) The ultimate goal here is to estimate C rather than F 1,..., F p. Knowledge of the z ij s would allow us to estimate C; however these are unobserved. Nonetheless, we do have some information about the z ij s through the y ij s; we know that y ij < y ij implies z ij < z ij. If we let Z = (z 1,..., z n ) T and Y = (y 1,..., y n ) T, then Z D(Y) where D(Y) = {Z R n p : max{z kj : y kj < y ij } < z ij < min{z kj : y ij < y kj }}. (5) 3

Using this construction, one can construct a likelihood for C relying solely on Z based on the following probability, Pr(Z D(Y) C, F 1,..., F p ) = p(z C) dz D(Y) = Pr(Z D(Y) C) (6) that does not depend on F 1,..., F p. This enables the following decomposition of the density of Y, p(y C, F 1,..., F p ) = p(y, Z D(Y) C, F 1,..., F p ) (7) = Pr(Z D(Y) C, F 1,..., F p ) p(y Z D(Y), C, F 1,..., F p ). (8) This decomposition uses the fact that Z is conditionally independent of the marginal distributions as shown above and that the event Z D(Y) occurs whenever Y is observed. Thus one is able to estimate the dependence structure of Y through C without any knowledge or assumptions about the marginal distributions. In the context of item response theory modeling, we are not interested in explicitly estimating C. However we are interested in the characterizing the interdependencies in multivariate observed responses through a latent variable model. This difference is represented graphically in Figures 1 and 2. In Figure 1, we see the latent z ij s as directly correlated and this is parameterized through C in the Gaussian copula model. In Figure 2, we see the relation among the latent z ij s as being connected to a single latent trait θ i. z i1 z i2 z i3 z i4 z i5 y i1 y i2 y i3 y i4 y i5 Figure 1: Extended Rank Likelihood Applied To Gaussian Copula Using an IRT-type model, the data generating model now takes the form, θ i N(0, 1) (9) z i a, θ i N(aθ i, I) (10) z 1,..., z n a i.i.d. N(0, I + aa T ) (11) y ij = g j (z ij ). (12) 4

θ i z i1 z i2 z i3 z i4 z i5 y i1 y i2 y i3 y i4 y i5 Figure 2: Extended Rank Likelihood Applied To IRT Model Here the dependence among the z i s is restricted to I + aa T as opposed to the more general C in the Gaussian copula model. This restriction is based on our assumption that an IRT type model is appropriate for the data being analyzed. Our likelihood for the IRT model is hence based on Pr(Z D(Y) a, θ) with a and θ our parameters of interest. Finally notice that we have not included the difficulty parameters b in the above formulation. These location parameters are not thought to be identifiable because we could shift all z ij s by an arbitrary amount (as long as the amount is the same for each j) and still have a set of values entirely consistent with the rankings of the y ij s. As we are primarily interested in the estimation of the latent trait, this loss is not critical. 3 Estimation To estimate a and θ, we rely on Bayesian methods. Specifically, we rely upon Metropolis-Hastings (MH) and Gibbs sampling to obtain draws from the posterior distribution of a and θ. Because each a j is restricted to be positive, we specify a Lognormal(0, σ 2 a) prior for each. Thus, our complete model is, a j Lognormal(0, σ 2 a) θ i N(0, 1) z i a, θ i N(aθ i, I) y ij = g j (z ij ). To sample from p(a, θ Z D(Y)), we proceed by iterating through the following steps as in Hoff and as in Scott [5]. 5

1. Draw unobserved Z. For each i and j, sample z ij from p(z ij a, θ, Z ( i)( j), Z D(Y)). More specifically, for each j and within that for each y = unique{y 1j,..., y nj }, z ij TN (zl,z u)(a j θ i, 1) (13) where TN denotes truncated normal and z l, z u define the lower and upper truncation points, z l = max{z kj : y kj < y} (14) z u = min{z kj : y kj > y}. (15) 2. Draw latent abilities θ. For each i, we can sample directly from the conditional distribution for θ i as follows, ( a T ) z i θ i N a T a + 1, 1 a T. (16) a + 1 3. Draw item parameters a. To sample from p(a θ, Z D(Y), σ 2 a), we rely on MH sampling. Proposals for a are generated using the lognormal distribution. We have chosen to sample each a j individually rather than jointly. The implementation in R is presented in the Appendix. 4 Simulations We now demonstrate the above methods on simulated data. We generate 20 item responses for 600 individuals using the mixed outcome IRT model discussed in the introduction. The 20 item responses were of type and generating distribution listed in Table 1. Table 1: Types and associated distributions for the 20 simulated items. Item Type Distribution # of Items Binary Bernoulli 7 Count Poisson 3 Right Censored Count Right Censored Poisson 3 Positive Skewed Lognormal 3 Ordinal Categorical Multinomial (Adjacent Categories/GPCM) 4 6

In Figure 3, we presents histograms of the simulated data for two of the items. Figure 3a displays simulated responses for item 12, data that was generated using a right-censored Poisson model. One can see how the right-censored count outcomes exhibit a small ceiling effect. Figure 3b displays simulated responses for item 15. This data was produced using a lognormal distribution. We see the majority of responses clustered at the low end with a few responses dispersed among higher values. We truncated the vertical axis of the plot to provide a more granular view of the low frequency intervals; the first interval includes over 500 responses. Frequency 0 50 100 150 200 250 Frequency 0 20 40 60 80 100 0 2 4 6 8 10 0 5 10 15 20 25 30 (a) Item 12. (b) Item 15. Note that the height of the first bar has been truncated to provide a more granular view of the other intervals. It extends to over 500. Figure 3: Histograms Of Item Responses Having simulated this data, we now see how well the latent ability parameter can be recovered by different models. We consider a few different metrics including RMSE, average width of 95% credible interval, coverage properties of 95% credible interval (recognizing that coverage is a frequentist concept). However, because the location and scale of the latent variable is arbitrary and influenced by the choice of prior, it may be more meaningful to consider whether ranks of the estimated latent traits are consistent with those of the true, data-generating values. Hence we also calculate Spearman s ρ between the estimated and true values as well as the mean absolute difference in ranks. We consider three different methods of estimating the parameters. First we use the data-generating IRT model for estimation and naturally expect this to perform the best. Next we use the extended rank likelihood methods discussed above to estimate the ability parameters. While we do not expect this general 7

approach to perform as well as the data-generating model, we will be curious how close the performance is. Finally, we will use the data-generating model again but a misspecified version where we choose to model the lognormal outcomes as normally distributed. Comparison with the misspecified model will give us some idea of the robustness of the mixed outcome IRT model and the flexibility of the extended rank likelihood approach. We used 25,000 iterations for each Bayesian sampling scheme, discarding the first half of those as burn-in. Tuning parameters were selected for all MH steps so that acceptance rates generally fell within an interval of 25% to 50%. Trace plots did not appear to indicate any failure to converge. Table 2 presents the estimation metrics for θ by method. By every metric, the data generating mixed outcomes IRT model is not surprisingly superior. However, the extended rank likelihood method is quite competitive in all metrics, particularly the rank related metrics. The misspecified mixed outcomes model does not fare well in terms of RMSE and coverage. It does perform better in terms of the rank related metrics. The reason for this is that the simulated data for item 14 contains some very large values generated by the lognormal distribution skewing the outcomes heavily. When this is modeled with a normal distribution as in the misspecified model, the results suffer. The large simulated values for item 14 are perhaps not overly realistic but nonetheless provide some notion of the potential cost of misspecification. Table 2: Latent trait estimation metrics by estimation method. Metric Data Generating Extended Rank Misspecified RMSE 0.22 0.25 0.69 95% CI Coverage 0.95 0.94 0.72 Mean 95% CI Width 0.87 0.92 0.82 Spearman s ρ 0.98 0.97 0.96 Mean Abs. Rank Diff. 28.47 29.25 36.64 5 Discussion We have applied the extended rank likelihood method of Hoff to estimate latent abilities as in a item response theory model but without the need to specify distributions for the observed item responses. In one simulation at least, the results produced by this method were quite favorable. The next step is to apply the extended rank likelihood IRT model to data from the subcortical ischemic 8

vascular dementia (SIVD) study. A primary goal of the SIVD study was to investigate the contribution of subcortical cerebrovascular disease to declines in cognitive functioning. Ultimately, we would like to relate the estimated latent abilities of participants in the SIVD study to MRI measured volumes of different brain matter. This extension should be straightforward as we can build upon the Bayesian model above in hierarchical fashion by specifying, where x i is a vector of covariates for participant i. θ i N(x T i β, I) (17) β N(0, σ 2 βi) (18) A number of areas require some further examination. I need to better understand identifiability of this model. Also the above model assumes a unidimensional latent trait. We would like to be able to test and/or relax this assumption. Finally the SIVD study is longitudinal and we would like to develop the methods here for longitudinal data. References [1] D.B. Dunson. Dynamic Latent Trait Models for Multidimensional Longitudinal Data. Journal of the American Statistical Association, 98(463):555 564, 2003. [2] P.D. Hoff. Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Statist, 1(1):265 283, 2007. [3] I. Moustaki and M. Knott. Generalized latent trait models. Psychometrika, 65(3):391 411, 2000. [4] M.D. Sammel, L.M. Ryan, and J.M. Legler. Latent variable models for mixed discrete and continuous outcomes. Journal of the Royal Statistical Society. Series B (Methodological), pages 667 678, 1997. [5] J.G. Scott. Nonparametric Benchmarking with the Extended Rank Likelihood. 2009. 9