Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal items in order to measure an underlying trait using a collection of observed responses to test questions or items. For binary items, a common IRT model is the two parameter logistic (2PL) form. Assume that we have observed p item responses for n individuals where a correct responses is designated with a 1 and an incorrect response a 0. Let the matrix Y = {Y ij : i = 1,..., n; j = 1,..., p} denote the set of binary item responses. The 2PL IRT form for the probability of a correct response on item j for participant i denoted by Y ij = 1 is the following, P (Y ij = 1 θ i, a j, b j ) = 1 1 + exp ( a j (θ i b j )) Here a j and b j are item specific parameters while θ i is an individual specific parameter. The parameter b in this framework is a measure of the item s difficulty. θ represents the individual s level of a latent trait. The parameters θ and b are on the same scale here and when θ > b the individual has a greater probability of answering the item correctly than answering it incorrectly. The item parameter a measures the degree to which an item differentiates among individuals of different levels of the latent trait. As a result, it is referred to as the discrimination parameter. The probability of a correct response is assumed to increase with θ in IRT models and a is restricted to be positive. Finally, the responses of an individual are assumed to be independent given their ability θ. The above interpretations make the linear form a j (θ i b j ) a useful one. When working with ordinal responses of more than two categories, IRT practitioners commonly rely upon the graded response model (GRM) or the generalized partial credit model (GPCM). We discuss the GPCM briefly. The GPCM (1) 1
is essentially an adjacent categories logit model. If we assume that item j has a total of K j categories, the probability that individual i is credited with item response k j is ( kj ) exp h=1 a j(θ i b jh ) P (Y ij = k j θ, a j, b j ) = ( Kj l=1 exp l ) (2) h=1 a j(θ i b jh ) where b j is now a vector of location or threshold parameters between categories. It is convention to set b j1 = 0 so that there are a total of K j 1 parameters. For instances where the observed response is continuous or is discrete but with a large number of categories, the manifest observations may be binned in some fashion in order to apply existing IRT models to such data. We are interested in estimating cognitive functioning in individuals using a set of their observed responses to cognitive testing items. As a result, we are primarily focused on estimation of the latent trait. These sets of outcomes for each individual may consist of binary, categorical (ordinal), counts, rightcensored counts and continuous outcomes. Thus we require a more general approach than the above. One common approach to latent variable models for more diverse types of outcomes, or mixed outcomes, includes latent variables in generalized linear models. Sammel, Ryan and Legler [4] developed a generalized linear model formulation that included a latent variable among the covariates with the latent variable also a function of covariates. Application was to testing results similar to ours. Although the model was formulated to handle outcomes from exponential families in general, the methods were demonstrated for binary and continuous data only. Moustaki and Knott [3] focus on the measurement model, generalizing the results of Sammel and Ryan to include multiple latent variables and to handle polytomous, Poisson and gamma distributed outcomes as well as binary and normal outcomes. While the above models relied on maximum likelihood for inference and employed the EM algorithm for estimation, Dunson [1] cast the problem in a Bayesian framework and extended the models to allow for repeated measurements, serial correlations in the latent variables and individual-specific response behavior. More recently, we have applied these generalized latent traits models in an IRT setting and, motivated by items on tests of cognitive functioning, we have extended these methods to allow for right-censored count outcomes as well as time-to-completion outcomes. We will refer to this model as the mixed outcomes IRT model. Of course, we could continue to further extend our model to accommodate many more types of outcomes that we may encounter. However our interest is ultimately in obtaining a good estimate of the latent ability, θ. Specification 2
of a diverse set of distributions F 1,..., F p to model the p mixed outcomes is merely a means to obtaining a good estimate of θ. Ideally however we could estimate θ without the need to specify the distributions F 1,..., F p. Hoff [2] produced the extended rank likelihood method to estimate parameters of a Gaussian copula for arbitrary marginal distributions and without the need to specify any assumptions regarding the marginal distributions. We seek to use the extended rank likelihood method in an IRT framework. Section 2 discusses the extended rank likelihood method and how we may use it to estimate latent traits. Section 3 discusses the Bayesian methods by which we are able to obtain estimates from this model. In Section 4, we demonstrate these methods as well as the mixed outcomes IRT model on simulated data. We conclude with discussion. 2 Methods We start by summarizing the methods of Hoff and then proceed to discuss how we may adapt them to estimate latent abilities provided a number of observed responses to items with unknown distributions. Hoff developed the extended rank likelihood methods to estimate dependence among multiple outcomes using a Gaussian copula without relying on assumptions regarding the marginal distributions. As above, let i = 1,..., n denote the ith participant, and let j = 1,..., p denote the jth item. Then, y ij represents the observed response of participant i to item j with marginal distribution F j. Thus we may also represent y ij = F 1 j (u ij ) where u ij is a uniform (0,1) random variable. Similarly we might formulate y ij = F 1 j [Φ(z ij )] where Φ denotes the normal CDF and z ij is distributed normal. If we assume that the correlation of z ij with z ij for all 1 j, j p is specified by the p p correlation matrix C, then we have the following Gaussian copula sampling model as presented by Hoff, z 1,..., z n C i.i.d. N(0, C) (3) where z i is the p-length vector of z ij for participant i. y i,j = F 1 j [Φ(z ij )] (4) The ultimate goal here is to estimate C rather than F 1,..., F p. Knowledge of the z ij s would allow us to estimate C; however these are unobserved. Nonetheless, we do have some information about the z ij s through the y ij s; we know that y ij < y ij implies z ij < z ij. If we let Z = (z 1,..., z n ) T and Y = (y 1,..., y n ) T, then Z D(Y) where D(Y) = {Z R n p : max{z kj : y kj < y ij } < z ij < min{z kj : y ij < y kj }}. (5) 3
Using this construction, one can construct a likelihood for C relying solely on Z based on the following probability, Pr(Z D(Y) C, F 1,..., F p ) = p(z C) dz D(Y) = Pr(Z D(Y) C) (6) that does not depend on F 1,..., F p. This enables the following decomposition of the density of Y, p(y C, F 1,..., F p ) = p(y, Z D(Y) C, F 1,..., F p ) (7) = Pr(Z D(Y) C, F 1,..., F p ) p(y Z D(Y), C, F 1,..., F p ). (8) This decomposition uses the fact that Z is conditionally independent of the marginal distributions as shown above and that the event Z D(Y) occurs whenever Y is observed. Thus one is able to estimate the dependence structure of Y through C without any knowledge or assumptions about the marginal distributions. In the context of item response theory modeling, we are not interested in explicitly estimating C. However we are interested in the characterizing the interdependencies in multivariate observed responses through a latent variable model. This difference is represented graphically in Figures 1 and 2. In Figure 1, we see the latent z ij s as directly correlated and this is parameterized through C in the Gaussian copula model. In Figure 2, we see the relation among the latent z ij s as being connected to a single latent trait θ i. z i1 z i2 z i3 z i4 z i5 y i1 y i2 y i3 y i4 y i5 Figure 1: Extended Rank Likelihood Applied To Gaussian Copula Using an IRT-type model, the data generating model now takes the form, θ i N(0, 1) (9) z i a, θ i N(aθ i, I) (10) z 1,..., z n a i.i.d. N(0, I + aa T ) (11) y ij = g j (z ij ). (12) 4
θ i z i1 z i2 z i3 z i4 z i5 y i1 y i2 y i3 y i4 y i5 Figure 2: Extended Rank Likelihood Applied To IRT Model Here the dependence among the z i s is restricted to I + aa T as opposed to the more general C in the Gaussian copula model. This restriction is based on our assumption that an IRT type model is appropriate for the data being analyzed. Our likelihood for the IRT model is hence based on Pr(Z D(Y) a, θ) with a and θ our parameters of interest. Finally notice that we have not included the difficulty parameters b in the above formulation. These location parameters are not thought to be identifiable because we could shift all z ij s by an arbitrary amount (as long as the amount is the same for each j) and still have a set of values entirely consistent with the rankings of the y ij s. As we are primarily interested in the estimation of the latent trait, this loss is not critical. 3 Estimation To estimate a and θ, we rely on Bayesian methods. Specifically, we rely upon Metropolis-Hastings (MH) and Gibbs sampling to obtain draws from the posterior distribution of a and θ. Because each a j is restricted to be positive, we specify a Lognormal(0, σ 2 a) prior for each. Thus, our complete model is, a j Lognormal(0, σ 2 a) θ i N(0, 1) z i a, θ i N(aθ i, I) y ij = g j (z ij ). To sample from p(a, θ Z D(Y)), we proceed by iterating through the following steps as in Hoff and as in Scott [5]. 5
1. Draw unobserved Z. For each i and j, sample z ij from p(z ij a, θ, Z ( i)( j), Z D(Y)). More specifically, for each j and within that for each y = unique{y 1j,..., y nj }, z ij TN (zl,z u)(a j θ i, 1) (13) where TN denotes truncated normal and z l, z u define the lower and upper truncation points, z l = max{z kj : y kj < y} (14) z u = min{z kj : y kj > y}. (15) 2. Draw latent abilities θ. For each i, we can sample directly from the conditional distribution for θ i as follows, ( a T ) z i θ i N a T a + 1, 1 a T. (16) a + 1 3. Draw item parameters a. To sample from p(a θ, Z D(Y), σ 2 a), we rely on MH sampling. Proposals for a are generated using the lognormal distribution. We have chosen to sample each a j individually rather than jointly. The implementation in R is presented in the Appendix. 4 Simulations We now demonstrate the above methods on simulated data. We generate 20 item responses for 600 individuals using the mixed outcome IRT model discussed in the introduction. The 20 item responses were of type and generating distribution listed in Table 1. Table 1: Types and associated distributions for the 20 simulated items. Item Type Distribution # of Items Binary Bernoulli 7 Count Poisson 3 Right Censored Count Right Censored Poisson 3 Positive Skewed Lognormal 3 Ordinal Categorical Multinomial (Adjacent Categories/GPCM) 4 6
In Figure 3, we presents histograms of the simulated data for two of the items. Figure 3a displays simulated responses for item 12, data that was generated using a right-censored Poisson model. One can see how the right-censored count outcomes exhibit a small ceiling effect. Figure 3b displays simulated responses for item 15. This data was produced using a lognormal distribution. We see the majority of responses clustered at the low end with a few responses dispersed among higher values. We truncated the vertical axis of the plot to provide a more granular view of the low frequency intervals; the first interval includes over 500 responses. Frequency 0 50 100 150 200 250 Frequency 0 20 40 60 80 100 0 2 4 6 8 10 0 5 10 15 20 25 30 (a) Item 12. (b) Item 15. Note that the height of the first bar has been truncated to provide a more granular view of the other intervals. It extends to over 500. Figure 3: Histograms Of Item Responses Having simulated this data, we now see how well the latent ability parameter can be recovered by different models. We consider a few different metrics including RMSE, average width of 95% credible interval, coverage properties of 95% credible interval (recognizing that coverage is a frequentist concept). However, because the location and scale of the latent variable is arbitrary and influenced by the choice of prior, it may be more meaningful to consider whether ranks of the estimated latent traits are consistent with those of the true, data-generating values. Hence we also calculate Spearman s ρ between the estimated and true values as well as the mean absolute difference in ranks. We consider three different methods of estimating the parameters. First we use the data-generating IRT model for estimation and naturally expect this to perform the best. Next we use the extended rank likelihood methods discussed above to estimate the ability parameters. While we do not expect this general 7
approach to perform as well as the data-generating model, we will be curious how close the performance is. Finally, we will use the data-generating model again but a misspecified version where we choose to model the lognormal outcomes as normally distributed. Comparison with the misspecified model will give us some idea of the robustness of the mixed outcome IRT model and the flexibility of the extended rank likelihood approach. We used 25,000 iterations for each Bayesian sampling scheme, discarding the first half of those as burn-in. Tuning parameters were selected for all MH steps so that acceptance rates generally fell within an interval of 25% to 50%. Trace plots did not appear to indicate any failure to converge. Table 2 presents the estimation metrics for θ by method. By every metric, the data generating mixed outcomes IRT model is not surprisingly superior. However, the extended rank likelihood method is quite competitive in all metrics, particularly the rank related metrics. The misspecified mixed outcomes model does not fare well in terms of RMSE and coverage. It does perform better in terms of the rank related metrics. The reason for this is that the simulated data for item 14 contains some very large values generated by the lognormal distribution skewing the outcomes heavily. When this is modeled with a normal distribution as in the misspecified model, the results suffer. The large simulated values for item 14 are perhaps not overly realistic but nonetheless provide some notion of the potential cost of misspecification. Table 2: Latent trait estimation metrics by estimation method. Metric Data Generating Extended Rank Misspecified RMSE 0.22 0.25 0.69 95% CI Coverage 0.95 0.94 0.72 Mean 95% CI Width 0.87 0.92 0.82 Spearman s ρ 0.98 0.97 0.96 Mean Abs. Rank Diff. 28.47 29.25 36.64 5 Discussion We have applied the extended rank likelihood method of Hoff to estimate latent abilities as in a item response theory model but without the need to specify distributions for the observed item responses. In one simulation at least, the results produced by this method were quite favorable. The next step is to apply the extended rank likelihood IRT model to data from the subcortical ischemic 8
vascular dementia (SIVD) study. A primary goal of the SIVD study was to investigate the contribution of subcortical cerebrovascular disease to declines in cognitive functioning. Ultimately, we would like to relate the estimated latent abilities of participants in the SIVD study to MRI measured volumes of different brain matter. This extension should be straightforward as we can build upon the Bayesian model above in hierarchical fashion by specifying, where x i is a vector of covariates for participant i. θ i N(x T i β, I) (17) β N(0, σ 2 βi) (18) A number of areas require some further examination. I need to better understand identifiability of this model. Also the above model assumes a unidimensional latent trait. We would like to be able to test and/or relax this assumption. Finally the SIVD study is longitudinal and we would like to develop the methods here for longitudinal data. References [1] D.B. Dunson. Dynamic Latent Trait Models for Multidimensional Longitudinal Data. Journal of the American Statistical Association, 98(463):555 564, 2003. [2] P.D. Hoff. Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Statist, 1(1):265 283, 2007. [3] I. Moustaki and M. Knott. Generalized latent trait models. Psychometrika, 65(3):391 411, 2000. [4] M.D. Sammel, L.M. Ryan, and J.M. Legler. Latent variable models for mixed discrete and continuous outcomes. Journal of the Royal Statistical Society. Series B (Methodological), pages 667 678, 1997. [5] J.G. Scott. Nonparametric Benchmarking with the Extended Rank Likelihood. 2009. 9