Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Size: px
Start display at page:

Download "Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring"

Transcription

1 Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring Jimmy de la Torre Rutgers, The State University of New Jersey Richard J. Patz R. J. Patz, Inc. This article proposes a practical method that capitalizes on the availability of information from multiple tests measuring correlated abilities given in a single test administration. By simultaneously estimating different abilities with the use of a hierarchical Bayesian framework, more precise estimates for each ability dimension are obtained. The efficiency of the proposed method is most pronounced when highly correlated abilities are estimated from multiple short tests. Employing Markov chain Monte Carlo techniques allows for straightforward estimation of model parameters. Keywords: ability estimation, Bayesian estimation, item response theory, Markov chain Monte Carlo, multidimensionality It is not unusual for several tests measuring different abilities (i.e., a battery ) to be given in one test administration. Although these tests may tap different latent abilities, the abilities are usually not independent of one another. For example, in achievement tests such as the National Assessment of Educational Progress (NAEP) or the California Achievement Tests, these abilities have high positive correlations, typically greater than 0.70 (CTB/ McGraw-Hill, 2002; Johnson & Carlson, 1994). However, a common practice in educational measurement is to estimate these abilities independently of each other. This article proposes a more efficient method of estimating these abilities that takes into account this correlational structure. The method uses a hierarchical Bayesian approach to simultaneous estimation of abilities that is based on a simple structure, multidimensional item response theory (IRT) model. The approach may be applied in a straightforward manner to improve the scoring of test batteries for certain purposes by using the simple structure reflected in the construction of the test battery and the IRT item parameter estimates employed in traditional unidimensional scoring of the component tests. This research was started during the first author s summer internship at CTB/McGraw-Hill in The authors thank Howard Wainer and two associate editors for their helpful comments and suggestions and CTB/McGraw-Hill for the data used in this study. 295

2 de la Torre and Patz One advantage of IRT is its flexibility in incorporating various auxiliary information into the model. Mislevy (1987) has shown that including educational variables (e.g., grade level) in the estimation process can improve the precision of parameter estimates. In this article we exploit for the purpose of ability estimation the auxiliary information available from responses by examinees to other tests. Exploiting hierarchical structure to improve the accuracy of parameter estimation is a well-known statistical technique (see, for example, Purcell & Kish, 1979). In the context of test scoring, Wainer et al. (2001) derived an empirical Bayes approach by using test reliabilities and intertest correlations to improve the accuracy of test scores. We show that our hierarchical modeling approach provides estimates very similar to those of Wainer et al. in the case of simple structure and that our approach may be extended more easily to accommodate complex structure and a richer set of auxiliary information. The primary focus of this article is to investigate whether and under what conditions the simultaneous estimation of abilities from different dimensions yields moreaccurate ability estimates. In particular, the article examines how the number of dimensions, the number of items in each dimension, and the degree of correlation between abilities affect the accuracy of the estimates. Because the resulting problem involves maximization in high dimensional space, traditional methods of estimation that rely on derivatives may not be carried out in a straightforward manner. Hence, Markov chain Monte Carlo (MCMC) simulation is used in estimating the abilities. Although not always feasible and with its own drawback, the dimensionality and complexity of the problem can be substantially reduced by estimating the correlation between abilities separately and fixing the correlation in the scoring process (see Segall, 1996). Model To extend the three-parameter logistic model (Lord, 1980) to the multidimensional context, Reckase (1996) used the following generalization: where 296 ( ) = + ( ) exp( + β ) j i j PX ij = 1 i, j, β j, γ j γ j 1 γ j 1 + exp j i + β j ( ), ( 1) P(X ij = 1 i, j, j, j ) is the probability of examinee i responding to item j correctly; X ij is the response of examinee i to item j (0 = incorrect, 1 = correct); i is the ability vector of the examinee; j is the vector of item parameters related to the discrimination power of the item; β j is the parameter related to the difficulty of the item; γ j is the pseudo-guessing parameter of the item; i = 1,..., I (the total number of examinees); and j = 1,..., J (the total number of items).

3 For this article, a simple structure is assumed (i.e., each item measures one dimension of ability and thus j contains only one nonzero element). Under this assumption, the model in Equation 1 can be reexpressed as: where (,,, ) PX ij ( d ) = 1θ i ( d ) α j ( d ) β j ( d ) γ j ( d ) ( ) = γ + 1 γ j( d) j( d) Multidimensional IRT in Test Scoring ( α j( d) θi( d) + βj( d) ) ( α j( d) θi( d) βj( d) ) exp 1 + exp + X ij(d) is the response of examinee i to the jth item of dimension d; θ i(d) is the dth component of the vector i (i.e., i = {θ i(d) }); α j(d), β j(d), and γ j(d) are the parameters of the jth item of dimension d; d = 1,..., D (the number of dimensions); j(d) = 1(d),..., J(d); and D d=1 J(d) = J., ( 2) A graphical representation of the hierarchical structure of the model is given in Figure 1. 1 Inv-Wishart v0 (Λ 0 ) µ Σ θ i(d) α j(d) β j(d) γ j(d) X ij(d) FIGURE 1. A directed acyclic graph of the model. 297

4 de la Torre and Patz The item response X ij(d) has a likelihood P ij(d) (θ i(d) ) given by P ( ( )) ( θ ) = P X = 1θ, α, β, γ ij( d) i( d) ij( d) i( d) j( d) j( d) j( d) Let X i = {X i1(1),..., X ij(1),..., X i1(d),..., X ij(d),..., X i1(d),..., X ij(d) } represent the response vector of examinee i. The corresponding likelihood of this vector is D Jd ( ) Pi( Xi i) = Pijd ( ) ( θid ( ) ). ( 4) d = 1 jd ( )= 1 Finally, the likelihood of the data matrix X is given by 1 Xij( d ) ( 1 PX ( ij( d ) = 1θi( d ), αj( d), βj( d), γj( d ))). ( 3) I D Jd ( ) P X P ij ( d ) θ i ( d ). ( 5) ( ) = ( ) i= 1 d= 1 jd ( )= 1 Xij( d ) Estimation Prior, Posterior, and Conditional Distributions The prior distribution of i is parametrized as ~ MVN 0, ( ) i ( ) 6 ( ) 1 ~ Inv-Wishart v. ( 7) 0 0 Of primary interest is the joint distribution of i and. Using the notations X = {X ij }, = { i }, = { j }, = { j }, and = { i }, this joint posterior can be expressed as ( ) ( ) ( ) ( ) P, X,,, P X,,,, P P. ( 8) The posterior distribution in Equation 8 cannot be evaluated in a straightforward manner (i.e., samples cannot be drawn directly from the joint posterior distribution). MCMC simulation is used to draw samples iteratively from the full conditional distributions X,,,, and X,,,, (Casella & George, 1992; Gamerman, 1997; Gelman, Carlin, Stern, & Rubin, 1995). For each examinee, the full conditional distribution is 12 P( i Xi,,,, ) exp 1 1 i i Pi( Xi i ). ( 9) 2 Although Equation 9 is not a known distribution, samples can be drawn from this distribution indirectly by using the Metropolis-Hastings algorithm (Chib & Greenberg, 1995; Gilks, Richardson, & Spiegelhalter, 1996; Tierney, 1994). The full conditional distribution of is 298

5 Multidimensional IRT in Test Scoring ( ) = ( ) ( ) ( ) P X,,,, P P P. ( 10) With the use of the prior distribution and the hyperdistributions given in Equations 6 and 7, the full conditional posterior distribution of is an Inv-Wishart νi ( I 1 ), where ν I =ν 0 + I and I = 0 + i i. This full conditional distribution is a known distribution and can be sampled directly. MCMC Algorithm The following prior parameters were used: ν 0 = D + 2, and the diagonal and offdiagonal elements of 0 were set to 1 and 0.5, respectively. Below is an outline of the MCMC algorithm used in the estimation of the parameters. At iteration 0: Let (1) (0) = I (0) (2) i MVN(0, I) At iteration t: (1) Draw (t) (t 1) from Inv-Wishart νi ({ I } 1 (t 1) ), where ν I =ν 0 + I, and I = 0 + (t 1) (t i 1) i. (2) For, draw the candidate value (*) i from MVN( (t 1) i, θ ), and accept (*) i with probability ( t) Pi i i P i t p( ( ) i i ) ( X * * ) ( ) 1, * = min Pi Xi ˆ ( t ) ( t t i P ) ( ) ( ) ( i ), ( ) For the present article, each chain is iterated 10,000 times. The first 2,000 iterations are discarded, and inference is based on the remaining 8,000 iterations. Two parameters are of interest: the ability vectors i and the correlation matrix. Both were estimated using the MCMC output. The ability estimates denoted as the multidimensional expected a posteriori (EAP-M) method was computed as follows: 10, ( t) = E ( X,,, ). ( 12) 8, 000 t= 2, 001 Although not the primary focus of this article, an estimate of underlying correlational structure between the abilities can be obtained. The covariance matrix was estimated as 10, ( t) = E ( X,,, ). ( 13) 8, 000 t= 2, 001 The estimated covariance was standardized to obtain the correlation estimate ~. The accuracy of the ability and correlation estimates was gauged by comparing them to the generating parameters. Specifically for the ability estimates, Pearson 299

6 de la Torre and Patz correlation and mean squared error (MSE) were computed to summarize the correspondence between the estimated and the generated abilities, in addition to the precision of the ability estimates as measured by the posterior variance. It should be noted that when the correlation between the abilities is zero or assumed to be zero (i.e., is set to zero in estimating ability), the resulting ability estimates are equivalent to the unidimensional expected a posteriori (EAP-U; Bock & Aitken, 1981). The additional precision from simultaneous estimation can be obtained by comparing the MSE of the ability estimates when > 0 to the MSE of the ability estimates when =0. In particular, the MSE of EAP-U over the MSE of EAP-M is a measure of the relative efficiency of the proposed method compared with the unidimensional approach. Factors Affecting Multidimensional Ability Estimation: A Simulation Study In this section we present results of a simulation study that examines the performance of the hierarchical model ability estimates under a variety of realistic configurations. The factors investigated in this article are: (a) the number of abilities, (b) the number of items, and (c) the degree of correlation between the abilities. The different number of abilities were 2 and 5; the different number of items were 10, 30, and 50; and the different degrees of correlation were 0.00, 0.40, 0.70, and The levels of each factor were crossed completely to yield 24 conditions. The item parameters used in simulating the examinee responses and scoring the examinees were obtained from a pool of 550 nationally standardized mathematics items. Ten items whose mean information function is closest to the mean information function of all the items were selected. Of the 1 billion randomly constructed 10-item tests, the selected test has the minimum mean absolute deviation 10 Ij ( θ) 10 Ij ( θ) 550 f( θ) dθ, ( 14) j = j = 1 where I j (θ) = D 2 a j 2 [1 P j (θ)][p j (θ) c j ] 2 /(1 c j ) 2 P j (θ) is the item information function (Hambleton & Swaminathan, 1985; Samejima, 1977), and f(θ) is the standard normal distribution. Multiples of the 10-item test were used according to the requisite of the different conditions. For each combination, 1,000 examinees were drawn from MVN(0, D ), where Σ D = 1 1, 1 and their responses simulated. The constraint on retained the structure of the design but did not in any way affect the estimation process. 300

7 TABLE 1 Estimate of Correlation Between Abilities Multidimensional IRT in Test Scoring Number of Abilities J D = D = Estimates of Correlation Table 1 gives the MCMC estimates of the correlations. Results demonstrate that the correlations were accurately estimated by using MCMC. In general, additional precision can be expected for the estimates as more items and abilities are considered. However, because the large number of examinees allowed for the accurate estimation of the correlations even when only two abilities and 10 items were considered, the additional information afforded by adding more abilities and items became negligible in the process. Ability Estimates Correlation With True Ability and Posterior Variance Table 2 lists the correlations between the true ability and the estimated ability. It can be observed that the correlation between the true ability and the estimated ability increases as the correlation between the underlying abilities, the number of items, and the number of abilities increases. As to be expected, when the underlying abilities are not correlated, correlation between the true and estimated abilities increases only with the number of items and not with the number of dimensions. With at least 30 items, the abilities can be well estimated even when the underlying abilities are not correlated and, hence, increasing the underlying correlation has marginal impact. TABLE 2 Correlation Between True and Estimated Abilities Number of Abilities J D = D =

8 de la Torre and Patz TABLE 3 Posterior Variance of the Ability Estimates Number of Abilities J D = D = The largest improvement was observed when five abilities were simultaneously estimated and the correlation between the abilities is 0.90 (i.e., correlation increases from 0.83 to 0.92). Table 3 shows that the number of abilities, number of items, and the degree of correlation affect the posterior variance of the estimates in the same way that these factors affect the correlations between the true and estimated abilities. Specifically, an increase in the number of abilities, the degree of correlation between the abilities, and the number of items resulted in more precise estimates. Relative Efficiency To quantify the amount of improvement attributable to simultaneous estimation, relative efficiency was computed, and the results are presented in Table 4. Relative efficiency was defined here as the MSE of the EAP-U estimates (i.e., =0) over the MSE of the EAP-M estimates. Thus, a ratio greater than 1.00 can be interpreted as the EAP-M having higher efficiency compared to the EAP-U. In addition, the ratio also indicates the factor by which the test length needs to be increased for the EAP-U estimates to have the same precision as the EAP-M estimates obtained with the original test length. When only two abilities are concurrently considered, the efficiency of the EAP-M method was not evident unless the abilities are very highly correlated (i.e., =0.90). TABLE 4 Mean Squared Error and Relative Efficiency Number of Abilities J ( ) 0.34 (1.00) 0.29 (1.16) 0.23 (1.44) D = ( ) 0.13 (1.10) 0.13 (1.15) 0.11 (1.40) ( ) 0.09 (1.03) 0.09 (1.08) 0.07 (1.24) ( ) 0.29 (1.08) 0.23 (1.36) 0.16 (1.95) D = ( ) 0.13 (1.08) 0.12 (1.24) 0.08 (1.69) ( ) 0.09 (1.02) 0.08 (1.16) 0.06 (1.52) 302

9 Efficiency at this level ranged from 1.24 to Depending on the length of the test, this was equivalent to adding 4 to 12 items to the test. When more dimensions were simultaneously used, the efficiency of the EAP-M method was evident for abilities that were reasonably highly correlated (i.e., 0.70). Efficiency ranged from 1.16 to For 10-item tests where five abilities with =0.90 were simultaneously estimated, the precision of the EAP-M estimates was equivalent to the precision of the EAP-U estimates obtained from tests twice as long. For other conditions, depending on the original test length the additional precision was equivalent to adding 4 to 26 items to the test. The increase in precision from increasing the number of abilities was less evident when long tests were used. This is consistent with the results discussed earlier, which indicate that improvement is marginal when abilities are already well estimated. Although efficiency may not be as high for long tests, the corresponding number of additional items turned out to be larger. It can be noted that the simulation study shows that the average posterior variances given in Table 3 are very close to the MSEs of the estimates in Table 4. In real data analysis where MSE cannot be computed, approximate relative efficiency can be obtained by comparing the average posterior variances of the estimates. The effects of the underlying correlations between abilities on the estimates for five dimensions and 10 items are shown in Figures 2 through 4. Figure 2 shows = 0.00 = 0.40 = 0.70 = 0.90 FIGURE 2. Five dimensions and 10 items: scatter plots of true and estimated abilities. 303

10 FIGURE 3. Five dimensions and 10 items: smoothed deviations between true and estimated abilities. θ FIGURE 4. Five dimensions and 10 items: smoothed posterior variances of the ability estimates. 304 θ

11 that higher underlying correlation resulted in more-compact scatter plots along the identity line. Figure 3 shows that the estimation bias for extreme abilities can be substantially reduced by simultaneous estimation when the underlying correlation is high. Finally, Figure 4 shows the dramatic decrease in posterior variance for all values of θ as the abilities become more correlated. Augmenting Ability Estimates Wainer et al. (2001) presented different methods of improving ability estimation by computing the empirical Bayes estimates of abilities on the basis of responses to multiple tests. The methods they described can be used for both number correct and IRT scores and utilize the multivariate analog of test reliability in regressing the examinee scores. Because their procedure for augmenting IRT scores involved the modal a posteriori (MAP; pp ) estimates, the formulas they presented are slightly modified to allow comparison between their method and the method proposed in this article, which gives EAP estimates. For test d, the unregressed IRT score for the test is given by θ * d θd =, ( 15) where θ d is the EAP-U ability estimate and d is the test reliability. The reliability of test d is computed as Var ( θd ) d =, ( 16) Var θ PVar θ ( d) + ( d) where P Var ( θ d ) is the average posterior variance of ability estimates in test d. Let S u be the covariance matrix of unregressed ability estimates θ*, and define S c = S u D to be the covariance matrix corrected for reliability, where D is a diagonal matrix whose dth nonzero entry is (1 d )s u dd. Also define θ * to be the mean vector of unregressed ability estimates. Then the empirical Bayes ability estimate for examinee i is given by θ () 1 θ * c u 1 θ * θ * i = + ( ) S S i. ( 17 ) To compare the two methods of augmenting ability estimates, unidimensional ability estimates were computed for all the conditions in the previous section. These unidimensional estimates were transformed to the empirical Bayes estimates, θ (1), using 17 and compared with the ability estimates obtained using simultaneous estimation, denoted by θ (0). Comparison was based on the correlation between the two estimates and correlation and MSE between the true and estimated abilities. Listed in Table 5 are the summary statistics for the measures across the 24 conditions. These statistics indicate that the two methods provide almost identical estimates. In the d 305

12 de la Torre and Patz TABLE 5 Comparison of Methods of Computing Augmented Ability Estimates Absolute Difference Statistic Cor( ~ θ (0), ~ θ (1) ) Cor(θ, ~ θ) MSE Min Q Q Q Max worst case scenario, the correlation between the two estimates is still almost perfect (0.9953), and the differences between the correlation and MSE are not evident until the third decimal place. Although the results appear almost identical, the following statistics suggest that ~ θ (0) may be slightly better than ~ θ (1). The mean correlation between θ and ~ θ (0) and θ and ~ θ (1) are and , respectively, whereas the average MSEs across the 24 conditions are and for the simultaneous estimation and Wainer et al. s method, respectively. The difference between the two estimates can be observed in an example displayed in Figure 5. In this example, five 10-item tests with = 0.00 = 0.40 = 0.70 = 0.90 Empirical Bayes Empirical Bayes Empirical Bayes Empirical Bayes FIGURE 5. Five dimensions and 10 items: scatter plots of estimated abilities. 306

13 Multidimensional IRT in Test Scoring reliability 0.68 are scored. Compared with ~ θ (0) estimates, a greater magnitude of shrinkage can be observed in ~ θ (1) estimates for extreme values of θ. The degree of tail shrinkage is seen to depend on the magnitude of the correlation between the abilities. In this case, we see that using the test reliabilities to improve ability estimates causes greater regression to the mean compared to using the covariance matrix as a prior distribution. It should be noted, however, that this difference in shrinkage occurs in areas where only a small proportion of examinees can be found and that, for most practical purposes, Wainer et al. s method, which is easier to implement, can be used without negative implications. Analysis of a Grade 9 Test Battery The responses of 2,255 Grade 9 examinees on four content areas Math (MA; 25 items), Math-Computation (MC; 20 items), Spelling (SP; 20 items), and Social Studies (SS; 25 items) were analyzed. The abilities of each examinee on the four content areas were estimated using the EAP-M and EAP-U. The prior distributions discussed in the simulation section were used. Each chain was of length 25,000 iterations with the first 5,000 iterations as burn-in. Correlation Estimates The estimates of the correlation between the four abilities are given in Table 6. The highest correlation was between Math and Math-Computation (0.89), whereas the lowest correlation was between Math and Spelling (0.66). The average correlation between the four abilities was These results indicated that the data analyzed were close to the simulated condition where =0.70. Ability Estimates The posterior variances of the EAP-U and EAP-M estimates and the approximate relative efficiency of EAP-M method for each content area are given in Table 7. The highest relative efficiency was obtained in estimating the Math ability. The EAP-M estimates were on the average 28% more precise compared to the EAP-U estimates. This translates to an additional 7 items for the EAP-U method to arrive at the same precision. The lowest efficiency was in estimating the Social Studies ability, Nonetheless, the higher precision using the EAP-M method was equivalent to an additional 5 items. The mean efficiency of 1.22 for four abilities that have an average correlation of 0.75 was reasonable and consistent with the simulation results when one takes into account the additional noise involved in analyzing real data. TABLE 6 Correlation Estimates for the Grade 9 Test Battery Content Area MC SP SS MA MC SP

14 de la Torre and Patz TABLE 7 Posterior Variance and Approximate Relative Efficiency of Ability Estimates for the Grade 9 Test Battery Content Area Method MA MC SP SS EAP-U EAP-M Relative efficiency Discussion The multidimensional approach to simultaneous ability estimation can be viewed as a more general framework for obtaining expected a posteriori estimates of ability. The method gives the same results as the unidimensional approach when abilities are uncorrelated. However, when abilities are correlated, taking the correlation into account can lead to noticeable improvements in ability estimates, especially when there are multiple short tests and the underlying correlation is high. Among several methods of ability estimation, EAP-U has been preferred for its small bias and standard error (Kim & Nicewander, 1993; Thissen & Orlando, 2001). But, as the results of this article have shown, employing simultaneous estimation can further reduce the bias and standard error of the estimates. In addition to improvements to ability estimates, the hierarchical formulation used in this paper provides a framework that allows for the direct estimation of the correlation between the abilities. This obviates the need for a two-step approach (i.e., estimation of abilities in the first step and estimation of the correlation matrix using the ability estimates in the second step), which leads to biased estimates (Little & Rubin, 1983; Mislevy, 1984; Segall, 1996). The multidimensional approach should be beneficial in many testing situations. The administration of multiple tests during one sitting is not uncommon, and as Johnson and Carlson (1994) reported, the different abilities measured by these tests are usually highly correlated. Although some of the improvements from using this approach are relatively modest, it can be achieved without much additional cost (i.e., only the estimation process was changed in scoring the same data sets). In a practical sense, use of this method means that, given a fixed number of items, the score can be made more reliable, or given a desired level of reliability, the number of items can be reduced without loss of accuracy. The valid use of scores obtained using auxiliary information of any type must be considered carefully. NAEP, for example, uses auxiliary information regarding student and school characteristics to obtain more accurate scores for subpopulations of students. The student-level scores (i.e., plausible values ) computed in the analysis may not be used to characterize individual student proficiency because they depend on the auxiliary demographic variables, which should not inform characterizations of proficiency. In the multidimensional approach proposed in this article, all 308

15 Multidimensional IRT in Test Scoring auxiliary information is directly obtained from test performance, but validity concerns remain. Although the multidimensional scores have favorable statistical properties, they also have a more complex interpretation. These scores may not be desirable when straightforward interpretation of test scores as a summary of test responses within a domain is favored. In any type of competition between students within a domain (e.g., identifying the top 10% of scores on a science achievement), it would not be appropriate to allow scores on other domains (e.g., mathematics) to affect the scores and rankings. When the consequences for examinees do not depend on comparisons with other examinees, then the greater accuracy of the multidimensional scores may be useful. For example, if more accurate score profiles lead to more efficient diagnosis or more precise targeting of instructional resources, then their use could be supported. Finally, we observe that multidimensional scores may serve to complement rather than to replace traditional scoring of test batteries. Traditional unidimensional scores may be reported at the domain level (e.g., scale scores and their associated normreferenced and/or criterion-referenced derived scores), and multidimensional scores could be used to inform finer-grained reporting such as skills profiles and objectivelevel scores. Traditional approaches to this type of fine-grain reporting suffer from insufficient reliability because of the small numbers of items associated with each fine-grain reporting category. Hence, the continued reliance on a single, more global composite score (Wainer et al., 2001). Our analysis suggests that a multidimensional approach to scoring could be a promising application given the nature of the test batteries (i.e., multiple short sections that are highly correlated). It should be noted that the assumption of simple structure does not limit the usefulness of the proposed method. On the contrary, the assumption makes the application of the method more straightforward in that it can be applied to existing tests that have already been calibrated without any changes in the item response models. The simultaneous estimation procedure discussed in this article is akin to the method employed by Wainer et al. (2001) in that both augment scores on one test by using information from other tests, and, as our analyses show, under a variety of testing conditions the two approaches yield very similar results. However, the two methods differ in some important respects. One important distinction is that although both methods can be classified as Bayesian procedures, the method discussed by Wainer et al. uses an empirical Bayesian approach whereas the current method uses a fully Bayesian approach. Another important distinction is that the method described by Wainer et al. involves elements of classical test theory whereas the method described here is solely based on IRT and its extensions. For example, the former method uses the multivariate analog of reliability in regressing the examinee scores. Future research might take a variety of directions. First, the flexibility of IRT formulation should allow for other information to be readily incorporated in the model. Aside from responses to other tests, one can also consider other information such as academic or demographic variables routinely collected in most testing situations. Second, the present article uses item parameters with known values. The approach can be broadened to include item parameter estimation such as was done by Patz and 309

16 de la Torre and Patz Junker (1999a, 1999b) in the unidimensional IRT case. Finally, the proposed method can be tried with other item response models such as the generalized graded unfolding model of Roberts, Donoghue, and Laughlin (2000) and other testing contexts. References Bock, R. D., & Aitken, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, Casella, G., & George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46, Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49, CTB/McGraw-Hill. (2002). Technical Bulletin 1 of California Achievement Tests Forms C and D. Monterey, CA: Author. Gamerman, D. (1997). Markov chain Monte Carlo: Stochastic simulation for Bayesian inference. London: Chapman & Hall. Gelman, A., Carlin, J. B., Stern, H., & Rubin, D. B. (1995). Bayesian data analysis. London: Chapman & Hall. Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Introducing Markov chain Monte Carlo. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 1 17). London: Chapman & Hall. Kim, J. K., & Nicewander, W. A. (1993). Ability estimation for conventional tests. Psychometrika, 58, Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff. Johnson, E. G., & Carlson, J. (1994). The NAEP 1992 Technical Report (Report No. 23-TR- 20). Washington, DC: National Center for Education Statistics. Little, R. J. A., & Rubin, D. B. (1983). On jointly estimating parameters and missing data by maximizing the complete-data likelihood. The American Statistician, 37, Lord, F. M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, Mislevy, R. J. (1987). Exploiting auxiliary information about examinees in the estimation of item parameters. Applied Psychological Measurement, 11, Patz, R. J., and Junker, B. W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response theory. Journal of Educational and Behavioral Statistics, 24, Patz, R. J., and Junker, B. W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24, Purcell, N. J., and Kish, L. (1979). Estimation for small domains. Biometrics 35, Reckase, M. D. (1996). A linear logistic multidimensional model. In W. J. van der Linder & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp ). New York: Springer-Verlag. Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A general model for unfolding unidimensional polytomous responses using item response theory. Applied Psychological Measurement, 24, Samejima, F. (1977). The use of information function in tailored testing. Applied Psychological Measurement, 1,

17 Segall, D. O. (1996). Multidimensional Adaptive Testing. Psychometrika, 61, Thissen, D., & Orlando, M. (2001). Item Response Theory for Items Scored in Two Categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp ). Mahwah, NJ: Erlbaum. Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Annals of Statistics, 22, Wainer, H., Vevea, J. L., Camacho, F., Reeve III, B. B., Rosa, K., Nelson, L., Swygert, K. A., & Thissen, D. (2001). Augmented scores Borrowing strength to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp ). Mahwah, NJ: Erlbaum. Authors Multidimensional IRT in Test Scoring JIMMY DE LA TORRE is Assistant Professor, Department of Educational Psychology at Rutgers Graduate School of Education, 10 Seminary Place, New Brunswick, NJ 08901; His research interests include psychometrics, item response models, and cognitive diagnosis. RICHARD J. PATZ is President, R. J. Patz, Inc., 1414 Soquel Avenue, Suite 212, Santa Cruz, CA 95062; His research and consulting interests include large-scale assessment design and implementation, and statistical models for item response data. Manuscript received September 9, 2002 Revision received February 9, 2004 Accepted June 15,

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Multidimensional Linking for Tests with Mixed Item Types

Multidimensional Linking for Tests with Mixed Item Types Journal of Educational Measurement Summer 2009, Vol. 46, No. 2, pp. 177 197 Multidimensional Linking for Tests with Mixed Item Types Lihua Yao 1 Defense Manpower Data Center Keith Boughton CTB/McGraw-Hill

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

2 Bayesian Hierarchical Response Modeling

2 Bayesian Hierarchical Response Modeling 2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond

An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond Psicológica (2005), 26, 327-352 An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond Gunter Maris & Timo M. Bechger Cito (The Netherlands) The DA-T Gibbs sampler

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES

BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES Behaviormetrika Vol.36, No., 2009, 27 48 BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES Yanyan Sheng and Christopher K. Wikle IRT-based models with a general ability and several specific

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software April 2008, Volume 25, Issue 8. http://www.jstatsoft.org/ Markov Chain Monte Carlo Estimation of Normal Ogive IRT Models in MATLAB Yanyan Sheng Southern Illinois University-Carbondale

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

IRT Model Selection Methods for Polytomous Items

IRT Model Selection Methods for Polytomous Items IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory

More information

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach CHI HANG AU & ALLISON AMES, PH.D. 1 Acknowledgement Allison Ames, PhD Jeanne Horst, PhD 2 Overview Features of the

More information

Chapter 11. Scaling the PIRLS 2006 Reading Assessment Data Overview

Chapter 11. Scaling the PIRLS 2006 Reading Assessment Data Overview Chapter 11 Scaling the PIRLS 2006 Reading Assessment Data Pierre Foy, Joseph Galia, and Isaac Li 11.1 Overview PIRLS 2006 had ambitious goals for broad coverage of the reading purposes and processes as

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory

Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory University of South Carolina Scholar Commons Theses and Dissertations 2016 Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory Han Kil Lee University of South Carolina Follow this

More information

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore ABSTRACT Title of Document: A MIXTURE RASCH MODEL WITH A COVARIATE: A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION Yunyun Dai, Doctor of Philosophy, 2009 Directed By: Professor, Robert

More information

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS by CIGDEM ALAGOZ (Under the Direction of Seock-Ho Kim) ABSTRACT This study applies item response theory methods to the tests combining multiple-choice

More information

Development and Calibration of an Item Response Model. that Incorporates Response Time

Development and Calibration of an Item Response Model. that Incorporates Response Time Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,

More information

Hierarchical Linear Models. Jeff Gill. University of Florida

Hierarchical Linear Models. Jeff Gill. University of Florida Hierarchical Linear Models Jeff Gill University of Florida I. ESSENTIAL DESCRIPTION OF HIERARCHICAL LINEAR MODELS II. SPECIAL CASES OF THE HLM III. THE GENERAL STRUCTURE OF THE HLM IV. ESTIMATION OF THE

More information

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS by Mary A. Hansen B.S., Mathematics and Computer Science, California University of PA,

More information

ABSTRACT. Roy Levy, Doctor of Philosophy, conditional independence assumptions of the hypothesized model do not hold. The

ABSTRACT. Roy Levy, Doctor of Philosophy, conditional independence assumptions of the hypothesized model do not hold. The ABSTRACT Title of dissertation: POSTERIOR PREDICTIVE MODEL CHECKING FOR MULTIDIMENSIONALITY IN ITEM RESPONSE THEORY AND BAYESIAN NETWORKS Roy Levy, Doctor of Philosophy, 006 Dissertation directed by: Professor

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-

More information

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Yu-Feng Chang IN PARTIAL FULFILLMENT

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

MONTE CARLO METHODS. Hedibert Freitas Lopes

MONTE CARLO METHODS. Hedibert Freitas Lopes MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities

Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities by Yuan H. Li Prince Georges County Public Schools, Maryland William D. Schafer University of Maryland at

More information

Paradoxical Results in Multidimensional Item Response Theory

Paradoxical Results in Multidimensional Item Response Theory UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model

More information

The Difficulty of Test Items That Measure More Than One Ability

The Difficulty of Test Items That Measure More Than One Ability The Difficulty of Test Items That Measure More Than One Ability Mark D. Reckase The American College Testing Program Many test items require more than one ability to obtain a correct response. This article

More information

Online Item Calibration for Q-matrix in CD-CAT

Online Item Calibration for Q-matrix in CD-CAT Online Item Calibration for Q-matrix in CD-CAT Yunxiao Chen, Jingchen Liu, and Zhiliang Ying November 8, 2013 Abstract Item replenishment is important to maintaining a large scale item bank. In this paper

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Stochastic Approximation Methods for Latent Regression Item Response Models

Stochastic Approximation Methods for Latent Regression Item Response Models Research Report Stochastic Approximation Methods for Latent Regression Item Response Models Matthias von Davier Sandip Sinharay March 2009 ETS RR-09-09 Listening. Learning. Leading. Stochastic Approximation

More information

Item Response Theory (IRT) Analysis of Item Sets

Item Response Theory (IRT) Analysis of Item Sets University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis

More information

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales

Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales Kentaro Yamamoto Edward Kulick 14 Scaling Methodology and Procedures for the TIMSS Mathematics and Science Scales Kentaro

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software November 2008, Volume 28, Issue 10. http://www.jstatsoft.org/ A MATLAB Package for Markov Chain Monte Carlo with a Multi-Unidimensional IRT Model Yanyan Sheng Southern

More information

Use of e-rater in Scoring of the TOEFL ibt Writing Test

Use of e-rater in Scoring of the TOEFL ibt Writing Test Research Report ETS RR 11-25 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman June 2011 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman ETS, Princeton,

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

User's Guide for SCORIGHT (Version 3.0): A Computer Program for Scoring Tests Built of Testlets Including a Module for Covariate Analysis

User's Guide for SCORIGHT (Version 3.0): A Computer Program for Scoring Tests Built of Testlets Including a Module for Covariate Analysis Research Report User's Guide for SCORIGHT (Version 3.0): A Computer Program for Scoring Tests Built of Testlets Including a Module for Covariate Analysis Xiaohui Wang Eric T. Bradlow Howard Wainer Research

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

examples of how different aspects of test information can be displayed graphically to form a profile of a test

examples of how different aspects of test information can be displayed graphically to form a profile of a test Creating a Test Information Profile for a Two-Dimensional Latent Space Terry A. Ackerman University of Illinois In some cognitive testing situations it is believed, despite reporting only a single score,

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Computerized Adaptive Testing With Equated Number-Correct Scoring

Computerized Adaptive Testing With Equated Number-Correct Scoring Computerized Adaptive Testing With Equated Number-Correct Scoring Wim J. van der Linden University of Twente A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Using Bayesian Priors for More Flexible Latent Class Analysis

Using Bayesian Priors for More Flexible Latent Class Analysis Using Bayesian Priors for More Flexible Latent Class Analysis Tihomir Asparouhov Bengt Muthén Abstract Latent class analysis is based on the assumption that within each class the observed class indicator

More information

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence A C T Research Report Series 87-14 The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence Terry Ackerman September 1987 For additional copies write: ACT Research

More information

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis Xueli Xu Department of Statistics,University of Illinois Hua-Hua Chang Department of Educational Psychology,University of Texas Jeff

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL COMMUN. STATIST. THEORY METH., 30(5), 855 874 (2001) POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL Hisashi Tanizaki and Xingyuan Zhang Faculty of Economics, Kobe University, Kobe 657-8501,

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model Rand R. Wilcox University of Southern California Based on recently published papers, it might be tempting

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006 LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master

More information

HANNEKE GEERLINGS AND CEES A.W. GLAS WIM J. VAN DER LINDEN MODELING RULE-BASED ITEM GENERATION. 1. Introduction

HANNEKE GEERLINGS AND CEES A.W. GLAS WIM J. VAN DER LINDEN MODELING RULE-BASED ITEM GENERATION. 1. Introduction PSYCHOMETRIKA VOL. 76, NO. 2, 337 359 APRIL 2011 DOI: 10.1007/S11336-011-9204-X MODELING RULE-BASED ITEM GENERATION HANNEKE GEERLINGS AND CEES A.W. GLAS UNIVERSITY OF TWENTE WIM J. VAN DER LINDEN CTB/MCGRAW-HILL

More information

A Use of the Information Function in Tailored Testing

A Use of the Information Function in Tailored Testing A Use of the Information Function in Tailored Testing Fumiko Samejima University of Tennessee for indi- Several important and useful implications in latent trait theory, with direct implications vidualized

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 41 A Comparative Study of Item Response Theory Item Calibration Methods for the Two Parameter Logistic Model Kyung

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Bayesian Networks in Educational Assessment Tutorial

Bayesian Networks in Educational Assessment Tutorial Bayesian Networks in Educational Assessment Tutorial Session V: Refining Bayes Nets with Data Russell Almond, Bob Mislevy, David Williamson and Duanli Yan Unpublished work 2002-2014 ETS 1 Agenda SESSION

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Covariates of the Rating Process in Hierarchical Models for Multiple Ratings of Test Items

Covariates of the Rating Process in Hierarchical Models for Multiple Ratings of Test Items Journal of Educational and Behavioral Statistics March XXXX, Vol. XX, No. issue;, pp. 1 28 DOI: 10.3102/1076998606298033 Ó AERA and ASA. http://jebs.aera.net Covariates of the Rating Process in Hierarchical

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics

Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics Markov Chain Monte Carlo A Contribution to the Encyclopedia of Environmetrics Galin L. Jones and James P. Hobert Department of Statistics University of Florida May 2000 1 Introduction Realistic statistical

More information

Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment. Jingyu Liu

Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment. Jingyu Liu Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment by Jingyu Liu BS, Beijing Institute of Technology, 1994 MS, University of Texas at San

More information

BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS. Sherwin G. Toribio. A Dissertation

BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS. Sherwin G. Toribio. A Dissertation BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS Sherwin G. Toribio A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center, the Netherlands d.rizopoulos@erasmusmc.nl

More information

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Monte Carlo Integration using Importance Sampling and Gibbs Sampling Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Equivalency of the DINA Model and a Constrained General Diagnostic Model

Equivalency of the DINA Model and a Constrained General Diagnostic Model Research Report ETS RR 11-37 Equivalency of the DINA Model and a Constrained General Diagnostic Model Matthias von Davier September 2011 Equivalency of the DINA Model and a Constrained General Diagnostic

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

Zita Oravecz, Francis Tuerlinckx, & Joachim Vandekerckhove Department of Psychology. Department of Psychology University of Leuven, Belgium

Zita Oravecz, Francis Tuerlinckx, & Joachim Vandekerckhove Department of Psychology. Department of Psychology University of Leuven, Belgium Bayesian statistical inference for the hierarchical Ornstein-Uhlenbeck model: An online supplement to A hierarchical latent stochastic differential equation model for affective dynamics Zita Oravecz, Francis

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information