BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS. Sherwin G. Toribio. A Dissertation
|
|
- Hugo Crawford
- 5 years ago
- Views:
Transcription
1 BAYESIAN MODEL CHECKING STRATEGIES FOR DICHOTOMOUS ITEM RESPONSE THEORY MODELS Sherwin G. Toribio A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 26 Committee: James H. Albert, Advisor William H. Redmond Graduate Faculty Representative John T. Chen Craig L. Zirbel
2 ii ABSTRACT James H Albert, Advisor Item Response Theory (IRT) models are commonly used in educational and psychological testing. These models are mainly used to assess the latent abilities of examinees and the effectiveness of the test items in measuring this underlying trait. However, model checking in Item Response Theory is still an underdeveloped area. In this dissertation, various model checking strategies from a Bayesian perspective for different Item Response models are presented. In particular, three methods are employed to assess the goodness-of-fit of different IRT models. First, Bayesian residuals and different residual plots are introduced to serve as graphical procedures to check for model fit and to detect outlying items and examinees. Second, the idea of predictive distributions is used to construct reference distributions for different test quantities and discrepancy measures, including the standard deviation of point bi-serial correlations, Bock s Pearson-type chi-square index, Yen s Q 1 index, Hosmer- Lemeshow Statistic, Mckinley and Mill s G 2 index, Orlando and Thissen s S G 2 and S X 2 indices, Wright and Stone s W -statistic, and the Log-likelihood statistic. The prior, posterior, and partial posterior predictive distributions are discussed and employed. Finally, Bayes factor are used to compare different IRT models in model selection and detection of outlying discrimination parameters. In this topic, different numerical procedures to estimate the Bayes factors for these models are discussed. All of these proposed methods are illustrated using simulated data and Mathematics placement exam data from BGSU.
3 iii ACKNOWLEDGMENTS First of all, I would like to thank Dr. Jim Albert, my advisor, for his constant support and many suggestions throughout this research. I also wish to thank him for the friendship and all the advice that he shared about life in general. I also want to extend my gratitude to the other members of my committee, Dr. John Chen, Dr. Craig Zirbel, and Dr. William Redmond, for their time and advice. I am grateful to the department of Mathematics and Statistics for all the support and for providing a wonderful research environment. I especially wish to thank Marcia Seubert, Cyndi Patterson, and Mary Busdeker for all their help. The dissertation fellowship for the period was crucial to the completion of this work. I wish to thank my colleagues and friends from BG, Joel, Vhie, Merly, Florence, Dhanuja, Kevin, Mike, Khairul and Shapla, and all the other Pinoys for all the fun and interesting discussions. Finally, I thank my beloved wife, Alie, for all her support, love, and patience, and Simone for bringing all the joy and happiness in our lives during our stay in Bowling Green. Without them this work could never have come to existence. Sherwin G. Toribio Bowling Green, Ohio August, 26
4 iv TABLE OF CONTENTS CHAPTER 1: ITEM RESPONSE THEORY MODELS Introduction Item Response Curve Common IRT Models One-Parameter Model Two-Parameter Model Three-Parameter Model Exchangeable IRT Model Parameter Estimation Likelihood Function Joint Maximum Likelihood Estimation Bayesian Estimation Albert s Gibbs Sampler An Example - BGSU Mathematics Placement Exam Advantages of the Bayesian Approach CHAPTER 2: MODEL CHECKING METHODS FOR BINARY AND IRT MODELS Introduction Residuals
5 v Classical Residuals Bayesian Residuals Chi-squared tests for Goodness-of-fit of IRT Models Wright and Pachapakesan Index (WP) Bock s Index (B) Yen s Index (Q 1 ) Hosmer and Lemeshow Index (HL) Mckingley and Mills Index (G 2 ) Orlando and Thissen Indices (S χ 2 and S G 2 ) Discrepancy Measures and Test quantities Predictive Distributions Prior Predictive Distribution Posterior Predictive Distribution Conditional Predictive Distribution Partial Posterior Predictive Distribution Bayes Factor CHAPTER 3: OUTLIER DETECTION IN IRT MODELS USING BAYESIAN RESIDUALS Introduction Detecting Misfitted Items Using IRC Interval Band Detecting Guessers Examinee Bayesian Residual Plots
6 vi Examinee Bayesian Latent Residual Plots Detecting Misfitted Examinees Application To Real Data Set CHAPTER 4: ASSESSING THE GOODNESS-OF-FIT OF IRT MODELS USING PREDICTIVE DISTRIBUTIONS Introduction Checking the Appropriateness of the One-parameter Probit IRT Model Point Biserial Correlation Using Prior Predictive Using Posterior Predictive Item Fit Analysis Using Prior Predictive Using Posterior Predictive Using Partial Posterior Predictive Examinee Fit Analysis Discrepancy Measures for Person Fit Detecting Guessers using Posterior Predictive Application To Real Data Set CHAPTER 5: BAYESIAN METHODS FOR IRT MODEL SELECTION Introduction Checking the Beta-Binomial Model using Bayes Factors Beta Binomial Model
7 vii Bayes Factor Laplace Method for Integration Estimating the Bayes Factor Application to Real Data Approximating the Denominator of the Bayes Factor Using Importance Sampling Exchangeable IRT Model Approximating the One-parameter model Approximating the Two-parameter model IRT Model Comparisons and Model Selection Computing the Bayes Factor for IRT models IRT Model Comparison Finding Outlying Discrimination Parameters Using Bayes Factor Using Mixture Prior Density Application To Real Data Set CHAPTER 6: SUMMARY AND CONCLUSIONS 142 Appendix A: NUMERICAL METHODS 145 A.1 Newton Raphson for IRT Models A.2 Markov Chain Monte Carlo (MCMC) A.2.1 Metropolis-Hasting A.2.2 Gibbs Sampling
8 A.2.3 Importance Sampling viii Appendix B: MATLAB PROGRAMS 151 B.1 Chapter 1 codes B.2 Chapter 3 codes B.3 Chapter 4 codes B.4 Chapter 5 codes REFERENCES 174
9 ix LIST OF FIGURES 1.1 A typical item response curve Item response curves for 3 different difficulty values Item response curves for 3 different discrimination values Items with high discrimination power have higher chances of distinguishing two examinees with different ability scores than items with low discrimination power Scatterplots of 35 actual item parameter versus their corresponding estimates Scatterplots of 1 actual ability scores versus their corresponding estimates Scatterplots of 35 actual item parameter versus their corresponding Bayesian estimates Scatterplot of 1 actual ability scores versus their corresponding Bayesian estimates Summary plot of the JML estimates of the parameters of the 35 items in BGSU Math placement exam Scatterplot of the JML estimates of the ability scores versus their corresponding exam raw score Summary plot of the Bayesian estimates of the parameters of the 35 items in BGSU Math placement exam Scatterplot of the Bayesian estimates of the ability scores versus their corresponding exam raw score
10 x 1.13 Scatterplots that compare the Bayesian estimates with the JMLE estimates of the item parameters A scatterplot that depicts a strong correlation between the Bayesian and JMLE estimates of the ability scores Classical Residual Plot A 9% interval band for the fitted item response curves of items 15 and 3 using the Two-parameter IRT model A 9% interval band for the item response curves of items 1 (above) and 26 (below) fitted with the (left) One-parameter IRT model and (right) Twoparameter IRT model Posterior residual plots of items 1 (above) and 26 (below) fitted with the (left) One-parameter IRT model and (right) Two-parameter IRT model Examinee residual plot of someone with ability score θ = Examinee residual plots of examinees with ability scores of θ = 1.15 (left) and θ = 2.19 (right) Examinee residual plots of examinees with ability scores of θ = 1.22 (left) and θ = 2.2 (right) Examinee residual plots of two guessers Examinee latent residual plot of an examinee with ability score θ = Examinee latent residual plots of examinees with ability scores of θ = 1.15(left) and θ = 2.19(right)
11 xi 3.1 Examinee latent residual plots of examinees with ability scores of θ = 1.22(left) and θ = 2.2(right) Examinee latent residual plots of two guessers Histograms of the number of examinees (out of 1) who scored (left) much too high and (right) much too low Examinee residual and latent residual plots of examinee no Residual and latent residual plots of examinee no. 82 (above) and 854 (below) Examinee residual and latent residual plots of examinee no IRC band and Posterior Residual Plot of Item Item response curves of item 21 (above) and item 3 (below) Histogram of 5 simulated values of std(r-pbis) using prior predictive distribution This histogram of the 1 simulated prior predictive p-values illustrates that the distribution of the prior p-value of std(r pbis ) is close to uniform[,1] Histograms of 1 observed std(r pbis ) when data sets were generated using (left) two-parameter and (right) one-parameter model Histogram of 5 simulated values of std(r-pbis) Histogram of 1 posterior predictive p-values Residual plots of the two guessers, examinee 236 and Residual plots of examinee Histogram of the 995 non-guessers Histogram of 5 simulated values of std(r-pbis)
12 xii 4.1 The 9% interval bands for the item response curves of items 11 (upper left), 3 (upper right), 33 (lower left), and 34 (lower right) fitted with the oneparameter IRT model The 9% interval bands for the item response curves of items 14 (left) and 15 (right) fitted with the one-parameter IRT model Latent residual plots of six students marked as potential guessers by the W and L statistics using the posterior predictive distribution Scatterplots of the exact values versus the approximate values of the logdenominator of the Bayes factor Parameter estimates obtained using the exchangeable model is compared with the actual values: (left) Item difficulty, and (right) Ability scores Item Parameter and Ability scores estimates obtained using the exchangeable model is compared with the observed data: (left) Item difficulty vs. No. of correct students, and (right) Ability scores vs. Students raw scores Scatterplot of the discrimination estimates obtained using the Exchangeable model and the Two-parameter model Estimates obtained using the two exchangeable models (one with random s a and one with fixed s a =.25) are compared: (left) Item difficulty; (right) Ability scores Estimates obtained using the One-parameter model and the exchangeable model with fixed s a =.1 are compared: (left) Item difficulty; (right) Item discrimination
13 xiii 5.7 Estimates obtained using the One-parameter model and the exchangeable model with fixed s a = 1 are compared: (left) Item difficulty; (right) Ability scores Scatterplot of estimates of ability scores obtained using the One-parameter model and the exchangeable model with fixed s a = Histogram of 1 log 1 BF of Exchangeable model (s a =.25) vs. (left) Twoparameter model and (right) One-parameter model Values of log 1 BF of exchangeable models with varying standard deviations compared to the approximate Two-parameter model. The right plot is just a close up look at the peak of the graph Values of log 1 BF of exchangeable models with varying standard deviations compared to the approximate One-parameter model. The right plot is a closer look at the peak of the graph (left) Scatterplot of the actual vs. estimated item discrimination parameters. (right) Estimated probability of each item having an outlying discrimination parameter. Note that items 1, 2, and 3 have much bigger probabilities than the rest Values of log 1 BF of exchangeable models with varying standard deviations compared to the two-parameter model using the BGSU Math placement data set. The right plot is just a close up look at the peak of the graph Histogram of the 1 posterior sample values of µ a for the BGSU Math placement data using the exchangeable model with s a =
14 xiv 5.15 Histogram of the 1 posterior sample values of µ a for the BGSU Math placement data using the exchangeable model with s a =
15 xv LIST OF TABLES 1.1 First and Second Derivatives of Item and Ability Parameters for the Two- Parameter Logistic Model Two extreme questions in the exam levels of evidence by log 1 BF Orlando and Thissen (2) simulation results: Proportion of significant p- values (<.5) Percentage of p-values <.5 out of Percentage of p-values <.5 out of Percentage of significant p-values when the one-parameter probit model is used on items with no guessing parameter(c = ) Percentage of significant p-values when the one-parameter probit model is used on items with guessing parameter value of c = Percentage of significant p-values when the two-parameter probit model is used on items with no guessing parameter (c = ) Percentage of significant p-values when the two-parameter probit model is used on items with guessing parameter value of c = Percentage of p-values <.5 out of 1 using G 2 (pp1 and pp2 represent the one-parameter and two-parameter probit model) The 17 misfitted examinees with P W <.5 (* signifies a guesser)
16 4.1 The 16 misfitted examinees with P L <.5 (* signifies a guesser) The percentage of P L and P W <.5 (* signifies a guesser) xvi 5.1 Twenty simulated observations from Beta-binomial model Twenty generated binomial observations Range of values of log 1 BF Levels of evidence by log 1 BF Barry Bonds hitting data from 1986 to The log 1 BF(M l out /M) for each item in the artificial data. Note that the value of log 1 BF(M l out /M) for items 1, 2, and 3 are all bigger than 3, marking them as items with outlying discrimination parameter The ˆγ for each item represents the likelihood that its discrimination parameter is outlying. Note that the value of ˆγ for items 1, 2, and 3 are all much bigger than the rest, marking them as items with outlying discrimination parameter Bayesian estimates of â j, log 1 BF, and γ for the BGSU Math placement exam.141
17 xvii OVERVIEW The focus of this dissertation is to discuss the available model diagnostic procedures for Item Response Theory (IRT) models and to propose new methodologies to assess the goodness-of-fit of these models. The first two chapters cover the material needed to understand the different IRT models and some Bayesian ideas which will be utilized later. Chapters 3, 4, and 5 cover the proposed Bayesian methodologies to assess the goodness-of-fit of the IRT models. In Chapter 1, the different IRT models used in this work are introduced. Classical and Bayesian methods to estimate the parameters in the IRT models are also discussed. This includes discussions of some numerical methods like Newton Raphson and Gibbs Sampling. These methods are illustrated using a Mathematics placement data set from BGSU. The chapter ends with a discussion on the advantages of the Bayesian estimation method over the classical estimation method. Chapter 2 covers the ideas of classical and Bayesian residuals. The concept of residuals is used to construct different chi-squared indices which are currently being used to check model fit of IRT models. These different indices are used later within a Bayesian framework as discrepancy measures. The idea of predictive distributions and measures of surprise are also discussed in this chapter. These standard Bayesian ideas are useful to construct reference distributions for different test quantities and discrepancy measures. Another important Bayesian concept that will be employed later is the Bayes factor, which is introduced in the last section of this chapter.
18 xviii Chapter 3 deals mostly about graphical procedures that can be used to assess the fit of the IRT models. These visual diagnostic plots are constructed based on Bayesian residuals. The item response curve probability interval band proposed by Albert (1999) is a simple but very useful plot to check for item fit. This plot is described in the first section. Two other diagnostic plots, the examinee Bayesian residual and latent residual plots, are proposed in the second section. These two plots will be utilized to check how a particular examinee performed in the test. They may also help detect examinees who were simply guessing their responses. In the third section, a Bayesian procedure to detect examinees who scored much too low or much too high in the exam is proposed based on another Bayesian residual. These Bayesian methods and plots were applied to a real data set in the last section. In Chapter 4, new quantitative methods are proposed to give objective assessments to the fit of IRT models. In particular, the prior, posterior, and partial posterior predictive distributions are used to construct reference distributions for the standard deviation of the item point-biserial correlations, and for eight different discrepancy measures the six χ 2 -indices described in Chapter 2 and two more discrepancy measures for person fit. A simulation study is performed to illustrate and compare the effectiveness of these different discrepancy measures and different predictive distributions in detecting misfitted items and examinees, as well as the overall model misfit. The chapter ends with the application of these predictive methods to a real data set. In Chapter 5, the Bayes factor is used to illustrate a quantitative method for comparing goodness-of-fit of different IRT models and for model selection. The first section of this chapter covers different numerical methods that could be used to calculate the Bayes factor. These methods are then modified and applied to estimate the Bayes factor for IRT models
19 xix in later sections. This Bayes factor is used to choose between competing IRT models and for the detection of outlying discrimination parameters. The effectiveness of this method will be illustrated using simulated data. Again, these methods were applied to a real data set in the last section. Finally, the last chapter gives a summary of all the proposed Bayesian methods along with discussions regarding their performances in the assessment of goodness-of-fit of different IRT models.
20 1 CHAPTER 1 ITEM RESPONSE THEORY MODELS 1.1 Introduction Item Response Theory (IRT) models are commonly used in Educational and Psychological testing. In these fields of study, researchers are usually interested in measuring the underlying ability of examinees such as intelligence, mathematical abilities, or scholastic abilities. However, these kinds of quantities cannot be measured directly as one measures physical attributes like weight or height. In this sense, these underlying abilities are latent traits. One of the main objectives of IRT is to measure the amount of (latent) ability that an examinee possesses. This is usually done using a questionnaire or an examination. It is important that the items used in the questionnaire or test are appropriate to accurately and effectively measure the underlying trait. Consequently, the second main objective of IRT is to study the effectiveness of different test items in measuring a particular underlying trait. Although the idea of IRT has been around for almost a century now, it only became popular in the last two decades. This is mainly due to the extensive computational requirements of the IRT methods. Up until the 198 s, the Classical Test Theory (CTT) had been the mainstay of psychological and educational test development and test score analysis. The classic book of Gulliksen (195) is often cited as the defining volume for CTT. Today there are countless numbers of achievement, aptitude, and personality tests that were constructed using CTT models and procedures. However, there are many well-documented shortcomings of the ways in which educa-
21 2 tional and psychological tests are usually constructed, evaluated, and used within the CTT (Hambleton & van der Linden, 1982). For one, the values of commonly used item statistics in test development, such as item difficulty, depend on the particular sample of examinees from which they were obtained. That is, one particular item can be labeled as easy when given to a group of well prepared students and as difficult when given to a group of unprepared students. For more information about the shortcomings of CTT, see the book by Hambleton and Swaminathan (1985). By the late 198 s, the power of computers had developed to a point where it allowed people working in measurement theory to employ the more computationally intensive methods of IRT. This new theory is conceptually more powerful than CTT [11]. Based upon items rather than test scores, IRT addresses most of the shortcomings of the CTT. In other words, IRT can do all the things that CTT can do and more. An extensive comparison between these two theories are discussed in the book by Embretson and Reise (2). 1.2 Item Response Curve In this dissertation, the latent ability of examinees (usually denoted by θ) is assumed to be continuous and one-dimensional. That means that the performance of an examinee on a particular item of an exam depends only on this one characteristic. Theoretically, the range of this latent variable is from negative infinity to positive infinity. But for most practical purposes, it is sufficient to limit this range between 3 and 3. An examinee with higher ability score is expected to perform better in answering a particular item in the test compared to an examinee with a lower ability score. In the case where items in the test can only be answered either correctly or incorrectly,
22 3 let y denote the examinee s response to a particular item, and take y = 1 if the response is correct and y = if incorrect. This is a Bernoulli random variable with success probability p that depends on the latent ability of the examinee. That is, p = Pr(y = 1) = F (θ), where F represents a known function, called the link function. Because p should be an increasing function of θ and are supposed to take on values between and 1, a natural class for the function F is provided by the class of cumulative distribution functions, or cdf s. The two most commonly used link functions in IRT models are: 1. Probit link (standard normal cdf). F (x) = x 1 2π e 1 2 t2 dt, x R 2. Logistic link (standard logistic distribution function). F (x) = ex 1 + e x, x R Inferences obtained using either link functions are essentially the same. Previously, people working with IRT models prefer the logistic link because of its nice properties that simplifies the mathematical calculations in parameter estimation. However, with the advancement of computing power and the introduction of Bayesian methods in parameter estimation, the probit link has gained more popularity as it is more natural and easier to implement numerically. In the IRT model, an examinee with a certain ability level will have a certain probability of answering a particular item correctly. Plotting these probabilities and their corresponding
23 ability scores will yield a plot like the one shown Figure 1.1. This curve is called an Item Response Curve (IRC). 4 1 Typical Item Response Curve.9.8 PROBABILITY OF CORRECT RESPONSE LATENT ABILITY Figure 1.1: A typical item response curve. 1.3 Common IRT Models One-Parameter Model The probability that an examinee will answer a particular item in a test correctly should also depend on the characteristics of the item. For example, if item 2 is more difficult than item 1, then the probability that a particular examinee will get item 2 correctly should be lower than the probability that he/she gets item 1 correctly. Under the assumption that each item in the test can be described using this single difficulty parameter, one could model the probability of correctly answering a particular item in the test by Pr(y = 1) = F (θ b) (1.3.1)
24 5 where b represents the difficulty parameter of the item. To see the effect of b in the Item Response Curve (IRC), consider the three different plots given in Figure 1.2 with varying difficulty values. 1.9 a=1,b= a=1,b= 1 a=1,b=1 Item Response Curves for 3 Different Item Difficulty Values.8 PROBABILITY OF CORRECT RESPONSE Easier (b= 1) (b=) Harder (b=1) LATENT ABILITY Figure 1.2: Item response curves for 3 different difficulty values. Note that b serves as a location parameter. When b takes negative values, the IRC is shifted to the left and the probability that a particular examinee, with a certain ability score θ, correctly answers the item increases. Hence, lower b values correspond to easier items and higher b values correspond to more difficult items. When the link function F is taken to be the cumulative distribution function of the standard normal distribution (denoted by Φ), this model is known as the One-parameter probit model. But when this link function F is taken as the logistic cumulative distribution
25 6 function, this model becomes the famous Rasch Model (Rasch, 1966). Pr(Y = 1) = e(θ b). (1.3.2) 1 + e (θ b) Two-Parameter Model Suppose that each item in the exam can be described by two parameters a discrimination parameter a j and a difficulty parameter b j. Then the probability that a particular examinee with latent ability score of θ i correctly answers item j is modeled as Pr(Y ij = 1 θ i ) = F (a j θ i b j ). (1.3.3) Again, when the link function F is taken to be the cumulative distribution function of the standard normal distribution, this model is known as the Two-parameter probit model. But when this link function F is taken as the logistic cumulative distribution function, this model is called the Two-parameter logit model. To see the effect of the discrimination parameter a in the item response curve, consider the three different plots shown in Figure 1.3 with varying discrimination values. Note that a serves as a scale parameter that represents the slope of the item response curve. It indicates how well a particular item discriminates between students with different abilities. Take for example two examinees one with ability score and another with ability score 1. If an item has a discrimination parameter value of.5, then the difference in the probabilities of getting the correct answer to this item by these 2 examinees will be about.19 (see Figure 1.4). On the other hand, if an item has a discrimination parameter value of 2, then this difference in probabilities will be about.48.
26 7 PROBABILITY OF CORRECT RESPONSE a=1,b= a=.5,b= a=2,b= Item Response Curves for 3 Different Item Discrimination Values High Low LATENT ABILITY Figure 1.3: Item response curves for 3 different discrimination values a=1,b= a=.5,b= a=2,b= High Discrimination vs. Low Discrimination P1 P =.48 PROBABILITY OF CORRECT RESPONSE P1 P = LATENT ABILITY Figure 1.4: Items with high discrimination power have higher chances of distinguishing two examinees with different ability scores than items with low discrimination power.
27 Hence, the item with higher discrimination parameter value has a better chance of finding the examinee with higher ability score Three-Parameter Model Sometimes, especially on multiple choice items, examinees can get the correct answer purely by guessing. To include this guessing parameter in the model, one could model the success probability as Pr(y ij = 1 θ i ) = c j + (1 c j )F (a j θ i b j ). (1.3.4) where c j represents the probability that any examinee will get item j correct by pure guessing. This model is known as the Three-parameter probit model when the standard normal cumulative distribution is used as the link function. But when the logistic link function is used, this model is called the Three-parameter logit model. The latter model was introduced by Birnbaum in Exchangeable IRT Model The one-parameter IRT model assumes that all items in the exam have the same discrimination parameters (usually all equal to one), while the two-parameter IRT model assumes that each item can have a different discrimination parameter value. Some people think that the one-parameter model is too restrictive, while others think that the two-parameter model is already over-parameterized. In the Bayesian framework, there is a way to get a compromise between these two models. This is achieved by considering an exchangeable IRT model in which the item discrimination parameter values are shrunk toward a common
28 value. More details about this model will be discussed in Chapter 5, where this model will be used extensively Parameter Estimation There are two main methods of obtaining estimates for the parameters in the above mentioned models: the classical Joint Maximum Likelihood Estimation (JMLE) and by Bayesian Estimation. In either case, one has to work with the likelihood function. To facilitate the discussion of these two estimation methods, they will be discussed using only the two-parameter IRT models. These two estimation procedures can be easily modified to work for the other IRT models Likelihood Function Let y i1, y i2,..., y ik denote the binary responses of the ith individual to k test items, a = (a 1,..., a k ) and b = (b 1,..., b k ) be the vectors of item discrimination and difficulty parameters, respectively. Assuming that an individual taking the test answers each item independently (local independence assumption), then the probability of observing the entire sequence of responses of the ith individual is given by Pr(Y i1 = y i1,..., Y ik = y ik θ i, a, b) = = k Pr(Y ij = y ij θ i, a, b). j=1 k F (a j θ i b j ) y ij [1 F (a j θ i b j )] (1 yij). j=1
29 1 Finally, if the responses of each of the n individuals to the test items are assumed to be independent, then the likelihood function for all responses of all individuals will be L(θ, a, b) = n k F (a j θ i b j ) y ij [1 F (a j θ i b j )] (1 yij). (1.4.1) i=1 j=1 This function represents the likelihood of obtaining the observed data as a function of the model parameters. Therefore, it is logical to estimate these model parameters using those values that maximize this likelihood function. This is what Maximum Likelihood Estimation (MLE) or Joint Maximum Likelihood Estimation (JMLE) is all about Joint Maximum Likelihood Estimation One of the most common ways of maximizing a likelihood function is to take its partial derivatives with respect to each parameter in the model and set them to zero. Actually, because likelihood functions are most often expressed as the product of several density functions, it is often more convenient to maximize the natural logarithm of the likelihood (ln(l)). Since logarithmic functions are increasing in R, then the maximum of the likelihood function will occur at the same point as the maximum of the log-likelihood. In the case of the two-parameter IRT model, the log-likelihood is ln L = n k {y ij ln(p ij ) + (1 y ij ) ln(1 p ij )} (1.4.2) i=1 j=1 where p ij = F (a j θ i b j ). Taking its partial derivatives with respect to each parameter and setting them to zero will yield a system of n + 2k equations with the same number of unknowns. The solutions of this system of equations are the potential maximum likelihood estimates of the model
30 11 parameters. For this reason, people working with the IRT models preferred to use the logistic link because it simplifies the derivative expressions nicely and greatly facilitated the required calculations. For the two-parameter logistic IRT model, where p ij = e(a jθ i b j ) 1 + e (a jθ i b j, the first partial ) derivatives are given by p ij a j = p ij q ij θ i, p ij b j = p ij q ij, and p ij θ i = p ij q ij a j, where q ij = 1 p ij. Using these partial derivatives, the first and second partial derivatives of the log-likelihood (1.4.2) using the logistic link, can be obtained easily and are summarized in Table 1.1, shown below. Derivative ln(l) a j ln(l) b j ln(l) θ i 2 ln(l) a 2 j 2 ln(l) b j a j 2 ln(l) b 2 j Expression n θ i (y ij p ij ) i=1 n (y ij p ij ) i=1 k a j (y ij p ij ) j=1 n p ij q ij θi 2 i=1 n p ij q ij θ i i=1 n p ij q ij i=1 Table 1.1: First and Second Derivatives of Item and Ability Parameters for the Two- Parameter Logistic Model. However, even with the logistic link, the resulting equations are not linear. Thus, to get the maximum likelihood estimates, one needs to solve these equations numerically. Two
31 12 popular numerical methods used for this purpose are the Newton-Raphson and Fisher s Method of Scoring (see Appendix). Using the mathematical software Matlab, the author has written programs to implement the Newton-Raphson algorithm to estimate the parameters of the two-parameter logistic model. This program is described in full in the Appendix under the name pl2 mle. To show how close the JMLE estimates are to the actual parameters, a simple simulation was performed where a data set of s and 1 s were generated using 1 simulated ability scores and 35 test items each with 2 parameters. The 1 simulated ability scores and the 35 item difficulty parameter values were generated from N(,1), while the 35 item discrimination parameters were randomly selected from the possible values of {.2,.4,.6,.8, 1., 1.2, 1.4, 1.6}. Once the parameter values were specified, the probability of answering a particular item correctly by a certain simulated student was computed using the logistic link to obtain a 1 35 matrix of probabilities. Finally, this matrix was converted into a matrix of s and 1 s to simulate a particular exam result. Using the 1 35 data matrix of simulated responses, the JMLE estimates were obtained using the program pl2 mle. The two scatterplots shown in Figure 1.5 display the relationship between the actual item parameters and their JMLE estimates. The left plot of Figure 1.5 shows a linear pattern of dots that were very close to the line y = x, which illustrates the accuracy of the estimates of the 35 difficulty parameters obtained. The right plot of Figure 1.5 also shows a linear trend, but the dots in this plot are more scattered revealing a lower precision for the estimates of the discrimination parameters of the 35 items. Also, notice that the linear pattern is slightly above the line y = x, suggesting a positive bias in the estimation of the discrimination parameters by the JMLE. This positive bias was previously noted by Lord
32 13 2 Classical Approach : (r =.9965) 1.8 Classical Approach : (r =.981) Estimated Item Difficulty.5.5 Estimated Item Discrimination Actual Item Difficulty Actual Item Discrimination Figure 1.5: Scatterplots of 35 actual item parameter versus their corresponding estimates (1983). 4 Classical Approach : (r =.8964) 3 2 Estimated Ability Score Actual Ability Score Figure 1.6: Scatterplots of 1 actual ability scores versus their corresponding estimates. The scatterplot of the actual ability scores versus their estimated values is given in Figure 1.6. Again, notice the linear pattern of the dots that cluster around the line y = x. The variability of the estimates around this line depends on the number of items in the exam. That is, if there were more items in the exam, these dots would be closer to the line y = x.
33 This plot also revealed the increased variability of the estimates for extreme ability scores Bayesian Estimation In the classical (or frequentist) framework, parameters in a model are considered as fixed quantities. On the other hand, in the Bayesian framework, these parameters, ξ = (ξ 1,..., ξ N ), are considered as random variables that follow a certain distribution, π(ξ). Bayesian methodology requires the specification of a prior distribution, π (ξ), for each parameter ξ in the model. This will represent the prior belief regarding the parameters in the model. After observing the data through the likelihood function, L(data; ξ), the belief about the parameters is modified (or updated) by computing their posterior distributions, π(ξ data). This is done with the use of the Bayes Rule formula: P (A B) = P (A B) P (B) = P (B A)P (A) P (B) P (B A)P (A). Or in our terms, π(ξ data) L(data; ξ)π (ξ). Once the posterior distributions of the parameters are obtained, all inferences pertaining to these parameters are based on their respective posterior distributions. For the Bayesian method of estimation, using the probit link is more natural and easier to implement as will be seen later. For this reason, the Bayesian estimation method was discussed using the two-parameter probit model. Using Bayes rule, the joint posterior distribution will be proportional to the product of the likelihood function obtained in Section and the joint prior density of the parameters. That is, π(θ, a, b data) n k Φ(a j θ i b j ) y ij [1 Φ(a j θ i b j )] (1 yij) π (θ, a, b). (1.4.3) i=1 j=1
34 15 where Φ is the standard normal cumulative distribution and π (θ, a, b) is the joint prior density of the parameters in the model. It is a standard practice to use values for θ i mostly between 3 and 3. For this reason, a N(, 1) prior is assigned for θ i, i = 1,..., n. This also solves the problem of nonidentifiability of the parameters in the model. To avoid the problem of getting unbounded estimates for the item difficulty parameters, b j is assigned a N(, s b ) prior, j = 1,..., k, where s b < 5. Finally, a N(, s a ) prior is also assigned for a j, j = 1,..., k, where s a is fixed. For simplicity, s b and s a were both set to 1 in the actual computations. Combining these prior densities with the likelihood function, the posterior density of the two-parameter IRT model is proportional to n k π(θ, a, b data) L(θ, a, b) φ(θ i ;, 1) φ(b j ;, s b )φ(a j ;, s a ). (1.4.4) i=1 j=1 As mentioned before, all Bayesian inferences about a parameter will be based on its posterior distribution. Consequently, Bayesian analysis will require the study of the important parameters based on this joint posterior distribution or their corresponding marginal posterior distributions. However, it is quite difficult to study this complicated posterior distribution or to derive the marginal posterior distributions analytically. An alternative method is to simulate values of these parameters from the joint posterior distribution. Inferences about a parameter can then be made using this sample. For example, one could take the average of the sample to serve as an estimate of the mean of the parameter, or construct an approximate 95% probability interval for the parameter. However, drawing a sample from a high-dimensional posterior distribution is not an easy task. Fortunately, there is Gibbs Sampling (Geman and Geman, 1984).
35 16 Gibbs sampling, as discussed in Gelfand and Smith (199), is a special type of Markov Chain Monte Carlo (MCMC) that makes use of the full conditional distribution of a set of parameters. The idea is, to simulate from f(x, y, z) (the joint distribution of X, Y, and Z), one iteratively draws from the full conditional distributions. That is, from initial values x, y, and z, one draws x 1 from g(x y, z ), then y 1 from g(y x 1, z ), and then z 1 from g(z x 1, y 1 ). This will constitute a single iteration of the Gibbs Sampling. To simulate m points from the f(x, y, z), simply repeat this cycle m + l times, where l is the number of cycles it takes to converge to the desired distribution (also called the burn-in period). The points from the last m cycles can be considered as a (dependent) sample drawn from the joint distribution f(x, y, z). Some other people would repeat the cycle km + l times and select every other k points among the last km points in order to reduce the dependency of the sample points. However, Gibbs sampling assumes that it is possible to simulate from the full conditional distributions. If each of the full conditional distributions turned out to be a common standard distribution that can be directly or easily simulated from, then there would be no problem. But if some of these distributions are nonstandard density functions, then one may need to employ a more general MCMC algorithm, like the Metropolis-Hasting (MH) algorithm, to obtain a sample from them (see the appendix for details on MH algorithm). Sometimes, finding the full conditional distribution from a joint distribution may be the problem. Especially in complicated distributions, like our joint posterior distribution given in equation 1.4.4, it can be very challenging.
36 Albert s Gibbs Sampler To facilitate the computation of these full conditional distributions, Albert (1992) introduced a latent variable Z ij that has a normal distribution with mean m ij = a j θ i b j and variance 1. This continuous variable serves as the underlying mechanism that generates the responses. We say that the response is positive (y ij = 1) when z ij > and negative (y ij = ) when z ij <. This ingenious idea greatly simplified the simulation of samples from the conditional posterior distributions, as they turned out to be just variations of the normal distribution. With the introduction of these continuous latent data Z = (Z 11,..., Z nk ), the joint posterior density of all model parameters is given by π(θ, Z, a, b data) n k [φ(z ij ; m ij, 1)I(Z ij, y ij )] i=1 j=1 n φ(θ i ;, 1) i=1 j=1 k φ(b j ;, s b )φ(a j ;, s a ). (1.4.5) where I(z, y) is equal to 1 when {z >, y = 1} or {z <, y = }, and equal to otherwise. To simulate from the joint posterior (1.4.5), the Gibbs sampling procedure can iteratively draw from three sets of conditional probability distributions: g(z θ, (a, b), data), g(θ Z, a, b, data), and g((a, b) Z, θ, data). The conditional posterior distribution of Z ij given (θ i, a j, b j, data) is simply a truncated normal distribution with mean m ij = a j θ i b j and variance 1. The truncation is from the left of if the corresponding response is correct (y ij = 1), and from the right of if it is incorrect (y ij = ). The conditional posterior distribution of θ i given (Z ij, a j, b j, data) is a normal distribution with mean and variance m θi = k j=1 a j(z ij + b j ) k j=1 a2 j + 1 and ν θi = 1 k j=1 a2 j + 1.
37 Finally, the conditional posterior distribution of (a j, b j ) given (Z ij, θ i, data) is the multivariate normal distribution with mean 18 M j = [X X + Σ 1 ] 1 [X Z + [µ a ] Σ 1 ] and covariance matrix where, Σ 1 = (1999). ν j = (X X + Σ 1 ) 1, Sa 2 and X is the known covariate vector (θ Sb 2 i, 1). For more details on these conditional posterior distributions, see Albert and Johnson To implement Albert s Gibbs Sampler on the two-parameter probit IRT model with burn-in period, the author modified Albert s Matlab program to get the program pp2 bay (see Appendix). To see how close the estimates are to the actual parameters, the same simulated parameter values that were used in the previous section were used to generate a data set of s and 1 s using the probit link. Using the generated 1 35 data matrix of responses, the Bayesian estimates were obtained using the program pp2 bay. The two scatterplots shown in Figure 1.7 display the relationship between the actual item parameters and their Bayesian estimates. The left plot in Figure 1.7 shows a linear pattern of dots which resembles very closely to the corresponding plot obtained earlier using the classical approach and is shown in Figure 1.5. The correlation coefficient between the actual difficulty values and their Bayesian estimates was This indicates the accuracy of the Bayesian estimates. The plot on the right of Figure 1.7 also shows a linear pattern of dots that looks similar to the one obtained using the JMLE method, except that the Bayesian item discrimination estimates are better since they centered around values close to
38 Bayesian Approach (r =.9948) 1.8 Bayesian Approach : (r =.9794) Estimated Item Difficulty Estimated Item Discrimination Actual Item Difficulty Actual Item Discrimination Figure 1.7: Scatterplots of 35 actual item parameter versus their corresponding Bayesian estimates 3 Bayesian Approach (r =.9557) 2 Estimated Ability Scores Actual Ability Scores Figure 1.8: Scatterplot of 1 actual ability scores versus their corresponding Bayesian estimates
39 2 the actual discrimination values. Figure 1.8 shows a very nice linear pattern around the line y = x, which illustrates the accuracy of the Bayesian estimates of the ability score. In addition, the problem of higher variability at extreme values that were observed earlier, when the JMLE method was used, no longer exist. 1.5 An Example - BGSU Mathematics Placement Exam Every year, the Mathematics and Statistics Department of BGSU administers a placement exam to determine the proficiency of the incoming freshmen students. In 24, there were three different exams (A, B, and C) given to a total of 557 students. Exam A was composed of 35 questions and were given to a total of 1286 students. Data set A contains the results of these 1286 students to each of the 35 items. It is a table of s and 1 s with a ij = 1 when the ith student answered the jth item correctly and otherwise. To illustrate the kind of results that one gets from the estimation procedures discussed in the previous two sections, those methods were applied to the responses of the 1286 students who took exam A. A. Using Joint Maximum Likelihood Estimation Before the JMLE procedure can be applied to the data set for exam A, students who got either a score of zero or a perfect score, as well as items that were answered correctly or incorrectly by all students had to be removed to avoid getting unreasonable results (This issue will be discussed later in the last section of this chapter). After checking, 11 students (2 zero scores and 9 perfect scores) were removed from the data set. The JMLE procedure was then applied to this slightly smaller data set and the resulting item parameter estimates
Lesson 7: Item response theory models (part 2)
Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of
More informationSome Issues In Markov Chain Monte Carlo Estimation For Item Response Theory
University of South Carolina Scholar Commons Theses and Dissertations 2016 Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory Han Kil Lee University of South Carolina Follow this
More informationPREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen
PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS by Mary A. Hansen B.S., Mathematics and Computer Science, California University of PA,
More informationBayesian Nonparametric Rasch Modeling: Methods and Software
Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationA Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University
A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationPIRLS 2016 Achievement Scaling Methodology 1
CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationItem Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions
R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationBasic IRT Concepts, Models, and Assumptions
Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationItem Response Theory (IRT) Analysis of Item Sets
University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationLogistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.
Louisiana State University LSU Digital Commons LSU Historical Dissertations and Theses Graduate School 1998 Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationA Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts
A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov
Pliska Stud. Math. Bulgar. 19 (2009), 59 68 STUDIA MATHEMATICA BULGARICA ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES Dimitar Atanasov Estimation of the parameters
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationAn Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin
Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University
More informationComparing IRT with Other Models
Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used
More informationDevelopment and Calibration of an Item Response Model. that Incorporates Response Time
Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore
ABSTRACT Title of Document: A MIXTURE RASCH MODEL WITH A COVARIATE: A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION Yunyun Dai, Doctor of Philosophy, 2009 Directed By: Professor, Robert
More informationA Simulation Study to Compare CAT Strategies for Cognitive Diagnosis
A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis Xueli Xu Department of Statistics,University of Illinois Hua-Hua Chang Department of Educational Psychology,University of Texas Jeff
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More information2 Bayesian Hierarchical Response Modeling
2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationSubjective and Objective Bayesian Statistics
Subjective and Objective Bayesian Statistics Principles, Models, and Applications Second Edition S. JAMES PRESS with contributions by SIDDHARTHA CHIB MERLISE CLYDE GEORGE WOODWORTH ALAN ZASLAVSKY \WILEY-
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationQuiz 1. Name: Instructions: Closed book, notes, and no electronic devices.
Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationGeneralized, Linear, and Mixed Models
Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationBayesian Estimation An Informal Introduction
Mary Parker, Bayesian Estimation An Informal Introduction page 1 of 8 Bayesian Estimation An Informal Introduction Example: I take a coin out of my pocket and I want to estimate the probability of heads
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationThe application and empirical comparison of item. parameters of Classical Test Theory and Partial Credit. Model of Rasch in performance assessments
The application and empirical comparison of item parameters of Classical Test Theory and Partial Credit Model of Rasch in performance assessments by Paul Moloantoa Mokilane Student no: 31388248 Dissertation
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationFrom Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...
From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationJournal of Statistical Software
JSS Journal of Statistical Software April 2008, Volume 25, Issue 8. http://www.jstatsoft.org/ Markov Chain Monte Carlo Estimation of Normal Ogive IRT Models in MATLAB Yanyan Sheng Southern Illinois University-Carbondale
More informationComparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment. Jingyu Liu
Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment by Jingyu Liu BS, Beijing Institute of Technology, 1994 MS, University of Texas at San
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationOverview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications
Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationMaking the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring
Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp. 295 311 Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring
More informationDoing Bayesian Integrals
ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationStatistical Methods in Particle Physics Lecture 1: Bayesian methods
Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan
More informationSummer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010
Summer School in Applied Psychometric Principles Peterhouse College 13 th to 17 th September 2010 1 Two- and three-parameter IRT models. Introducing models for polytomous data. Test information in IRT
More informationOn the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit
On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationMarkov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017
Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationUCLA Department of Statistics Papers
UCLA Department of Statistics Papers Title Can Interval-level Scores be Obtained from Binary Responses? Permalink https://escholarship.org/uc/item/6vg0z0m0 Author Peter M. Bentler Publication Date 2011-10-25
More informationExploring Monte Carlo Methods
Exploring Monte Carlo Methods William L Dunn J. Kenneth Shultis AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO ELSEVIER Academic Press Is an imprint
More informationA Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions
A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationLatent Trait Reliability
Latent Trait Reliability Lecture #7 ICPSR Item Response Theory Workshop Lecture #7: 1of 66 Lecture Overview Classical Notions of Reliability Reliability with IRT Item and Test Information Functions Concepts
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMonte Carlo Simulations for Rasch Model Tests
Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian
More informationBAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES
Behaviormetrika Vol.36, No., 2009, 27 48 BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES Yanyan Sheng and Christopher K. Wikle IRT-based models with a general ability and several specific
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationUSING BAYESIAN TECHNIQUES WITH ITEM RESPONSE THEORY TO ANALYZE MATHEMATICS TESTS. by MARY MAXWELL
USING BAYESIAN TECHNIQUES WITH ITEM RESPONSE THEORY TO ANALYZE MATHEMATICS TESTS by MARY MAXWELL JIM GLEASON, COMMITTEE CHAIR STAVROS BELBAS ROBERT MOORE SARA TOMEK ZHIJIAN WU A DISSERTATION Submitted
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationInferences about Parameters of Trivariate Normal Distribution with Missing Data
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing
More informationDoctor of Philosophy
MAINTAINING A COMMON ARBITRARY UNIT IN SOCIAL MEASUREMENT STEPHEN HUMPHRY 2005 Submitted in fulfillment of the requirements of the degree of Doctor of Philosophy School of Education, Murdoch University,
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More information