Paradoxical Results in Multidimensional Item Response Theory

Size: px
Start display at page:

Download "Paradoxical Results in Multidimensional Item Response Theory"

Transcription

1 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker, Matthew Finkelman and Armin Schwartzman Abstract In multidimensional item response theory MIRT, it is possible for the estimate of a subject s ability in some dimension to decrease after they have answered a question correctly. This paper investigates how and when this type of paradoxical result can occur. We demonstrate that many response models and statistical estimates can produce paradoxical results and that in the popular class of linearly compensatory models, maximum likelihood estimates are guaranteed to do so. In light of these findings, the appropriateness of multidimensional item response methods for assigning scores in high-stakes testing is called into question. Introduction Jane and Jill are fast friends who are nonetheless intensely competitive. At the end of high school they each take an entrance exam for a prestigious university. After the exam, they compare notes and discover that they gave the same answers for every question but the last. On checking their materials, it is clear that Jane answered this question correctly, but Jill answered incorrectly. They are therefore very surprised, when the test results are published, to find that Jill passed but Jane did not! Lawsuits ensue. The university maintains that it followed well established statistical procedures: the questions on the test were designed to simultaneously examine both language and analytic skills, and a multiple-hurdle rule Segall 2000 based on maximum likelihood estimates of each student s abilities was used to ensure that admitted students were proficient in both. The university had re-checked its calculations many times and was satisfied the correct decision had been made. Jane s lawyers countered that, whatever the statistical correctness of the agency s procedures, it is unreasonable that an examinee should be penalized for getting more questions correct. Could such a situation occur? ***** 2007 demonstrated empirically that it could. On examining an operational data set they found that were a passing threshold for a test set injudiciously, nearly 6% of students could move from fail to pass by changing answers from correct to incorrect. The possibility of obtaining a lower score by getting more questions correct or vice versa was labeled a paradoxical result and has clear implications for fairness. This is a property of statistical estimates for multidimensional latent abilities, even when the models for student responses appear reasonable. How does this occur? ***** 2007 provide an intuitive explanation. Suppose that each question on the test given to Jane and Jill required both language and analytical skills to answer correctly, and Jane and Jill got some of these correct and some incorrect. The final question, however, was very difficult in terms of analysis, but did not require strong language skills. That Jane got this question correct suggests that her analytical skills must be very good indeed. This being the case, the only explanation for her previous

2 incorrect answers is that her language skills must be quite low. By contrast, Jill, in getting the final question incorrect, has demonstrated fewer analytic skills and must have relied on stronger language skills to answer previous questions correctly. The estimate of Jane s language ability therefore dipped below the required threshold, while Jill s was pushed upwards; both obtained satisfactory analysis scores. This provides an intuitive explanation for how paradoxical results occur and makes clear that they may not be unreasonable. The authors nonetheless feel that it is better not to put students in the position of second-guessing when their best answer may be harmful to them. This paper provides a mathematical analysis of paradoxical results that began as an attempt to find conditions under which statistical estimation would always avoid them. Our analysis is not encouraging: paradoxical results can occur across a wide range of two-dimensional item response models using a large class of statistical ability estimates. Moreover, in the popular class of linearly compensatory models, every non-separable test has a response sequence for which maximum likelihood estimates MLEs of abilities are paradoxical. Indeed, for any ability dimension, almost every answer sequence for every test is paradoxical in the sense that the estimate of ability can either be made to increase by changing a correctly answered item to incorrect, or to decrease by changing an incorrectly answered item to correct, provided that question is chosen appropriately. The results in this paper establish the existence of paradoxical results under general conditions. They imply, for example, that within a broad class of multidimensional models, it is not enough to restrict the parametric form of an item response function in order to avoid paradoxical results. The exact conditions and assumptions are given in Sec. 2. They essentially reduce to the following sufficient, but not necessary, conditions: All second derivatives of the log of the item response surface should be strictly negative. Either MLEs, or priors with no correlation between abilities, are being used to estimate a subject s ability. These can be unrealistic in practice; the first excludes response functions with guessing parameters and the second may assume an unlikely correlation. Neither of these conditions are necessary for paradoxical results, however, and we investigate the extent to which they may be violated and still produce such results. We also believe that our analysis sheds light on the general mathematical causes of paradoxical results in a way that makes an analysis in specific cases possible. As an example, our study of linearly compensatory models uses the framework developed in the general case to provide sufficient conditions for an individual item to produce a paradoxical result, including for Bayesian estimates with non-independent priors. We note that tests that exhibit simple structure every question depending on only one ability dimension violate the first of the above conditions and these can be guaranteed not to produce paradoxical results in two ability dimensions or when independence priors are used. The paper is structured as follows. We provide a basic framework for multidimensional item response theory in Sec. 2. Some basic theoretical results and constructions are given in Sec. 3. We apply these results to frequentist estimates in Sec. 4 and to Bayesian estimates in Sec. 5. The specific case of linearly compensatory models is explored in Sec. 6. Sec. 7 considers extensions to models not covered in our assumptions in particular, models that involve guessing parameters, non-independence priors, discrete ability spaces and spaces of three or more dimensions; empirically we show that paradoxical results remain common under these conditions. Sec. 8 examines a limited solution given in terms of enforcing regularity conditions. 2

3 2 Definitions and Assumptions The assumptions made here and in Sec. 3 are used throughout Sections 4, 5 and 6. We discuss the effect of relaxing these assumptions in Sec. 7. Multidimensional item response theory assumes that questions hereafter referred to as items may measure more than one latent trait, with θ R d indicating a given subject s proficiency in each of d dimensions. In the scenario above, there are dimensions representing language and analytic skills. We assume that all item parameters have been estimated and are defined with respect to fixed dimensions. We further assume that subjects abilities are modeled as being unrestricted elements of R d. In order to simplify the analysis below, we restrict to abilities indexed in R 2, but state as much as possible in more general terms. Some comments regarding higher dimensional models are given in Sec. 7. We assume that a subject s answers to items are marked either as correct or incorrect. The item s response function, P, gives the probability of a student getting a particular test item correct as a function of the student s abilities and parameters specific to the item. We will frequently refer to an item as meaning both the specifics of a test question and its response function. Common choices include Compensatory: P θ, a i = ga i0 + a T i θ where g is a log-concave cumulative distribution function and each a i is a d-vector with non-negative entries. Most commonly used are Logistic: gt = e t / + e t e.g., Ackerman 996; Reckase 985. Normal ogive: gt is the cumulative distribution for a Gaussian random variable e.g., Bock et al See Reckase 997 for a review including a history of multidimensional IRT models. The label compensatory is used since a subject s lack of ability in dimension may be made up for by proficiency in dimension 2 and vice versa. We describe the models above as being linearly compensatory; they admit particularly easy analysis. Non-compensatory: P θ, a i = d j= g ja ij0 + a ij θ j in which each g j is a log-concave cumulative distribution function and a ij > 0 Bolt and Lall 2003; Whitely 980. The term non-compensatory is applied here since no degree of proficiency in dimension 2 will overcome a lack of ability on dimension. Our results are more general than these particular models, however, and we set out very general conditions below. Definition 2.. We define a test to be a collection of N items with probability functions P i θ,..., θ d, i =,..., N for ability parameters θ = θ,..., θ d. When θ is used as an argument to a function, it is taken to be shorthand for the vector of arguments θ,..., θ d, or a sub-vector of these, and we use both notations interchangeably. It makes sense that increasing a subject s ability should not reduce their probability of getting an item correct, hence the following natural restriction: Definition 2.2. A response function P θ,..., θ d is monotone if P is a monotone increasing function of θ j for each θ,..., θ j, θ j+,..., θ d. Monotone response functions are considered in Junker and Sijtsma 2000 and Antal 2007; non-monotone response functions can clearly be seen to produce paradoxical results. This is also a concern for multiplechoice models, for example in Thissen and Steinberg 997. We make the distinction that our results are 3

4 specifically based on monotone response functions. These cannot produce paradoxical results in unidimensional IRT, but will do so in MIRT. In addition, we require a somewhat more restrictive assumption about models for a correct response: Definition 2.3. A response function P θ has log negative second derivatives if its second derivatives exist and 2 log P θ θ j θ k for each j and k possibly equal everywhere in R d. 0 and 2 log P θ θ j θ k 0 Straightforward calculation gives that each of the models listed above satisfies this condition. Importantly, P θ and P θ are log-concave as functions of each θ j with the other ability dimensions fixed. In what follows, we will be interested in properties of classes of items; specifically, are there simple restrictions that ensure a test cannot produce paradoxical results? Our main concern here is whether there are families of response functions that can be guaranteed to avoid paradoxical results. For any class of items, our mathematical analysis consists of finding a test comprised of items drawn from that class that will produce a paradoxical result. If such a test exists, then the combination of items within the test becomes important. In order to make this argument formal, we need to define which items can be selected to form a test. Definition 2.4. A class of two-dimensional items is said to be parametric if the response functions for each item can be written as P a θ, a 2 θ 2, γ for a 0, a 2 0 R and γ finite-dimensional. A parametric class is complete if it contains items for all a 0 and a 2 0. In particular, a complete class contains an item for which a = 0; that is, θ does not affect the probability of getting the item correct. If a complete parametric class could be shown to avoid paradoxical results, we could safely design tests using items from this class. Sec. 6 removes the assumption of completeness for linearly compensatory models. As noted in the introduction, two-dimensional tests exhibiting simple structure are not subject to paradoxical results, and we restrict to Definition 2.5. A class of items with log negative second derivatives is non-simple if it contains at least one item for which the inequalities in are strict for at least one pair j, k with j k. We now turn to the outcome of administering a test to a subject: Definition 2.6. For a test, each subject produces a response vector: a series of N binary answers y = y,..., y N which are coded 0 or according to whether item i is answered correctly. Definition 2.7. For any given test, a score is a function mapping from the response vectors to real numbers, S : {0, } N R. An important class of scores are those defined with respect to some estimate of a subject s ability vector, ˆθ. Examples include Multiple hurdle scores Segall 2000: Sy = Iˆθ i c i. Composite scores van der Linden 999: Sy = s i ˆθi or I s i ˆθi C. 4

5 Either score can represent nuisance dimensions by setting c i = or s i = 0. Our results will be concerned with the estimate ˆθ, which can be thought of as a composite score with s, s 2,..., s d =, 0,..., 0. However, paradoxical results for this score can clearly have implications for multiple hurdle or other composite scores. Specific conditions on when this can happen for composite scores and linearly compensatory models are investigated in Sec. 6. As discussed, one of the properties of scores that we would like to retain is regularity. Definition 2.8. We define a partial ordering on responses by stating y z if y i z i for all i and this inequality is strict for some i. A score is regular if y z Sy Sz. A paradoxical result is defined to be the existence of response vectors y and z such that y z and Sz < Sy. As with items, we define a partial ordering of functions and a monotonicity condition for functionals of them. Definition 2.9. We define a partial ordering of unidimensional functions by gt ft if gt < ft, t R. A functional T mapping functions to the real line is said to be monotone increasing if it is monotone decreasing if g f T [g] < T [f]; g f T [g] > T [f]. Most unidimensional frequentist and Bayesian estimates are monotone increasing functionals of the derivative of the log likelihood. 3 Basic Results and Constructions Before proceeding to demonstrate that scores based on statistical estimates of an ability parameter are not regular, we provide some basic notation, a few general results and constructions for the results given below. 3. Statistical Constructs Most statistical estimates may be written as functionals of the log likelihood lθ; y under the assumption that responses are conditionally independent across items given subjects. In our proofs, we will use l k θ; y to denote the log likelihood after k items have been administered: l k θ; y = k y i log P i θ + y i log P i θ. i= l k θ; y is a function only of the first k entries in y, and ignores the following entries. Throughout, lθ; y will be taken to imply l N θ; y: the log likelihood at the end of the test. We note that the log-negative second derivative condition Def. 2.3 implies that all the second derivatives of lθ; y are negative. Frequentist estimates for θ are most easily expressed in terms of the derivatives of lθ; y. When using Bayesian methods, we also place a restriction on the priors available: 5

6 Definition 3.. A density µθ is independent and log-concave if µθ = d j= µ jθ j where each µ j is a log-concave density. We assume that Bayesian estimates are conducted with a prior in which the traits are modeled as being independent. This is a substantial restriction that ensures that the posterior distribution after k items, f k θ y = elk θ y µθ e l k θ y µθdθ, also has log-negative second derivatives. We discuss relaxing this condition in Sec. 7. We want to emphasize that Def. 3. is used to define a statistical estimation procedure rather than to make any statement about the distribution of traits in a population. Our results refer to the behavior of individual statistical estimates and do not depend on the population of subjects. We denote the marginal posterior density of θ i by f k i θ i y and its density conditional on all the other ability parameters by f k i θ i θ i, y. The corresponding cumulative distribution functions are written F k i θ i y and F k i θ i θ i, y. 3.2 Basic Results We begin with two central lemmas. These describe the behavior of the derivatives of the likelihood with respect to θ i, keeping the other entries in θ fixed. For simplicity, throughout this section, we take i = and let θ = θ 2,..., θ d. The results will clearly not depend on this choice of i. We use t as an argument in place of θ when examining a likelihood as a function of θ. All proofs are given in Appendix A. Lemma 3.. Assume that a test consists of N items which are monotone in d abilities and have log negative second derivatives, at least one of which is strict. If θ θ Def. 2.8, then Lemma 3.2. Define l t, θ θ ; y l t, θ θ ; y. θ θ, c; y = { } l t : t, θ ; y = c. θ Under the conditions of Lemma 3., dθ /dθ j < 0, for j = 2,..., d. For c = 0 this is the MLE for θ with all other parameters held fixed. We can define likelihood ratio confidence intervals using other values of c. We note that since the log likelihood is concave, l/ θ is monotone decreasing and hence this function is well defined. Lemma 3.2 provides the basic intuition for the mathematical explanation of paradoxical results. As θ 2, say, increases, the value of θ at which l/ θ crosses zero, say, decreases. This already implies that the MLE for θ conditional on θ 2,..., θ d decreases as θ 2 increases. Figure provides a visual representation of this intuition using the likelihood for the first subject in the data set analyzed in ***** In this figure, the contours of the log likelihood form ellipses with a negatively-oriented major axis. This will always be the case for when the conditions in Def. hold. For Bayesian inference, the analogous result to Lemma 3. follows from the same reasoning and is stated without proof: 6

7 θ 2 + > θ2 * 0 dlθ 2 * /dθ 0 likelihood θ,mle θ 2 ;y dl/dθ 5 0 dlθ 2 + /dθ θ,mle θ 2 * 0.5 θ θ,mle θ θ θ Figure : An illustration of the nature of likelihoods in two-dimensional item response theory using likelihoods from an example test. Left panel: we plot lθ, θ 2 / θ for fixed θ2 and θ 2 + > θ 2. The latter is everywhere below the former. We have also given the maximizing values of θ for θ2 and θ 2 +. Right panel: contours of the likelihood and the MLE for θ solid plotted as a function of θ 2 dashed. If θ 2 is increased the maximizing value for θ must decrease. Lemma 3.3. Under the conditions of Lemma 3., let fθ y be the posterior density based upon an independent and log-concave prior. Let θ θ. Then log f θ t, θ y log f t, θ θ y. While Bayesian inference can be written in terms of the derivatives of a log posterior density, it is far more convenient to produce an analysis in terms of cumulative distribution functions. The following result demonstrates that the ordering relation is preserved between these two. Lemma 3.4. Let f t and f 2 t be twice-differentiable log-concave densities on the real line. Then d log f dt d log f 2 dt t f sds t f 2 sds. This result demonstrates a monotone relationship between derivatives of log densities and cumulative distribution functions. In particular, letting X and Y correspond to cumulative distribution functions F F 2, then X is said to be stochastically greater than Y and consequently for a monotone increasing function g, EgX > EgY. A discussion of the application of stochastic ordering to IRT is given in van der Linden A Paradoxical Test In this section, we construct a test that exhibits paradoxical results for a wide range of statistical estimates of θ. We make use of this test throughout the proofs of our results in Sections 4 and 5. Our main message 7

8 is that among all non-simple complete parametric classes of items with log-negative second derivatives, it is impossible to use the form of the response functions alone to rule out paradoxical results, and one must therefore consider the particular combination of items in a test. Assume that F is any complete parametric non-simple class of two-dimensional monotone items with log-negative second derivatives see Definitions Let P,..., P N be items selected from this family with responses y N = y,..., y N. We assume that the second derivatives of log P i are strictly negative for at least one item i and that l N θ; y N has a unique maximum. We take P N to be such that P N θ, θ 2, γ N = P N 0, θ 2, γ N = P N θ 2, γ N. This item is available by the completeness of F. We define response sequences y = y N, and y 0 = y N, 0 so that y 0 y. Each of the results in Sections 4 and 5 shows that a statistical estimate of ability in dimension has ˆθ y < ˆθ y 0. There is nothing special about the first dimension or the final item; they have been chosen for notational, and conceptual, convenience. The test constructed here requires the final item to exhibit simple structure. This is not required to produce paradoxical results. The continuity arguments of Sec. 7 demonstrate that the result will continue to hold when the final item is replaced by P N ɛ, θ 2, γ N for ɛ > 0. 4 Frequentist Estimates in Two-Dimensional MIRT Many frequentist estimates can be regarded as functionals of the derivative of the log likelihood. Our particular examples are the MLE and the extrema of likelihood ratio confidence intervals. Our purpose in this section is to demonstrate that in any complete parametric class of models there are tests for which these statistics are paradoxical scores. Without loss of generality, we consider statistical estimates for θ. 4. Maximum Likelihood Estimates The MLE for θ is given by the vector ˆθ MLE y that maximizes lθ; y, or alternatively that satisfies l θ ˆθ MLE ; y = 0 assuming that this exists and is unique. ˆθ,MLE y to denotes the first entry of ˆθ MLE y. Theorem 4.. Let F be a non-simple complete parametric class of monotone two-dimensional response models with log negative second derivatives. There exists a test with response functions P,..., P N selected from F, and response vectors y and y 0 to this test that give rise to a paradoxical result. The technique of this proof is to show that in the test defined in Sec. 3.3, ˆθ 2,MLE must increase after the final item is answered correctly and that ˆθ,MLE must decrease by Lemma 3.2. This technique is central to the analysis of the paper. In this proof, we rely on the final item not depending on θ. However, in Sec. 6 we demonstrate how the same idea can be applied to analyze more general tests when linearly compensatory models are employed. The formal proof is given in Appendix B Profile Likelihood Ratio Confidence Bounds We can extend paradoxical results for MLEs to confidence bounds on estimates. A general method for constructing confidence intervals is based on the acceptance region of a likelihood ratio test. A profile 8

9 likelihood ratio lower confidence bound is the smallest value for θ such that a likelihood ratio test for this value against the MLE would not be rejected: { ˆθ,P LB y = min θ : l θ, ˆθ } 2,MLE θ ; y; y l ˆθMLE ; y K. Here K is a cut-off value. This is usually taken to be a quantile of a χ 2 random variable. Theorem 4.2. Under the conditions of Theorem 4., there exists a test P,..., P N drawn from F for which ˆθ,P LB y is not a regular score. The proof for this theorem is given in Appendix B.2. The proof for the equivalent unconditional upper profile confidence bound follows the same construction. We note that the use of confidence bounds based on the Fisher information matrix is conspicuously absent from this analysis. This is because the generality of our assumptions does not allow us to exclude local features in the likelihood which may produce quite different bounds from those obtained by likelihood ratio confidence intervals. In such cases we consider the likelihood ratio confidence bounds to be preferable. 5 Bayesian Estimates in Two-Dimensional MIRT We begin with the maximum a posteriori estimate MAP: { ˆθ MAP = θ : f } θ θ y = 0. As with the MLE, ˆθ,MAP y indicates the first element of ˆθ MAP. Theorem 5.. Under the conditions of Theorem 4., let µ be an independent log-concave prior. There exists a test P,..., P N drawn from F for which ˆθ,MAP y is not a regular score. The proof of this theorem is similar to that of Theorem 4. and is omitted. The analogous result to Theorem 4.2 demonstrates that lower or upper bounds based on contours of the log posterior also exhibit paradoxical results. 5. Marginal Inference Many Bayesian estimates for θ, including credible intervals, can be expressed in terms of monotone functionals of their marginal cumulative distribution function. The most commonly found of these is the expected a posteriori estimate EAP: ˆθ EAP = R d θfθdθ. We can demonstrate paradoxical results for a much more general class of estimates. Theorem 5.2. Let T be a decreasing monotone functional of cumulative distribution functions and F a nonsimple complete parametric class of monotone response models with log-negative second derivatives. There exists a test P,..., P N drawn from F for which T [F θ ; y] is not a regular score. The proof of this theorem is given in Appendix C. It is reasonable to expect any sensible Bayesian estimate to be a monotone decreasing function of the posterior cumulative distribution function; that is, to preserve the stochastic ordering of posterior distributions. The following two corollaries are immediate: 9

10 Corollary 5.. Under the conditions of Theorem 5.2, there exists a test for which the EAP for θ is not a regular score. Corollary 5.2. Under the conditions of Theorem 5.2, there exists a test for which any upper or lower credible bound for θ is not a regular score. 6 Linearly Compensatory Models We can refine our analysis somewhat for the class of linearly compensatory models such as the logistic and normal ogive, in which θ enters the model only as a linear combination θ T a. In this case, we can provide more precise conditions under which paradoxical results may occur without using completeness of the class. In particular, we consider a test using items with linearly compensatory response functions P θ T a,..., P N θ T a N We assume that the P i have log-negative second derivatives with w i d 2 log P i t/dt 2 w i2, and w i d 2 log P i t/dt 2 w i2 for all t and some w i, w i2 < 0, with w i possibly infinite. Let A be the matrix with a T i on the ith row. We also let W and W 2 be diagonal matrices with ith diagonal elements w i and w i2 respectively. We note that the Hessian of the log likelihood can now be written as A T W A for W a diagonal matrix with W W W Frequentist Inference Theorem 6.. Let P θ T a,..., P N θ T a N be response functions for a test employing linearly compensatory monotone response models for abilities θ R d. Let y N = y,..., y N be any response sequence N such that ˆθ MLEy N exists and is unique. Let P N θ T b be a further item in the test and y = y,, y 0 = y, 0. A sufficient condition for ˆθ,MLE y < ˆθ,MLE y 0 is e T A T W A b > 0 2 for all diagonal matrices W such that W W W 2 where e =, 0,..., 0 T d-vector. R d the first Euclidean The condition in this theorem is not immediately interpretable. It will be simplified below. The mathematical strategy for the proof is as follows: we consider a transformation of ability space θ ψ so that, without loss of generality, P N depends only on ψ d. From the arguments used to prove Theorem 4., we know that the MLE for ψ d must increase if the final item is answered correctly. We consider the MLE for θ as a function of ψ d and show that its derivative is negative under the condition 2. Since ψ d increases, the MLE for θ must decrease. We have given the formal details for this proof in Appendix D.. The requirement that the MLE exist and be unique is a technical necessity. It rules out tests where the item parameters all lie on a subspace of R d. This can be expected to be unusual in practice with the notable exception of multidimensional Rasch models. The condition also rules out some answer sequences. Most immediately, all-correct and all-incorrect answer sequences do not have unique MLEs regardless of the item parameters see, for example, ***** Note also that this result excludes tests made up solely of simple structure items. In this case each row of A has only one non-zero entry and A T W A is diagonal and negative definite; 2 is therefore either negative or zero. 0

11 Corollary 6.. Let P θ T a,..., P N θ T a N be response functions for a test comprised of monotone linearly compensatory items with log negative derivatives. Let the test be ordered such that a N a N < a i, i =,..., N. 3 a i Then for any response pattern y N = y,..., y N such that ˆθ,MLE y N is defined and unique, setting y = y N, and y 0 = y N, 0 gives ˆθ,MLE y 0 > ˆθ,MLE y. The proof of this corollary is quite involved and has been given in Appendix D.2. This result demonstrates that for every test using linearly compensatory models, the MLE exhibits paradoxical results. Moreover, almost all answer sequences produce paradoxical results, either increasing the estimate of θ by getting more items incorrect, or decreasing it by getting more correct. It also provides a rule for finding which item to change: the one with least relative weight on dimension. Should a student be able to identify this item, they could be guaranteed to obtain a higher score by answering it incorrectly. We want to emphasize here that the above theorem provides sufficient conditions for an item to produce a paradoxical result; these conditions are not necessary. In addition to the item guaranteed by the corollary, there may be numerous other items that produce paradoxical results. Indeed any item that satisfies 2 is guaranteed to produce a paradoxical result so long as the answers to the other items produce a unique MLE. However, there may be further items that produce paradoxical results for particular values of W. Similar comments can be made for the results below. As a final application, we can extend our analysis above to composite scores of the form Sy = α T ˆθ MLE y. The use of such scores has been suggested, for example, in van der Linden 999. Corollary 6.2. Under the conditions of Theorem 6., let Sy = α T ˆθ MLE y. A sufficient condition for Sy < Sy 0 is α T A T W A b > 0 for all diagonal W W W 2. The proof for this corollary is given in Sec. D Bayesian Inference Bayesian analysis for linearly compensatory models follows along similar lines. In particular, we observe that for an independent and log-concave prior, taking ψ = B θ gives log µ ψ ψ T = B KθB T for K = log µ/ θ θ T. Adding to the second derivatives throughout the proof for Theorem 6. leads to: Theorem 6.2. Under the conditions of Theorem 6., let µ be an independent and log-concave prior with second derivatives bounded below by diagonal matrix K. A sufficient condition for ˆθ,MAP y < ˆθ,MAP y 0 is A T W A + K b > 0 4 for all diagonal W such that W W W 2 and K K 0. e T We note that this condition no longer admits paradoxical results across all answer sequences for which the estimate is defined. In particular, in the case of a two-dimensional θ and Gaussian prior with variances σ 2 and σ2, 2 4 can be reduced to b 2 > b N i= w ia 2 i2 + σ2 2 N = w. 5 ia i a i2

12 It is possible that there is no ordering of the answers for which this is true. The condition becomes more stringent as σ 2 2 decreases. Using a smaller value of σ 2 2 reduces the variation of θ 2 in the posterior distribution and we can think of this strategy as shrinking the test towards being a unidimensional test on dimension. However, we also note that the condition becomes less stringent as N increases and 5 approaches 3. Alternatively, reversing the inequality in 5 can be shown to guarantee that ˆθ,MAP y behaves regularly at the final item. This could be turned into a general criterion for designing tests. However, doing so would restrict to either short tests, low discrimination parameters or strong priors. The condition 5 also provides conditions for paradoxical marginal inference: Theorem 6.3. Under the conditions in Theorem 6.2, let the dimension of θ be 2, and T [F θ ; y] a decreasing monotone functional of the marginal distribution for θ. Let the second derivative of the log prior be bounded below by k. A sufficient condition for T [F θ y ] < T [F θ y 0 ] is for all w i < w i < w i2. The proof of this Theorem is given in Appendix D.4. 7 Model Extensions N b 2 i= > w ia 2 i2 + k b N = w ia i a i2 We have made several assumptions in order to simplify our analysis. In particular, the use of log-concave response models and prior densities with no correlation between ability traits is not always realistic in practice. The relaxation of these assumptions significantly complicates the analysis. This section outlines a number of specific model extensions that are commonly used in practice and some ways to extend the mathematical results above. 7. Extending Existence Results 7.. Embedding in a Larger Space The following result is a direct consequence of the continuity of the statistical estimates we have examined on the various parameters that define the test and estimation procedure: Theorem 7.. Let sˆθy; γ be a monotone function of a MLE, MAP or EAP of θ, or profile confidence bound or credible bound for θ. Let γ be the collection of test parameters including item parameters and parameters of the prior such that the prior and item response functions vary continuously with γ. If y 0 y and sˆθy 0 ; γ > sˆθy ; γ, then for some ɛ > 0, γ γ < ɛ sˆθy 0 ; γ > sˆθy ; γ. This theorem demonstrates that paradoxical results can occur in more general settings than those covered by our assumptions, so long as our assumptions hold in a subspace of the model. In particular, this covers the following models: No Item Has Simple Structure: The results in Sections 4 and 5 were predicated on a test in which the final item response function did not depend on θ, but at least one of the previous items depended on both θ and θ 2. This test is a special case of the set of the tests in a complete parametric class of items. Theorem 7. implies that paradoxical results can occur when all items load onto both dimensions. 2

13 Three or More Ability Dimensions: Our results in Sections 4 and 5 may be viewed as a special case of using a d-dimensional ability vector in which no item depends on the final d 2 abilities. Theorem 7. implies that paradoxical results can occur if the loadings on the final d 2 items are sufficiently small. More generally, if the multidimensional analogue of P N θ in Sec. 3.3 places no weight on θ and ˆθ i,mle N y N ˆθ i,mle y for i = 2,..., d, the arguments used above give that ˆθ,MLE N y < ˆθ,MLE N y 0. If we do not have ˆθ i,mle N y N ˆθ i,mle y, a paradoxical result must hold for some ability dimension other than. These ideas can also be extended to our results for Bayesian inference. Guessing Parameters: A common variant of the item response models examined so far is to introduce guessing parameters of the form P i θ = c i + c i P i θ. This ensures that a subject always has probability of at least c i of answering the item correctly. These models violate our log-negative second derivative condition unless c i = 0 for all i. Theorem 7. implies that paradoxical results will occur for some c i > 0. Non-independence Priors: Bayesian estimates that use priors in which abilities are positively correlated do not always produce log posteriors with positive second derivatives. Letting µθ, ρ be such that µθ, 0 factorizes into log-concave densities, Theorem 7. implies that Bayesian estimates using ρ > 0 can also exhibit paradoxical results. In longer tests, the influence of the prior will be overwhelmed by the data, whatever correlation is used. In each of these cases, our results demonstrate that paradoxical results can occur in tests whose parameters are near those defined in Sec Exactly how far away from these parameters it is still possible to produce paradoxical results will depend on the number of items, the shape of the item response function and the specific parameters in the test Local Conditions Theorems 4. and 5. only require the log likelihood or log posterior to have negative second derivatives in a local region of ˆθ y 0 and ˆθ y. Theorem 7.2. Let l N N θ; y be the log likelihood after N items in a test. Define ˆθ θ 2 ; y to be the conditional MLE for θ as in 0 and assume this is unique. Assume that the response function of the final item does not depend on θ and let ˆθ N y 0 and ˆθ N y be the MLE estimates when the final item is answered incorrectly and correctly, respectively. A sufficient condition for ˆθ N y 0 > ˆθ N y is that the second derivatives of l N θ; y are negative on the line defined by ˆθ N θ 2 ; y, θ 2 for θ 2 between ˆθ 2 N y 0 and ˆθ 2 N y. Proof. This is a direct consequence of the implicit function theorem as used in Theorem 4. restricted to the line ˆθ N θ 2, θ 2. The equivalent result can be shown for MAPs by replacing the log likelihood above with the log posterior. Verify that these points of inflection do not occur near the MLEs or MAPs appears to be analytically difficult, but provides an intuitive motivation for expecting paradoxical results to occur in more general settings than we examine. A similar analysis of EAPs is also possible, but complex. 3

14 7.2 Correlated Priors and Linearly Compensatory Models In order to refine our analysis in a specific case, we examine linearly compensatory models with Gaussian priors. It is clear from the results in Sec. 6 that if K in Theorem 6.2 is constant, it need not be diagonal and a general sufficient condition for a paradoxical result is simply 4. In two dimensions, if we set the prior variances to σ 2 and σ 2 2 with prior correlation ρ, condition 5 becomes a N2 a N > N i= w ia 2 i2 + σ2 2 ρ2 N = w ia i a i2 6 ρ ρ 2 σ σ 2 and we see that making ρ closer to increases the right hand side of 6, resulting in a smaller set of items that will produce paradoxical results. 7.3 An Empirical Study By way of empirically studying the prevalence of paradoxical results under the model violations discussed above, we investigated the operational data in ***** In that paper, a linearly compensatory twodimensional logistic model was used that incorporated guessing parameters and a standard normal prior for a 67-item test of English skills, given to 2500 grade 5 students. Among these students, four pairs exhibited the Jill-and-Jane scenario in the introduction, one answering every question as well or better than the other but yet obtaining a worse result on dimension. Depending on the consequences of the test, this may be regarded as being rare, although it may still be unacceptable in high-stakes settings. Beyond directly comparing subjects, we can also ask the hypothetical Would this subject have done better by deliberately answering some items incorrectly or vice versa?. For every one of these subjects, changing the item with largest relative weight on θ 2 caused ˆθ,EAP y to produce paradoxical results. To illustrate the concerns that these results might produce, ***** 2007 investigated setting passing thresholds for these data. They found it was possible to choose a threshold on θ so that 6% of subjects could be moved from fail to pass by getting more questions wrong. ***** 2007 also investigated varying the prior correlation, ρ; while larger values reduced the incidence of paradoxical results, setting ρ as high as 0.8 did not eliminate them entirely. 7.4 Discrete Ability Spaces Some tests or models such as cognitive diagnosis models CDMs; see Haertel 990, Roussos et al. 2007, and von Davier 2005 classify subjects into one of multiple ordered categories along each dimension. In this context, an assignment is made to one of the categories and thus the use of MLE and MAP estimation is standard. We illustrate the existence of paradoxical results on a two-dimensional ability space partitioned into categories master and non-master. Here, items are monotone in the sense of Definition 2.2 if the probability of getting an item correct for a master is greater than the probability of getting an item correct for a non-master. Table gives numerical values for a hypothetical three-item test that produces a paradoxical result along with the log likelihood values of the answer sequences,0, and,0,0. We observe that the first two items create a ridge giving highest values to the assignments non-master, master and master, nonmaster with the latter being highest. The final item only changes with dimension 2; getting it correct moves the MLE for that dimension from non-master to master. The effect of this is to move the assignment for dimension the other way, creating a paradoxical result. We note that in this example the only way to 4

15 PItem Correct Cumulative log likelihood Category n,n n,m m,n m,m Table : A paradoxical test on a two-dimensional discrete ability space. Left: probability of a correct answer in each class, coded n for non-master and m for master. Right: cumulative log likelihood following a,0, response pattern. The final column gives the log likelihood after a,0,0 response pattern. MLEs are starred. Notice that the category maximizing the likelihood moves from n,m to m,n when the third item is changed from correct to incorrect. achieve a paradoxical result is to move from the non-master, master category into master, non-master or vice versa. Due to the discrete nature of the estimates, this is less common, than paradoxical results for continuous ability spaces. 8 Enforcing Regularity Our analysis indicates that paradoxical results are endemic to multidimensional item response models. In the case of two-dimensional linearly compensatory models, it is not possible to design a test using frequentist estimates that avoids this problem. This fact suggests a need to review the statistical estimates that we use. One immediate method for modifying a MAP or MLE is to impose constraints on the estimate. That is, we seek ability estimates for all subjects such that no paradoxical results occur for the observed response sequences. For the case of MAP estimates, we want estimated vectors θ,..., θ n for n subjects with ˆθ MAP,,..., ˆθ MAP,n = argmax θ,...,θ n lθ ; y lθ n ; y n + log µθ log µθ n subject to the constraints ˆθ MAP,j ˆθ MAP,k if y j y k. 7 The number of these constraints may be reduced by taking account of the transitivity relations ˆθ MAP,i < ˆθ MAP,j and ˆθ MAP,j < ˆθ MAP,k ˆθ MAP,i < ˆθ MAP,k. 8 Here these inequalities could be enforced just for a single dimension of interest, or for all dimensions, depending on the purpose of the test. Similar estimates could be obtained by defining appropriate objective functions for confidence bounds and Bayesian estimates. In the operational data investigated in ***** 2007, there were 2500 ability vectors to estimate, with 5,354 inequalities of the form 7. Accounting for transitivity relations 8 reduced the number of constraints to 3,764. We performed the constrained MAP estimation using the IPOPT constrained optimization routines Wächter and Biegler 2006, starting from the unconstrained EAP estimates. It took 2.5 seconds of The EAP is used here rather than the MAP because it may be calculated without requiring a non-linear optimization. We have elected to use a Bayesian analysis in order to avoid the lack of identifiability of MLEs studied in *****

16 CPU time when no constraints were imposed; this increased to 0.4 when the constraints were imposed for θ only and 35.5 CPU seconds when the estimates of both θ and θ 2 were constrained to satisfy 7. In all, the mean squared difference between the unconstrained EAP and MAP estimates was The mean squared difference between the unconstrained MAP and the MAP constrained to be regular in both θ and θ 2 was Thus, the effect of enforcing regularity constraints has a somewhat larger magnitude than the choice of estimate for θ. Of these, those subject who appeared in the four pairs violating the constraints has twice the squared displacement on average as the others. When the constraint was only imposed on θ 2, the mean squared difference was reduced to Some of this difference is due to the numerical approximations used for constrained optimization within IPOPT. It appears then, that imposing constraints to avoid what might be termed observed paradoxical results is computationally feasible for at least moderate sizes of cohorts and creates moderate distortion in the statistical estimates of a subject s ability. By way of understanding this, we give the following theorem: Theorem 8.. Let {θ k } n k= be a fixed set of finite item parameters for n subjects. Let {ˆθ u kn } n k= be a set of θ estimates for each of these n subjects based on N items without constraints. Let ˆθ c kn, k =,..., n be the corresponding estimate under constraints 7. Assume that N with P j θ k uniformly bounded away from zero and one for all k,..., n and j,...,. Then ˆθ c kn ˆθ u kn 0 almost surely. The proof of this theorem is given in Appendix E. It relies on the fact that observing two subjects with partial inequality is rare for long tests. In short tests, using Bayesian methods is likely to reduce the incidence of observed non-monotonicity; in long tests it is unlikely that students can be placed in a definite order at all. Thus, if observed paradoxical results are the only concern, enforcing regularity constraints will remove the few instances of it without unduly distorting the estimated parameters. The constrained optimization above would avoid the scenario described in the introduction. However, it has the disadvantage of changing the results if more subjects are added. It also does not address the counterfactual if I had only gotten this question wrong, I would have passed. There is, of course, no reason that we need restrict to only those response patterns seen. One way to avoid these concerns is to estimate parameters for all possible response patterns, rather than just those observed, under all relevant regularity conditions. Estimates for θ then amount to matching the observed response pattern to our enumeration of all possible responses and assigning the corresponding score. For an N-item test with a d-dimensional ability space we would need to estimate all d2 N possible ability parameters under d N N k=0 k N k constraints. Doing so is clearly infeasible for all but very short tests. 9 Conclusion Jane and Jill from our opening story were meant to demonstrate the real-world consequences of paradoxical results. Clearly, any testing procedure where doing better has adverse consequences has implications for fairness. Theirs is not a purely hypothetical case, nor a case that could occur only under pathological conditions. ***** 2007 found four pairs of subjects that matched their situation in a real-world data set and a much larger number who could have done better by deliberately answering questions incorrectly; the current paper complements these empirical findings by providing a mathematical explanation for how and when statistical estimation can produce paradoxical results. Our discussion has uncovered sufficient conditions for paradoxical results in multidimensional item response models. Paradoxical results in statistical estimates for multidimensional item response models are common. They are ubiquitous in some of the most popular models. This does not make them unreason- 6

17 able from a statistical standpoint and should not preclude the use of multidimensional models in diagnostic testing. However, our results do provide reason to be cautious about their use in high-stakes tests. Our results do not provide a complete analysis for the extensions cited in Sec. 7 of the form given in Sec. 6. Further extensions such as an analysis of item bundles should also be investigated. Should the user wish to both use statistical methods and maintain regularity, the constrained estimates discussed in Sec. 8 provide a tractable means of at least removing observable paradoxical results. When the broader counterfactual is of concern, the practitioner may reconsider the use of multidimensional testing. References Ackerman, T Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement 20 4, Antal, T On multidimensional item response theory a coordinate free approach. Electronic Journal of Statistics, Bock, R., R. Gibbons, and E. Muraki 988. Full-information item factor analysis. Applied Psychological Measurement 2, Bolt, D. and V. Lall Estimation of compensatory and noncompensatory multidimensional item response models using markov chain monte carlo. Applied Psychological Measurement 27 6, Finkelman, M., G. Hooker, and J. Wang Unidentifiability and lack of monotonicity in the multidimensional three-parameter logistic model. under review. Haertel, E Continuous and discrete latent structure models of item response data. Psychometrika 55, Junker, B. and K. Sijtsma Latent and manifest monotonicity in item response models. Applied Psychological Measurement 24, Reckase, M The difficulty of test items that measure more than one ability. Applied Psychological Measurement 9, Reckase, M The past and future of multidimensional item response theory. Applied Psychological Measurement 2, Roussos, L., L. DiBello, W. Stout, S. Hartz, R. Henson, and J. Templin The fusion model skills diagnosis system, pp Cambridge, UK: Cambridge University Press. Segall, D. O Principles of multidimensional adaptive testing. In W. J. van der Linden and C. A. W. Glas Eds., Computerized adaptive testing: Theory and Practice, pp Boston: Kluwer Academic Publishers. Thissen, D. and L. Steinberg 997. A response model for multiple choice items. In W. J. van der Linden and R. K. Hambleton Eds., Handbook of item response theory, pp New York: Springer-Verlag. van der Linden, W. J Stochastic order in dichotomous item response models for fiixed, adaptive and multidimensional tests. Psychometrika 63 3, van der Linden, W. J Multidimensional adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics 24, von Davier, M A general diagnostic model applied to language testing data. ETS research report 05-6, Educational Testing Service, Princeton, NJ. 7

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

Paradoxical Results in Multidimensional Item Response Theory

Paradoxical Results in Multidimensional Item Response Theory UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model

More information

Equivalency of the DINA Model and a Constrained General Diagnostic Model

Equivalency of the DINA Model and a Constrained General Diagnostic Model Research Report ETS RR 11-37 Equivalency of the DINA Model and a Constrained General Diagnostic Model Matthias von Davier September 2011 Equivalency of the DINA Model and a Constrained General Diagnostic

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Online Item Calibration for Q-matrix in CD-CAT

Online Item Calibration for Q-matrix in CD-CAT Online Item Calibration for Q-matrix in CD-CAT Yunxiao Chen, Jingchen Liu, and Zhiliang Ying November 8, 2013 Abstract Item replenishment is important to maintaining a large scale item bank. In this paper

More information

Computerized Adaptive Testing With Equated Number-Correct Scoring

Computerized Adaptive Testing With Equated Number-Correct Scoring Computerized Adaptive Testing With Equated Number-Correct Scoring Wim J. van der Linden University of Twente A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

JORIS MULDER AND WIM J. VAN DER LINDEN

JORIS MULDER AND WIM J. VAN DER LINDEN PSYCHOMETRIKA VOL. 74, NO. 2, 273 296 JUNE 2009 DOI: 10.1007/S11336-008-9097-5 MULTIDIMENSIONAL ADAPTIVE TESTING WITH OPTIMAL DESIGN CRITERIA FOR ITEM SELECTION JORIS MULDER AND WIM J. VAN DER LINDEN UNIVERSITY

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis Xueli Xu Department of Statistics,University of Illinois Hua-Hua Chang Department of Educational Psychology,University of Texas Jeff

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

examples of how different aspects of test information can be displayed graphically to form a profile of a test

examples of how different aspects of test information can be displayed graphically to form a profile of a test Creating a Test Information Profile for a Two-Dimensional Latent Space Terry A. Ackerman University of Illinois In some cognitive testing situations it is believed, despite reporting only a single score,

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

A Use of the Information Function in Tailored Testing

A Use of the Information Function in Tailored Testing A Use of the Information Function in Tailored Testing Fumiko Samejima University of Tennessee for indi- Several important and useful implications in latent trait theory, with direct implications vidualized

More information

Development and Calibration of an Item Response Model. that Incorporates Response Time

Development and Calibration of an Item Response Model. that Incorporates Response Time Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,

More information

Monte Carlo Simulations for Rasch Model Tests

Monte Carlo Simulations for Rasch Model Tests Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,

More information

Online Question Asking Algorithms For Measuring Skill

Online Question Asking Algorithms For Measuring Skill Online Question Asking Algorithms For Measuring Skill Jack Stahl December 4, 2007 Abstract We wish to discover the best way to design an online algorithm for measuring hidden qualities. In particular,

More information

The Discriminating Power of Items That Measure More Than One Dimension

The Discriminating Power of Items That Measure More Than One Dimension The Discriminating Power of Items That Measure More Than One Dimension Mark D. Reckase, American College Testing Robert L. McKinley, Educational Testing Service Determining a correct response to many test

More information

Detecting Exposed Test Items in Computer-Based Testing 1,2. Ning Han and Ronald Hambleton University of Massachusetts at Amherst

Detecting Exposed Test Items in Computer-Based Testing 1,2. Ning Han and Ronald Hambleton University of Massachusetts at Amherst Detecting Exposed Test s in Computer-Based Testing 1,2 Ning Han and Ronald Hambleton University of Massachusetts at Amherst Background and Purposes Exposed test items are a major threat to the validity

More information

Contents. 3 Evaluating Manifest Monotonicity Using Bayes Factors Introduction... 44

Contents. 3 Evaluating Manifest Monotonicity Using Bayes Factors Introduction... 44 Contents 1 Introduction 4 1.1 Measuring Latent Attributes................. 4 1.2 Assumptions in Item Response Theory............ 6 1.2.1 Local Independence.................. 6 1.2.2 Unidimensionality...................

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

P Values and Nuisance Parameters

P Values and Nuisance Parameters P Values and Nuisance Parameters Luc Demortier The Rockefeller University PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics CERN, Geneva, June 27 29, 2007 Definition and interpretation of p values;

More information

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Introduction To Confirmatory Factor Analysis and Item Response Theory

Introduction To Confirmatory Factor Analysis and Item Response Theory Introduction To Confirmatory Factor Analysis and Item Response Theory Lecture 23 May 3, 2005 Applied Regression Analysis Lecture #23-5/3/2005 Slide 1 of 21 Today s Lecture Confirmatory Factor Analysis.

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

What is an Ordinal Latent Trait Model?

What is an Ordinal Latent Trait Model? What is an Ordinal Latent Trait Model? Gerhard Tutz Ludwig-Maximilians-Universität München Akademiestraße 1, 80799 München February 19, 2019 arxiv:1902.06303v1 [stat.me] 17 Feb 2019 Abstract Although various

More information

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov Pliska Stud. Math. Bulgar. 19 (2009), 59 68 STUDIA MATHEMATICA BULGARICA ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES Dimitar Atanasov Estimation of the parameters

More information

Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities

Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities by Yuan H. Li Prince Georges County Public Schools, Maryland William D. Schafer University of Maryland at

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Gerhard Tutz On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Technical Report Number 218, 2018 Department

More information

Constructing Ensembles of Pseudo-Experiments

Constructing Ensembles of Pseudo-Experiments Constructing Ensembles of Pseudo-Experiments Luc Demortier The Rockefeller University, New York, NY 10021, USA The frequentist interpretation of measurement results requires the specification of an ensemble

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

DIAGNOSTIC MEASUREMENT FROM A STANDARDIZED MATH ACHIEVEMENT TEST USING MULTIDIMENSIONAL LATENT TRAIT MODELS

DIAGNOSTIC MEASUREMENT FROM A STANDARDIZED MATH ACHIEVEMENT TEST USING MULTIDIMENSIONAL LATENT TRAIT MODELS DIAGNOSTIC MEASUREMENT FROM A STANDARDIZED MATH ACHIEVEMENT TEST USING MULTIDIMENSIONAL LATENT TRAIT MODELS A Thesis Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

2012 Assessment Report. Mathematics with Calculus Level 3 Statistics and Modelling Level 3

2012 Assessment Report. Mathematics with Calculus Level 3 Statistics and Modelling Level 3 National Certificate of Educational Achievement 2012 Assessment Report Mathematics with Calculus Level 3 Statistics and Modelling Level 3 90635 Differentiate functions and use derivatives to solve problems

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Investigation into the use of confidence indicators with calibration

Investigation into the use of confidence indicators with calibration WORKSHOP ON FRONTIERS IN BENCHMARKING TECHNIQUES AND THEIR APPLICATION TO OFFICIAL STATISTICS 7 8 APRIL 2005 Investigation into the use of confidence indicators with calibration Gerard Keogh and Dave Jennings

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Yu-Feng Chang IN PARTIAL FULFILLMENT

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 38 Hierarchical Cognitive Diagnostic Analysis: Simulation Study Yu-Lan Su, Won-Chan Lee, & Kyong Mi Choi Dec 2013

More information

Statistical Analysis of Q-matrix Based Diagnostic. Classification Models

Statistical Analysis of Q-matrix Based Diagnostic. Classification Models Statistical Analysis of Q-matrix Based Diagnostic Classification Models Yunxiao Chen, Jingchen Liu, Gongjun Xu +, and Zhiliang Ying Columbia University and University of Minnesota + Abstract Diagnostic

More information

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

A noninformative Bayesian approach to domain estimation

A noninformative Bayesian approach to domain estimation A noninformative Bayesian approach to domain estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu August 2002 Revised July 2003 To appear in Journal

More information

A Bayesian solution for a statistical auditing problem

A Bayesian solution for a statistical auditing problem A Bayesian solution for a statistical auditing problem Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 July 2002 Research supported in part by NSF Grant DMS 9971331 1 SUMMARY

More information

Group Dependence of Some Reliability

Group Dependence of Some Reliability Group Dependence of Some Reliability Indices for astery Tests D. R. Divgi Syracuse University Reliability indices for mastery tests depend not only on true-score variance but also on mean and cutoff scores.

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

A RATE FUNCTION APPROACH TO THE COMPUTERIZED ADAPTIVE TESTING FOR COGNITIVE DIAGNOSIS. Jingchen Liu, Zhiliang Ying, and Stephanie Zhang

A RATE FUNCTION APPROACH TO THE COMPUTERIZED ADAPTIVE TESTING FOR COGNITIVE DIAGNOSIS. Jingchen Liu, Zhiliang Ying, and Stephanie Zhang A RATE FUNCTION APPROACH TO THE COMPUTERIZED ADAPTIVE TESTING FOR COGNITIVE DIAGNOSIS Jingchen Liu, Zhiliang Ying, and Stephanie Zhang columbia university June 16, 2013 Correspondence should be sent to

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

Nonparametric Online Item Calibration

Nonparametric Online Item Calibration Nonparametric Online Item Calibration Fumiko Samejima University of Tennesee Keynote Address Presented June 7, 2007 Abstract In estimating the operating characteristic (OC) of an item, in contrast to parametric

More information

Item Response Theory (IRT) Analysis of Item Sets

Item Response Theory (IRT) Analysis of Item Sets University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence A C T Research Report Series 87-14 The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence Terry Ackerman September 1987 For additional copies write: ACT Research

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

The Difficulty of Test Items That Measure More Than One Ability

The Difficulty of Test Items That Measure More Than One Ability The Difficulty of Test Items That Measure More Than One Ability Mark D. Reckase The American College Testing Program Many test items require more than one ability to obtain a correct response. This article

More information

Diagnostic Classification Models: Psychometric Issues and Statistical Challenges

Diagnostic Classification Models: Psychometric Issues and Statistical Challenges Diagnostic Classification Models: Psychometric Issues and Statistical Challenges Jonathan Templin Department of Educational Psychology The University of Georgia University of South Carolina Talk Talk Overview

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models Whats beyond Concerto: An introduction to the R package catr Session 4: Overview of polytomous IRT models The Psychometrics Centre, Cambridge, June 10th, 2014 2 Outline: 1. Introduction 2. General notations

More information

Bootstrap Tests: How Many Bootstraps?

Bootstrap Tests: How Many Bootstraps? Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002

More information

Bayesian Networks in Educational Assessment Tutorial

Bayesian Networks in Educational Assessment Tutorial Bayesian Networks in Educational Assessment Tutorial Session V: Refining Bayes Nets with Data Russell Almond, Bob Mislevy, David Williamson and Duanli Yan Unpublished work 2002-2014 ETS 1 Agenda SESSION

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

MULTIPLE CHOICE QUESTIONS DECISION SCIENCE

MULTIPLE CHOICE QUESTIONS DECISION SCIENCE MULTIPLE CHOICE QUESTIONS DECISION SCIENCE 1. Decision Science approach is a. Multi-disciplinary b. Scientific c. Intuitive 2. For analyzing a problem, decision-makers should study a. Its qualitative aspects

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Step-Stress Models and Associated Inference

Step-Stress Models and Associated Inference Department of Mathematics & Statistics Indian Institute of Technology Kanpur August 19, 2014 Outline Accelerated Life Test 1 Accelerated Life Test 2 3 4 5 6 7 Outline Accelerated Life Test 1 Accelerated

More information

Bayesian vs frequentist techniques for the analysis of binary outcome data

Bayesian vs frequentist techniques for the analysis of binary outcome data 1 Bayesian vs frequentist techniques for the analysis of binary outcome data By M. Stapleton Abstract We compare Bayesian and frequentist techniques for analysing binary outcome data. Such data are commonly

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Computerized Adaptive Testing for Cognitive Diagnosis. Ying Cheng University of Notre Dame

Computerized Adaptive Testing for Cognitive Diagnosis. Ying Cheng University of Notre Dame Computerized Adaptive Testing for Cognitive Diagnosis Ying Cheng University of Notre Dame Presented at the Diagnostic Testing Paper Session, June 3, 2009 Abstract Computerized adaptive testing (CAT) has

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 23 Comparison of Three IRT Linking Procedures in the Random Groups Equating Design Won-Chan Lee Jae-Chun Ban February

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information