NESTED LOGIT MODELS FOR MULTIPLE-CHOICE ITEM RESPONSE DATA UNIVERSITY OF TEXAS AT AUSTIN UNIVERSITY OF WISCONSIN-MADISON

Size: px
Start display at page:

Download "NESTED LOGIT MODELS FOR MULTIPLE-CHOICE ITEM RESPONSE DATA UNIVERSITY OF TEXAS AT AUSTIN UNIVERSITY OF WISCONSIN-MADISON"

Transcription

1 PSYCHOMETRIKA VOL. 75, NO. 3, SEPTEMBER 2010 DOI: /S NESTED LOGIT MODELS FOR MULTIPLE-CHOICE ITEM RESPONSE DATA YOUNGSUK SUH UNIVERSITY OF TEXAS AT AUSTIN DANIEL M. BOLT UNIVERSITY OF WISCONSIN-MADISON Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all distractor categories, making it easier to allow decisions about including distractor information to occur on an item-by-item or application-by-application basis without altering the statistical form of the correct response curves. Marginal maximum likelihood estimation algorithms for the models are presented along with simulation and real data analyses. Key words: Multiple-choice items, multiple-choice models, nested logit models, nominal response model, marginal maximum likelihood estimation, item information, distractor selection information, distractor category collapsibility. Multiple-choice items are a common form of test item in standardized testing and have been a focus of item response theory (IRT) modeling for decades. A major challenge in building appropriate models for multiple-choice tests is the variety of strategies that can be used in responding to items, and the potential for such strategies to vary depending on the type of test item or test (Hutchinson, 1991). Perhaps the most common IRT approach to modeling multiple-choice item responses is reflected by Bock s Nominal Response Model (NRM; Bock, 1972) and related models such as Thissen and Steinberg s Multiple Choice Model (MCM; Thissen & Steinberg, 1984) and Samejima s Guessing Model (SGM; Samejima, 1979). Such models portray the item response in a competing utility framework where each response category is associated with a selection propensity that is a function of the ability measured by the test. More recent models (e.g., Revuelta 2004, 2005) are based on the same general framework but are designed to possess attractive statistical properties, such as rising distractor selection ratios (Love, 1997). The NRM modeling approach seems most apt for conditions in which the item response is based on a comparative evaluation of all response categories. Consider the following item from a test of English usage: Example Item 1. Select the one underlined part of the following sentence that must be changed in order to make the sentence grammatically correct. The average soda can has a tensile strong capable of supporting a weight (A) (B) (C) (D) of one hundred kilograms. Requests for reprints should be sent to Youngsuk Suh, Department of Educational Psychology, University of Texas at Austin, 1 University Station D5800, Austin, TX 78712, USA. yssuh327@gmail.com 2010 The Psychometric Society 454

2 YOUNGSUK SUH AND DANIEL M. BOLT 455 An anticipated strategy for such an item entails evaluating each response option and selecting the option that seems to best satisfy the requirement of the stem. In this case, option (B) should emerge as the appropriate selection for a high ability examinee. By contrast, many multiple-choice item types invoke strategies that involve problem-solving independent of the response categories. Consider for example the following item from a test of elementary mathematics: Example Item 2. A store sells 168 CDs each week. How many CDs does it sell in 24 weeks? (A) 2196 (B) 3210 (C) 4032 (D) 6118 An expected strategy for this item would entail multiplying 168 by 24, which leads to 4032, and selecting response option (C), with no more than a surface-level evaluation of other response options as not being Under such a strategy, the distractors are only considered as potential responses if the examinee is unable to solve the item. The process might be viewed as one in which a problem-solving strategy precedes a guessing strategy (see, e.g., Hutchinson, 1991; San Martin, del Pino, & De Boeck, 2006), and where evaluation of response options only occurs when the problem-solving strategy fails. The purpose of this paper is to present a modeling framework that may provide a better approximation to this latter process. It is further shown that the new models, unlike previous models for multiple-choice data, possess an attractive property of category collapsibility across all distractor options. This property is argued to be of practical value in settings where distractor information may be needed for some applications of the IRT model but not others. For example, studies of cheating behavior (Wollack, 1997) or appropriateness measurement (Drasgow, Levine, & Williams, 1985) commonly find value in attending to distractor selection information, but are often studied in the context of tests where items are intended to be scored correct/incorrect, and where applications such as test equating may be more easily handled using traditional binary models. In still other applications, such as when using an IRT model to estimate latent ability, attending to distractor selection may be useful for some items but not others, depending on the ability of the item writer to design distractors whose attractiveness varies in relation to the trait. It would thus appear that IRT models that are consistent in how the correct response option is modeled, whether including or excluding distractor information, could be of considerable practical value. A potential limitation of multiple-choice models such as the NRM is their lack of a distractor collapsibility property. The decision to model all possible responses under the NRM, for example, implies a multivariate logit with as many categories as response options. Assuming an item with more than two categories, the correct response characteristic curve under the NRM is incompatible with the corresponding curve under a binary logistic model where the item is scored dichotomously (0/1). Figure 1 provides an illustration of the best fitting NRM and 2-parameter logistic model (2PLM) correct response curves applied to the same five-category test item using the same response data, but where all distractors scored incorrect in the 2PLM case. (The item and data for this example came from a real data illustration described shortly.) As the example illustrates, the difference between curves can be fairly substantial. A second purpose of this paper is therefore to illustrate the advantages of a model that possesses collapsibility with respect to all distractor categories. In the current paper, we use the models to examine distractor information on an item-by-item basis, and demonstrate the potential to retain or ignore information provided by distractors in a variable way using just one model.

3 456 PSYCHOMETRIKA FIGURE 1. Illustration of best-fitting NRM correct response curve, 2PLM curve. 1. Bock s Nominal Response Model The NRM uses a multivariate logit to model category selection. Assume v = 1,...,m i possible response categories for item i. A propensity function Z iv (θ j ) = ζ iv + λ iv θ j represents the attractiveness of category v as a function of an examinee ability level θ j using two item category parameters: a slope parameter λ iv and an intercept parameter ζ iv. Z iv (θ j ) is mapped to a probability metric as: exp Z iv(θ j ) P iv (θ j ) = mi k=1 expz ik(θ j ). (1) The probability of selecting category v is thus affected not only by the propensity toward v, but also by the propensities toward all other categories, making the NRM a divide-by-total model (Thissen & Steinberg, 1986). To resolve a statistical indeterminacy, for all θ j we set m i Z iv (θ j ) = 0 (2) which also implies m i λ iv = 0 and m i ζ iv = 0, resulting in 2(m i 1) free parameters to be estimated per item. (For detailed NRM estimation procedures, see Baker & Kim, 2004, pp ) Despite extensions of the NRM to address issues related to random guessing (e.g., the MCM of Thissen & Steinberg, 1984; the SGM of Samejima, 1979), the NRM generally provides as good a fit to real data as these more complex models (Drasgow, Levine, Tsien, Williams, & Mead, 1995). 2. Nested Logit Models Nested logit models (NLMs; McFadden 1981, 1982) provide an alternative to multivariate logit models, and are appropriate for choice settings where selection possesses a hierarchical structure, as when a final choice decision is made through a sequential process. An NLM represents the final choice among a discrete set of choice options conditional upon choices made at

4 YOUNGSUK SUH AND DANIEL M. BOLT 457 higher levels in the hierarchy. The resulting probability of each discrete choice is modeled as a product of the conditional and unconditional probabilities across levels of the hierarchy. In this paper, we adapt the NLM approach to incorporate latent traits, such as an ability θ in IRT, to provide a competing approach to the NRM for multiple-choice test items. Using the NLM framework, we assume the correct response probability to be formulated by the 2PLM or 3-parameter logistic model (3PLM), and model distractor selection conditional upon an incorrect response using Bock s NRM. This results in an NLM having two levels a higher level (level 1) introducing branches that distinguish a correct versus an incorrect response, and a lower level (level 2) introducing branches that distinguish distractors. The response options are consequently separated into two nests, one nest possessing the correct response only, the second nest possessing all distractors. Formulated in this way, NLMs provide a different portrayal of how the examinee arrives at a correct response; while the NRM emphasizes a comparative evaluation of response options, the NLMs emphasize a solution strategy that occurs independent of evaluating the options. Although the most accurate representation probably lies somewhere in-between (see Section 5), the NLM strategy might be expected to provide a better approximation for many multiple-choice items, such as items represented by example item PL-Nested Logit Model While we will consider both 2PLM and 3PLM versions of the NLMs described above (denoted 2PL-NLM and 3PL-NLM, respectively), we consider the 3PL-NLM in greater detail, recognizing the 2PL-NLM as a special case. Suppose a multiple-choice test is composed of n items and each item has one correct answer and m i distractors, or a total of m i + 1 response categories. Let U ij represent the item i response by examinee j (j = 1,...,N) once keyed for correctness (i.e., U ij = 1 if correct, 0 if incorrect). Further, let D ij v denote the item response in an item examinee distractor category array such that D ij v = 1 when examinee j selects distractor category v (v = 1,...,m i )ofitemi, and 0 otherwise. Under the 3PLM, the probability that an examinee of ability θ j chooses the correct response category on item i is modeled as P ( [ ) U ij = 1 θ j = γi + (1 γ i ) exp (β i+α i θ j ), (3) where β i is an intercept parameter, α i a slope parameter, and γ i a lower asymptote parameter, also referred to as a pseudo-guessing parameter. The probability that an examinee selects distractor category v is modeled as the product of the probability of an incorrect response and the probability of selecting distractor category v conditional upon an incorrect response: P ( U ij = 0,D ij v = 1 θ j ) = P ( Uij = 0 θ j ) P ( Dij v = 1 U ij = 0,θ j ) { [ }[ 1 = 1 γ i + (1 γ i ) 1 + exp (β i+α i θ j ) exp Z iv(θ j ) mi k=1 expz ik(θ j ). (4) As under the NRM, we use Z iv (θ j ) = ζ iv + λ iv θ j to define a propensity toward each distractor category v, now conditional upon an incorrect response. Unlike the NRM, the denominator in the conditional probability is obtained by summing exp Z ik(θ j ) across only distractor categories. Following Bock (1972), an arbitrary linear restriction in Equation (2) is imposed for the distractor category parameters. Figure 2 plots item category characteristic curves (ICCCs) for a simulated multiple-choice item with four response categories, where the fourth category represents the correct response. When item responses for the item are scored as binary and analyzed by the 3PLM, the leftside plot represents the characteristic curve for the correct response. For the same item, use of

5 458 PSYCHOMETRIKA FIGURE 2. ICCCs for a simulated item under the 3PLM and 3PL-NLM. the 3PL-NLM results in ICCCs shown to the right. It should be noted that the item parameter estimates for α, β, and γ are identical under both models, as the correct response probability is formulated in both instances under the 3PLM and is not informed by the particular distractors selected. Naturally the 2PL-NLM is also represented by Equations (3) and (4) above, but where γ i = 0. An appealing feature of the 2PL-NLM is that it contains the same number of parameters as the NRM. Thus for a given dataset, the two models can be compared with respect to loglikelihood in terms of which provides the better structural representation of the data Item Parameter Estimation via an MML Approach for the 3PL-NLM Estimation of the 3PL-NLM is possible using a variant of Bock and Aitkin s (1981) marginal maximum likelihood (MML) procedure. Using the U ij and D ij v notation above, let U j denote the correct response vector for examinee j, and D j represent the response pattern matrix of distractor categories for examinee j, and let [U j, D j denote the complete n [max(m i ) + 1 item response matrix for examinee j. To simplify the notations in Equations (3) and (4), let P(U ij = 1 θ j ) = P i (θ j ), P(U ij = 0,D ij v = 1 θ j ) = P iv (θ j ), and P(D ij v = 1 U ij = 0,θ j ) = P iv u=0 (θ j ). Then assuming local independence, the conditional probability of a response pattern matrix for examinee j given θ j, is the joint probability: P ( [U j, D j θ j,ϖ ) = = [ n m i P i (θ j ) u ij P iv (θ j ) d ij v i=1 m i [ n P i (θ j ) u ( ij 1 Pi (θ j ) ) d ij v P iv u=0 (θ j ) d ij v, (5) i=1

6 YOUNGSUK SUH AND DANIEL M. BOLT 459 where ϖ denotes all item parameters. Under Bock and Aitkin s (1981) approach, the marginal probability of the observed response pattern for examinee j is expressed as P([U j, D j ) = P([Uj, D j θ,ϖ)g(θ τ)dθ, where g(θ τ) is a density function with unknown parameters τ. (The j subscript on θ is dropped because θ j can be seen as a random subject sampled from a population.) When combined across examinees, we write the likelihood as L = N j=1 P([U j, D j ). And the natural logarithm of the likelihood is log L = N log P ( [U j, D j ). (6) j=1 The total number of estimable item parameters is n i=1 (2m i + 3). However, in deriving the likelihood equations, it proves convenient to substitute, following the restriction of Equation (2), a reparameterization of the NRM probability (i.e., P iv u=0 (θ j )) that reduces the number of parameters by 2n. Following Bock (1972), instead of estimating ζ v and λ v (v = 1,...,m i ), we use parameters η v and ξ v (v = 1,...,m i 1) that are defined by difference contrasts of the parameters ζ v and λ v. For example, when m i = 3, the new parameters are defined as [ 1 1 ζ1 ζ B 2 3 T 3 2 = 2 ζ λ 1 λ 2 λ [ ζ1 ζ = 2 ζ 1 ζ 3 = λ 1 λ 2 λ 1 λ 3 [ η1 η 2, (7) ξ 1 ξ 2 where T is a transformation matrix. The likelihood equations can be derived with respect to these new parameters η v and ξ v for the distractor categories, as well as with respect to the item parameters for the correct response category (β, α, and γ ). Suppose ω ih represents an item parameter to be estimated for item i and category h. The likelihood in Equation (6) can be differentiated with respect to ω ih as where log L ω ih = N { ( P [Uj, D j )} 1 j=1 log P ( [U j, D j θ,ϖ ) = { [ log P ( [U j, D j θ,ϖ ) ω ih P ( [U j, D j θ,ϖ ) g ( θ τ ) dθ { n m i log P i (θ) u [ ( ij + log 1 Pi (θ) ) d ij v + log P iv u=0 (θ) d } ij v. (8) i=1 When derived for the correct response category of item i, the first partial derivative of Equation (8) with respect to ω ih can be written as ω ih log P ( [U j, D j θ,ϖ ) = ω ih [ m i log P i (θ) u ij + log(1 P i (θ)) d ij v = ω ih [ log Pi (θ) u ij + log ( 1 P i (θ) ) 1 u ij. (9) The summation across items in Equation (8) can be eliminated by assuming that the item parameter estimates are independent across items. As shown in Equation (9), the estimation of the },

7 460 PSYCHOMETRIKA correct category parameters proceeds independent of the distractor category parameters. The first derivative for a distractor category parameter reduces to ω ih log P ( [U j, D j θ,ϖ ) = ω ih [ mi log P iv u=0 (θ) d ij v, (10) implying the distractor category parameters can be estimated independent of the correct category parameters. Estimates of the new parameters η and ξ can then be used to find the values of the estimates of the original and more conventional parameters ζ and λ for each item. For the case in which m i = 3, using Equation (7) and the constraints (i.e., m i ζ iv = 0 and m i λ iv = 0) yields and ζ 1 = η 1 + η 2 3 λ 1 = ξ 1 + ξ 2 3, ζ 2 = η 2 2η 1 3, λ 2 = ξ 2 2ξ 1 3, ζ 3 = η 1 2η 2, (11) 3, λ 3 = ξ 1 2ξ 2. (12) 3 An EM estimation algorithm was programmed using FORTRAN (Digital Equipment Corporation, 1997). The quadrature points and weights, and initial values of the item parameters were chosen using the same default values as in BILOG-MG (Zimowski, Muraki, Mislevy, & Bock, 2003). The convergence criterion for both the Newton Raphson iterations and EM algorithm in terms of parameter change was set to , and the maximum number of EM cycles to 200. Additional details on derivations of the likelihood equations and implementation of the EM algorithm (including the procedure for computing the standard errors of the item parameter estimates) and the software can be obtained from the first author Information Functions for the 3PL-NLM A potential advantage of the NLMs relates to their quantification of item information, and the ease of studying the relative contribution of distractor categories. Due to the distractor collapsibility property, it becomes possible to directly compare the relative amounts of information provided when including versus excluding distractor information using the estimates of just the one model. As noted earlier, such a property is not present in the NRM, where the collapsing of distractor categories results in a change to the correct response ICCC that also implies a change in information. Such a feature makes it difficult to compare information under the two different forms of scoring. Information functions are particularly useful for the NLMs as they can be used to quantify the increase in the precision of ability estimates when attending to distractors. Information functions for the 3PL-NLM can be derived as follows. As shown in Equation (5), the conditional probability of a response pattern matrix for examinee j is written as L j = P ( [U j, D j θ j,ϖ ) = [ n m i P i (θ j ) u ij Q i (θ j ) d ij v P iv u=0 (θ j ) d ij v, (13) i=1 where 1 P i (θ j ) = γ i + (1 γ i ) 1 + exp (β i+α i θ j ), Q i(θ j ) = 1 P i (θ j ),

8 and YOUNGSUK SUH AND DANIEL M. BOLT 461 exp Z iv(θ j ) P iv u=0 (θ j ) = mi k=1 expz ik(θ j ). To simplify notation, let P i (θ j ) = P ij, Q i (θ j ) = Q ij, P iv u=0 (θ j ) = P ij v u=0, and Z ik (θ j ) = Z ij k. Then taking the natural logarithm of the likelihood function for examinee j yields log L j = { n m i [ } u ij log P ij + d ij v log Qij + log P ij v u=0, (14) i=1 and the first partial derivative of the loglikelihood with respect to θ j is where log L j θ j = { n i=1 u ij α i P ij P i (θ j ) = Q ij P ij + m i [ ( mi ) d ij v α i Pij + k=1 ez ij k (λ iv λ ik ) }, (15) exp (β i+α i θ j ) m i S ij and S ij = exp Z ij k. Then, the second partial derivative of the log-likelihood with respect to θ j is where 2 log L j θ 2 j ( m i k=1 expz ij k (λ iv λ ik )) S ij θ j { n [ P 2 = u ij αi 2 P ij ij γ i Q ij P 2 i=1 ij [ ( mi m i ( k=1 ez ij k (λ iv λ ik ) )} + d ij v αi 2 P ij Q ij + S ij ), (16) θ i = S ij ( m i k=1 expz ij k λ ik (λ iv λ ik )) m i k=1 expz ij k (λ iv λ ik )( m i k=1 λ ik exp Z ij k ) Sij 2 (17) and Q i (θ j ) = 1 P i (θ j ). This second partial derivative contains observed data values. Following usual practice (Kendall & Staurt, 1967), the u ij and d ij v are replaced by their expected values P ij and (1 P ij )P ij v u=0, respectively, resulting in ( 2 ) log L j E θj 2 = k=1 [ n m i P ij I iu (θ j ) + P ij v I iv (θ j ), (18) i=1 where P ij v = (1 P ij )P ij v u=0. The test information function is then given by ( 2 ) [ log L j n m i I(θ j ) = E θj 2 = P ij I iu (θ j ) + P ij v I iv (θ j ). (19) i=1

9 462 PSYCHOMETRIKA For each item, the item information function is given by m i I i (θ j ) = P ij I iu (θ j ) + P ij v I iv (θ j ), (20) where P ij I iu (θ j ) is the contribution of the correct response category to item information, and P ij v I iv (θ j ) is the contribution of distractor category v. Each of these terms is referred to as the information share of a category (Baker & Kim, 2004; Samejima 1969, 1972, 1977). Here, the information share of the correct response category is [ P ij I iu (θ j ) = γ i + (1 γ i ) exp (β i+α i θ j ) [ P 2 αi 2 P ij Q ij ij γ i P 2 ij, (21) the same as in the traditional 3PLM, while the information share of any distractor category v is P ij v I iv (θ j ) { [ } 1 exp Z [ ij v = 1 γ i + (1 γ i ) 1 + exp (β i+α i θ j ) mi α 2 k=1 expz ij k i P ij Q ij mi k=1 + expz ij k (λ iv λ ik ) m i k=1 λ ik exp Z ij k S ij ( m i S 2 ij k=1 expz ij k λ ik (λ iv λ ik )), (22) allowing for quantification of the incremental information provided by the distractor categories. 3. Simulation Studies Simulation studies were conducted to investigate (1) the parameter recovery of the NLMs, (2) the empirical distinguishability of the 2PL-NLM, 3PL-NLM, and NRM, and (3) the statistical performances of the NLMs in testing for and quantifying distractor information Simulation Study Designs Simulation 1: Parameter Recovery Study. Parameter recovery for the 2PL-NLM and 3PL-NLM was evaluated for varying sample size (1,000 and 5,000 examinees) and test length (10-, 20-, and 50-item tests) conditions. For each combination of conditions, examinee ability parameters were generated as θ Normal(0, 1). Item parameters were generated randomly from the following distributions: α Uniform(0.75, 2) and β Uniform( 2.5, 2.5) for the correct response category, and λ v Uniform( 2, 2) and ζ v Uniform( 2, 2) for the distractor categories, followed by the imposition of constraints m i ζ iv = 0 and m i λ iv = 0. Four-category item responses (one correct response and three distractor categories) were generated following either the 2PL-NLM or 3PL-NLM. The same item parameters were applied to generate both the 2PL-NLM and 3PL-NLM data, with γ for the 3PL-NLM set at 0.25 for all items. As has been observed when estimating the 3PLM, the γ parameter is generally difficult to recover without a prior; our intent in using a constant value of 0.25 was based on our desire to match the true parameter with the prior so as to better ascertain the impact of the presence of the guessing parameter on recovery of the other parameters. Thus, the γ parameter was assigned a beta prior with parameters of 5 and 15 during the EM process for item parameter estimation (for detailed procedures, see Baker & Kim, 2004, pp ). 100 replications were simulated for each combination of conditions. The accuracy

10 YOUNGSUK SUH AND DANIEL M. BOLT 463 of item parameter estimates was evaluated with respect to bias (estimated minus true) and root mean squared error (RMSE). For the distractor category parameters, the estimated η iv and ξ iv (v = 1,...,m i 1) were converted to ζ iv and λ iv (v = 1,...,m i ). In order to demonstrate ability parameter estimation and the value of attending to distractors under the NLM, Expected a Posteriori (EAP) estimates were obtained under the 2PLM and 2PL- NLM for the 10- and 50-item test length conditions. Response patterns for 5,000 examinees were simulated at each of 13 discrete θ values ranging from 3.4 to 3.4. Bias and RMSEs were then computed at each of the θ levels generated with respect to both the 2PLM and 2PL-NLM. In addition, the test information functions under the two models were evaluated at each of the θ levels Simulation 2: Model Comparison Study. To evaluate the empirical distinguishability of the NLMs and NRM, a second simulation study was conducted in which the NLMs and NRM were fit to data generated from each of the three models. The models were compared using several likelihood-based criteria: AIC (Akaike, 1974), BIC (Schwarz, 1978), and CAIC (Bozdogan, 1987). Item response data were simulated using as generating parameters the corresponding item parameter estimates from the 36 items studied in the real data analysis reported in the next section (see Table 5 of Section 4 for the 2PL-NLM estimates 1 ). In each case, data were generated for 3,000 examinees as θ Normal(0, 1). 100 datasets were simulated with respect to each of the 2PL-NLM, 3PL-NLM, and NRM. Each of these datasets was then fit using each of the three models, with the AIC, BIC, and CAIC applied to evaluate whether the correct generating model was identified. For the 3PL-NLM, the component of the loglikelihood associated with the prior 2 was removed when calculating the indices Simulation 3: Distractor Information Study. As noted, an important practical benefit of the NLMs is their potential to quantify the contribution of distractor information to the overall information provided by an item. We consider two aspects of this process: (1) testing whether distractors provide incremental information; and (2) quantifying the information provided. In testing for information, we apply a likelihood ratio (LR) test for comparing models under conditions in which the distractors are assumed to provide no information (a reduced model) against conditions in which they do (an augmented model). The reduced model under the 2PL-NLM and 3PL-NLM assumes λ v = 0 for all distractor categories. The LR test was performed on an item-by-item basis in which the reduced model assumed all items but the studied item was fit so as to allow distractor information. While such a test may be desirable to determine whether distractors provide information, in actual practice greater value may be placed on the quantification of information provided by distractors. Following Section 2, such a quantification is provided by the estimated information share of the distractor categories, as computed from the item parameter estimates. In order to evaluate the performance of the LR test and the precision of distractor information estimates, a third simulation was conducted. The first part of the simulation evaluated the Type I error performance of the LR test. Data were generated from each of the 2PL-NLM, 3PL-NLM, and NRM. The generating item parameters were based on those observed as estimates in the real data analysis, but with restrictions imposed to reflect the reduced condition for each of the three models. For the NRM, the reduced condition was simulated by setting λ 1 = =λ 4 = α/4 across the distractor categories. Data were generated for 3,000 examinees and 36 items. 1 The 3PL-NLM and NRM item parameter estimates (and their standard errors) are available from the first author upon request. 2 A beta prior with parameters of 4 and 16 was used, when estimating the guessing parameters in this case, where each item has five response categories.

11 464 PSYCHOMETRIKA TABLE 1. Simulation 1 results: average bias and RMSE for correct category parameters. 2PL-NLM 3PL-NLM n α β α β α β γ α β γ Bias RMSE TABLE 2. Simulation 1 results: average RMSE for distractor category parameters. 2PL-NLM 3PL-NLM n λ ζ λ ζ λ ζ λ ζ RMSE The second part of this simulation evaluated the recovery of the estimated information share of distractor categories. Data were again generated under the three models using the real data item parameter estimates, but now without the restrictions implied by the reduced condition. Recovery was evaluated by comparing the estimated information share compared against the true information share as calculated from the generating parameters. For the NRM, the information share of categories was calculated using methods described by Baker and Kim (2004, pp ) Simulation Study Results Simulation 1: Parameter Recovery Study Results. Bias and RMSEs for the distractor category parameters were collapsed across item categories and items to create an average bias and RMSE for each distractor category parameter type. Similarly, the recovery results for the correct response category parameters were averaged across items. Results are provided in Tables 1 and 2 across all conditions. Bias for the correct response parameters is close to zero under the 2PL-NLM for all conditions, implying no apparent evidence of systematic underestimation or overestimation. (It should be noted that the bias for the distractor parameters is forced to 0 due to the constraint in Equation (2).) Bias is somewhat larger under the 3PL-NLM (i.e., positive for β and negative for α), which may be attributed in part to the greater influence of the priors and small departures of the distribution of generating parameters from that assumed by the priors. RMSEs for the correct response parameters are smaller than those for the distractor parameters under the 2PL-NLM. Similar patterns in relation to the effects of conditions are found for the 2PL-NLM and 3PL- NLM. No systematic patterns related to the number of items are apparent. Also, as expected, the

12 YOUNGSUK SUH AND DANIEL M. BOLT 465 TABLE 3. Simulation 1 results: average bias and RMSE for ability parameters. Bias RMSE 2PLM 2PL-NLM 2PLM 2PL-NLM θ Weighted average RMSEs for the 3PL-NLM were consistently larger than for the 2PL-NLM. Most importantly, the overall results seen here appear comparable to those previously observed using MML techniques under the NRM (Wollack, Bolt, Cohen, & Lee, 2002), suggesting that the NLMs appear to be at least as good as the NRM in terms of parameter recovery. We further confirmed the consistency of our recovery results in comparison to both the 2PLM and 3PLM by comparing results for our generated data when estimated using BILOG-MG using the same γ prior, which were effectively the same. Bias and RMSEs for θ are provided in Table 3. Based on the test information functions shown in Figure 3, it would appear that the distractors provide their greatest relative increases in information at both low and intermediate levels of θ, which is expected as the distractors are more commonly selected among examinees not of high ability. These results are also supported by the RMSEs of Table 3, where the greatest relative declines in RMSE when moving from the 2PLM to the 2PL-NLM are seen for lower θ levels. Not surprisingly, there is also a reduction in bias under the 2PL-NLM at the extreme low θ levels, again owing to the greater amount of information about θ provided by attending to distractors. The results of Table 3 appear to consistently support the value of attending to distractor information. The weighted average statistics in the final row of Table 3 show that when assuming a Normal(0, 1) distribution for θ, we appear to get an approximately 30% decrease in RMSEs Simulation 2: Model Comparison Study Results. Table 4 shows the number of times out of 100 that each model fits best according to each likelihood criterion for each generating model. Although there is some tendency for confusion between the 2PL-NLM and 3PL-NLM when the 3PL-NLM is the generating model (particularly under CAIC), the distinction between the NRM and NLMs seems clearer. The sometimes better comparative fit for the 2PL-NLM compared to the 3PL-NLM when the 3PL-NLM is the generating model is perhaps not surprising, as the ultimate value of the pseudo-guessing parameter is often questionable, especially when the majority of items are relatively easy. Overall, it would thus appear that as statistical models, the NLM and NRM approaches not only provide competing structural representations, but also ones that may be statistically distinguishable when applied to actual test data.

13 466 PSYCHOMETRIKA FIGURE 3. Test information under the 2PLM and 2PL-NLM. TABLE 4. Simulation 2 results: frequencies of model selection. True model Estimated model NRM 2PL-NLM 3PL-NLM NRM 2PL-NLM 3PL-NLM NRM 2PL-NLM 3PL-NLM NRM 2PL-NLM 3PL-NLM 2logL AIC BIC CAIC Simulation 3: Distractor Information Study Results. In evaluating the Type I error performance of the LR test, we considered alpha levels of 0.05, 0.01, as well as Bonferroni corrected levels at 0.05 (p = ). When using the 2PL-NLM as both the generating and fitted model, we observe clear evidence of Type 1 error inflation with rejection rates under the 2PL-NLM of 0.14, 0.04, and 0.01, respectively, averaging across the 36 items. Similarly, when using 3PL-NLM, the corresponding rejection rates are 0.17, 0.06, and Even greater inflation is observed when using the NRM as the generating model, with rates of 0.25, 0.11, and 0.04, respectively, for the 2PL-NLM, and 0.22, 0.09, and 0.03 for the 3PL-NLM. Overall, there is clearly evidence of Type I error inflation in applying the LR test, and potential for mistaken inferences when relying solely on the LR test as a basis for including distractor information. At the same time however, we note that in virtually all Type I error cases, the estimated distractor information is near 0, even when the NRM is the generating model. As it is anticipated that most practitioners would attend to the amount of distractor information when deciding whether to attend to distractors, we conducted a follow-up study that evaluated the accuracy of the NLMs in recovering the amount of distractor information.

14 YOUNGSUK SUH AND DANIEL M. BOLT 467 Regardless of whether the 2PL-NLM, 3PL-NLM, or NRM is the generating model, the relative amount of information provided by distractors appears well-recovered. When the 2PL-NLM was the generating model, the correlation between the estimated and true distractor information was 0.97 when fitting the 2PL-NLM and 0.95 when fitting the 3PL-NLM. The mean absolute differences (MADs) were 0.01 and 0.04, respectively. When the 3PL-NLM was the generating model, the respective correlations were 0.93 and 0.98 and the MADs were 0.05 and 0.02, while when the NRM was the generating model, the correlations were still 0.98 and 0.93 and the MADs were 0.02 and 0.09, suggesting that, even in the presence of some model misspecification, recovery appears quite good. 4. Real Data Illustration Data from a 36-item college level mathematics placement test (Center for Placement Testing, 1998) were analyzed. For purposes of reporting model estimates and testing for distractor information, 3,000 examinees were randomly selected from a full dataset of 12,800 examinees. Each item contained five response categories (one correct response and four distractor categories). Inspection of the items suggested a response process more consistent with that discussed in relation to example item 2 shown earlier in the paper. That is, most items would appear to be best solved through use of a problem-solving strategy that initially does not consider the response options. Both the 2PL-NLM and 3PL-NLM were investigated as potential competitors to the NRM. The overall 2 loglikelihood for the 2PL-NLM was as compared to for the NRM (when the NRM was fit using the same algorithm). As both models possess the same number of parameters, it would appear that the 2PL-NLM thus provides a better representation of the data. Table 5 displays the 2PL-NLM estimates. The average standard errors for distractor category slopes and intercepts were both 0.06, and for correct response slope and intercept were 0.05 and 0.04, respectively. To further examine how the NLMs compare to the NRM in terms of model fit, 10 nonoverlapping random samples were drawn from the full dataset, each consisting of 1,000 examinees. Each of the 2PL-NLM, 3PL-NLM, and NRM was fit to the 10 datasets. Table 6 shows 2loglikelihoods under each model, as well as the AIC, BIC, and CAIC indices. On the whole, the 2PL-NLM appears to show comparatively better fit than the 3PL-NLM and NRM across all 10 samples. Figure 4 shows plots of information functions for several example items; the item information share of distractor categories 1 4 (Equation (22)) and the correct response category 5 (Equation (21)), as well as the total item information function (Equation (20)). Apparent from these graphs is the substantial variability across items in the contribution of distractors to overall item information. For example, items 6 and 7 show large amounts of information both in the correct response category and most of the distractor categories, while items 12 and 32 show very small amounts of information, especially in the distractor categories. As noted, the item information share of distractor categories can be used as the basis for a decision about whether to collapse distractor categories and score items simply as correct/incorrect. Using the item parameter estimates from the 2PL-NLM, a comparison of item information functions when including versus excluding distractor information can be performed without any revision of the model, as the item parameter estimates for the correct responses from the 2PL-NLM are the 2PLM item parameter estimates. The item information functions under the 2PLM and 2PL-NLM are plotted in Figure 5. Note that for items 6 and 7, the 2PL-NLM provides a larger amount of information at relatively low levels of ability relative to the 2PLM. Items 12 and 32 yield almost the same information under the two models, suggesting virtually no practical advantage to considering distractor information.

15 468 PSYCHOMETRIKA TABLE 5. 2PL-NLM parameter estimates, mathematics placement test. Slopes Intercepts Item λ 1 λ 2 λ 3 λ 4 α ζ 1 ζ 2 ζ 3 ζ 4 β Beyond quantifying item category information, the statistical significance of information in the distractor categories was evaluated through an LR test for each item using the 2PL-NLM. The results are presented in Table 7. As noted above, the 2logL augmented is equal to for all items. As χα=0.95,df 2 =3 = 7.81, we reject the null hypotheses for all items except items 12 and 32. Recalling the inflated Type I error performance of the LR test in Section 3.2.3, it nevertheless appears that on the whole there is evidence of distractor information in the items on this test, as the LR rejects in 34 out of 36 cases, well beyond the levels of inflation seen in the simulation. However, a quantification of the percentage increase in information, shown in the rightmost column of Table 7, suggests that the increase in information is less than 12% for half of the items.

16 YOUNGSUK SUH AND DANIEL M. BOLT 469 TABLE 6. Model selection comparison across 10 samples. Method Model NRM logL 2PL-NLM PL-NLM NRM AIC 2PL-NLM PL-NLM NRM BIC 2PL-NLM PL-NLM NRM CAIC 2PL-NLM PL-NLM FIGURE 4. Item information and information share of categories under the 2PL-NLM.

17 470 PSYCHOMETRIKA FIGURE 5. Item information under the 2PLM and 2PL-NLM. 5. Alternative Nested Logit Models Although the NLMs presented in this paper appear to provide a better representation of items such as example item 2 when compared to models such as the NRM, a limitation of the models as representations of response process is the absence of the correct response category in the second nest. This limitation can be addressed through overlapping nests, where the same response option is present in more than one nest. We note that such a framework also provides an appealing one in which to better understand the NLM and NRM approaches in relation to each other, as well as potential hybrid approaches. Under an overlapping nest approach, the correct response category could be present in two nests: (1) the nest associated with correct solution strategy; and (2) the nest including the distractor categories. An appealing aspect of the modeling framework is that it emphasizes two general ways by which the examinee arrives at the correct response: (1) correct problem solving apart from response category evaluation; and (2) a comparative evaluation of all response options, as might occur with an educated guess. Such a model would be obtained by generalizing Equations (3) and (4) such that the summation in the rightmost bracketed term in Equation (4) would now include the correct response category (in addition to the distractor categories). The NRM can be viewed as a special case in which β i = and α i = 0, while the 2PL-NLM is a special case in which ζ iv = and λ iv = 0 for the correct response category. The cost of the more general model in comparison to the 2PL-NLM or NRM, therefore, is the addition of two parameters per item. It remains to be seen whether there are conditions in which this model can be effectively estimated. The work of San Martin et al. (2006) likely provides some insight as to the

18 YOUNGSUK SUH AND DANIEL M. BOLT 471 TABLE 7. LR test results and average item information under the 2PLM and 2PL-NLM Item 2LogL reduced LR Significant Bonferroni 2PLM Info 2PL-NLM Info % Increased potential value of these models, although their approach was applied within the framework of a Rasch model and thus did not include either correct response or distractor slope parameters. One setting that may make estimation possible is the presence of distinguishable traits across levels. For example, for some tests it may be reasonable to assume a distinct trait (e.g., testwiseness ) influences selection among response categories at level 2, as compared to the trait (e.g., math ability ) that functions at level 1. An example of this possibility is demonstrated by example item 3. Example Item 3. A tire measures 24 inches in diameter. What is the circumference of the tire in inches? Round your answer to the nearest tenth. (A) 48 (B) (C) 75 (D) 75.4

19 472 PSYCHOMETRIKA While math ability may determine whether the examinee arrives at correct response at level 1, a correlated but potentially distinguishable trait may function within level 2. For example, as only two responses are reported to the nearest tenth (B) and (D), and (C) and (D) are essentially the same answer but differ only in rounding, a testwise respondent could likely ascertain (D) as the correct response. Examples of other NLMs that might be adapted for still other item types might include additional levels. One such case might involve items for which a solution strategy can be broken down into steps and where distractors are designed to catch misapplication of a particular step. Following the same example item above, for instance, it might be anticipated that the process by which a respondent arrives at (D) as the correct response involves first performing the circumference calculation: = , and next determining the correct level at which to round. Correct execution of the first step but incorrect execution of the second would be represented by the choice of (C) as the response. While these models and others likely provide a more accurate representation of the response process than the NLMs considered in this paper, they naturally come at the cost of additional model complexity, as well as the loss of the distractor collapsibility property that motivated the 2PL-NLM and 3PL-NLM considered in this paper. 6. Conclusion Future work with NLMs can address various issues. More applications and direct comparisons against competing models, including models other than the NRM, such as the Nedelsky model (Bechger, Maris, Verstralen, & Verhelst, 2005) are needed. Estimation issues related to more complex NLMs, such as models with overlapping nests, may help clarify the potential value of the NLM strategy in other contexts. Various alternative strategies might be considered, following approaches taken in the discrete choice literature (see e.g., Train, 2003, pp ). For example, some approaches to handling overlapping nests specify a parameter indicating the degree to which a given outcome is a member of each nest. Other generalizations of the nested approach might consider probit as opposed to logit link functions. Additional practical applications of the specific models investigated in this paper may also be of interest. For example, attempts to study differential item functioning (DIF) in multiple-choice items often find value in determining whether particular distractors are responsible for differential functioning of the correct response. Such applications can be studied in a more explicit fashion using the 2PL-NLM and 3PL-NLM as presented in this paper. Still other applications may focus on the use of the models in comparing test items administered under open-ended versus multiplechoice frameworks. Here again the consistency in how the correct response category is modeled should allow for more direct assessment of the consequences of adding multiple-choice response options. References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), Baker, F.B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker. Bechger, T.M., Maris, G., Verstralen, H.H.F.M., & Verhelst, N.D. (2005). The Nedelsky model for multiple-choice items. In L.A. van der Ark, M.A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences. Mahwah: Lawrence Erlbaum Associates. Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46,

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

Item Response Theory (IRT) Analysis of Item Sets

Item Response Theory (IRT) Analysis of Item Sets University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 41 A Comparative Study of Item Response Theory Item Calibration Methods for the Two Parameter Logistic Model Kyung

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

IRT Model Selection Methods for Polytomous Items

IRT Model Selection Methods for Polytomous Items IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Development and Calibration of an Item Response Model. that Incorporates Response Time

Development and Calibration of an Item Response Model. that Incorporates Response Time Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond

An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond Psicológica (2005), 26, 327-352 An Introduction to the DA-T Gibbs Sampler for the Two-Parameter Logistic (2PL) Model and Beyond Gunter Maris & Timo M. Bechger Cito (The Netherlands) The DA-T Gibbs sampler

More information

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models Whats beyond Concerto: An introduction to the R package catr Session 4: Overview of polytomous IRT models The Psychometrics Centre, Cambridge, June 10th, 2014 2 Outline: 1. Introduction 2. General notations

More information

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory

More information

A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model

A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model Cees A. W. Glas, University of Twente, the Netherlands Juan Carlos Suárez Falcón, Universidad Nacional de Educacion a Distancia,

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Walkthrough for Illustrations. Illustration 1

Walkthrough for Illustrations. Illustration 1 Tay, L., Meade, A. W., & Cao, M. (in press). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods. doi: 10.1177/1094428114553062 Walkthrough for Illustrations

More information

Ability Metric Transformations

Ability Metric Transformations Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three

More information

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Gerhard Tutz On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Technical Report Number 218, 2018 Department

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

What is an Ordinal Latent Trait Model?

What is an Ordinal Latent Trait Model? What is an Ordinal Latent Trait Model? Gerhard Tutz Ludwig-Maximilians-Universität München Akademiestraße 1, 80799 München February 19, 2019 arxiv:1902.06303v1 [stat.me] 17 Feb 2019 Abstract Although various

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Application of Item Response Theory Models for Intensive Longitudinal Data

Application of Item Response Theory Models for Intensive Longitudinal Data Application of Item Response Theory Models for Intensive Longitudinal Data Don Hedeker, Robin Mermelstein, & Brian Flay University of Illinois at Chicago hedeker@uic.edu Models for Intensive Longitudinal

More information

Because it might not make a big DIF: Assessing differential test functioning

Because it might not make a big DIF: Assessing differential test functioning Because it might not make a big DIF: Assessing differential test functioning David B. Flora R. Philip Chalmers Alyssa Counsell Department of Psychology, Quantitative Methods Area Differential item functioning

More information

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence A C T Research Report Series 87-14 The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence Terry Ackerman September 1987 For additional copies write: ACT Research

More information

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Yu-Feng Chang IN PARTIAL FULFILLMENT

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 23 Comparison of Three IRT Linking Procedures in the Random Groups Equating Design Won-Chan Lee Jae-Chun Ban February

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 38 Hierarchical Cognitive Diagnostic Analysis: Simulation Study Yu-Lan Su, Won-Chan Lee, & Kyong Mi Choi Dec 2013

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore ABSTRACT Title of Document: A MIXTURE RASCH MODEL WITH A COVARIATE: A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION Yunyun Dai, Doctor of Philosophy, 2009 Directed By: Professor, Robert

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

An Overview of Item Response Theory. Michael C. Edwards, PhD

An Overview of Item Response Theory. Michael C. Edwards, PhD An Overview of Item Response Theory Michael C. Edwards, PhD Overview General overview of psychometrics Reliability and validity Different models and approaches Item response theory (IRT) Conceptual framework

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are

More information

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006 LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master

More information

Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory

Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory University of South Carolina Scholar Commons Theses and Dissertations 2016 Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory Han Kil Lee University of South Carolina Follow this

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis

A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis A Simulation Study to Compare CAT Strategies for Cognitive Diagnosis Xueli Xu Department of Statistics,University of Illinois Hua-Hua Chang Department of Educational Psychology,University of Texas Jeff

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

GENERALIZED LATENT TRAIT MODELS. 1. Introduction

GENERALIZED LATENT TRAIT MODELS. 1. Introduction PSYCHOMETRIKA VOL. 65, NO. 3, 391 411 SEPTEMBER 2000 GENERALIZED LATENT TRAIT MODELS IRINI MOUSTAKI AND MARTIN KNOTT LONDON SCHOOL OF ECONOMICS AND POLITICAL SCIENCE In this paper we discuss a general

More information

Joint Assessment of the Differential Item Functioning. and Latent Trait Dimensionality. of Students National Tests

Joint Assessment of the Differential Item Functioning. and Latent Trait Dimensionality. of Students National Tests Joint Assessment of the Differential Item Functioning and Latent Trait Dimensionality of Students National Tests arxiv:1212.0378v1 [stat.ap] 3 Dec 2012 Michela Gnaldi, Francesco Bartolucci, Silvia Bacci

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS by CIGDEM ALAGOZ (Under the Direction of Seock-Ho Kim) ABSTRACT This study applies item response theory methods to the tests combining multiple-choice

More information

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010 Summer School in Applied Psychometric Principles Peterhouse College 13 th to 17 th September 2010 1 Two- and three-parameter IRT models. Introducing models for polytomous data. Test information in IRT

More information

Statistical and psychometric methods for measurement: Scale development and validation

Statistical and psychometric methods for measurement: Scale development and validation Statistical and psychometric methods for measurement: Scale development and validation Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course Washington, DC. June 11,

More information

Local response dependence and the Rasch factor model

Local response dependence and the Rasch factor model Local response dependence and the Rasch factor model Dept. of Biostatistics, Univ. of Copenhagen Rasch6 Cape Town Uni-dimensional latent variable model X 1 TREATMENT δ T X 2 AGE δ A Θ X 3 X 4 Latent variable

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

Contributions to latent variable modeling in educational measurement Zwitser, R.J.

Contributions to latent variable modeling in educational measurement Zwitser, R.J. UvA-DARE (Digital Academic Repository) Contributions to latent variable modeling in educational measurement Zwitser, R.J. Link to publication Citation for published version (APA): Zwitser, R. J. (2015).

More information

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018 SRMR in Mplus Tihomir Asparouhov and Bengt Muthén May 2, 2018 1 Introduction In this note we describe the Mplus implementation of the SRMR standardized root mean squared residual) fit index for the models

More information

A comparison of two estimation algorithms for Samejima s continuous IRT model

A comparison of two estimation algorithms for Samejima s continuous IRT model Behav Res (2013) 45:54 64 DOI 10.3758/s13428-012-0229-6 A comparison of two estimation algorithms for Samejima s continuous IRT model Cengiz Zopluoglu Published online: 26 June 2012 # Psychonomic Society,

More information

arxiv: v1 [stat.ap] 11 Aug 2014

arxiv: v1 [stat.ap] 11 Aug 2014 Noname manuscript No. (will be inserted by the editor) A multilevel finite mixture item response model to cluster examinees and schools Michela Gnaldi Silvia Bacci Francesco Bartolucci arxiv:1408.2319v1

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Monte Carlo Simulations for Rasch Model Tests

Monte Carlo Simulations for Rasch Model Tests Monte Carlo Simulations for Rasch Model Tests Patrick Mair Vienna University of Economics Thomas Ledl University of Vienna Abstract: Sources of deviation from model fit in Rasch models can be lack of unidimensionality,

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model

IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model University of Iowa Iowa Research Online Theses and Dissertations Summer 2017 IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model Kyung Yong Kim University

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 31 Assessing Equating Results Based on First-order and Second-order Equity Eunjung Lee, Won-Chan Lee, Robert L. Brennan

More information

Equating Subscores Using Total Scaled Scores as an Anchor

Equating Subscores Using Total Scaled Scores as an Anchor Research Report ETS RR 11-07 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan Longjuan Liang March 2011 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan and

More information

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model Goals PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1 Tetsuya Matsubayashi University of North Texas November 2, 2010 Random utility model Multinomial logit model Conditional logit model

More information

Estimating Integer Parameters in IRT Models for Polytomous Items

Estimating Integer Parameters in IRT Models for Polytomous Items Measurement and Research Department Reports 96-1 Estimating Integer Parameters in IRT Models for Polytomous Items H.H.F.M. Verstralen Measurement and Research Department Reports 96-1 Estimating Integer

More information

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh

More information

When enough is enough: early stopping of biometrics error rate testing

When enough is enough: early stopping of biometrics error rate testing When enough is enough: early stopping of biometrics error rate testing Michael E. Schuckers Department of Mathematics, Computer Science and Statistics St. Lawrence University and Center for Identification

More information

Econometrics Spring School 2016 Econometric Modelling. Lecture 6: Model selection theory and evidence Introduction to Monte Carlo Simulation

Econometrics Spring School 2016 Econometric Modelling. Lecture 6: Model selection theory and evidence Introduction to Monte Carlo Simulation Econometrics Spring School 2016 Econometric Modelling Jurgen A Doornik, David F. Hendry, and Felix Pretis George-Washington University March 2016 Lecture 6: Model selection theory and evidence Introduction

More information

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA Topics: Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA What are MI and DIF? Testing measurement invariance in CFA Testing differential item functioning in IRT/IFA

More information

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data

Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Journal of Data Science 9(2011), 43-54 Latent Class Analysis for Models with Error of Measurement Using Log-Linear Models and An Application to Women s Liberation Data Haydar Demirhan Hacettepe University

More information

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data A multivariate multilevel model for the analysis of TIMMS & PIRLS data European Congress of Methodology July 23-25, 2014 - Utrecht Leonardo Grilli 1, Fulvia Pennoni 2, Carla Rampichini 1, Isabella Romeo

More information

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017 MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and

More information

A Use of the Information Function in Tailored Testing

A Use of the Information Function in Tailored Testing A Use of the Information Function in Tailored Testing Fumiko Samejima University of Tennessee for indi- Several important and useful implications in latent trait theory, with direct implications vidualized

More information

APPENDICES TO Protest Movements and Citizen Discontent. Appendix A: Question Wordings

APPENDICES TO Protest Movements and Citizen Discontent. Appendix A: Question Wordings APPENDICES TO Protest Movements and Citizen Discontent Appendix A: Question Wordings IDEOLOGY: How would you describe your views on most political matters? Generally do you think of yourself as liberal,

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Signal Detection Theory With Finite Mixture Distributions: Theoretical Developments With Applications to Recognition Memory

Signal Detection Theory With Finite Mixture Distributions: Theoretical Developments With Applications to Recognition Memory Psychological Review Copyright 2002 by the American Psychological Association, Inc. 2002, Vol. 109, No. 4, 710 721 0033-295X/02/$5.00 DOI: 10.1037//0033-295X.109.4.710 Signal Detection Theory With Finite

More information

Applied Psychological Measurement 2001; 25; 283

Applied Psychological Measurement 2001; 25; 283 Applied Psychological Measurement http://apm.sagepub.com The Use of Restricted Latent Class Models for Defining and Testing Nonparametric and Parametric Item Response Theory Models Jeroen K. Vermunt Applied

More information

Online Item Calibration for Q-matrix in CD-CAT

Online Item Calibration for Q-matrix in CD-CAT Online Item Calibration for Q-matrix in CD-CAT Yunxiao Chen, Jingchen Liu, and Zhiliang Ying November 8, 2013 Abstract Item replenishment is important to maintaining a large scale item bank. In this paper

More information

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models Journal of Educational and Behavioral Statistics 2015, Vol. 40, No. 1, pp. 96 105 DOI: 10.3102/1076998614558122 # 2014 AERA. http://jebs.aera.net A Note on the Equivalence Between Observed and Expected

More information

Multidimensional Linking for Tests with Mixed Item Types

Multidimensional Linking for Tests with Mixed Item Types Journal of Educational Measurement Summer 2009, Vol. 46, No. 2, pp. 177 197 Multidimensional Linking for Tests with Mixed Item Types Lihua Yao 1 Defense Manpower Data Center Keith Boughton CTB/McGraw-Hill

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path

More information

Diversity partitioning without statistical independence of alpha and beta

Diversity partitioning without statistical independence of alpha and beta 1964 Ecology, Vol. 91, No. 7 Ecology, 91(7), 2010, pp. 1964 1969 Ó 2010 by the Ecological Society of America Diversity partitioning without statistical independence of alpha and beta JOSEPH A. VEECH 1,3

More information

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm Margo G. H. Jansen University of Groningen Rasch s Poisson counts model is a latent trait model for the situation

More information

Ensemble Rasch Models

Ensemble Rasch Models Ensemble Rasch Models Steven M. Lattanzio II Metamatrics Inc., Durham, NC 27713 email: slattanzio@lexile.com Donald S. Burdick Metamatrics Inc., Durham, NC 27713 email: dburdick@lexile.com A. Jackson Stenner

More information

COWLEY COLLEGE & Area Vocational Technical School

COWLEY COLLEGE & Area Vocational Technical School COWLEY COLLEGE & Area Vocational Technical School COURSE PROCEDURE FOR COLLEGE ALGEBRA WITH REVIEW MTH 4421 5 Credit Hours Student Level: This course is open to students on the college level in the freshman

More information

Nonparametric Online Item Calibration

Nonparametric Online Item Calibration Nonparametric Online Item Calibration Fumiko Samejima University of Tennesee Keynote Address Presented June 7, 2007 Abstract In estimating the operating characteristic (OC) of an item, in contrast to parametric

More information

flexmirt R : Flexible Multilevel Multidimensional Item Analysis and Test Scoring

flexmirt R : Flexible Multilevel Multidimensional Item Analysis and Test Scoring flexmirt R : Flexible Multilevel Multidimensional Item Analysis and Test Scoring User s Manual Version 3.0RC Authored by: Carrie R. Houts, PhD Li Cai, PhD This manual accompanies a Release Candidate version

More information

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT. Louisiana State University LSU Digital Commons LSU Historical Dissertations and Theses Graduate School 1998 Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools Dries Debeer & Rianne Janssen (University of Leuven) Johannes Hartig & Janine Buchholz (DIPF)

More information

Parametric Identification of Multiplicative Exponential Heteroskedasticity

Parametric Identification of Multiplicative Exponential Heteroskedasticity Parametric Identification of Multiplicative Exponential Heteroskedasticity Alyssa Carlson Department of Economics, Michigan State University East Lansing, MI 48824-1038, United States Dated: October 5,

More information

Computerized Adaptive Testing With Equated Number-Correct Scoring

Computerized Adaptive Testing With Equated Number-Correct Scoring Computerized Adaptive Testing With Equated Number-Correct Scoring Wim J. van der Linden University of Twente A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. A Multinomial Error Model for Tests with Polytomous Items

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. A Multinomial Error Model for Tests with Polytomous Items Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 1 for Tests with Polytomous Items Won-Chan Lee January 2 A previous version of this paper was presented at the Annual

More information

Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses

Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses David Thissen, University of North Carolina at Chapel Hill Mary Pommerich, American College Testing Kathleen Billeaud,

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

The Simplex Method: An Example

The Simplex Method: An Example The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified

More information