IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model

Size: px
Start display at page:

Download "IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model"

Transcription

1 University of Iowa Iowa Research Online Theses and Dissertations Summer 2017 IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model Kyung Yong Kim University of Iowa Copyright 2017 Kyung Yong Kim This dissertation is available at Iowa Research Online: Recommended Citation Kim, Kyung Yong. "IRT linking methods for the bifactor model: a special case of the two-tier item factor analysis model." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Education Commons

2 IRT LINKING METHODS FOR THE BIFACTOR MODEL: A SPECIAL CASE OF THE TWO-TIER ITEM FACTOR ANALYSIS MODEL by KYUNG YONG KIM A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) in the Graduate College of The University of Iowa August 2017 Thesis Supervisor: Professor Won-Chan Lee

3 Copyright by KYUNG YONG KIM 2017 All Rights Reserved

4 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL This is to certify that the Ph.D. thesis of PH.D. THESIS Kyung Yong Kim has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) at the August 2017 graduation. Thesis Committee: Won-Chan Lee, Thesis Supervisor Robert L. Brennan Michael J. Kolen Brandon LeBeau Aixin Tan

5 ACKNOWLEDGEMENTS First, I would like to thank Dr. Lee for his guidance and support as my thesis and academic advisor, and also as my mentor. For the last four years, I have learned an immeasurable amount about psychometrics from him. I would also like to thank Dr. Brennan for not only serving on my thesis committee but also providing me the opportunity to work at CASMA, where I have gained extensive hands-on applied experience. I am also very thankful to Dr. Kolen, Dr. LeBeau, and Dr. Tan for their valuable comments and suggestions which have led to significant improvement on the dissertation. I would like to thank my friends in the Educational Measurement and Statistics Program. Because of you, I had a wonderful time here in Iowa City. Also, I would like to thank the members of our study group. I really appreciate bringing your passion and insight to the study group every Saturday morning. I would also like to thank my friends at ACT. It was always a pleasure talking with you and a good opportunity for me to gain practical knowledge and skills from industry experts. I would not have been able to get through the hardest times if it were not for my wife, Seung Hee. Thank you for supporting me for the past twelve years and bearing with me during my graduate study. Also, many thanks to my family. Your endless support, encouragement, and love made me who I am today. Lastly, many thanks to my new boy, Matthew. You will never know the extent to which you have changed my life. ii

6 ABSTRACT For unidimensional item response theory (UIRT) models, three linking methods, which are the separate, concurrent, and fixed parameter calibration methods, have been developed and widely used in applications such as vertical scaling, differential item functioning, computerized adaptive testing, and equating. By contrast, even though a few studies have compared the separate and concurrent calibration methods for full multidimensional IRT (MIRT) models or applied the concurrent calibration method to vertical scaling using the bifactor model, no study has yet provided technical descriptions of the concurrent and fixed parameter calibration methods for any MIRT models. Thus, the purpose of this dissertation was to extend the concurrent and fixed parameter calibration methods for UIRT models to the two-tier item factor analysis model. In addition, the relative performance of the separate, concurrent, and fixed parameter calibration methods was compared in terms of the recovery of item parameters and accuracy of IRT observed score equating using both real and simulated datasets. The separate, concurrent, and fixed parameter calibration methods well recovered the item parameters, with the concurrent calibration method performing slightly better than the other two linking methods. Despite the comparable performance of the three linking methods in terms of the recovery of item parameters, however, some discrepancy was observed between the IRT observed score equating results obtained with the three linking methods. In general, the concurrent calibration method provided equating results with the smallest equating error, whereas the separate calibration method provided equating results with the largest equating error due to the largest standard error of equating. The performance of the fixed parameter calibration method depended on the iii

7 proportion of common items. When the proportion was 20%, the fixed parameter calibration method provided more biased equating results than the concurrent calibration method because of the underestimated specific slope parameters. However, when the proportion of common items was 40%, the fixed parameter calibration method worked as well as the concurrent calibration method. iv

8 PUBLIC ABSTRACT For unidimensional item response theory (UIRT) models, three linking methods, which are the separate, concurrent, and fixed parameter calibration methods, have been developed and widely used in applications such as vertical scaling, differential item functioning, computerized adaptive testing, and equating. By contrast, several separate calibration methods have been developed for the multidimensional two-parameter logistic model, but the concurrent and fixed parameter calibration methods have been an infrequent topic in the literature. Thus, the main purpose of this dissertation was to extend the concurrent and fixed parameter calibration methods for UIRT models to the two-tier item factor analysis model. In addition, the relative performance of the three linking methods for the two-tier model with a single general dimension (i.e., bifactor model) was compared in terms of the recovery of item parameters and accuracy of IRT observed score equating. The simulation results showed that the concurrent calibration method performed better than the separate and fixed parameter calibration methods. The separate calibration method recovered the item parameter reasonably well, but provided slightly less accurate equating results than the concurrent calibration method. With a few exceptions, the performance of the fixed parameter calibration method was comparable to that of the concurrent calibration method. v

9 TABLE OF CONTENTS LIST OF TABLES... ix LIST OF FIGURES... xi GLOSSARY... xiii CHAPTER 1. INTRODUCTION... 1 Unidimensional IRT... 2 Indeterminacies... 2 Linking Methods... 3 Multidimensional IRT... 5 Indeterminacies... 7 Linking Methods... 8 Two-Tier Item Factor Analysis Model... 8 Purpose of the Study... 9 Research Questions CHAPTER 2. LITERATURE REVIEW Unidimensional IRT Linking Methods Separate Calibration Concurrent Calibration Fixed Parameter Calibration Comparison of Unidimensional IRT Linking Methods Petersen, Cook, and Stocking (1983) Kim and Cohen (1998) Hanson and Béguin (2002) Kim (2006) Lee and Ban (2010) Kang and Petersen (2011) Multidimensional IRT Models Multidimensional Three-Parameter Logistic Model Two-Tier Item Factor Analysis Model Orthogonal Procrustes Problem Classical Orthogonal Procrustes Problem Generalized Orthogonal Procrustes Problem Indeterminacies in Multidimensional IRT Translation of the Origin Orthogonal Rotation of the Coordinate Axes Change of the Units of Measurement Multidimensional IRT Linking Methods Hirsch (1989) Davey, Oshima, and Lee (1996) vi

10 Oshima, Davey, and Lee (2000) Li and Lissitz (2000) Min (2007) Reckase (2009) CHAPTER 3. METHODOLOGY Linking Methods for the Two-Tier Model Concurrent Calibration Fixed Parameter Calibration IRT Observed Score Equating Study 1: Real Data Analyses Source of Data Dimensionality Assessment Model Comparison Linking Procedures IRT Observed Score Equating Study 2: Simulated Data Analyses Study Factors Simulation Procedures Evaluation Criteria CHAPTER 4. RESULTS Study 1: Real Data Analysis Dimensionality Assessment Model Comparison IRT Observed Score Equating Study 2: Simulated Data Analysis Recovery of Transformation Parameters Recovery of Item Parameters IRT Observed Score Equating Comparison Between R and flexmirt CHAPTER 5. DISCUSSION Equating Criterion Estimation of the Latent Variable Distribution Summary of Results Research Question Research Question Research Question Research Question Research Question Conclusions Limitations and Future Research vii

11 REFERENCES viii

12 LIST OF TABLES Table 3.1. Characteristics of the two AP French Language test forms Table 3.2. Descriptive statistics for the two French Language test forms Table 3.3. Descriptive statistics for the common items of the two French Language test forms Table 3.4. Descriptive statistics for the two domains of the French Language test forms 76 Table 3.5. Summary of the study conditions for the simulated data analyses Table 3.6. Generating item parameters for the base form Table 3.7. Generating item parameters for the new form Table 4.1. Correlation, Cronbach s α, and disattenuated correlation for the two sections of the French Language test forms Table 4.2. Goodness-of-fit measures for the single factor CFA model Table 4.3. Goodness-of-fit measures for the bifactor structure CFA model Table 4.4. Values of the DIMTEST statistics for the two French Language test forms Table 4.5. Model comparison indices for the 2PL and bifactor IRT models Table 4.6. Transformation parameter recovery: Values of bias and RMSE for all three distribution conditions Table 4.7. Item parameter recovery: Values of the overall statistics under the N(0, 1) distribution condition Table 4.8. Item parameter recovery: Values of the overall statistics under the N(0.5, 1) distribution condition Table 4.9. Item parameter recovery: Values of the overall statistics under the N(1, 1) distribution condition Table IRT observed score equating: Values of the unweighted AE, ASE, and ARMSE for all three distribution conditions Table IRT observed score equating: Values of the weighted ABIAS, ASE, and ARMSE for all three distribution conditions ix

13 Table Comparison of item parameter estimates obtained with R and flexmirt using separate calibration Table Comparison of item parameter estimates obtained with R and flexmirt using concurrent calibration Table Comparison of item parameter estimates obtained with R and flexmirt using fixed parameter calibration x

14 LIST OF FIGURES Figure 4.1. Estimated equating relationships for IRT observed score equating Figure 4.2. Item parameter recovery: Conditional plots for the N = 2000, CI = 20%, N(0, 1) distribution condition Figure 4.3. Item parameter recovery: Conditional plots for the N = 5000, CI = 20%, N(0, 1) distribution condition Figure 4.4. Item parameter recovery: Conditional plots for the N = 2000, CI = 40%, N(0, 1) distribution condition Figure 4.5. Item parameter recovery: Conditional plots for the N = 5000, CI = 40%, N(0, 1) distribution condition Figure 4.6. Item parameter recovery: Conditional plots for the N = 2000, CI = 20%, N(0.5, 1) distribution condition Figure 4.7. Item parameter recovery: Conditional plots for the N = 5000, CI = 20%, N(0.5, 1) distribution condition Figure 4.8. Item parameter recovery: Conditional plots for the N = 2000, CI = 40%, N(0.5, 1) distribution condition Figure 4.9. Item parameter recovery: Conditional plots for the N = 5000, CI = 40%, N(0.5, 1) distribution condition Figure Item parameter recovery: Conditional plots for the N = 2000, CI = 20%, N(1, 1) distribution condition Figure Item parameter recovery: Conditional plots for the N = 5000, CI = 20%, N(1, 1) distribution condition Figure Item parameter recovery: Conditional plots for the N = 2000, CI = 40%, N(1, 1) distribution condition Figure Item parameter recovery: Conditional plots for the N = 5000, CI = 40%, N(1, 1) distribution condition Figure IRT observed score equating: Conditional plots for the N = 2000, CI = 20% condition Figure IRT observed score equating: Conditional plots for the N = 5000, CI = 20% condition xi

15 Figure IRT observed score equating: Conditional plots for the N = 2000, CI = 40% condition Figure IRT observed score equating: Conditional plots for the N = 5000, CI = 40% condition Figure Item parameter estimation: Conditional plots for the N = 2000, CI = 20%, N(0.5, 1) distribution condition when flexmirt was used for item calibration Figure IRT observed score equating: Conditional plots for the N = 2000, CI = 20% condition when equating was conducted using the item parameter estimates provided by flexmirt xii

16 GLOSSARY Acronyms 2PL 3PL Two-parameter logistic Three-parameter logistic exp Exponential (approximately 2.718) IRT M2PL M3PL MIRT UIRT Item response theory Multidimensional two-parameter logistic Multidimensional three-parameter logistic Multidimensional IRT Unidimensional IRT Arabic Letters 0 Discrimination parameter 1 Vector of item discrimination parameters 2 Matrix of item discrimination parameters 3 Difficulty parameter 4 Pseudo-guessing parameter or Scaling parameter 5 Scaling matrix 6 Intercept parameter 7 Vector of intercept parameters 8 Constant : Diagonal matrix xiii

17 ; < (=) Base form equivalent of new form score = > Function symbol? Index denoting A B B C E F G H I J K L M M N O P Q Number of groups Index denoting person Intercept of a linear equation Set of items that load on specific factor D Index denoting item Number of items Index denoting score category Number of score categories Index denoting dimension Number of dimensions Number of examinees Index denoting general dimension Number of general dimensions Probability of responding to category G Index denoting quadrature point Number of quadrature points Number of iterations for the EM algorithm or Number of replications for the simulation study R D Rotation matrix Index denoting specific dimension xiv

18 S Number of specific dimensions or Slope of a linear equation T U V W X Matrix transpose Orthogonal matrix Response for a specific item Vector of item responses Common-item set = Observed score Y Quadrature point or New form Z Base form Greek Letters [ Translation vector \ An upper triangular matrix with real and positive diagonal entries ] Parameters for a specific item ^ _ ` `a `C b c Parameters for all items Latent variable Vector of coordinates for locating a person in the coordinate system Vector of coordinates for the general dimension Vector of coordinates for the specific dimension Mean Mean vector xv

19 d e f g h i Variance Covariance matrix of ` Quadrature weight for a specific quadrature point Vector of quadrature weights Probability distribution of ` xvi

20 CHAPTER 1 INTRODUCTION In item response theory (IRT), estimating item parameters using the marginal maximum likelihood estimation (MMLE) procedure implemented via EM algorithm (Bock & Aitkin, 1981; Harwell, Baker, & Zwarts, 1988; Woodruff & Hanson, 1997) or the marginalized Bayesian approach (Harwell & Baker, 1991; Mislevy, 1986) requires specification of the distribution of person parameter `. However, specifying the correct distribution is often difficult, if not impossible, because ` is a latent variable. One additional difficulty related to item parameter estimation in IRT is the lack of a standard coordinate system; in other words, there exist several indeterminacies related to the coordinate system. For example, in unidimensional IRT (UIRT), there is no natural origin and unit of measurement for the _-scale. For the common-item nonequivalent groups (CINEG) design, two test forms with a set of common items are administered to samples from two different populations. In the context of IRT, two populations are different in the sense that the distributions of ` are different. Nonetheless, when calibrating items in the two test forms using two independent computer runs, because of the two aforementioned reasons, IRT estimation programs arbitrary set the `-distribution to a standard normal distribution for each computer run. The consequence of using the same distribution for nonequivalent groups is that the two sets of item parameter estimates obtained from two independent computer runs will be on different coordinate systems. To handle this issue and place all the item parameter estimates on a common coordinate system, a process called linking is required. 1

21 Note that the term linking in this dissertation is used to describe explicitly the process of placing IRT parameters on a common coordinate system. For UIRT models, several linking methods (e.g., separate, concurrent, and fixed parameter calibration) have been developed and used widely in applications such as vertical scaling, differential item functioning, computerized adaptive testing (CAT), and equating. On the other hand, for multidimensional IRT (MIRT) models, even though a few studies have compared the separate and concurrent calibration (i.e., multiple-group calibration) methods or applied concurrent calibration to vertical scaling, the literature in MIRT linking is mostly limited to the separate calibration method. Unidimensional IRT UIRT assumes that the probability of a correct response to a specific test item depends on a single person parameter _. In this case, a person s response to an item is determined solely by the _-value and not by the responses to the other items. This is often referred to as the local or conditional independence assumption. The implication of the local independence assumption is that the probability of observing a person s response string can be expressed as the product of the item response probabilities. Indeterminacies The coordinate system in UIRT involves a single coordinate axis, which is often called the _-scale. Therefore, the indeterminacies in UIRT models are the location of the origin and the unit of measurement. As mentioned earlier, IRT estimation programs deal with these two indeterminacies by setting the _-distribution to a standard normal 2

22 distribution. By doing so, the origin and unit of measurement of the _-scale are set to 0 (i.e., the mean of the distribution) and 1 (i.e., the standard deviation of the distribution), respectively. When test forms are administered to two nonequivalent groups, choosing the _- scale to have a mean of 0 and a standard deviation of 1 for each group makes item parameter estimates obtained from two separate runs of an estimation program to be on two different scales. However, as will be shown in Chapter 2, the two _-scales are linearly related, only differing in their origin and unit of measurement (Cook & Eignor, 1991; Lord, 1980). Consequently, estimated item parameters for the two test forms can be placed on a common scale through a linear transformation. Another commonly used data collection design for equating is the random groups design, which assumes that two groups of examinees taking different test forms are randomly equivalent. A spiraling process is typically used to randomly assign people to alternative test forms. For the random groups design, linking is not required because the _-values for the groups of examinees that take different test forms are assumed to have the same mean and standard deviation. Linking Methods Three linking methods are widely used with UIRT models to obtain a common scale under the CINEG design: separate calibration, concurrent calibration, and fixed parameter calibration. In separate calibration, item parameters for two different test forms are first estimated using two separate runs of an IRT estimation program. Then, a linear transformation function is obtained with the two sets of estimated parameters for the 3

23 common items, and used to place all the item parameter estimates on a common scale. The separate calibration method is attractive because it is conceptually straightforward to understand and easy to apply. Another benefit of the separate calibration method compared to the concurrent and fixed parameter calibration methods (discussed below) is that it is the only linking method among the three that can be used to examine item parameter drift for the common items. This is because the separate calibration method produces two sets of parameter estimates for the common items, whereas the concurrent and fixed parameter calibration methods produce only one set of parameter estimates for the common items (for more details, see Kolen and Brennan, 2014). In concurrent calibration, item parameters for the two test forms are estimated concurrently with a single run of an IRT estimation program. During the calibration process, instead of fixing the distributions of _ for the two groups to a standard normal distribution, the distributions are estimated simultaneously with the item parameters. As these estimated distributions are on the same _-scale, the item parameters estimated using these distributions will also be on the same _- scale. Thus, there is no need for an additional scale transformation procedure in concurrent calibration. This is probably one of the greatest benefits of the concurrent calibration method. However, to use the concurrent calibration method, response data for the two test forms being linked should be available at the time of calibration, which is often not feasible. Fixed parameter calibration is often used to place the parameter estimates for new or pretest items on the scale of the item pool. The meaning of the term fixed is that the parameter estimates for the anchor items (i.e., common items) in the new form are fixed at their existing values. When used under the CINEG design, the fixed parameter 4

24 calibration method also does not require any scale transformation. Instead, the distribution of _ for the new group is estimated with the responses for the common items of the new form and the parameter estimates for the common items of the base form. This estimated distribution of _ will be on the scale of the base form, and therefore, the noncommon items in the new form that are calibrated using the estimated distribution will also be on the scale of the base form. Multidimensional IRT As stated previously, UIRT assumes that the probability of a correct response to a test item is determined by a single hypothetical construct. However, there are many testing situations that involve more than one construct. One example is a math item that requires both arithmetic computation and algebraic symbol manipulation skills (see Reckase 2009, p. 80). This type of item requires more than one construct to determine the correct response. As UIRT, MIRT also entails several assumptions: The probability of a correct response to a test item is determined by more than one trait, which is denoted by ` (a vector of latent traits); Each element of ` refers to the same trait for all items; Item parameters are invariant over samples of examinees; The relationship between ` and item responses can be modeled with a mathematical function that is monotonically increasing; The mathematical function is smooth, and therefore, derivatives are defined; Conditional on `, item responses are independent (i.e., local independence assumption). 5

25 Two major types of MIRT models exist: (a) compensatory models and (b) partially compensatory models (i.e., non-compensatory models). In compensatory models, the _- coordinates combine linearly to produce the probability of a correct response to a test item. Therefore, high scores on some dimensions can compensate for low scores on other dimensions. On the other hand, in partially compensatory models, each _-coordinate is modeled separately, and the probability of correct response to an item is the product of the individual probabilities. For these models, some compensation occurs, but the amount of compensation is limited because the probability of correct response to an item cannot exceed the lowest probability in the product. Following Reckase (2004), a distinction is made between dimensions of a MIRT model and constructs in this dissertation. Specifically, dimensions refer to the coordinate axes that are used to represent the observed data, whereas constructs refer to the hypothetical cognitive dimensions of interest. The reason for distinguishing these two terms is because of the rotational indeterminacy of MIRT models (see Chapter 2 for more details). Oftentimes, the coordinate axes of the original coordinate system on which person and item parameters are placed do not well-represent the constructs of interest. Therefore, to obtain better interpretation of the constructs, the original coordinate system is rotated so that the rotated coordinate axes align with the items that are supposed to measure the hypothetical constructs. Because dimensions refer to the coordinate axes of the original coordinate system, they might not necessarily represent the constructs of interest prior to rotation. 6

26 Indeterminacies IRT estimation programs typically make three arbitrary decisions when estimating parameters for MIRT models: (a) the origin, (b) the unit of measurement along each coordinate axis, and (c) the orientation of the coordinate axes. The last indeterminacy is often referred to as rotational indeterminacy. To handle scale indeterminacy (i.e., the origin and unit of measurement), many IRT estimation programs set the distribution of ` to a multivariate standard normal distribution. By doing so, the origin of the coordinate system is set to the 0-vector and the unit of measurement along each coordinate axis is set to 1. Finally, the coordinate system is often rotated to simplify the interpretation of the coordinate axes. For MIRT models, linking is necessary for not only the CINEG design but also for the random groups design. This is because the `-vectors for the groups that are randomly sampled from the same population have the same mean vector and covariance matrix, but the orientation of the coordinate axes might not be the same (Reckase, 2009). In contrast to the CINEG design, the orientation of the coordinate axes resulting from two separate calibrations under the random groups design cannot be easily aligned due to the lack of common items. Linking methods under the random groups design will not be further discussed, because the main interest of this study is linking methods under the CINEG design. As with UIRT models, selecting a different coordinate system for the parameters of MIRT models does not affect the response probabilities, provided that the item parameters are also properly converted. Therefore, the origin, units of measurement, and orientation of the coordinate axes can be changed based on the particular application. 7

27 Linking Methods Several linking methods have been proposed for MIRT models (Davey, Oshima, & Lee, 1996; Hirsch, 1989; Li & Lissitz, 2000; Min, 2007; Oshima, Davey, & Lee, 2000; Reckase, 2009). These linking methods are all separate calibration methods, and are different from each other in the way they define and estimate the transformation functions. Separate calibration for MIRT models is more complicated than that for UIRT models because the rotation of the coordinate system should be considered, as well as the shift in the origin and the change of the units of measurement. In contrast to separate calibration, however, concurrent and fixed parameter calibration for MIRT models have been a very infrequent topic in the literature. Two-Tier Item Factor Analysis Model The model of interest in this study is the two-tier item factor analysis model proposed by Cai (2010), which is a MIRT model that involves specific (i.e., secondary) factors to handle residual dependence among test items that exist beyond the general (i.e., primary) factors. As discussed by DeMars (2013), the bifactor model (i.e., the two-tier model with a single general factor) may represent nuisance factors, such as background knowledge specific to a testlet or interest level specific to a set of items, or represent specific skills related to a common content domain (e.g., subscales). The structure of the two-tier model is similar to the bifactor model (Cai, Yang, & Hansen, 2011; Gibbons & Hedeker, 1992) and the testlet response theory model (Bradlow, Wainer, & Wang, 1999; Wainer, Bradlow, & Du, 2000; Wainer, Bradlow, & Wang, 2007). However, one major difference is that the two-tier model can have more than one general factor, which may be 8

28 correlated among themselves. Thus, the two-tier model can be regarded as a more general model than the bifactor and testlet response theory models. Cai (2010) provided several examples to illustrate the applicability of the two-tier model, such as modeling the reading and mathematics sections of the Program for International Student Assessment (PISA), which consists of testlet-based items, and modeling longitudinal item response data that are obtained by administering the same measurement instrument to the same group of respondents on multiple occasions. One benefit of the two-tier model is its computational efficiency. Compared to the full MIRT model, the two-tier model involves fewer number of integrations when computing the marginal likelihood function. For example, assuming M general dimensions and S specific dimensions, the marginal likelihood function for the two-tier model involves no more than M + 1 integrals. Choosing P quadrature points for each dimension, this leads to at most P klm function evaluations. On the other hand, if M + S dimensions are used with a full multidimensional IRT model, then a (M + S)-dimensional integral needs to be solved. This requires P kln function evaluations, which is generally much larger than P klm. Purpose of the Study The initial presentation of the two-tier model in Cai (2010) was restricted to single-group calibration. Extending Cai s work to include multiple-group calibration would enable the two-tier model to be used in linking, equating, and differential item functioning. Thus, under the CINEG design, the purpose of this dissertation is twofold: (a) to extend the concurrent and fixed parameter calibration methods for UIRT models to 9

29 the two-tier model; and (b) to compare the relative performance of the separate, concurrent, and fixed parameter calibration methods for the bifactor model in terms of the recovery of item parameters and accuracy of IRT observed score equating results. Although the concurrent and fixed parameter calibration methods are described for the more general two-tier model, the reason for comparing the two linking methods and the separate calibration method using the bifactor model is because the bifactor model is more suitable for analyzing or equating assessments that measure a single general / primary construct. As mentioned earlier, the specific factors in the two-tier and bifactor models are used to explicitly model residual dependencies among a group of items which exist beyond the general factor and may represent unintended nuisance factors or intended specific skills related to a content domain, such as subscales. More specific research questions are provided below. Research Questions Answers to the following research questions will be sought in this study: 1. How do the IRT observed score equating results produced by the bifactor model with the separate, concurrent, and fixed parameter calibration methods compare to those produced by the unidimensional two-parameter logistic (2PL) models with the same three linking methods? 2. How do the three linking methods with the bifactor model perform in terms of the recovery of item parameters and the accuracy of equating for tests with varied lengths of common items? 10

30 3. Does the sample size affect the three linking methods with the bifactor model in terms of the recovery of item parameters and the accuracy of equating results? 4. To what extent does group difference impact the recovery of item parameters and the accuracy of equating with the bifactor model? 5. What are the advantages and disadvantages of each linking method with the bifactor model from both a theoretical and practical viewpoint? The first research question will be addressed using real data, while the other four research questions will be addressed using simulated data. 11

31 CHAPTER 2 LITERATURE REVIEW This chapter consists of five sections and provides a review of UIRT and MIRT linking methods. The first two sections provide a detailed description of three UIRT linking methods, which are the separate, concurrent, and fixed parameter calibration methods, and a summary of previous work on these three linking methods. Then, in the last three sections, the indeterminacies in MIRT models, the orthogonal Procrustes problem, and several MIRT linking methods are presented. Throughout this chapter, it is assumed that the purpose of linking is to place the item parameters of Form X on the coordinate system of Form Y. Note that Form X will be used interchangeably with new form, and Form Y with base form. In addition, the groups that take Forms X and Y will be referred to as the new and base groups, respectively. Unidimensional IRT Linking Methods Suppose that there are two _-scales, one on the scale of Form X and the other on the scale of Form Y. The two _-scales are linearly related as follows: _ <o = S_ po + B, 2.1 where S and B are the slope and intercept of the linear equation, respectively, and _ po and _ <o are respectively the _-values of examinee A on the scale of Form X and Form Y. Assuming the 3PL model, the item parameters are also related as follows: and 0 <q = 0 p q S, <q = S3 pq + B,

32 4 <q = 4 pq, 2.4 where 0 pq, 3 pq, and 4 pq are respectively the item discrimination, item difficulty, and pseudo-guessing parameters for item E on the scale of Form X, and 0 <q, 3 <q, and 4 <q are the same set of parameters for item E on the scale of Form Y. These two sets of parameters produce the same probability of correct response: M m (_ <o, 0 <q, 3 <q, 4 <q ) = 4 <q <t exp 80 <q _ <o 3 <q 1 + exp 80 <q _ <o 3 <q 0 pq exp 8 S = 4 pq pq 1 + exp 8 0 p q S = 4 pq pq exp 80 pq _ po 3 pq 1 + exp 80 pq _ po 3 pq S_ po + B S3 pq + B S_ po + B S3 pq + B = M m (_ po, 0 pq, 3 pq, 4 pq ), 2.5 where M m (_ po, 0 pq, 3 pq, 4 pq ) and M m (_ <o, 0 <q, 3 < y, 4 < q ) are the probabilities of examinee A correctly answering to item E on two different scales, Forms X and Y. Note that the subscript G = 1 in M N is the score category (i.e., correct response for a dichotomous item). Separate Calibration In separate calibration, two sets of parameters for the common items that are on different _-scales are used to find S and B. For example, using Equations 2.2 and 2.3, S and B can be computed with any common item E as follows: and S = 0 p q 0 <q 2.6 B = 3 <q S3 pq

33 However, item parameters are generally unknown, and therefore, should be estimated. When using estimates instead of parameters, different choices of a common item can lead to different values of S and B. For this reason, it is better to use all the common items to find the relationship between two different scales when only item parameter estimates are available. From Equations 2.2 and 2.3, it follows that or S = b 0 p b 0 < 2.8 and S = d 3 p d 3 <, 2.9 B = b 3 < Sb 3 p, 2.10 where b( ) and d( ) are the sample mean and sample standard deviation of the parameter estimates for the common items, respectively. Following Kolen and Brennan (2014), the scale transformation method that uses Equations 2.8 and 2.10 will be referred to as the mean/mean method (Loyd & Hoover, 1980), and the method that uses Equations 2.9 and 2.10 as the mean/sigma method (Marco, 1977). When using the values of S and B obtained with the item parameter estimates for scale transformation, there is no guarantee that Equations 2.1 through 2.4 hold. That is, it is likely that the item parameter estimates on the scale of Form Y due to linear transformation will not be identical to the original item parameter estimates on the scale of Form Y. To differentiate the two sets of item parameter estimates, the original item parameter estimates will be denoted by (0 <q, 3 <q, 4 <q ), and the linearly transformed estimates by (0 <q, 3 <q, 4 <q ). 14

34 One shortcoming of the mean/mean and mean/sigma methods is that these methods do not consider all the item parameter estimates simultaneously (Kolen & Brennan, 2014). To overcome this shortcoming, Haebara (1980) and Stocking and Lord (1983) developed two different scale transformation methods based on characteristic curves. The Haebara method seeks the values of S and B that minimize the sum of the squared differences between the item characteristic curves over all the common items and a selected set of _-values (i.e., quadrature points); that is, = M m _ <, 0 <q, 3 <q, 4 <q M m _ <, 0 <q, 3 <q, 4 <q, 2.11 Ç É y Å where X denotes the common-item set. The values of S and B that minimize Equation 2.11 can be found by solving e Ñ ÑS Ñ ÑB = Ö 2.12 numerically using the Newton's method (Süli & Mayers, 2006, p. 118). The Stocking-Lord method is similar to the Haebara method, except that it minimizes the squared difference between the sum of the item characteristic curves over a selected set of _-values: SÜ = M m _ <, 0 <q, 3 <q, 4 <q M m _ <, 0 <q, 3 <q, 4 <q Ç É y Å y Å Because the sum of the item characteristic curves is the test characteristic curve, the Stocking-Lord method is often described as the sum of the squared difference between the test characteristic curves. However, it is worth mentioning that the test characteristic curves used for the Stocking-Lord method are computed using only the common items. e 15

35 Concurrent Calibration When calibrating items in two test forms that are administered to two nonequivalent groups, the item parameter estimates that result from two separate runs of an IRT estimation program are not on the same scale because, for each computer run, the distribution of _ is set to a standard normal distribution. In concurrent calibration, this scale problem is handled by estimating the distributions of _ for the two nonequivalent groups and the item parameters for the two test forms simultaneously. Bock and Zimowski (1997) presented a concurrent calibration procedure, which is also welldocumented in Baker and Kim (2004), that assumes continuous distributions for the two groups and obtains maximum likelihood estimates for the item parameters and distributions of _. Another procedure that yields essentially the same item parameter estimates as the Bock-Zimowski procedure can be derived by extending the single-group calibration procedure presented in Woodruff and Hanson (1997) to multiple groups. The Woodruff-Hanson procedure, which is presented below, assumes that _ is a discrete variable and uses the EM algorithm (Dempster, Laird, & Rubin, 1977) to estimate the item parameters and distributions of _. The EM algorithm consists of two steps: the E-step and the M-step. The expectation of the complete data log-likelihood is evaluated in the E-step and maximized in the M-step. Because the derivation of the expectation of the complete data loglikelihood is somewhat tedious, only the final results of the E- and M-steps are provided here. The E-step at iteration Q of the EM algorithm consists of computing two quantities á àâ ä ä and Q àynâ for each group?: 16

36 å ç á àâ ä = > Y àâ W àã, ^ ä, h à ä ãém = å ç ãém > W àã Y àâ, ^ ä g àâ ä ë â ê ém > W àã Y àâ ê, ^(ä) ä g àâ ê 2.14 and ä Q àynâ å ç = í N V àãy > Y àâ W àã, ^ ä, h à ä ãém å ç = í N V àãy > W àã Y àâ, ^ ä g àâ ä ãém ë â ê ém > W àã Y àâ ê, ^ ä ä g àâ ê, 2.15 where G denotes the score category; K à is the number of examinees in group?; V àãy is the item response of examinee A in group? to item E; W àã = V àãm,, V àãî is the item response vector of examinee A in group? to all F items; í N V àãy = 1 if V àãy = G, and otherwise 0; ^ ä is the collection of the provisional item parameter estimates obtained at iteration Q 1 of the EM algorithm; P is the number of quadrature points; Y àâ is the Oth quadrature point for group?; and g ä àâ is the quadrature weight for group? corresponding to Y àâ estimated at iteration Q 1 of the EM algorithm. In addition, î ó q > W àã Y àâ, ^ ä = 4 ày M N Y àâ, ] y ä yém Ném í ï ñ çoq, 2.16 where H y is the number of score categories for item E; 4 ày = 1 if Group? takes item E, and otherwise 0; ] y ä is the vector of provisional parameter estimates for item E obtained at iteration Q 1; M N Y àâ, ] y ä is the probability of responding to category G of item E for an examinee with ability Y àâ ; and the subscript O ò in Equations 2.14 and 2.15 is used ë ä for the sum of > W àã _ àâ ê, ^ ä â ê ém g àâ ê over all possible values of the quadrature points, whereas the subscript O is used to denote a specific quadrature point. The quantity 17

37 á àâ can be conceived as the expected number of examinees in group? with ability Y àâ, and the quantity Q àynâ as the expected number of examinees in group? with ability Y àâ who responded to score category G of item E. Using á àâ ä ä and Q àynâ obtained from the E-step, the M-step computes the value of ^, denoted as ^ älm, that maximizes ù ë å ç ô ^ = log > W àã Y àâ, ^ > Y àâ W àã, ^ ä, g ä àém âém ãém ù ë î ó q ä = log M N Y àâ, ] y Q àynâ àém âém yém Ném ù = ô à ^ àém, 2.17 is the number of groups. Equation 2.17 can be maximized by employing an optimization technique such as the Newton's method, which requires the first and second derivatives of ô ^ with respect to the item parameters. Note that the parameters for each item can be estimated independently because of the local independence assumption of UIRT. Note also that when differentiating ô ^ with respect to any particular item parameter, only the groups that take that item are involved in the calibration. For example, if only Group 1 takes item E, then and if Groups 1 and 2 (i.e., common item) take item E, then Ñô ^ Ñ] y = Ñô m ^ Ñ] y, 2.18 Ñô ^ Ñ] y = Ñô m ^ Ñ] y + Ñô e ^ Ñ] y

38 The distributions of _ for groups are also estimated in the M-step. Because _ is a discrete random variable, estimating the distributions for the base and new groups are equivalent to estimating the weights for the P quadrature points. After some tedious calculation, it can be shown that the estimated weights for the P quadrature points are g àâ älm å ç = á ä àâ = 1 > Y K à K àâ W àã, ^ ä, h ä à ãém, O = 1,, P Equation 2.20 is the same as those presented in Bock and Aitkin (1981) and Mislevy (1984) for single-group calibration. These estimated distributions, which are referred to as the empirical distributions, are on the same _-scale because the parameters for the common items that are used for the computation of á àâ ä are estimated using item responses obtained from all groups. After estimating the distributions of _ in the M-step, the distribution for the base group is linearly transformed to have a specific mean and standard deviation (e.g., a mean of 0 and a standard deviation of 1). The purpose of the linear transformation is to fix the _-scale. Finally, the same transformation is applied to the distributions for the other groups to maintain the relative locations of all the distributions, and the item parameter estimates are also linearly transformed to maintain the response probabilities. These updated item parameter estimates and distributions of _ are used at the next EM cycle, iteration Q + 1, until the EM algorithm converges. calibration: The following is the step-by-step procedures for implementing concurrent 1. Specify an initial value for ^ û and an initial distribution of _ for each group. IRT estimation programs often use a standard normal distribution. 2. Compute á ä ä àâ and Q àynâ for each group (E-step). 19

39 3. Using the values obtained from Step 2, find ^ älm that maximizes ô ^ (Mstep). 4. Compute h à älm = g àm älm,, g àë älm for each group (M-step). 5. Linearly transform the quadrature points for the base group so that the mean and standard deviation become 0 and 1, respectively. 6. Apply the same linear transformation to the quadrature points for the other groups. Also, to maintain the response probabilities of the IRT model, linearly transform the item parameter estimates. 7. Using the updated values ^ älm and h älm à, repeat Steps 2 through 6 until the EM algorithm converges. Many IRT calibration programs, such as BILOG-MG (Zimowski, Muraki, Mislevy, & Bock, 2003), PARSCALE (Muraki & Bock, 2003), and IRT command language (ICL; Hanson, 2002), estimate the distribution of _ for each group empirically when conducting concurrent calibration. On the other hand, computer programs such as flexmirt (Cai, 2017) and MULTILOG (Thissen, Chen, & Bock, 2003) by default set the _-distribution for each group to the normal distribution, and estimate the mean and standard deviation of the distribution. This estimation procedure will be further discussed when describing the concurrent calibration method for the two-tier model. Fixed Parameter Calibration In fixed parameter calibration, the item parameter estimates for the new form are placed on the scale of the base form by fixing the parameters for the common items at their previously estimated values, and calibrating the non-common items using the latent 20

40 variable distribution estimated with the fixed parameter estimates. This idea can be concretized in various ways, depending on the number of EM cycles and the number of distribution updates. Five fixed parameter estimation methods are well documented in Kim (2006), and among the five methods, only the Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM) method is presented here because of its better performance. The MWU-MEM method continuously updates the latent variable distribution for the new group and the parameters for the non-common items in the new form until the user-specified convergence criterion of the EM algorithm is met. To properly implement the MWU-MEM method, the first EM cycle should be conducted slightly differently from the other EM cycles. In the first EM cycle, the function to maximize in the M-step is ë å ô ^ü = log > W ãü Y â, ^ü > Y â W ã, ^, h û âém ãém, 2.21 where ^ü is the parameters for the non-common (i.e., unique) items in the new form; W ãü is examinee A s responses to the non-common items; ^ is the parameters for the common items that are fixed; W ã is examinee A s responses to the common items; and h (û) is a vector of initial quadrature weights for the new group, which is usually set to the normalized densities of a standard normal distribution. Additionally, the latent variable distribution for the new group is estimated in the M-step as follows: g â m å = 1 > Y K â W ã, ^, h û ãém, O = 1,, P Then, beginning from the second EM cycle, the function to maximize is 21

41 ë å ô ^ü = log > W ãü Y â, ^ü > Y â W ã, ^, ^ü ä, h ä, 2.23 âém ãém and the latent variable distribution for the new group is estimated as å g â älm = 1 K > Y â W ã, ^, ^ü ä, h ä, O = 1,, P ãém The difference between the first and the other EM cycles is that only the parameter estimates for the common items are used in the first EM cycle to estimate the latent variable distribution for the new group. This is to place the distribution for the new group on the scale of the base form during the first EM cycle. However, once the distribution is placed on the desired scale, both the non-common and common items are used to estimate the latent variable distribution because using more items would better recover the distribution. Note that the item parameters that are estimated at each M-step of the EM algorithm using the estimated distribution will also be on the scale of the base form due to the latent variable distribution being on the scale of the base form. Comparison of Unidimensional IRT Linking Methods This section summarizes six studies that compared at least two of the three aforementioned linking methods under the CINEG design. Note that, from each study, only the results that are related to the three linking methods are provided here. Petersen, Cook, and Stocking (1983) IRT true score equating results obtained with the separate and concurrent calibration methods were compared under the CINEG design. Six SAT assessment 22

42 Verbal forms and six SAT Mathematics forms were used in this study. Denoting the six forms of each of the two tests V4, X2, Y3, B3, Y2, and Z5, six separate equatings were conducted in the order of V4 à X2 à Y3 à B3 à Y2 à Z5 à V4. For concurrent calibration, successive pairs of test forms were calibrated concurrently; for separate calibration, the same pair of test forms were calibrated separately and the Stocking-Lord method was used for scale transformation. These two linking methods resulted in item parameter estimates that were on a common scale for each pair of test forms and allowed IRT true score equating. The evaluation criterion was weighted mean squared difference (WMSD) between the existing and estimated scale-score conversions. Using the computer program LOGIST (Wingersky, Barton, & Lord, 1976), Petersen et al. (1983) found that concurrent calibration produces smaller values of WMSD than separate calibration. Kim and Cohen (1998) Separate calibration with the Stocking-Lord method was compared to concurrent calibration under the CINEG design using a simulation study. Four levels of the number of common items (5, 10, 25, and 50) for a test with 50 items were considered in the study. The _-values for the base group were sampled from a N(0, 1) distribution, and two sets of _-values for the new group were sampled from N(0, 1) and N(1, 1) distributions. The evaluation criteria were root mean squared difference (RMSD) between the item parameter estimates and the generating item parameters and mean Euclidian distance (MED). Stacking the item discrimination and difficulty parameters in a single vector, 23

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 41 A Comparative Study of Item Response Theory Item Calibration Methods for the Two Parameter Logistic Model Kyung

More information

Multidimensional item response theory observed score equating methods for mixed-format tests

Multidimensional item response theory observed score equating methods for mixed-format tests University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 Multidimensional item response theory observed score equating methods for mixed-format tests Jaime Leigh Peterson University

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 23 Comparison of Three IRT Linking Procedures in the Random Groups Equating Design Won-Chan Lee Jae-Chun Ban February

More information

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated

More information

COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS

COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Mayuko Kanada Simon IN PARTIAL

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are

More information

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture

More information

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Yu-Feng Chang A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Yu-Feng Chang IN PARTIAL FULFILLMENT

More information

Ability Metric Transformations

Ability Metric Transformations Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three

More information

Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design

Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent groups design University of Iowa Iowa Research Online Theses and Dissertations 27 Effectiveness of the hybrid Levine equipercentile and modified frequency estimation equating methods under the common-item nonequivalent

More information

Multidimensional Linking for Tests with Mixed Item Types

Multidimensional Linking for Tests with Mixed Item Types Journal of Educational Measurement Summer 2009, Vol. 46, No. 2, pp. 177 197 Multidimensional Linking for Tests with Mixed Item Types Lihua Yao 1 Defense Manpower Data Center Keith Boughton CTB/McGraw-Hill

More information

Development and Calibration of an Item Response Model. that Incorporates Response Time

Development and Calibration of an Item Response Model. that Incorporates Response Time Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,

More information

ROBUST SCALE TRNASFORMATION METHODS IN IRT TRUE SCORE EQUATING UNDER COMMON-ITEM NONEQUIVALENT GROUPS DESIGN

ROBUST SCALE TRNASFORMATION METHODS IN IRT TRUE SCORE EQUATING UNDER COMMON-ITEM NONEQUIVALENT GROUPS DESIGN ROBUST SCALE TRNASFORMATION METHODS IN IRT TRUE SCORE EQUATING UNDER COMMON-ITEM NONEQUIVALENT GROUPS DESIGN A Dissertation Presented to the Faculty of the Department of Educational, School and Counseling

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 37 Effects of the Number of Common Items on Equating Precision and Estimates of the Lower Bound to the Number of Common

More information

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006 LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master

More information

The Discriminating Power of Items That Measure More Than One Dimension

The Discriminating Power of Items That Measure More Than One Dimension The Discriminating Power of Items That Measure More Than One Dimension Mark D. Reckase, American College Testing Robert L. McKinley, Educational Testing Service Determining a correct response to many test

More information

Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory

Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory University of South Carolina Scholar Commons Theses and Dissertations 2016 Some Issues In Markov Chain Monte Carlo Estimation For Item Response Theory Han Kil Lee University of South Carolina Follow this

More information

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence

The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence A C T Research Report Series 87-14 The Robustness of LOGIST and BILOG IRT Estimation Programs to Violations of Local Independence Terry Ackerman September 1987 For additional copies write: ACT Research

More information

Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities

Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities Multidimensional Computerized Adaptive Testing in Recovering Reading and Mathematics Abilities by Yuan H. Li Prince Georges County Public Schools, Maryland William D. Schafer University of Maryland at

More information

Raffaela Wolf, MS, MA. Bachelor of Science, University of Maine, Master of Science, Robert Morris University, 2008

Raffaela Wolf, MS, MA. Bachelor of Science, University of Maine, Master of Science, Robert Morris University, 2008 Assessing the Impact of Characteristics of the Test, Common-items, and Examinees on the Preservation of Equity Properties in Mixed-format Test Equating by Raffaela Wolf, MS, MA Bachelor of Science, University

More information

Choice of Anchor Test in Equating

Choice of Anchor Test in Equating Research Report Choice of Anchor Test in Equating Sandip Sinharay Paul Holland Research & Development November 2006 RR-06-35 Choice of Anchor Test in Equating Sandip Sinharay and Paul Holland ETS, Princeton,

More information

The Difficulty of Test Items That Measure More Than One Ability

The Difficulty of Test Items That Measure More Than One Ability The Difficulty of Test Items That Measure More Than One Ability Mark D. Reckase The American College Testing Program Many test items require more than one ability to obtain a correct response. This article

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 31 Assessing Equating Results Based on First-order and Second-order Equity Eunjung Lee, Won-Chan Lee, Robert L. Brennan

More information

Item Response Theory (IRT) Analysis of Item Sets

Item Response Theory (IRT) Analysis of Item Sets University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen

PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS. Mary A. Hansen PREDICTING THE DISTRIBUTION OF A GOODNESS-OF-FIT STATISTIC APPROPRIATE FOR USE WITH PERFORMANCE-BASED ASSESSMENTS by Mary A. Hansen B.S., Mathematics and Computer Science, California University of PA,

More information

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University

A Markov chain Monte Carlo approach to confirmatory item factor analysis. Michael C. Edwards The Ohio State University A Markov chain Monte Carlo approach to confirmatory item factor analysis Michael C. Edwards The Ohio State University An MCMC approach to CIFA Overview Motivating examples Intro to Item Response Theory

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT. Louisiana State University LSU Digital Commons LSU Historical Dissertations and Theses Graduate School 1998 Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Doctor of Philosophy

Doctor of Philosophy MAINTAINING A COMMON ARBITRARY UNIT IN SOCIAL MEASUREMENT STEPHEN HUMPHRY 2005 Submitted in fulfillment of the requirements of the degree of Doctor of Philosophy School of Education, Murdoch University,

More information

The robustness of Rasch true score preequating to violations of model assumptions under equivalent and nonequivalent populations

The robustness of Rasch true score preequating to violations of model assumptions under equivalent and nonequivalent populations University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 2008 The robustness of Rasch true score preequating to violations of model assumptions under equivalent and

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Hierarchical Cognitive Diagnostic Analysis: Simulation Study Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 38 Hierarchical Cognitive Diagnostic Analysis: Simulation Study Yu-Lan Su, Won-Chan Lee, & Kyong Mi Choi Dec 2013

More information

A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating

A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating Tianyou Wang and Michael J. Kolen American College Testing A quadratic curve test equating method for equating

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

An Overview of Item Response Theory. Michael C. Edwards, PhD

An Overview of Item Response Theory. Michael C. Edwards, PhD An Overview of Item Response Theory Michael C. Edwards, PhD Overview General overview of psychometrics Reliability and validity Different models and approaches Item response theory (IRT) Conceptual framework

More information

GODFREY, KELLY ELIZABETH, Ph.D. A Comparison of Kernel Equating and IRT True Score Equating Methods. (2007) Directed by Dr. Terry A. Ackerman. 181 pp.

GODFREY, KELLY ELIZABETH, Ph.D. A Comparison of Kernel Equating and IRT True Score Equating Methods. (2007) Directed by Dr. Terry A. Ackerman. 181 pp. GODFREY, KELLY ELIZABETH, Ph.D. A Comparison of Kernel Equating and IRT True Score Equating Methods. (7) Directed by Dr. Terry A. Ackerman. 8 pp. This two-part study investigates ) the impact of loglinear

More information

IRT Model Selection Methods for Polytomous Items

IRT Model Selection Methods for Polytomous Items IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running

More information

Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments

Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments Jonathan Templin The University of Georgia Neal Kingston and Wenhao Wang University

More information

Experimental designs for multiple responses with different models

Experimental designs for multiple responses with different models Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:

More information

examples of how different aspects of test information can be displayed graphically to form a profile of a test

examples of how different aspects of test information can be displayed graphically to form a profile of a test Creating a Test Information Profile for a Two-Dimensional Latent Space Terry A. Ackerman University of Illinois In some cognitive testing situations it is believed, despite reporting only a single score,

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory

More information

Equating Subscores Using Total Scaled Scores as an Anchor

Equating Subscores Using Total Scaled Scores as an Anchor Research Report ETS RR 11-07 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan Longjuan Liang March 2011 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan and

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

Research on Standard Errors of Equating Differences

Research on Standard Errors of Equating Differences Research Report Research on Standard Errors of Equating Differences Tim Moses Wenmin Zhang November 2010 ETS RR-10-25 Listening. Learning. Leading. Research on Standard Errors of Equating Differences Tim

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

Equivalency of the DINA Model and a Constrained General Diagnostic Model

Equivalency of the DINA Model and a Constrained General Diagnostic Model Research Report ETS RR 11-37 Equivalency of the DINA Model and a Constrained General Diagnostic Model Matthias von Davier September 2011 Equivalency of the DINA Model and a Constrained General Diagnostic

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. A Multinomial Error Model for Tests with Polytomous Items

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. A Multinomial Error Model for Tests with Polytomous Items Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 1 for Tests with Polytomous Items Won-Chan Lee January 2 A previous version of this paper was presented at the Annual

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 25 A Analysis of a Large-scale Reading Comprehension Test. Dongmei Li Robert L. Brennan August 2007 Robert L.Brennan

More information

LSAC RESEARCH REPORT SERIES. Law School Admission Council Research Report March 2008

LSAC RESEARCH REPORT SERIES. Law School Admission Council Research Report March 2008 LSAC RESEARCH REPORT SERIES Structural Modeling Using Two-Step MML Procedures Cees A. W. Glas University of Twente, Enschede, The Netherlands Law School Admission Council Research Report 08-07 March 2008

More information

Observed-Score "Equatings"

Observed-Score Equatings Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings" Frederic M. Lord and Marilyn S. Wingersky Educational Testing Service Two methods of equating tests are compared, one using true

More information

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore ABSTRACT Title of Document: A MIXTURE RASCH MODEL WITH A COVARIATE: A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION Yunyun Dai, Doctor of Philosophy, 2009 Directed By: Professor, Robert

More information

Paradoxical Results in Multidimensional Item Response Theory

Paradoxical Results in Multidimensional Item Response Theory UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model

More information

The performance of estimation methods for generalized linear mixed models

The performance of estimation methods for generalized linear mixed models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear

More information

Use of Robust z in Detecting Unstable Items in Item Response Theory Models

Use of Robust z in Detecting Unstable Items in Item Response Theory Models A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to

More information

The application and empirical comparison of item. parameters of Classical Test Theory and Partial Credit. Model of Rasch in performance assessments

The application and empirical comparison of item. parameters of Classical Test Theory and Partial Credit. Model of Rasch in performance assessments The application and empirical comparison of item parameters of Classical Test Theory and Partial Credit Model of Rasch in performance assessments by Paul Moloantoa Mokilane Student no: 31388248 Dissertation

More information

A comparison of two estimation algorithms for Samejima s continuous IRT model

A comparison of two estimation algorithms for Samejima s continuous IRT model Behav Res (2013) 45:54 64 DOI 10.3758/s13428-012-0229-6 A comparison of two estimation algorithms for Samejima s continuous IRT model Cengiz Zopluoglu Published online: 26 June 2012 # Psychonomic Society,

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example

A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example Robert L. Brennan CASMA University of Iowa June 10, 2012 On May 3, 2012, the author made a PowerPoint presentation

More information

Test Equating under the Multiple Choice Model

Test Equating under the Multiple Choice Model Test Equating under the Multiple Choice Model Jee Seon Kim University of Illinois at Urbana Champaign Bradley A. Hanson ACT,Inc. May 12,2000 Abstract This paper presents a characteristic curve procedure

More information

Computerized Adaptive Testing With Equated Number-Correct Scoring

Computerized Adaptive Testing With Equated Number-Correct Scoring Computerized Adaptive Testing With Equated Number-Correct Scoring Wim J. van der Linden University of Twente A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to

More information

Pairwise Parameter Estimation in Rasch Models

Pairwise Parameter Estimation in Rasch Models Pairwise Parameter Estimation in Rasch Models Aeilko H. Zwinderman University of Leiden Rasch model item parameters can be estimated consistently with a pseudo-likelihood method based on comparing responses

More information

Application of Item Response Theory Models for Intensive Longitudinal Data

Application of Item Response Theory Models for Intensive Longitudinal Data Application of Item Response Theory Models for Intensive Longitudinal Data Don Hedeker, Robin Mermelstein, & Brian Flay University of Illinois at Chicago hedeker@uic.edu Models for Intensive Longitudinal

More information

Assessment of fit of item response theory models used in large scale educational survey assessments

Assessment of fit of item response theory models used in large scale educational survey assessments DOI 10.1186/s40536-016-0025-3 RESEARCH Open Access Assessment of fit of item response theory models used in large scale educational survey assessments Peter W. van Rijn 1, Sandip Sinharay 2*, Shelby J.

More information

Online Item Calibration for Q-matrix in CD-CAT

Online Item Calibration for Q-matrix in CD-CAT Online Item Calibration for Q-matrix in CD-CAT Yunxiao Chen, Jingchen Liu, and Zhiliang Ying November 8, 2013 Abstract Item replenishment is important to maintaining a large scale item bank. In this paper

More information

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS by CIGDEM ALAGOZ (Under the Direction of Seock-Ho Kim) ABSTRACT This study applies item response theory methods to the tests combining multiple-choice

More information

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010 Summer School in Applied Psychometric Principles Peterhouse College 13 th to 17 th September 2010 1 Two- and three-parameter IRT models. Introducing models for polytomous data. Test information in IRT

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Use of e-rater in Scoring of the TOEFL ibt Writing Test

Use of e-rater in Scoring of the TOEFL ibt Writing Test Research Report ETS RR 11-25 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman June 2011 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman ETS, Princeton,

More information

Equating of Subscores and Weighted Averages Under the NEAT Design

Equating of Subscores and Weighted Averages Under the NEAT Design Research Report ETS RR 11-01 Equating of Subscores and Weighted Averages Under the NEAT Design Sandip Sinharay Shelby Haberman January 2011 Equating of Subscores and Weighted Averages Under the NEAT Design

More information

Reduced [tau]_n-factorizations in Z and [tau]_nfactorizations

Reduced [tau]_n-factorizations in Z and [tau]_nfactorizations University of Iowa Iowa Research Online Theses and Dissertations Summer 2013 Reduced [tau]_n-factorizations in Z and [tau]_nfactorizations in N Alina Anca Florescu University of Iowa Copyright 2013 Alina

More information

DIAGNOSTIC MEASUREMENT FROM A STANDARDIZED MATH ACHIEVEMENT TEST USING MULTIDIMENSIONAL LATENT TRAIT MODELS

DIAGNOSTIC MEASUREMENT FROM A STANDARDIZED MATH ACHIEVEMENT TEST USING MULTIDIMENSIONAL LATENT TRAIT MODELS DIAGNOSTIC MEASUREMENT FROM A STANDARDIZED MATH ACHIEVEMENT TEST USING MULTIDIMENSIONAL LATENT TRAIT MODELS A Thesis Presented to The Academic Faculty by HeaWon Jun In Partial Fulfillment of the Requirements

More information

What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015

What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015 What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015 Our course Initial conceptualisation Separation of parameters Specific

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

In this lesson, students model filling a rectangular

In this lesson, students model filling a rectangular NATIONAL MATH + SCIENCE INITIATIVE Mathematics Fill It Up, Please Part III Level Algebra or Math at the end of a unit on linear functions Geometry or Math as part of a unit on volume to spiral concepts

More information

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov Pliska Stud. Math. Bulgar. 19 (2009), 59 68 STUDIA MATHEMATICA BULGARICA ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES Dimitar Atanasov Estimation of the parameters

More information

BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES

BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES Behaviormetrika Vol.36, No., 2009, 27 48 BAYESIAN IRT MODELS INCORPORATING GENERAL AND SPECIFIC ABILITIES Yanyan Sheng and Christopher K. Wikle IRT-based models with a general ability and several specific

More information

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach CHI HANG AU & ALLISON AMES, PH.D. 1 Acknowledgement Allison Ames, PhD Jeanne Horst, PhD 2 Overview Features of the

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Because it might not make a big DIF: Assessing differential test functioning

Because it might not make a big DIF: Assessing differential test functioning Because it might not make a big DIF: Assessing differential test functioning David B. Flora R. Philip Chalmers Alyssa Counsell Department of Psychology, Quantitative Methods Area Differential item functioning

More information

USING BAYESIAN TECHNIQUES WITH ITEM RESPONSE THEORY TO ANALYZE MATHEMATICS TESTS. by MARY MAXWELL

USING BAYESIAN TECHNIQUES WITH ITEM RESPONSE THEORY TO ANALYZE MATHEMATICS TESTS. by MARY MAXWELL USING BAYESIAN TECHNIQUES WITH ITEM RESPONSE THEORY TO ANALYZE MATHEMATICS TESTS by MARY MAXWELL JIM GLEASON, COMMITTEE CHAIR STAVROS BELBAS ROBERT MOORE SARA TOMEK ZHIJIAN WU A DISSERTATION Submitted

More information

AN INVESTIGATION OF INVARIANCE PROPERTIES OF ONE, TWO AND THREE PARAMETER LOGISTIC ITEM RESPONSE THEORY MODELS

AN INVESTIGATION OF INVARIANCE PROPERTIES OF ONE, TWO AND THREE PARAMETER LOGISTIC ITEM RESPONSE THEORY MODELS Bulgarian Journal of Science and Education Policy (BJSEP), Volume 11, Number 2, 2017 AN INVESTIGATION OF INVARIANCE PROPERTIES OF ONE, TWO AND THREE PARAMETER LOGISTIC ITEM RESPONSE THEORY MODELS O. A.

More information

Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment. Jingyu Liu

Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment. Jingyu Liu Comparing Multi-dimensional and Uni-dimensional Computer Adaptive Strategies in Psychological and Health Assessment by Jingyu Liu BS, Beijing Institute of Technology, 1994 MS, University of Texas at San

More information

Walkthrough for Illustrations. Illustration 1

Walkthrough for Illustrations. Illustration 1 Tay, L., Meade, A. W., & Cao, M. (in press). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods. doi: 10.1177/1094428114553062 Walkthrough for Illustrations

More information

flexmirt R : Flexible Multilevel Multidimensional Item Analysis and Test Scoring

flexmirt R : Flexible Multilevel Multidimensional Item Analysis and Test Scoring flexmirt R : Flexible Multilevel Multidimensional Item Analysis and Test Scoring User s Manual Version 3.0RC Authored by: Carrie R. Houts, PhD Li Cai, PhD This manual accompanies a Release Candidate version

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory

More information

Introduction to Confirmatory Factor Analysis

Introduction to Confirmatory Factor Analysis Introduction to Confirmatory Factor Analysis Multivariate Methods in Education ERSH 8350 Lecture #12 November 16, 2011 ERSH 8350: Lecture 12 Today s Class An Introduction to: Confirmatory Factor Analysis

More information

Classical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD

Classical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD Cal State Northridge Psy 30 Andrew Ainsworth, PhD Basics of Classical Test Theory Theory and Assumptions Types of Reliability Example Classical Test Theory Classical Test Theory (CTT) often called the

More information

2 Bayesian Hierarchical Response Modeling

2 Bayesian Hierarchical Response Modeling 2 Bayesian Hierarchical Response Modeling In the first chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful specification of priors since item

More information

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Impact of serial correlation structures on random effect misspecification with the linear mixed model. Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)

More information

ABSTRACT. Chair, Dr. Gregory R. Hancock, Department of. interactions as a function of the size of the interaction effect, sample size, the loadings of

ABSTRACT. Chair, Dr. Gregory R. Hancock, Department of. interactions as a function of the size of the interaction effect, sample size, the loadings of ABSTRACT Title of Document: A COMPARISON OF METHODS FOR TESTING FOR INTERACTION EFFECTS IN STRUCTURAL EQUATION MODELING Brandi A. Weiss, Doctor of Philosophy, 00 Directed By: Chair, Dr. Gregory R. Hancock,

More information

IOWA EQUATING SUMMIT

IOWA EQUATING SUMMIT Co-Hosted by ACTNext, ACT, and The University of Iowa Center for Advanced Studies in Measurement and Assessment IOWA EQUATING SUMMIT LINKING COMPARABILITY EQUATING SEPTEMBER 13, 2017 8:30AM - 5PM FERGUSON

More information

NESTED LOGIT MODELS FOR MULTIPLE-CHOICE ITEM RESPONSE DATA UNIVERSITY OF TEXAS AT AUSTIN UNIVERSITY OF WISCONSIN-MADISON

NESTED LOGIT MODELS FOR MULTIPLE-CHOICE ITEM RESPONSE DATA UNIVERSITY OF TEXAS AT AUSTIN UNIVERSITY OF WISCONSIN-MADISON PSYCHOMETRIKA VOL. 75, NO. 3, 454 473 SEPTEMBER 2010 DOI: 10.1007/S11336-010-9163-7 NESTED LOGIT MODELS FOR MULTIPLE-CHOICE ITEM RESPONSE DATA YOUNGSUK SUH UNIVERSITY OF TEXAS AT AUSTIN DANIEL M. BOLT

More information

Agile Mind Mathematics 8 Scope and Sequence, Texas Essential Knowledge and Skills for Mathematics

Agile Mind Mathematics 8 Scope and Sequence, Texas Essential Knowledge and Skills for Mathematics Agile Mind Mathematics 8 Scope and Sequence, 2014-2015 Prior to Grade 8, students have written and interpreted expressions, solved equations and inequalities, explored quantitative relationships between

More information

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm

The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm The Rasch Poisson Counts Model for Incomplete Data: An Application of the EM Algorithm Margo G. H. Jansen University of Groningen Rasch s Poisson counts model is a latent trait model for the situation

More information

Package equateirt. June 8, 2018

Package equateirt. June 8, 2018 Type Package Title IRT Equating Methods Imports statmod, stats, utils, mirt Suggests knitr, ltm, rmarkdown, sna Version 2.0-5 Author Package equateirt June 8, 2018 Maintainer

More information