Methods for the Comparison of DIF across Assessments W. Holmes Finch Maria Hernandez Finch Brian F. French David E. McIntosh

Size: px
Start display at page:

Download "Methods for the Comparison of DIF across Assessments W. Holmes Finch Maria Hernandez Finch Brian F. French David E. McIntosh"

Transcription

1 Methods for the Comparison of DIF across Assessments W. Holmes Finch Maria Hernandez Finch Brian F. French David E. McIntosh

2 Impact of DIF on Assessments Individuals administrating assessments must select assessments with the greatest evidence of validity for a given population. A primary component of validity evidence is fairness of assessments across subgroups. Evidence of fairness: A major form is differential item functioning (DIF) analysis.

3 Comparing levels of DIF across assessments Multiple instruments may exist for the same constructs. e.g., Intelligence; Self-efficacy, Learning Styles, Depression Users want to select the assessment that exhibits the least amount of DIF for subgroups. e.g., ethnicity, SES

4 Comparing levels of DIF across assessments DIF methods are effective for flagging DIF. DIF results do not yield easily comparable indices across assessments. Simple comparisons (e.g., # of DIF items across measures), do not provide information about the magnitude of DIF across assessments.

5 Effect size estimates to compare levels of DIF Study Goal: To examine several effect sizes to compare the collective amount of DIF across two or more psychological assessments. Types of effect sizes: Some developed expressly for this purpose Some already exist and we borrow from the literature.

6 Effect Sizes Present 5 different effect sizes: Drawn from different areas from traditional measures (e.g., d) and models (e.g., item response theory).

7 Random effects IRT model-based DIF An IRT model that incorporates a random effect for group membership and any DIF associated with it. P θ = exp 1.7a i θ j b i +Gξ i 1+exp 1.7a i θ j b i +Gξ i Where a i =Discrimination parameter for item i b i =Difficulty parameter for item i θ j =Level on the latent trait for examinee j G =Group membership with focal group=0 and reference group=1 ξ i =DIF effect for item i.

8 Random effects IRT model-based DIF When ξ i = 0 no DIF is present for item i. The log odds ratio, log α MHi, from the Mantel- Haenszel test of DIF is a good estimator of ξ i Camilli & Penfield, 1997 When there is no DIF for any of the items on a scale, the variance of ξ (across all items), τ 2 = 0.

9 Random effects IRT model based DIF I i=1 log α MHi μ 2 I 2 S i τ 2 i=1 = I Where μ =mean of log α MHi across the i items of the measure. S 2 i =Variance of log α MHi for each of the i items of the measure. The weighted estimator is: τ w 2 = Where w i 2 = S i 2 I 2 i=1 w i log αmhi μ 2 I 2 i=1 w i I i=1 w i 2

10 Comparing DIF using the random effects IRT model-based DIF Calculate τ 2 and τ w 2 for each assessment. Values reflect the total variance in the item responses associated with DIF. Calculate the difference for these values for each pair of assessments. Differences departing from 0 suggest varying levels of DIF in the assessments. Select the assessment with least amount of collective DIF for the target variable(s) of interest (e.g., ethnicity, SES).

11 Cohen s d for DIF In the context of meta-analysis, the log odds ratio can be easily converted to Cohen s d. Hasselblad & Hedges, 1995 The log of the odds ratios for each item on a scale obtained using the MH test for DIF can be converted to Cohen s d as: d i = log α MHi 3 π

12 Cohen s d for comparing DIF across assessments Calculate d i for each item; then calculate the mean across items to obtain d. Use absolute values of d i, then we have the unsigned effect size, d u. Use signed values of d i, then we have the signed effect size, d s.

13 Cohen s d The d reflects the average conditional difference in the likelihood of a correct response between the groups of interest. A commonly used and well understood scale.leading to easy interpretation. To ascertain the relative amount of DIF in the scales: Calculate d for each instrument Compare values to determine which has least amount of DIF Make your assessment selection.

14 Logistic regression R Δ 2 for comparing DIF across assessments The change in the variance accounted for in the logistic regression model (R Δ 2 ) for DIF detection group membership (uniform DIF) interaction of group and total score (nonuniform DIF) We propose averaging the R Δ 2 across the items to obtain R Δ 2 A measure of overall DIF in a set of items.

15 Logistic regression R Δ 2 for comparing DIF across assessments The R Δ 2 value shares with Cohen s d the advantage of being on a well known and easily interpretable scale. For comparing the collective DIF across assessments: Calculate R Δ 2 for each assessment Compare and make your assessment selection.

16 DIF effect size-- Steinberg and Thissen The difference in item parameter estimates for two groups can be a measure of DIF effect size. Steinberg &Thissen, 2006 Effect size is intuitive and provides easily interpretable results, particularly for measurement professionals. For uniform DIF, the difference in item difficulty parameters would serve as the effect size of interest.

17 DIF effect size-- Steinberg and Thissen For each item on each assessment, the difference between the reference and focal groups item difficulty values are calculated. The mean of these differences is calculated. The scale with the lower mean of the difference in difficulties is considered to have the least amount of DIF. Make your assessment selection.

18 SIBTEST SIBTEST is an effective tool for assessing both DIF and differential bundle functioning (DBF). The SIBTEST statistic for uniform DBF in a set of items is calculated as: K k=0 β Bundle = p k Y Rk Y Fk Where p k = Proportion of individuals with matching subtest score of k Y Rk = Adjusted mean score for the reference group on the bundle for individuals with matching subtest score k Y Fk = Adjusted mean score for the focal group on the bundle for individuals with matching subtest score k

19 SIBTEST to compare DIF across assessments β Bundle is a measure of the difference in conditional weighted mean performance for the items in a bundle between two groups. This statistic is calculated for each measure The one with the lower value is determined to contain the least amount of overall DIF. Make your assessment selection.

20 Summary of 5 effect size measures for differences in DIF Difference in τ 2 : Reflects the difference in variance in the scales associated with DIF. Difference in d : Reflects the difference in the average conditional likelihood of a correct response across items on the assessments. Difference in R Δ 2 : Reflects the difference in the average conditional proportion of variance in the item responses associated with group membership. Difference in S-T: Reflects the difference in the mean item difficulty differences between the two groups for whom DIF is assessed. Difference in β Bundle : Reflects the difference in the conditional difference in the groups scores on the assessments.

21 Simulation Study Results: Which one works? When assessments are of the same length, the methods are all equally able to detect which assessment contains more DIF. When assessments have different numbers of items, d and R Δ 2 are overly likely to indicate the shorter item contains more DIF. Concluded that we use τ 2, τ w 2 or β Bundle to make comparisons regarding the amount of DIF in two or more scales. Most accurate across a wide variety of conditions.

22 The current study--purpose Compare the amount of DIF on three separate assessments of intelligence that are commonly used by school psychologists. Participants were each given all 3 measures. Evaluate the various effect sizes with real data. Of particular interest was examining DIF associated with mother s education level.

23 Method Sample: 200 preschool children (103 females). Age: 4 years 0 months to 5 years 11 months, with a mean (standard deviation) age of months (5.38). 62% (124) were Caucasian 32% had mothers with a high school education or less.

24 Method The children were administered: Woodcock Johnson-III cognitive assessment battery (WJ-III) Kaufman Assessment Battery for Children-Second Edition (KABC-II) Stanford-Binet Intelligence Scales, Fifth Edition (SBV). All children received all measures: Counterbalancing of administration controlled for order effects.

25 Method Grouping variable: Mother s education level. Group 1: high school or less; Group 2: more than high school. Each effect size described previously was calculated for the first 7 items on each subtest of each assessment. Focus on these 7 items because these are typically used with all examinees, whereas later items are only administered to higher performing or older individuals. Subtests were matched into Catell-Horn-Carroll (CHC) theory based constructs for comparison purposes.

26 Recommendations for interpreting DIF effect size measures Cohen s d: Small ( ), Medium ( ), Large (0.8+) τ 2 : Small (0-0.07), Medium ( ), Large (0.14+) R Δ 2 : Negligible ( ), Moderate ( ), Large (0.07+) SIBTEST: Negligible ( ), Moderate ( ), Large (0.088+)

27 Results: Fluid Intelligence *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b SB Nonverbal Quant Reason (30) * SB Verbal Quant Reason (30) 0.19* 0.19* 0.01* SB Verbal Fluid Reason (13) KABC Pattern Reason (23) * * 0.13 WJ Concept Formation (40) * WJ Analysis Synthesis (35) * *

28 Results: Crystallized Intelligence *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b SB Nonverb Know (30) * KABC Verb Know (90) * KABC Riddle (51) WJ Verbal Comp A (23) 0.08* 0.08* * WJ Verbal Comp B (23) * * WJ Verbal Comp C (15) * WJ Verbal Comp D (18) * WJ General Info (26) * * -0.07

29 Results: Short Term Memory *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b SB Nonverb Work Memory (34) * -0.02* SB Verbal Work Memory (15) KABC Number Recall (22) * KABC Word Order (27) * * 0.10 WJ Numbers Reversed (30) * * WJ Memory for Words (24) 0.27* 0.27* 0.01*

30 Results: Visual Processing *Least amount of DIF for an index Test d u d s R Δ 2 S-T τ 2 τ w 2 b SB Verbal Visual Spatial (30) 0.15* 0.15* 0.01* SB Nonverbal Visual Spatial (22) * * KABC Triangles (25) * KABC Concept Thinking (28) * * KABC Face Recognition (21) * 0.18 WJ Spatial Relations (33) * WJ Picture Recognition (24)

31 Results Auditory Processing *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b WJ Sound Blending (33) 0.13* 0.13* 0.004* 0.06* * WJ Auditory Attention (50) * 0.01* -0.05

32 Results: Processing Speed *Least amount of DIF for an index Test (Items) d u d s 2 R Δ S-T τ 2 2 τ w b WJ Visual Matching 1 (26) * * WJ Visual Matching 2 (26) * WJ Decision Speed (40) 0.27* 0.27* 0.01* 0.37* * -0.05*

33 Conclusions The 5 effect sizes were useful in identifying specific subtests within each CHC factor that displayed the least amount of DIF with respect to mother s education. School psychologists, and others, can use this information to select the instrument that will provide the least biased assessments for target population. These results in combination with the simulation results support these effect sizes.

34 Conclusions The impact of DIF can be conceptualized as group differences in conditional item difficulty, conditional probability of a correct response, conditional performance on the set of items as a whole, or variance in the item responses associated with DIF. The specific results presented here revealed that the KABC-II had either the lowest, or next to lowest amount of DIF for those CHC domains in which it had tests. With regard to τ w 2, the KABC-II was generally the preferred method with this age group and in consideration of parental education level, in terms of the amount of DIF. The SBV tended to exhibit the most DIF across CHC domains. Recommendation: Use KABC-II as the base test and supplement with the WJ-III in a crossbattery assessment strategy.

35 Future Directions Investigate the indices under various conditions and for different types of DIF non-uniform Make these indices easy to obtain in software used for DIF/DBF analysis A standard issue is making new helpful indices easy to obtain and understand to the practitioner.

Because it might not make a big DIF: Assessing differential test functioning

Because it might not make a big DIF: Assessing differential test functioning Because it might not make a big DIF: Assessing differential test functioning David B. Flora R. Philip Chalmers Alyssa Counsell Department of Psychology, Quantitative Methods Area Differential item functioning

More information

Statistical and psychometric methods for measurement: G Theory, DIF, & Linking

Statistical and psychometric methods for measurement: G Theory, DIF, & Linking Statistical and psychometric methods for measurement: G Theory, DIF, & Linking Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course 2 Washington, DC. June 27, 2018

More information

Using DIF to Investigate Strengths and Weaknesses in Mathematics Achievement. Profiles of 10 Different Countries. Enis Dogan

Using DIF to Investigate Strengths and Weaknesses in Mathematics Achievement. Profiles of 10 Different Countries. Enis Dogan Using DIF to Investigate Strengths and Weaknesses in Mathematics Achievement Profiles of 10 Different Countries Enis Dogan Teachers College, Columbia University Anabelle Guerrero Teachers College, Columbia

More information

Comparison of parametric and nonparametric item response techniques in determining differential item functioning in polytomous scale

Comparison of parametric and nonparametric item response techniques in determining differential item functioning in polytomous scale American Journal of Theoretical and Applied Statistics 2014; 3(2): 31-38 Published online March 20, 2014 (http://www.sciencepublishinggroup.com/j/ajtas) doi: 10.11648/j.ajtas.20140302.11 Comparison of

More information

Running head: LOGISTIC REGRESSION FOR DIF DETECTION. Regression Procedure for DIF Detection. Michael G. Jodoin and Mark J. Gierl

Running head: LOGISTIC REGRESSION FOR DIF DETECTION. Regression Procedure for DIF Detection. Michael G. Jodoin and Mark J. Gierl Logistic Regression for DIF Detection 1 Running head: LOGISTIC REGRESSION FOR DIF DETECTION Evaluating Type I Error and Power Using an Effect Size Measure with the Logistic Regression Procedure for DIF

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

Florida State University Libraries

Florida State University Libraries Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2005 Modeling Differential Item Functioning (DIF) Using Multilevel Logistic Regression Models: A Bayesian

More information

Donghoh Kim & Se-Kang Kim

Donghoh Kim & Se-Kang Kim Behav Res (202) 44:239 243 DOI 0.3758/s3428-02-093- Comparing patterns of component loadings: Principal Analysis (PCA) versus Independent Analysis (ICA) in analyzing multivariate non-normal data Donghoh

More information

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006

LINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006 LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master

More information

Data analysis strategies for high dimensional social science data M3 Conference May 2013

Data analysis strategies for high dimensional social science data M3 Conference May 2013 Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional

More information

Item Response Theory and Computerized Adaptive Testing

Item Response Theory and Computerized Adaptive Testing Item Response Theory and Computerized Adaptive Testing Richard C. Gershon, PhD Department of Medical Social Sciences Feinberg School of Medicine Northwestern University gershon@northwestern.edu May 20,

More information

Development and Calibration of an Item Response Model. that Incorporates Response Time

Development and Calibration of an Item Response Model. that Incorporates Response Time Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Model Invariance Testing Under Different Levels of Invariance. W. Holmes Finch Brian F. French

Model Invariance Testing Under Different Levels of Invariance. W. Holmes Finch Brian F. French Model Invariance Testing Under Different Levels of Invariance W. Holmes Finch Brian F. French Measurement Invariance (MI) MI is important. Construct comparability across groups cannot be assumed Must be

More information

A Comparison of Linear and Nonlinear Factor Analysis in Examining the Effect of a Calculator Accommodation on Math Performance

A Comparison of Linear and Nonlinear Factor Analysis in Examining the Effect of a Calculator Accommodation on Math Performance University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 A Comparison of Linear and Nonlinear

More information

IRT Potpourri. Gerald van Belle University of Washington Seattle, WA

IRT Potpourri. Gerald van Belle University of Washington Seattle, WA IRT Potpourri Gerald van Belle University of Washington Seattle, WA Outline. Geometry of information 2. Some simple results 3. IRT and link to sensitivity and specificity 4. Linear model vs IRT model cautions

More information

Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees

Detection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees arxiv:1511.07178v1 [stat.me] 23 Nov 2015 Detection of Uniform and Non-Uniform Differential Functioning by Focussed Trees Moritz Berger & Gerhard Tutz Ludwig-Maximilians-Universität München Akademiestraße

More information

The Discriminating Power of Items That Measure More Than One Dimension

The Discriminating Power of Items That Measure More Than One Dimension The Discriminating Power of Items That Measure More Than One Dimension Mark D. Reckase, American College Testing Robert L. McKinley, Educational Testing Service Determining a correct response to many test

More information

An Overview of Item Response Theory. Michael C. Edwards, PhD

An Overview of Item Response Theory. Michael C. Edwards, PhD An Overview of Item Response Theory Michael C. Edwards, PhD Overview General overview of psychometrics Reliability and validity Different models and approaches Item response theory (IRT) Conceptual framework

More information

Online Supporting Materials

Online Supporting Materials Construct Validity of the WISC V UK 1 Online Supporting Materials 7.00 6.00 Random Data WISC-V UK Standardization Data (6-16) 5.00 Eigenvalue 4.00 3.00 2.00 1.00 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

More information

Georgia State University. Georgia State University. Jihye Kim Georgia State University

Georgia State University. Georgia State University. Jihye Kim Georgia State University Georgia State University ScholarWorks @ Georgia State University Educational Policy Studies Dissertations Department of Educational Policy Studies 10-25-2010 Controlling Type 1 Error Rate in Evaluating

More information

Categorical and Zero Inflated Growth Models

Categorical and Zero Inflated Growth Models Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).

More information

Introduction To Confirmatory Factor Analysis and Item Response Theory

Introduction To Confirmatory Factor Analysis and Item Response Theory Introduction To Confirmatory Factor Analysis and Item Response Theory Lecture 23 May 3, 2005 Applied Regression Analysis Lecture #23-5/3/2005 Slide 1 of 21 Today s Lecture Confirmatory Factor Analysis.

More information

For additional copies write: ACT Research Report Series PO Box 168 Iowa City, Iowa by ACT, Inc. All rights reserved.

For additional copies write: ACT Research Report Series PO Box 168 Iowa City, Iowa by ACT, Inc. All rights reserved. For additional copies write: ACT Research Report Series PO Box 168 Iowa City, Iowa 52243-0168 1997 by ACT, Inc. All rights reserved. Grade Equivalent and IRT Representations of Growth E. Matthew Schulz

More information

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications

Overview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture

More information

UCLA Department of Statistics Papers

UCLA Department of Statistics Papers UCLA Department of Statistics Papers Title Can Interval-level Scores be Obtained from Binary Responses? Permalink https://escholarship.org/uc/item/6vg0z0m0 Author Peter M. Bentler Publication Date 2011-10-25

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

A comparison of two estimation algorithms for Samejima s continuous IRT model

A comparison of two estimation algorithms for Samejima s continuous IRT model Behav Res (2013) 45:54 64 DOI 10.3758/s13428-012-0229-6 A comparison of two estimation algorithms for Samejima s continuous IRT model Cengiz Zopluoglu Published online: 26 June 2012 # Psychonomic Society,

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

Appendix: Estimating the Technology of Children s Skill Formation

Appendix: Estimating the Technology of Children s Skill Formation Appendix: Estimating the Technology of Children s Skill Formation Francesco Agostinelli W.P. Carey School of Business, Arizona State University Matthew Wiswall W.P. Carey School of Business, Arizona State

More information

Section 4. Test-Level Analyses

Section 4. Test-Level Analyses Section 4. Test-Level Analyses Test-level analyses include demographic distributions, reliability analyses, summary statistics, and decision consistency and accuracy. Demographic Distributions All eligible

More information

Ability Metric Transformations

Ability Metric Transformations Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three

More information

Item Response Theory (IRT) Analysis of Item Sets

Item Response Theory (IRT) Analysis of Item Sets University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis

More information

UNIT 2 MEAN, MEDIAN AND MODE

UNIT 2 MEAN, MEDIAN AND MODE Central Tendencies and Dispersion UNIT 2 MEAN, MEDIAN AND MODE Me Structure 2.0 Introduction 2.1 Objectives 2.2 Symbols Used in Calculation of Measures of Central Tendency 2.3 The Arithmetic Mean 2.3.1

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 23 Comparison of Three IRT Linking Procedures in the Random Groups Equating Design Won-Chan Lee Jae-Chun Ban February

More information

Computerized Adaptive Testing With Equated Number-Correct Scoring

Computerized Adaptive Testing With Equated Number-Correct Scoring Computerized Adaptive Testing With Equated Number-Correct Scoring Wim J. van der Linden University of Twente A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to

More information

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools

Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools Dries Debeer & Rianne Janssen (University of Leuven) Johannes Hartig & Janine Buchholz (DIPF)

More information

Walkthrough for Illustrations. Illustration 1

Walkthrough for Illustrations. Illustration 1 Tay, L., Meade, A. W., & Cao, M. (in press). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods. doi: 10.1177/1094428114553062 Walkthrough for Illustrations

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Variation of geospatial thinking in answering geography questions based on topographic maps

Variation of geospatial thinking in answering geography questions based on topographic maps Variation of geospatial thinking in answering geography questions based on topographic maps Yoshiki Wakabayashi*, Yuri Matsui** * Tokyo Metropolitan University ** Itabashi-ku, Tokyo Abstract. This study

More information

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.

Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT. Louisiana State University LSU Digital Commons LSU Historical Dissertations and Theses Graduate School 1998 Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using

More information

Inferential Conditions in the Statistical Detection of Measurement Bias

Inferential Conditions in the Statistical Detection of Measurement Bias Inferential Conditions in the Statistical Detection of Measurement Bias Roger E. Millsap, Baruch College, City University of New York William Meredith, University of California, Berkeley Measurement bias

More information

PSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test

PSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test PSY 216 Assignment 9 Answers 1. Problem 1 from the text Under what circumstances is a t statistic used instead of a z-score for a hypothesis test The t statistic should be used when the population standard

More information

Application of Item Response Theory Models for Intensive Longitudinal Data

Application of Item Response Theory Models for Intensive Longitudinal Data Application of Item Response Theory Models for Intensive Longitudinal Data Don Hedeker, Robin Mermelstein, & Brian Flay University of Illinois at Chicago hedeker@uic.edu Models for Intensive Longitudinal

More information

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore

ABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore ABSTRACT Title of Document: A MIXTURE RASCH MODEL WITH A COVARIATE: A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION Yunyun Dai, Doctor of Philosophy, 2009 Directed By: Professor, Robert

More information

Test Homogeneity The Single-Factor Model. Test Theory Chapter 6 Lecture 9

Test Homogeneity The Single-Factor Model. Test Theory Chapter 6 Lecture 9 Test Homogeneity The Single-Factor Model Test Theory Chapter 6 Lecture 9 Today s Class Test Homogeneity. The Single Factor Model. AKA the Spearman model. Chapter 6. Homework questions? Psych 892 - Test

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Supplemental Tables. Investigating the Theoretical Structure of the Differential Ability Scales Second Edition Through Hierarchical Exploratory

Supplemental Tables. Investigating the Theoretical Structure of the Differential Ability Scales Second Edition Through Hierarchical Exploratory Supplemental Tables Investigating the Theoretical Structure of the Differential Ability Scales Second Edition Through Hierarchical Exploratory Factor Analysis By S. C. Dombrowksi et al., Journal of Psychoeducational

More information

The Difficulty of Test Items That Measure More Than One Ability

The Difficulty of Test Items That Measure More Than One Ability The Difficulty of Test Items That Measure More Than One Ability Mark D. Reckase The American College Testing Program Many test items require more than one ability to obtain a correct response. This article

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory

More information

Dimensionality Assessment: Additional Methods

Dimensionality Assessment: Additional Methods Dimensionality Assessment: Additional Methods In Chapter 3 we use a nonlinear factor analytic model for assessing dimensionality. In this appendix two additional approaches are presented. The first strategy

More information

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models

Whats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models Whats beyond Concerto: An introduction to the R package catr Session 4: Overview of polytomous IRT models The Psychometrics Centre, Cambridge, June 10th, 2014 2 Outline: 1. Introduction 2. General notations

More information

What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015

What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015 What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015 Our course Initial conceptualisation Separation of parameters Specific

More information

Estimating Measures of Pass-Fail Reliability

Estimating Measures of Pass-Fail Reliability Estimating Measures of Pass-Fail Reliability From Parallel Half-Tests David J. Woodruff and Richard L. Sawyer American College Testing Program Two methods are derived for estimating measures of pass-fail

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Stat 315c: Transposable Data Rasch model and friends

Stat 315c: Transposable Data Rasch model and friends Stat 315c: Transposable Data Rasch model and friends Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Rasch and friends 1 / 14 Categorical data analysis Anova has a problem with too much

More information

Rater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT,

Rater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1 Rater agreement - ordinal ratings Methods for analyzing

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

Using SEM to detect measurement bias in dichotomous responses

Using SEM to detect measurement bias in dichotomous responses Using SEM to detect measurement bias in dichotomous responses An application to the measurement of intelligence in higher education S.Jak (), C.V.Dolan (2), C.W.M. van Zoelen (3), F.J. Oort () () Educational

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Two-Sample Inferential Statistics

Two-Sample Inferential Statistics The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is

More information

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp. 295 311 Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

More information

Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D

Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors Jichuan Wang, Ph.D Children s National Health System The George Washington University School of Medicine Washington, DC 1

More information

Classical Test Theory (CTT) for Assessing Reliability and Validity

Classical Test Theory (CTT) for Assessing Reliability and Validity Classical Test Theory (CTT) for Assessing Reliability and Validity Today s Class: Hand-waving at CTT-based assessments of validity CTT-based assessments of reliability Why alpha doesn t really matter CLP

More information

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA

Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA Topics: Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA What are MI and DIF? Testing measurement invariance in CFA Testing differential item functioning in IRT/IFA

More information

Bayesian Nonparametric Rasch Modeling: Methods and Software

Bayesian Nonparametric Rasch Modeling: Methods and Software Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar

More information

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models

A Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models Journal of Educational and Behavioral Statistics 2015, Vol. 40, No. 1, pp. 96 105 DOI: 10.3102/1076998614558122 # 2014 AERA. http://jebs.aera.net A Note on the Equivalence Between Observed and Expected

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters

On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Gerhard Tutz On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Technical Report Number 218, 2018 Department

More information

Statistical and psychometric methods for measurement: Scale development and validation

Statistical and psychometric methods for measurement: Scale development and validation Statistical and psychometric methods for measurement: Scale development and validation Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course Washington, DC. June 11,

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data

The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated

More information

IRT Model Selection Methods for Polytomous Items

IRT Model Selection Methods for Polytomous Items IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running

More information

Mixtures of Rasch Models

Mixtures of Rasch Models Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure) Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value

More information

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Conditional SEMs from OLS, 1 Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Mark R. Raymond and Irina Grabovsky National Board of Medical Examiners

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

ROGER E. MILLSAP INVARIANCE IN MEASUREMENT AND PREDICTION REVISITED. Introduction

ROGER E. MILLSAP INVARIANCE IN MEASUREMENT AND PREDICTION REVISITED. Introduction PSYCHOMETRIKA VOL. 72, NO. 4, 461 473 DECEMBER 2007 DOI: 10.1007/S11336-007-9039-7 INVARIANCE IN MEASUREMENT AND PREDICTION REVISITED ROGER E. MILLSAP ARIZONA STATE UNIVERSITY Borsboom (Psychometrika,

More information

arxiv: v1 [stat.ap] 11 Aug 2014

arxiv: v1 [stat.ap] 11 Aug 2014 Noname manuscript No. (will be inserted by the editor) A multilevel finite mixture item response model to cluster examinees and schools Michela Gnaldi Silvia Bacci Francesco Bartolucci arxiv:1408.2319v1

More information

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010

Summer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010 Summer School in Applied Psychometric Principles Peterhouse College 13 th to 17 th September 2010 1 Two- and three-parameter IRT models. Introducing models for polytomous data. Test information in IRT

More information

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT

SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS by CIGDEM ALAGOZ (Under the Direction of Seock-Ho Kim) ABSTRACT This study applies item response theory methods to the tests combining multiple-choice

More information

Alternative Growth Goals for Students Attending Alternative Education Campuses

Alternative Growth Goals for Students Attending Alternative Education Campuses Alternative Growth Goals for Students Attending Alternative Education Campuses AN ANALYSIS OF NWEA S MAP ASSESSMENT: TECHNICAL REPORT Jody L. Ernst, Ph.D. Director of Research & Evaluation Colorado League

More information

Exponential and Logarithmic Functions. Copyright Cengage Learning. All rights reserved.

Exponential and Logarithmic Functions. Copyright Cengage Learning. All rights reserved. 3 Exponential and Logarithmic Functions Copyright Cengage Learning. All rights reserved. 3.2 Logarithmic Functions and Their Graphs Copyright Cengage Learning. All rights reserved. What You Should Learn

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

AC : GETTING MORE FROM YOUR DATA: APPLICATION OF ITEM RESPONSE THEORY TO THE STATISTICS CONCEPT INVENTORY

AC : GETTING MORE FROM YOUR DATA: APPLICATION OF ITEM RESPONSE THEORY TO THE STATISTICS CONCEPT INVENTORY AC 2007-1783: GETTING MORE FROM YOUR DATA: APPLICATION OF ITEM RESPONSE THEORY TO THE STATISTICS CONCEPT INVENTORY Kirk Allen, Purdue University Kirk Allen is a post-doctoral researcher in Purdue University's

More information

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY Y -3 - -1 0 1 3 X Y -10-5 0 5 10 X Measurement Theory t & X 1 X X 3 X k Reliability e 1 e e 3 e k 1 The Big Picture Measurement error makes it difficult to identify the true patterns of relationships between

More information

AN INVESTIGATION OF THE ALIGNMENT METHOD FOR DETECTING MEASUREMENT NON- INVARIANCE ACROSS MANY GROUPS WITH DICHOTOMOUS INDICATORS

AN INVESTIGATION OF THE ALIGNMENT METHOD FOR DETECTING MEASUREMENT NON- INVARIANCE ACROSS MANY GROUPS WITH DICHOTOMOUS INDICATORS 1 AN INVESTIGATION OF THE ALIGNMENT METHOD FOR DETECTING MEASUREMENT NON- INVARIANCE ACROSS MANY GROUPS WITH DICHOTOMOUS INDICATORS Jessica Flake, Erin Strauts, Betsy McCoach, Jane Rogers, Megan Welsh

More information

In addition to the interactions reported in the main text, we separately

In addition to the interactions reported in the main text, we separately Experiment 3 In addition to the interactions reported in the main text, we separately examined effects of value on list 1 and effects of value on recall averaged across lists 2-8. For list 1, a 2 x 3 (item

More information

The robustness of Rasch true score preequating to violations of model assumptions under equivalent and nonequivalent populations

The robustness of Rasch true score preequating to violations of model assumptions under equivalent and nonequivalent populations University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 2008 The robustness of Rasch true score preequating to violations of model assumptions under equivalent and

More information

Exploring geographic knowledge through mapping

Exploring geographic knowledge through mapping Prairie Perspectives 89 Exploring geographic knowledge through mapping Scott Bell, University of Saskatchewan Abstract: Knowledge about the world is expressed in many ways. Sketch mapping has been a dominant

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

The t-statistic. Student s t Test

The t-statistic. Student s t Test The t-statistic 1 Student s t Test When the population standard deviation is not known, you cannot use a z score hypothesis test Use Student s t test instead Student s t, or t test is, conceptually, very

More information

A Random Effects Model for Effect Sizes

A Random Effects Model for Effect Sizes Psychological Bulletin 1983, Vol. 93, No., 3 8-395 Copyright 1983 by the American Psychological Association, Inc. 0033-909/83/930-0388S00.75 A Random Effects Model for Effect Sizes Larry V. Hedges Department

More information