Methods for the Comparison of DIF across Assessments W. Holmes Finch Maria Hernandez Finch Brian F. French David E. McIntosh
|
|
- Dennis Gray
- 6 years ago
- Views:
Transcription
1 Methods for the Comparison of DIF across Assessments W. Holmes Finch Maria Hernandez Finch Brian F. French David E. McIntosh
2 Impact of DIF on Assessments Individuals administrating assessments must select assessments with the greatest evidence of validity for a given population. A primary component of validity evidence is fairness of assessments across subgroups. Evidence of fairness: A major form is differential item functioning (DIF) analysis.
3 Comparing levels of DIF across assessments Multiple instruments may exist for the same constructs. e.g., Intelligence; Self-efficacy, Learning Styles, Depression Users want to select the assessment that exhibits the least amount of DIF for subgroups. e.g., ethnicity, SES
4 Comparing levels of DIF across assessments DIF methods are effective for flagging DIF. DIF results do not yield easily comparable indices across assessments. Simple comparisons (e.g., # of DIF items across measures), do not provide information about the magnitude of DIF across assessments.
5 Effect size estimates to compare levels of DIF Study Goal: To examine several effect sizes to compare the collective amount of DIF across two or more psychological assessments. Types of effect sizes: Some developed expressly for this purpose Some already exist and we borrow from the literature.
6 Effect Sizes Present 5 different effect sizes: Drawn from different areas from traditional measures (e.g., d) and models (e.g., item response theory).
7 Random effects IRT model-based DIF An IRT model that incorporates a random effect for group membership and any DIF associated with it. P θ = exp 1.7a i θ j b i +Gξ i 1+exp 1.7a i θ j b i +Gξ i Where a i =Discrimination parameter for item i b i =Difficulty parameter for item i θ j =Level on the latent trait for examinee j G =Group membership with focal group=0 and reference group=1 ξ i =DIF effect for item i.
8 Random effects IRT model-based DIF When ξ i = 0 no DIF is present for item i. The log odds ratio, log α MHi, from the Mantel- Haenszel test of DIF is a good estimator of ξ i Camilli & Penfield, 1997 When there is no DIF for any of the items on a scale, the variance of ξ (across all items), τ 2 = 0.
9 Random effects IRT model based DIF I i=1 log α MHi μ 2 I 2 S i τ 2 i=1 = I Where μ =mean of log α MHi across the i items of the measure. S 2 i =Variance of log α MHi for each of the i items of the measure. The weighted estimator is: τ w 2 = Where w i 2 = S i 2 I 2 i=1 w i log αmhi μ 2 I 2 i=1 w i I i=1 w i 2
10 Comparing DIF using the random effects IRT model-based DIF Calculate τ 2 and τ w 2 for each assessment. Values reflect the total variance in the item responses associated with DIF. Calculate the difference for these values for each pair of assessments. Differences departing from 0 suggest varying levels of DIF in the assessments. Select the assessment with least amount of collective DIF for the target variable(s) of interest (e.g., ethnicity, SES).
11 Cohen s d for DIF In the context of meta-analysis, the log odds ratio can be easily converted to Cohen s d. Hasselblad & Hedges, 1995 The log of the odds ratios for each item on a scale obtained using the MH test for DIF can be converted to Cohen s d as: d i = log α MHi 3 π
12 Cohen s d for comparing DIF across assessments Calculate d i for each item; then calculate the mean across items to obtain d. Use absolute values of d i, then we have the unsigned effect size, d u. Use signed values of d i, then we have the signed effect size, d s.
13 Cohen s d The d reflects the average conditional difference in the likelihood of a correct response between the groups of interest. A commonly used and well understood scale.leading to easy interpretation. To ascertain the relative amount of DIF in the scales: Calculate d for each instrument Compare values to determine which has least amount of DIF Make your assessment selection.
14 Logistic regression R Δ 2 for comparing DIF across assessments The change in the variance accounted for in the logistic regression model (R Δ 2 ) for DIF detection group membership (uniform DIF) interaction of group and total score (nonuniform DIF) We propose averaging the R Δ 2 across the items to obtain R Δ 2 A measure of overall DIF in a set of items.
15 Logistic regression R Δ 2 for comparing DIF across assessments The R Δ 2 value shares with Cohen s d the advantage of being on a well known and easily interpretable scale. For comparing the collective DIF across assessments: Calculate R Δ 2 for each assessment Compare and make your assessment selection.
16 DIF effect size-- Steinberg and Thissen The difference in item parameter estimates for two groups can be a measure of DIF effect size. Steinberg &Thissen, 2006 Effect size is intuitive and provides easily interpretable results, particularly for measurement professionals. For uniform DIF, the difference in item difficulty parameters would serve as the effect size of interest.
17 DIF effect size-- Steinberg and Thissen For each item on each assessment, the difference between the reference and focal groups item difficulty values are calculated. The mean of these differences is calculated. The scale with the lower mean of the difference in difficulties is considered to have the least amount of DIF. Make your assessment selection.
18 SIBTEST SIBTEST is an effective tool for assessing both DIF and differential bundle functioning (DBF). The SIBTEST statistic for uniform DBF in a set of items is calculated as: K k=0 β Bundle = p k Y Rk Y Fk Where p k = Proportion of individuals with matching subtest score of k Y Rk = Adjusted mean score for the reference group on the bundle for individuals with matching subtest score k Y Fk = Adjusted mean score for the focal group on the bundle for individuals with matching subtest score k
19 SIBTEST to compare DIF across assessments β Bundle is a measure of the difference in conditional weighted mean performance for the items in a bundle between two groups. This statistic is calculated for each measure The one with the lower value is determined to contain the least amount of overall DIF. Make your assessment selection.
20 Summary of 5 effect size measures for differences in DIF Difference in τ 2 : Reflects the difference in variance in the scales associated with DIF. Difference in d : Reflects the difference in the average conditional likelihood of a correct response across items on the assessments. Difference in R Δ 2 : Reflects the difference in the average conditional proportion of variance in the item responses associated with group membership. Difference in S-T: Reflects the difference in the mean item difficulty differences between the two groups for whom DIF is assessed. Difference in β Bundle : Reflects the difference in the conditional difference in the groups scores on the assessments.
21 Simulation Study Results: Which one works? When assessments are of the same length, the methods are all equally able to detect which assessment contains more DIF. When assessments have different numbers of items, d and R Δ 2 are overly likely to indicate the shorter item contains more DIF. Concluded that we use τ 2, τ w 2 or β Bundle to make comparisons regarding the amount of DIF in two or more scales. Most accurate across a wide variety of conditions.
22 The current study--purpose Compare the amount of DIF on three separate assessments of intelligence that are commonly used by school psychologists. Participants were each given all 3 measures. Evaluate the various effect sizes with real data. Of particular interest was examining DIF associated with mother s education level.
23 Method Sample: 200 preschool children (103 females). Age: 4 years 0 months to 5 years 11 months, with a mean (standard deviation) age of months (5.38). 62% (124) were Caucasian 32% had mothers with a high school education or less.
24 Method The children were administered: Woodcock Johnson-III cognitive assessment battery (WJ-III) Kaufman Assessment Battery for Children-Second Edition (KABC-II) Stanford-Binet Intelligence Scales, Fifth Edition (SBV). All children received all measures: Counterbalancing of administration controlled for order effects.
25 Method Grouping variable: Mother s education level. Group 1: high school or less; Group 2: more than high school. Each effect size described previously was calculated for the first 7 items on each subtest of each assessment. Focus on these 7 items because these are typically used with all examinees, whereas later items are only administered to higher performing or older individuals. Subtests were matched into Catell-Horn-Carroll (CHC) theory based constructs for comparison purposes.
26 Recommendations for interpreting DIF effect size measures Cohen s d: Small ( ), Medium ( ), Large (0.8+) τ 2 : Small (0-0.07), Medium ( ), Large (0.14+) R Δ 2 : Negligible ( ), Moderate ( ), Large (0.07+) SIBTEST: Negligible ( ), Moderate ( ), Large (0.088+)
27 Results: Fluid Intelligence *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b SB Nonverbal Quant Reason (30) * SB Verbal Quant Reason (30) 0.19* 0.19* 0.01* SB Verbal Fluid Reason (13) KABC Pattern Reason (23) * * 0.13 WJ Concept Formation (40) * WJ Analysis Synthesis (35) * *
28 Results: Crystallized Intelligence *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b SB Nonverb Know (30) * KABC Verb Know (90) * KABC Riddle (51) WJ Verbal Comp A (23) 0.08* 0.08* * WJ Verbal Comp B (23) * * WJ Verbal Comp C (15) * WJ Verbal Comp D (18) * WJ General Info (26) * * -0.07
29 Results: Short Term Memory *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b SB Nonverb Work Memory (34) * -0.02* SB Verbal Work Memory (15) KABC Number Recall (22) * KABC Word Order (27) * * 0.10 WJ Numbers Reversed (30) * * WJ Memory for Words (24) 0.27* 0.27* 0.01*
30 Results: Visual Processing *Least amount of DIF for an index Test d u d s R Δ 2 S-T τ 2 τ w 2 b SB Verbal Visual Spatial (30) 0.15* 0.15* 0.01* SB Nonverbal Visual Spatial (22) * * KABC Triangles (25) * KABC Concept Thinking (28) * * KABC Face Recognition (21) * 0.18 WJ Spatial Relations (33) * WJ Picture Recognition (24)
31 Results Auditory Processing *Least amount of DIF for an index Test (Items) d u d s R Δ 2 S-T τ 2 τ w 2 b WJ Sound Blending (33) 0.13* 0.13* 0.004* 0.06* * WJ Auditory Attention (50) * 0.01* -0.05
32 Results: Processing Speed *Least amount of DIF for an index Test (Items) d u d s 2 R Δ S-T τ 2 2 τ w b WJ Visual Matching 1 (26) * * WJ Visual Matching 2 (26) * WJ Decision Speed (40) 0.27* 0.27* 0.01* 0.37* * -0.05*
33 Conclusions The 5 effect sizes were useful in identifying specific subtests within each CHC factor that displayed the least amount of DIF with respect to mother s education. School psychologists, and others, can use this information to select the instrument that will provide the least biased assessments for target population. These results in combination with the simulation results support these effect sizes.
34 Conclusions The impact of DIF can be conceptualized as group differences in conditional item difficulty, conditional probability of a correct response, conditional performance on the set of items as a whole, or variance in the item responses associated with DIF. The specific results presented here revealed that the KABC-II had either the lowest, or next to lowest amount of DIF for those CHC domains in which it had tests. With regard to τ w 2, the KABC-II was generally the preferred method with this age group and in consideration of parental education level, in terms of the amount of DIF. The SBV tended to exhibit the most DIF across CHC domains. Recommendation: Use KABC-II as the base test and supplement with the WJ-III in a crossbattery assessment strategy.
35 Future Directions Investigate the indices under various conditions and for different types of DIF non-uniform Make these indices easy to obtain in software used for DIF/DBF analysis A standard issue is making new helpful indices easy to obtain and understand to the practitioner.
Because it might not make a big DIF: Assessing differential test functioning
Because it might not make a big DIF: Assessing differential test functioning David B. Flora R. Philip Chalmers Alyssa Counsell Department of Psychology, Quantitative Methods Area Differential item functioning
More informationStatistical and psychometric methods for measurement: G Theory, DIF, & Linking
Statistical and psychometric methods for measurement: G Theory, DIF, & Linking Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course 2 Washington, DC. June 27, 2018
More informationUsing DIF to Investigate Strengths and Weaknesses in Mathematics Achievement. Profiles of 10 Different Countries. Enis Dogan
Using DIF to Investigate Strengths and Weaknesses in Mathematics Achievement Profiles of 10 Different Countries Enis Dogan Teachers College, Columbia University Anabelle Guerrero Teachers College, Columbia
More informationComparison of parametric and nonparametric item response techniques in determining differential item functioning in polytomous scale
American Journal of Theoretical and Applied Statistics 2014; 3(2): 31-38 Published online March 20, 2014 (http://www.sciencepublishinggroup.com/j/ajtas) doi: 10.11648/j.ajtas.20140302.11 Comparison of
More informationRunning head: LOGISTIC REGRESSION FOR DIF DETECTION. Regression Procedure for DIF Detection. Michael G. Jodoin and Mark J. Gierl
Logistic Regression for DIF Detection 1 Running head: LOGISTIC REGRESSION FOR DIF DETECTION Evaluating Type I Error and Power Using an Effect Size Measure with the Logistic Regression Procedure for DIF
More informationPIRLS 2016 Achievement Scaling Methodology 1
CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally
More informationAn Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin
Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University
More informationFlorida State University Libraries
Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2005 Modeling Differential Item Functioning (DIF) Using Multilevel Logistic Regression Models: A Bayesian
More informationDonghoh Kim & Se-Kang Kim
Behav Res (202) 44:239 243 DOI 0.3758/s3428-02-093- Comparing patterns of component loadings: Principal Analysis (PCA) versus Independent Analysis (ICA) in analyzing multivariate non-normal data Donghoh
More informationLINKING IN DEVELOPMENTAL SCALES. Michelle M. Langer. Chapel Hill 2006
LINKING IN DEVELOPMENTAL SCALES Michelle M. Langer A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master
More informationData analysis strategies for high dimensional social science data M3 Conference May 2013
Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional
More informationItem Response Theory and Computerized Adaptive Testing
Item Response Theory and Computerized Adaptive Testing Richard C. Gershon, PhD Department of Medical Social Sciences Feinberg School of Medicine Northwestern University gershon@northwestern.edu May 20,
More informationDevelopment and Calibration of an Item Response Model. that Incorporates Response Time
Development and Calibration of an Item Response Model that Incorporates Response Time Tianyou Wang and Bradley A. Hanson ACT, Inc. Send correspondence to: Tianyou Wang ACT, Inc P.O. Box 168 Iowa City,
More informationBasic IRT Concepts, Models, and Assumptions
Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction
More informationEquating Tests Under The Nominal Response Model Frank B. Baker
Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric
More informationA Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts
A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of
More informationModel Invariance Testing Under Different Levels of Invariance. W. Holmes Finch Brian F. French
Model Invariance Testing Under Different Levels of Invariance W. Holmes Finch Brian F. French Measurement Invariance (MI) MI is important. Construct comparability across groups cannot be assumed Must be
More informationA Comparison of Linear and Nonlinear Factor Analysis in Examining the Effect of a Calculator Accommodation on Math Performance
University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2010 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-20-2010 A Comparison of Linear and Nonlinear
More informationIRT Potpourri. Gerald van Belle University of Washington Seattle, WA
IRT Potpourri Gerald van Belle University of Washington Seattle, WA Outline. Geometry of information 2. Some simple results 3. IRT and link to sensitivity and specificity 4. Linear model vs IRT model cautions
More informationDetection of Uniform and Non-Uniform Differential Item Functioning by Item Focussed Trees
arxiv:1511.07178v1 [stat.me] 23 Nov 2015 Detection of Uniform and Non-Uniform Differential Functioning by Focussed Trees Moritz Berger & Gerhard Tutz Ludwig-Maximilians-Universität München Akademiestraße
More informationThe Discriminating Power of Items That Measure More Than One Dimension
The Discriminating Power of Items That Measure More Than One Dimension Mark D. Reckase, American College Testing Robert L. McKinley, Educational Testing Service Determining a correct response to many test
More informationAn Overview of Item Response Theory. Michael C. Edwards, PhD
An Overview of Item Response Theory Michael C. Edwards, PhD Overview General overview of psychometrics Reliability and validity Different models and approaches Item response theory (IRT) Conceptual framework
More informationOnline Supporting Materials
Construct Validity of the WISC V UK 1 Online Supporting Materials 7.00 6.00 Random Data WISC-V UK Standardization Data (6-16) 5.00 Eigenvalue 4.00 3.00 2.00 1.00 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
More informationGeorgia State University. Georgia State University. Jihye Kim Georgia State University
Georgia State University ScholarWorks @ Georgia State University Educational Policy Studies Dissertations Department of Educational Policy Studies 10-25-2010 Controlling Type 1 Error Rate in Evaluating
More informationCategorical and Zero Inflated Growth Models
Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).
More informationIntroduction To Confirmatory Factor Analysis and Item Response Theory
Introduction To Confirmatory Factor Analysis and Item Response Theory Lecture 23 May 3, 2005 Applied Regression Analysis Lecture #23-5/3/2005 Slide 1 of 21 Today s Lecture Confirmatory Factor Analysis.
More informationFor additional copies write: ACT Research Report Series PO Box 168 Iowa City, Iowa by ACT, Inc. All rights reserved.
For additional copies write: ACT Research Report Series PO Box 168 Iowa City, Iowa 52243-0168 1997 by ACT, Inc. All rights reserved. Grade Equivalent and IRT Representations of Growth E. Matthew Schulz
More informationOverview. Multidimensional Item Response Theory. Lecture #12 ICPSR Item Response Theory Workshop. Basics of MIRT Assumptions Models Applications
Multidimensional Item Response Theory Lecture #12 ICPSR Item Response Theory Workshop Lecture #12: 1of 33 Overview Basics of MIRT Assumptions Models Applications Guidance about estimating MIRT Lecture
More informationUCLA Department of Statistics Papers
UCLA Department of Statistics Papers Title Can Interval-level Scores be Obtained from Binary Responses? Permalink https://escholarship.org/uc/item/6vg0z0m0 Author Peter M. Bentler Publication Date 2011-10-25
More informationLesson 7: Item response theory models (part 2)
Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of
More informationA comparison of two estimation algorithms for Samejima s continuous IRT model
Behav Res (2013) 45:54 64 DOI 10.3758/s13428-012-0229-6 A comparison of two estimation algorithms for Samejima s continuous IRT model Cengiz Zopluoglu Published online: 26 June 2012 # Psychonomic Society,
More informationOn the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit
On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison
More informationAppendix: Estimating the Technology of Children s Skill Formation
Appendix: Estimating the Technology of Children s Skill Formation Francesco Agostinelli W.P. Carey School of Business, Arizona State University Matthew Wiswall W.P. Carey School of Business, Arizona State
More informationSection 4. Test-Level Analyses
Section 4. Test-Level Analyses Test-level analyses include demographic distributions, reliability analyses, summary statistics, and decision consistency and accuracy. Demographic Distributions All eligible
More informationAbility Metric Transformations
Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three
More informationItem Response Theory (IRT) Analysis of Item Sets
University of Connecticut DigitalCommons@UConn NERA Conference Proceedings 2011 Northeastern Educational Research Association (NERA) Annual Conference Fall 10-21-2011 Item Response Theory (IRT) Analysis
More informationUNIT 2 MEAN, MEDIAN AND MODE
Central Tendencies and Dispersion UNIT 2 MEAN, MEDIAN AND MODE Me Structure 2.0 Introduction 2.1 Objectives 2.2 Symbols Used in Calculation of Measures of Central Tendency 2.3 The Arithmetic Mean 2.3.1
More informationCenter for Advanced Studies in Measurement and Assessment. CASMA Research Report
Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 23 Comparison of Three IRT Linking Procedures in the Random Groups Equating Design Won-Chan Lee Jae-Chun Ban February
More informationComputerized Adaptive Testing With Equated Number-Correct Scoring
Computerized Adaptive Testing With Equated Number-Correct Scoring Wim J. van der Linden University of Twente A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to
More informationModeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools
Modeling differences in itemposition effects in the PISA 2009 reading assessment within and between schools Dries Debeer & Rianne Janssen (University of Leuven) Johannes Hartig & Janine Buchholz (DIPF)
More informationWalkthrough for Illustrations. Illustration 1
Tay, L., Meade, A. W., & Cao, M. (in press). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods. doi: 10.1177/1094428114553062 Walkthrough for Illustrations
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationVariation of geospatial thinking in answering geography questions based on topographic maps
Variation of geospatial thinking in answering geography questions based on topographic maps Yoshiki Wakabayashi*, Yuri Matsui** * Tokyo Metropolitan University ** Itabashi-ku, Tokyo Abstract. This study
More informationLogistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using Logistic Regression in IRT.
Louisiana State University LSU Digital Commons LSU Historical Dissertations and Theses Graduate School 1998 Logistic Regression and Item Response Theory: Estimation Item and Ability Parameters by Using
More informationInferential Conditions in the Statistical Detection of Measurement Bias
Inferential Conditions in the Statistical Detection of Measurement Bias Roger E. Millsap, Baruch College, City University of New York William Meredith, University of California, Berkeley Measurement bias
More informationPSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test
PSY 216 Assignment 9 Answers 1. Problem 1 from the text Under what circumstances is a t statistic used instead of a z-score for a hypothesis test The t statistic should be used when the population standard
More informationApplication of Item Response Theory Models for Intensive Longitudinal Data
Application of Item Response Theory Models for Intensive Longitudinal Data Don Hedeker, Robin Mermelstein, & Brian Flay University of Illinois at Chicago hedeker@uic.edu Models for Intensive Longitudinal
More informationABSTRACT. Yunyun Dai, Doctor of Philosophy, Mixtures of item response theory models have been proposed as a technique to explore
ABSTRACT Title of Document: A MIXTURE RASCH MODEL WITH A COVARIATE: A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION Yunyun Dai, Doctor of Philosophy, 2009 Directed By: Professor, Robert
More informationTest Homogeneity The Single-Factor Model. Test Theory Chapter 6 Lecture 9
Test Homogeneity The Single-Factor Model Test Theory Chapter 6 Lecture 9 Today s Class Test Homogeneity. The Single Factor Model. AKA the Spearman model. Chapter 6. Homework questions? Psych 892 - Test
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationSupplemental Tables. Investigating the Theoretical Structure of the Differential Ability Scales Second Edition Through Hierarchical Exploratory
Supplemental Tables Investigating the Theoretical Structure of the Differential Ability Scales Second Edition Through Hierarchical Exploratory Factor Analysis By S. C. Dombrowksi et al., Journal of Psychoeducational
More informationThe Difficulty of Test Items That Measure More Than One Ability
The Difficulty of Test Items That Measure More Than One Ability Mark D. Reckase The American College Testing Program Many test items require more than one ability to obtain a correct response. This article
More informationIntroduction to Factor Analysis
to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory
More informationDimensionality Assessment: Additional Methods
Dimensionality Assessment: Additional Methods In Chapter 3 we use a nonlinear factor analytic model for assessing dimensionality. In this appendix two additional approaches are presented. The first strategy
More informationWhats beyond Concerto: An introduction to the R package catr. Session 4: Overview of polytomous IRT models
Whats beyond Concerto: An introduction to the R package catr Session 4: Overview of polytomous IRT models The Psychometrics Centre, Cambridge, June 10th, 2014 2 Outline: 1. Introduction 2. General notations
More informationWhat Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015
What Rasch did: the mathematical underpinnings of the Rasch model. Alex McKee, PhD. 9th Annual UK Rasch User Group Meeting, 20/03/2015 Our course Initial conceptualisation Separation of parameters Specific
More informationEstimating Measures of Pass-Fail Reliability
Estimating Measures of Pass-Fail Reliability From Parallel Half-Tests David J. Woodruff and Richard L. Sawyer American College Testing Program Two methods are derived for estimating measures of pass-fail
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,
More informationStat 315c: Transposable Data Rasch model and friends
Stat 315c: Transposable Data Rasch model and friends Art B. Owen Stanford Statistics Art B. Owen (Stanford Statistics) Rasch and friends 1 / 14 Categorical data analysis Anova has a problem with too much
More informationRater agreement - ordinal ratings. Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT,
Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of Copenhagen NORDSTAT, 2012 http://biostat.ku.dk/~kach/ 1 Rater agreement - ordinal ratings Methods for analyzing
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationUsing SEM to detect measurement bias in dichotomous responses
Using SEM to detect measurement bias in dichotomous responses An application to the measurement of intelligence in higher education S.Jak (), C.V.Dolan (2), C.W.M. van Zoelen (3), F.J. Oort () () Educational
More informationIntroduction to Factor Analysis
to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationTwo-Sample Inferential Statistics
The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationMaking the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring
Journal of Educational and Behavioral Statistics Fall 2005, Vol. 30, No. 3, pp. 295 311 Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring
More informationApplication of Plausible Values of Latent Variables to Analyzing BSI-18 Factors. Jichuan Wang, Ph.D
Application of Plausible Values of Latent Variables to Analyzing BSI-18 Factors Jichuan Wang, Ph.D Children s National Health System The George Washington University School of Medicine Washington, DC 1
More informationClassical Test Theory (CTT) for Assessing Reliability and Validity
Classical Test Theory (CTT) for Assessing Reliability and Validity Today s Class: Hand-waving at CTT-based assessments of validity CTT-based assessments of reliability Why alpha doesn t really matter CLP
More informationMeasurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA
Topics: Measurement Invariance (MI) in CFA and Differential Item Functioning (DIF) in IRT/IFA What are MI and DIF? Testing measurement invariance in CFA Testing differential item functioning in IRT/IFA
More informationBayesian Nonparametric Rasch Modeling: Methods and Software
Bayesian Nonparametric Rasch Modeling: Methods and Software George Karabatsos University of Illinois-Chicago Keynote talk Friday May 2, 2014 (9:15-10am) Ohio River Valley Objective Measurement Seminar
More informationA Note on the Equivalence Between Observed and Expected Information Functions With Polytomous IRT Models
Journal of Educational and Behavioral Statistics 2015, Vol. 40, No. 1, pp. 96 105 DOI: 10.3102/1076998614558122 # 2014 AERA. http://jebs.aera.net A Note on the Equivalence Between Observed and Expected
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationIntroduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data
Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington
More informationOn the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters
Gerhard Tutz On the Construction of Adjacent Categories Latent Trait Models from Binary Variables, Motivating Processes and the Interpretation of Parameters Technical Report Number 218, 2018 Department
More informationStatistical and psychometric methods for measurement: Scale development and validation
Statistical and psychometric methods for measurement: Scale development and validation Andrew Ho, Harvard Graduate School of Education The World Bank, Psychometrics Mini Course Washington, DC. June 11,
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationThe Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated Data
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 20, Dublin (Session CPS008) p.6049 The Factor Analytic Method for Item Calibration under Item Response Theory: A Comparison Study Using Simulated
More informationIRT Model Selection Methods for Polytomous Items
IRT Model Selection Methods for Polytomous Items Taehoon Kang University of Wisconsin-Madison Allan S. Cohen University of Georgia Hyun Jung Sung University of Wisconsin-Madison March 11, 2005 Running
More informationMixtures of Rasch Models
Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters
More informationSampling Distributions: Central Limit Theorem
Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)
More informationPrevious lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)
Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value
More informationConditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression
Conditional SEMs from OLS, 1 Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Mark R. Raymond and Irina Grabovsky National Board of Medical Examiners
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationROGER E. MILLSAP INVARIANCE IN MEASUREMENT AND PREDICTION REVISITED. Introduction
PSYCHOMETRIKA VOL. 72, NO. 4, 461 473 DECEMBER 2007 DOI: 10.1007/S11336-007-9039-7 INVARIANCE IN MEASUREMENT AND PREDICTION REVISITED ROGER E. MILLSAP ARIZONA STATE UNIVERSITY Borsboom (Psychometrika,
More informationarxiv: v1 [stat.ap] 11 Aug 2014
Noname manuscript No. (will be inserted by the editor) A multilevel finite mixture item response model to cluster examinees and schools Michela Gnaldi Silvia Bacci Francesco Bartolucci arxiv:1408.2319v1
More informationSummer School in Applied Psychometric Principles. Peterhouse College 13 th to 17 th September 2010
Summer School in Applied Psychometric Principles Peterhouse College 13 th to 17 th September 2010 1 Two- and three-parameter IRT models. Introducing models for polytomous data. Test information in IRT
More informationSCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS CIGDEM ALAGOZ. (Under the Direction of Seock-Ho Kim) ABSTRACT
SCORING TESTS WITH DICHOTOMOUS AND POLYTOMOUS ITEMS by CIGDEM ALAGOZ (Under the Direction of Seock-Ho Kim) ABSTRACT This study applies item response theory methods to the tests combining multiple-choice
More informationAlternative Growth Goals for Students Attending Alternative Education Campuses
Alternative Growth Goals for Students Attending Alternative Education Campuses AN ANALYSIS OF NWEA S MAP ASSESSMENT: TECHNICAL REPORT Jody L. Ernst, Ph.D. Director of Research & Evaluation Colorado League
More informationExponential and Logarithmic Functions. Copyright Cengage Learning. All rights reserved.
3 Exponential and Logarithmic Functions Copyright Cengage Learning. All rights reserved. 3.2 Logarithmic Functions and Their Graphs Copyright Cengage Learning. All rights reserved. What You Should Learn
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationAC : GETTING MORE FROM YOUR DATA: APPLICATION OF ITEM RESPONSE THEORY TO THE STATISTICS CONCEPT INVENTORY
AC 2007-1783: GETTING MORE FROM YOUR DATA: APPLICATION OF ITEM RESPONSE THEORY TO THE STATISTICS CONCEPT INVENTORY Kirk Allen, Purdue University Kirk Allen is a post-doctoral researcher in Purdue University's
More informationMeasurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY
Y -3 - -1 0 1 3 X Y -10-5 0 5 10 X Measurement Theory t & X 1 X X 3 X k Reliability e 1 e e 3 e k 1 The Big Picture Measurement error makes it difficult to identify the true patterns of relationships between
More informationAN INVESTIGATION OF THE ALIGNMENT METHOD FOR DETECTING MEASUREMENT NON- INVARIANCE ACROSS MANY GROUPS WITH DICHOTOMOUS INDICATORS
1 AN INVESTIGATION OF THE ALIGNMENT METHOD FOR DETECTING MEASUREMENT NON- INVARIANCE ACROSS MANY GROUPS WITH DICHOTOMOUS INDICATORS Jessica Flake, Erin Strauts, Betsy McCoach, Jane Rogers, Megan Welsh
More informationIn addition to the interactions reported in the main text, we separately
Experiment 3 In addition to the interactions reported in the main text, we separately examined effects of value on list 1 and effects of value on recall averaged across lists 2-8. For list 1, a 2 x 3 (item
More informationThe robustness of Rasch true score preequating to violations of model assumptions under equivalent and nonequivalent populations
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 2008 The robustness of Rasch true score preequating to violations of model assumptions under equivalent and
More informationExploring geographic knowledge through mapping
Prairie Perspectives 89 Exploring geographic knowledge through mapping Scott Bell, University of Saskatchewan Abstract: Knowledge about the world is expressed in many ways. Sketch mapping has been a dominant
More informationChained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data
Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification
More informationThe t-statistic. Student s t Test
The t-statistic 1 Student s t Test When the population standard deviation is not known, you cannot use a z score hypothesis test Use Student s t test instead Student s t, or t test is, conceptually, very
More informationA Random Effects Model for Effect Sizes
Psychological Bulletin 1983, Vol. 93, No., 3 8-395 Copyright 1983 by the American Psychological Association, Inc. 0033-909/83/930-0388S00.75 A Random Effects Model for Effect Sizes Larry V. Hedges Department
More information