VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES. T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service

Size: px
Start display at page:

Download "VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES. T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service"

Transcription

1 ~ E S [ B A U ~ L t L H E TI VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES RB-68-7 N T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service This Bulletin is a draft for interoffice circulation. Corrections and suggestions for revision are solicited. The Bulletin should not be cited as a reference without the specific permission of the authors. It is automatically superseded upon formal publication of the material. Educational Testing Service Princeton, New Jersey February 1968

2 Variability of Kuder-Richardson Formula 20 Reliability Estimates Abstract The standard error of a Kuder-Richardson Formula 20 reliability coefficient is derived and two approximations to it are presented. The values from the exact solutions and from the approximations are compared with empirical values.

3 Variability of Kuder-Richardson Formula 20 Reliability Estimates There are a number of different methods of estimating the reliability of a test; the more commonly used are the parallel-form correlation and internal-consistency measures such as the Kuder-Richardson formula 20 reliability coefficient. Each of these reliability estimates is based on particular assumptions and has a different interpretation; the choice of a reliability estimate in a given situation should be dictated by the interpretation. r~evertheless, it is instructive to consider the standard errors of these indices or the sampling standard deviations of the inqices under Type I Type I examinees: sampling fluctuation. sampling was used by Lord (1955) to refer to sampling of the same test is administered to a large number of separate groups of examinees, each group being a random sample from a population of examinees. Under Type I sampling the standard error of a parallel-test correlation is well known: P xx (1) where r is the observed correlation between two parallel tests, Pxx xx is the correlation between the parallel tests in the population, N is the number of persons in the sample. Feldt (1965) has derived an approximation to the sampling distribution of r 20, but does not state the standard error explicitly, Lord (1955) gives an explicit formula for the standard error under Type II (sampling of items) but not Type I sampling (sampling of persons)"

4 -2- Reliability can be defined as, where is the variance of the true scores and (J2 X is the variance 0f the observed ~cores. In deriving the KR-20 formula from the analysis-of-variance model (Hoyt, 1941; Feldt, 1965), the score of person p (p = 1,...N) on the item i (i ::: 1,...K) is represented as X. ::: M + A. + t + e. pl l P pl where M::: grand mean, Ai ::: the score component due to the difficulty of item i t p the item true score for person p, and e. ::: error for item :i. and person p, pl The item errors, e., are assumed to be normally and independently displ tributed with zero means and ~ommon variance (J2 The item true score, e t, is assumed to have a normal distribution with variance p ~. The test true score, T :::: Kt, and test error score, p are then normally distributed with variances E:::L:e., i pl. and 2 2 (J K o' E e

5 -3- Reliability is then defined in terms of item parameters as p '02 t The variance of the true score component, t,is estimated by p MS p - MS 1P K ; the variance of the item error score by I~Ip/K and the reliability by MS 1P MS p where MS 1P is the mean square for the items-by-persons interaction, and MS p is the mean square for persons. The expected values of the mean squares are: =C? e and The population covariance of MS 1P and NS p is zero, The sampling distributions of the mean squares are known: (N - 1) MS p 2 2 CT +KCT e t is distributed as chi-square...,ith (N - 1) degrees of freedom and (N - 1) (K - 1) MS 1P c? e is distributed as chi-square with (N - 1) (K - 1) degrees of freedom.

6 -4- Since these two chi-square variates are independent, the ratio 1 - r P is distributed as a central F with (N - l)(k - 1) and (N - 1) degrees of freedom. The variance of r 20 can then be ITitten c = (1 _ )2 2(N - l)((n - 3) + (N - l)(k - 1)) P (K - l)(n - 3)2(N - 5), and =(l-p) PCN-l)[(N-3)+(N-l)(K-l») /- (K - l)(n - 3)2(N - 5) (2) An approximation to this variance is obtained by considering the variance of a ratio of the two chi-square variates. Since the variance of a chi-square distribution is equal to tttlice the number of degrees of freedom, the sampling variances of the mean squares are: 2 4 Var (MS 1P) = (N - l)(k - 1) ~e and The variance of the ratio of two random variables, X 1 /X 2, is approximately equal to (see Kendall & Stuart, 1958, p. 232, Eq ): 2 cov

7 -5- The standard error of is then: ( ).: ( ) I 2K S. E. r P ~17(I~~-'"'="1"'T')+'(K:-=----:-l"'<'") A still cruder approximation S. E. (r 20 ) - (1- p)~ (4 ) is obtained by assuming that K - 1 is approximately equal to K. For tests of typical length, formula (4) will give results similar to those of formula (3), Baker (1962) conducted an empirical study of sampling distribution of some common test analysis statistics, including the KR-20 coefficient. Using a population of 747 answer sheets of an So-item test, Baker drew 200 Type I samples, with replacement, for each of four different sample sizes (N = 15, 30, 60, 120). The resulting standard deviations of the observed sample KR-20 coefficients are reported in Table 1 along with the theoretical standard errors obtained by using formulas (2), (3), and (4). As can be seen, there is a fairly close agreement among the sets of values which improves as the sampling s j_ze increases Insert Table 1 about here It is of some interest to note that for a parallel-test correlation with the same population value (p = <906) the theoretical standard xx errors for the four cases in Table 1 are.044,.031,.022, and.016 for an N of 15, 30, 60, and 120 respectively~ These values are all higher

8 /' -0- than the corresponding standard errors of P20 except for the smal.lest N (N = 15) A comparison of formulas (1) and (2) indicates that for the standard error of I rill generally be smaller than the standard error of for values of p, N, and K that are apt to be encountered in practice. It should be noted that the standard. test errors given above should not be used for confidence limits. The sampling distribution of the sample reliability coefficient is skevred and it is a biased estimator of the population value.

9 -7- References Baker, F. B. Empirical determination of sampling distribution of item discrimination indices and a reliability coefficient. Wisconsin: University of Wisconsin, Madison, Feldt, Lo S. The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty, Psychometrika, 1965, 30, HOJ~' C. Test reliability estimated by analysis of variance. Psychometrika, 1941, ~, Kendall, M. G. & stuart, A. Advanced theory of statistics, Vol. 1. London: Charles Griffin & Co., Ltd., Lord, F. M. Sampling fluctuations resulting from the sampling of test items. Psychometrika, 1955, g, 1-22

10 -8- Table 1 Empirical and Theoretical Standard Errors of Kuder-Richardson Formula 20 for an &)-Item Test with P20 =.906. Theoretical Results Sample Empirical Size Results a Formula 2 Formula 3 :F'ormula :;0.028, ) , aempirical results are based on 200 samples from a finite population of 747 test scores (Baker, 1962).

A TEST OF SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED PROPORTIONS. John A. Keats

A TEST OF SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED PROPORTIONS. John A. Keats ~ E S E B A U ~ L t L I-i E TI RB-55-20 A TEST OF SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED PROPORTIONS John A. Keats N This Bulletin is a draft for interoffice circulation. Corrections and suggestions

More information

PROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS. M. Clemens Johnson

PROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS. M. Clemens Johnson RB-55-22 ~ [ s [ B A U R L t L Ii [ T I N PROBABILITIES OF MISCLASSIFICATION IN DISCRIMINATORY ANALYSIS M. Clemens Johnson This Bulletin is a draft for interoffice circulation. Corrections and suggestions

More information

AN INDEX OF THE DISCRIMINATING POWER OF A TEST. Richard Levine and Frederic M. Lord

AN INDEX OF THE DISCRIMINATING POWER OF A TEST. Richard Levine and Frederic M. Lord RB-58-13 ~ [ s [ B A U ~ L t L I-t [ T I N AN INDEX OF THE DISCRIMINATING POWER OF A TEST AT DIFFERENT PARTS OF THE SCORE RANGE Richard Levine and Frederic M. Lord This Bulletin is a draft for interoffice

More information

Two Measurement Procedures

Two Measurement Procedures Test of the Hypothesis That the Intraclass Reliability Coefficient is the Same for Two Measurement Procedures Yousef M. Alsawalmeh, Yarmouk University Leonard S. Feldt, University of lowa An approximate

More information

SAMPLE IS USED IN A NEW SAMPLE

SAMPLE IS USED IN A NEW SAMPLE EB 50-40 ~ E S E B A U R L t L Ii E TI EFFICIENCY OF PBEDICTION WHEN A REGRESSION EQUATION FROM ONE SAMPLE IS USED IN A NEW SAMPLE Frederic M. Lord (Prepublication draft) N ~----'-- This Bulletin is a

More information

Reliability Coefficients

Reliability Coefficients Testing the Equality of Two Related Intraclass Reliability Coefficients Yousef M. Alsawaimeh, Yarmouk University Leonard S. Feldt, University of lowa An approximate statistical test of the equality of

More information

CORRELATIONS ~ PARTIAL REGRESSION COEFFICIENTS (GROWTH STUDY PAPER #29) and. Charles E. Werts

CORRELATIONS ~ PARTIAL REGRESSION COEFFICIENTS (GROWTH STUDY PAPER #29) and. Charles E. Werts RB-69-6 ASSUMPTIONS IN MAKING CAUSAL INFERENCES FROM PART CORRELATIONS ~ PARTIAL CORRELATIONS AND PARTIAL REGRESSION COEFFICIENTS (GROWTH STUDY PAPER #29) Robert L. Linn and Charles E. Werts This Bulletin

More information

Ability Metric Transformations

Ability Metric Transformations Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three

More information

A Use of the Information Function in Tailored Testing

A Use of the Information Function in Tailored Testing A Use of the Information Function in Tailored Testing Fumiko Samejima University of Tennessee for indi- Several important and useful implications in latent trait theory, with direct implications vidualized

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Sequential Reliability Tests Mindert H. Eiting University of Amsterdam

Sequential Reliability Tests Mindert H. Eiting University of Amsterdam Sequential Reliability Tests Mindert H. Eiting University of Amsterdam Sequential tests for a stepped-up reliability estimator and coefficient alpha are developed. In a series of monte carlo experiments,

More information

STATISTICAL INFERENCES ABOUT THE ERROR VARIANCE. Walter Kristof

STATISTICAL INFERENCES ABOUT THE ERROR VARIANCE. Walter Kristof ~ [ s RB-62-2l ~~ ~ L c L H [ T I N STATISTICAL INFERENCES ABOUT THE ERROR VARIANCE Walter Kristof This Bulletin is a draft for interoffice circulation Corrections and suggestions for revision are solicited.

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 37 Effects of the Number of Common Items on Equating Precision and Estimates of the Lower Bound to the Number of Common

More information

The Discriminating Power of Items That Measure More Than One Dimension

The Discriminating Power of Items That Measure More Than One Dimension The Discriminating Power of Items That Measure More Than One Dimension Mark D. Reckase, American College Testing Robert L. McKinley, Educational Testing Service Determining a correct response to many test

More information

FOR THE HYFOTHES IS THAT TWO VARIABLES MEASURE THE SAME TRAIT EXCEPT FOR ERRORS OF MEASUREMENT. Frederic M. Lord

FOR THE HYFOTHES IS THAT TWO VARIABLES MEASURE THE SAME TRAIT EXCEPT FOR ERRORS OF MEASUREMENT. Frederic M. Lord RB-56-9 ~ ES [ B AU R L t L H E TI A SIGNIFICANCE TEST FOR THE HYFOTHES IS THAT TWO VARIABLES MEASURE THE SAME TRAIT EXCEPT FOR ERRORS OF MEASUREMENT Frederic M. Lord N This Bulletin is a draft for interoffice

More information

T. Mark Beasley One-Way Repeated Measures ANOVA handout

T. Mark Beasley One-Way Repeated Measures ANOVA handout T. Mark Beasley One-Way Repeated Measures ANOVA handout Profile Analysis Example In the One-Way Repeated Measures ANOVA, two factors represent separate sources of variance. Their interaction presents an

More information

Latent Trait Reliability

Latent Trait Reliability Latent Trait Reliability Lecture #7 ICPSR Item Response Theory Workshop Lecture #7: 1of 66 Lecture Overview Classical Notions of Reliability Reliability with IRT Item and Test Information Functions Concepts

More information

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data

Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Research Report Chained Versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data Gautam Puhan February 2 ETS RR--6 Listening. Learning. Leading. Chained Versus Post-Stratification

More information

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY

Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY Y -3 - -1 0 1 3 X Y -10-5 0 5 10 X Measurement Theory t & X 1 X X 3 X k Reliability e 1 e e 3 e k 1 The Big Picture Measurement error makes it difficult to identify the true patterns of relationships between

More information

A Note on the Choice of an Anchor Test in Equating

A Note on the Choice of an Anchor Test in Equating Research Report ETS RR 12-14 A Note on the Choice of an Anchor Test in Equating Sandip Sinharay Shelby Haberman Paul Holland Charles Lewis September 2012 ETS Research Report Series EIGNOR EXECUTIVE EDITOR

More information

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Conditional SEMs from OLS, 1 Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Mark R. Raymond and Irina Grabovsky National Board of Medical Examiners

More information

PROGRAM STATISTICS RESEARCH

PROGRAM STATISTICS RESEARCH An Alternate Definition of the ETS Delta Scale of Item Difficulty Paul W. Holland and Dorothy T. Thayer @) PROGRAM STATISTICS RESEARCH TECHNICAL REPORT NO. 85..64 EDUCATIONAL TESTING SERVICE PRINCETON,

More information

Lesson 6: Reliability

Lesson 6: Reliability Lesson 6: Reliability Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences NMST 570, December 12, 2017 Dec 19, 2017 1/35 Contents 1. Introduction

More information

Group Dependence of Some Reliability

Group Dependence of Some Reliability Group Dependence of Some Reliability Indices for astery Tests D. R. Divgi Syracuse University Reliability indices for mastery tests depend not only on true-score variance but also on mean and cutoff scores.

More information

Style Insights DISC, English version 2006.g

Style Insights DISC, English version 2006.g To: From:. Style Insights DISC, English version 2006.g Bill Bonnstetter Target Training International, Ltd. www.documentingexcellence.com Date: 12 May 2006 www.documentingexcellence.com 445 S. Julian St,

More information

Classical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD

Classical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD Cal State Northridge Psy 30 Andrew Ainsworth, PhD Basics of Classical Test Theory Theory and Assumptions Types of Reliability Example Classical Test Theory Classical Test Theory (CTT) often called the

More information

The Influence Function of the Correlation Indexes in a Two-by-Two Table *

The Influence Function of the Correlation Indexes in a Two-by-Two Table * Applied Mathematics 014 5 3411-340 Published Online December 014 in SciRes http://wwwscirporg/journal/am http://dxdoiorg/10436/am01451318 The Influence Function of the Correlation Indexes in a Two-by-Two

More information

GRAPHICAL REPRESENTATION OF CORRELATION ANALYSIS OF ORDERED DATA BY LINKED VECTOR PATTERN

GRAPHICAL REPRESENTATION OF CORRELATION ANALYSIS OF ORDERED DATA BY LINKED VECTOR PATTERN Journ. Japan Statist. Soc. 6. 2. 1976. 17 `25 GRAPHICAL REPRESENTATION OF CORRELATION ANALYSIS OF ORDERED DATA BY LINKED VECTOR PATTERN Masaaki Taguri*, Makoto Hiramatsu**, Tomoyoshi Kittaka** and Kazumasa

More information

Nonequivalent-Populations Design David J. Woodruff American College Testing Program

Nonequivalent-Populations Design David J. Woodruff American College Testing Program A Comparison of Three Linear Equating Methods for the Common-Item Nonequivalent-Populations Design David J. Woodruff American College Testing Program Three linear equating methods for the common-item nonequivalent-populations

More information

Concept of Reliability

Concept of Reliability Concept of Reliability 1 The concept of reliability is of the consistency or precision of a measure Weight example Reliability varies along a continuum, measures are reliable to a greater or lesser extent

More information

Clarifying the concepts of reliability, validity and generalizability

Clarifying the concepts of reliability, validity and generalizability Clarifying the concepts of reliability, validity and generalizability Maria Valaste 1 and Lauri Tarkkonen 2 1 University of Helsinki, Finland e-mail: maria.valaste@helsinki.fi 2 University of Helsinki,

More information

RR R E E A H R E P DENOTING THE BASE FREE MEASURE OF CHANGE. Samuel Messick. Educational Testing Service Princeton, New Jersey December 1980

RR R E E A H R E P DENOTING THE BASE FREE MEASURE OF CHANGE. Samuel Messick. Educational Testing Service Princeton, New Jersey December 1980 RR 80 28 R E 5 E A RC H R E P o R T DENOTING THE BASE FREE MEASURE OF CHANGE Samuel Messick Educational Testing Service Princeton, New Jersey December 1980 DENOTING THE BASE-FREE MEASURE OF CHANGE Samuel

More information

Research on Standard Errors of Equating Differences

Research on Standard Errors of Equating Differences Research Report Research on Standard Errors of Equating Differences Tim Moses Wenmin Zhang November 2010 ETS RR-10-25 Listening. Learning. Leading. Research on Standard Errors of Equating Differences Tim

More information

Mixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755

Mixed- Model Analysis of Variance. Sohad Murrar & Markus Brauer. University of Wisconsin- Madison. Target Word Count: Actual Word Count: 2755 Mixed- Model Analysis of Variance Sohad Murrar & Markus Brauer University of Wisconsin- Madison The SAGE Encyclopedia of Educational Research, Measurement and Evaluation Target Word Count: 3000 - Actual

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. A Multinomial Error Model for Tests with Polytomous Items

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. A Multinomial Error Model for Tests with Polytomous Items Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 1 for Tests with Polytomous Items Won-Chan Lee January 2 A previous version of this paper was presented at the Annual

More information

Equating Tests Under The Nominal Response Model Frank B. Baker

Equating Tests Under The Nominal Response Model Frank B. Baker Equating Tests Under The Nominal Response Model Frank B. Baker University of Wisconsin Under item response theory, test equating involves finding the coefficients of a linear transformation of the metric

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at A Note on the Efficiency of Least-Squares Estimates Author(s): D. R. Cox and D. V. Hinkley Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 30, No. 2 (1968), pp. 284-289

More information

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model Rand R. Wilcox University of Southern California Based on recently published papers, it might be tempting

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

Lesson 7: Item response theory models (part 2)

Lesson 7: Item response theory models (part 2) Lesson 7: Item response theory models (part 2) Patrícia Martinková Department of Statistical Modelling Institute of Computer Science, Czech Academy of Sciences Institute for Research and Development of

More information

A Comparison of Bivariate Smoothing Methods in Common-Item Equipercentile Equating

A Comparison of Bivariate Smoothing Methods in Common-Item Equipercentile Equating A Comparison of Bivariate Smoothing Methods in Common-Item Equipercentile Equating Bradley A. Hanson American College Testing The effectiveness of smoothing the bivariate distributions of common and noncommon

More information

Linear Equating Models for the Common-item Nonequivalent-Populations Design Michael J. Kolen and Robert L. Brennan American College Testing Program

Linear Equating Models for the Common-item Nonequivalent-Populations Design Michael J. Kolen and Robert L. Brennan American College Testing Program Linear Equating Models for the Common-item Nonequivalent-Populations Design Michael J. Kolen Robert L. Brennan American College Testing Program The Tucker Levine equally reliable linear meth- in the common-item

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 24 in Relation to Measurement Error for Mixed Format Tests Jae-Chun Ban Won-Chan Lee February 2007 The authors are

More information

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions

A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions A Marginal Maximum Likelihood Procedure for an IRT Model with Single-Peaked Response Functions Cees A.W. Glas Oksana B. Korobko University of Twente, the Netherlands OMD Progress Report 07-01. Cees A.W.

More information

A Cautionary Note on the Use of LISREL s Automatic Start Values in Confirmatory Factor Analysis Studies R. L. Brown University of Wisconsin

A Cautionary Note on the Use of LISREL s Automatic Start Values in Confirmatory Factor Analysis Studies R. L. Brown University of Wisconsin A Cautionary Note on the Use of LISREL s Automatic Start Values in Confirmatory Factor Analysis Studies R. L. Brown University of Wisconsin The accuracy of parameter estimates provided by the major computer

More information

BIOL 4605/7220 CH 20.1 Correlation

BIOL 4605/7220 CH 20.1 Correlation BIOL 4605/70 CH 0. Correlation GPT Lectures Cailin Xu November 9, 0 GLM: correlation Regression ANOVA Only one dependent variable GLM ANCOVA Multivariate analysis Multiple dependent variables (Correlation)

More information

Trend analysis of fire season length and extreme fire weather in North America between 1979 and 2015

Trend analysis of fire season length and extreme fire weather in North America between 1979 and 2015 1 2 3 4 5 6 7 8 9 10 11 International Journal of Wildland Fire, 26, 1009 1020 IAWF 2017 doi:10.1071/wf17008_ac Supplementary material Trend analysis of fire season length and extreme fire weather in North

More information

Statistics and Measurement Concepts with OpenStat

Statistics and Measurement Concepts with OpenStat Statistics and Measurement Concepts with OpenStat William Miller Statistics and Measurement Concepts with OpenStat William Miller Urbandale, Iowa USA ISBN 978-1-4614-5742-8 ISBN 978-1-4614-5743-5 (ebook)

More information

A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating

A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating A Quadratic Curve Equating Method to Equate the First Three Moments in Equipercentile Equating Tianyou Wang and Michael J. Kolen American College Testing A quadratic curve test equating method for equating

More information

Section 4. Test-Level Analyses

Section 4. Test-Level Analyses Section 4. Test-Level Analyses Test-level analyses include demographic distributions, reliability analyses, summary statistics, and decision consistency and accuracy. Demographic Distributions All eligible

More information

The Difficulty of Test Items That Measure More Than One Ability

The Difficulty of Test Items That Measure More Than One Ability The Difficulty of Test Items That Measure More Than One Ability Mark D. Reckase The American College Testing Program Many test items require more than one ability to obtain a correct response. This article

More information

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin

An Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University

More information

A Multivariate Perspective

A Multivariate Perspective A Multivariate Perspective on the Analysis of Categorical Data Rebecca Zwick Educational Testing Service Ellijot M. Cramer University of North Carolina at Chapel Hill Psychological research often involves

More information

Use of e-rater in Scoring of the TOEFL ibt Writing Test

Use of e-rater in Scoring of the TOEFL ibt Writing Test Research Report ETS RR 11-25 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman June 2011 Use of e-rater in Scoring of the TOEFL ibt Writing Test Shelby J. Haberman ETS, Princeton,

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika. Biometrika Trust An Improved Bonferroni Procedure for Multiple Tests of Significance Author(s): R. J. Simes Source: Biometrika, Vol. 73, No. 3 (Dec., 1986), pp. 751-754 Published by: Biometrika Trust Stable

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Estimating ability for two samples

Estimating ability for two samples Estimating ability for two samples William Revelle David M. Condon Northwestern University Abstract Using IRT to estimate ability is easy, but how accurate are the estimate and what about multiple samples?

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Coefficients and Indices in Generalizability Theory

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Coefficients and Indices in Generalizability Theory Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 1 Coefficients and Indices in Generalizability Theory Robert L. Brennan August 2003 A revised version of a paper presented

More information

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006 and F Distributions Lecture 9 Distribution The distribution is used to: construct confidence intervals for a variance compare a set of actual frequencies with expected frequencies test for association

More information

UCLA Department of Statistics Papers

UCLA Department of Statistics Papers UCLA Department of Statistics Papers Title Can Interval-level Scores be Obtained from Binary Responses? Permalink https://escholarship.org/uc/item/6vg0z0m0 Author Peter M. Bentler Publication Date 2011-10-25

More information

Modeling and Performance Analysis with Discrete-Event Simulation

Modeling and Performance Analysis with Discrete-Event Simulation Simulation Modeling and Performance Analysis with Discrete-Event Simulation Chapter 9 Input Modeling Contents Data Collection Identifying the Distribution with Data Parameter Estimation Goodness-of-Fit

More information

Observed-Score "Equatings"

Observed-Score Equatings Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings" Frederic M. Lord and Marilyn S. Wingersky Educational Testing Service Two methods of equating tests are compared, one using true

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Haiwen (Henry) Chen and Paul Holland 1 ETS, Princeton, New Jersey

Haiwen (Henry) Chen and Paul Holland 1 ETS, Princeton, New Jersey Research Report Construction of Chained True Score Equipercentile Equatings Under the Kernel Equating (KE) Framework and Their Relationship to Levine True Score Equating Haiwen (Henry) Chen Paul Holland

More information

KR- 21 FOR FORMULA SCORED TESTS WITH. Robert L. Linn, Robert F. Boldt, Ronald L. Flaugher, and Donald A. Rock

KR- 21 FOR FORMULA SCORED TESTS WITH. Robert L. Linn, Robert F. Boldt, Ronald L. Flaugher, and Donald A. Rock RB-66-4D ~ E S [ B A U R L C L Ii E TI KR- 21 FOR FORMULA SCORED TESTS WITH OMITS SCORED AS WRONG Robet L. Linn, Robet F. Boldt, Ronald L. Flaughe, and Donald A. Rock N This Bulletin is a daft fo inteoffice

More information

Basic Statistical Analysis

Basic Statistical Analysis indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,

More information

Reconciling factor-based and composite-based approaches to structural equation modeling

Reconciling factor-based and composite-based approaches to structural equation modeling Reconciling factor-based and composite-based approaches to structural equation modeling Edward E. Rigdon (erigdon@gsu.edu) Modern Modeling Methods Conference May 20, 2015 Thesis: Arguments for factor-based

More information

Chapter - 5 Reliability, Validity & Norms

Chapter - 5 Reliability, Validity & Norms Chapter - 5 Reliability, Validity & Norms Chapter - 5 Reliability, Validity & Norms 5.1.0 Introduction 5.2.0 Concept of the Reliability 5.3.0 Methods of Estimation of reliability 5.3.1 Method of equivalent

More information

Formulas and Tables by Mario F. Triola

Formulas and Tables by Mario F. Triola Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or

More information

Randomized Complete Block Designs

Randomized Complete Block Designs Randomized Complete Block Designs David Allen University of Kentucky February 23, 2016 1 Randomized Complete Block Design There are many situations where it is impossible to use a completely randomized

More information

Empirical Power of Four Statistical Tests in One Way Layout

Empirical Power of Four Statistical Tests in One Way Layout International Mathematical Forum, Vol. 9, 2014, no. 28, 1347-1356 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47128 Empirical Power of Four Statistical Tests in One Way Layout Lorenzo

More information

Measurement Error in Nonparametric Item Response Curve Estimation

Measurement Error in Nonparametric Item Response Curve Estimation Research Report ETS RR 11-28 Measurement Error in Nonparametric Item Response Curve Estimation Hongwen Guo Sandip Sinharay June 2011 Measurement Error in Nonparametric Item Response Curve Estimation Hongwen

More information

Item Reliability Analysis

Item Reliability Analysis Item Reliability Analysis Revised: 10/11/2017 Summary... 1 Data Input... 4 Analysis Options... 5 Tables and Graphs... 5 Analysis Summary... 6 Matrix Plot... 8 Alpha Plot... 10 Correlation Matrix... 11

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov

ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES. Dimitar Atanasov Pliska Stud. Math. Bulgar. 19 (2009), 59 68 STUDIA MATHEMATICA BULGARICA ESTIMATION OF IRT PARAMETERS OVER A SMALL SAMPLE. BOOTSTRAPPING OF THE ITEM RESPONSES Dimitar Atanasov Estimation of the parameters

More information

Chapter 8 Heteroskedasticity

Chapter 8 Heteroskedasticity Chapter 8 Walter R. Paczkowski Rutgers University Page 1 Chapter Contents 8.1 The Nature of 8. Detecting 8.3 -Consistent Standard Errors 8.4 Generalized Least Squares: Known Form of Variance 8.5 Generalized

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Estimating Measures of Pass-Fail Reliability

Estimating Measures of Pass-Fail Reliability Estimating Measures of Pass-Fail Reliability From Parallel Half-Tests David J. Woodruff and Richard L. Sawyer American College Testing Program Two methods are derived for estimating measures of pass-fail

More information

In this module I again consider compositing. This module follows one entitled, Composites and Formative Indicators. In this module, I deal with a

In this module I again consider compositing. This module follows one entitled, Composites and Formative Indicators. In this module, I deal with a In this module I again consider compositing. This module follows one entitled, Composites and Formative Indicators. In this module, I deal with a special situation where there is an endogenous link that

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 31 Assessing Equating Results Based on First-order and Second-order Equity Eunjung Lee, Won-Chan Lee, Robert L. Brennan

More information

ELSEVIER FIRST PROOFS CONTRIBUTORS PROOFCHECKING INSTRUCTIONS FOR ENCYCLOPEDIA OF SOCIAL MEASUREMENT

ELSEVIER FIRST PROOFS CONTRIBUTORS PROOFCHECKING INSTRUCTIONS FOR ENCYCLOPEDIA OF SOCIAL MEASUREMENT CONTRIBUTORS PROOFCHECKING INSTRUCTIONS FOR ENCYCLOPEDIA OF SOCIAL MEASUREMENT PROOFREADING The text content and layout of your article is not in final form when you receive proofs. Read proofs for accuracy

More information

Study of the Relationship between Dependent and Independent Variable Groups by Using Canonical Correlation Analysis with Application

Study of the Relationship between Dependent and Independent Variable Groups by Using Canonical Correlation Analysis with Application Modern Applied Science; Vol. 9, No. 8; 2015 ISSN 1913-1844 E-ISSN 1913-1852 Published by Canadian Center of Science and Education Study of the Relationship between Dependent and Independent Variable Groups

More information

Classical Test Theory (CTT) for Assessing Reliability and Validity

Classical Test Theory (CTT) for Assessing Reliability and Validity Classical Test Theory (CTT) for Assessing Reliability and Validity Today s Class: Hand-waving at CTT-based assessments of validity CTT-based assessments of reliability Why alpha doesn t really matter CLP

More information

Basic IRT Concepts, Models, and Assumptions

Basic IRT Concepts, Models, and Assumptions Basic IRT Concepts, Models, and Assumptions Lecture #2 ICPSR Item Response Theory Workshop Lecture #2: 1of 64 Lecture #2 Overview Background of IRT and how it differs from CFA Creating a scale An introduction

More information

Improved General Class of Ratio Type Estimators

Improved General Class of Ratio Type Estimators [Volume 5 issue 8 August 2017] Page No.1790-1796 ISSN :2320-7167 INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER RESEARCH Improved General Class of Ratio Type Estimators 1 Banti Kumar, 2 Manish Sharma,

More information

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 Data Analysis: The mean egg masses (g) of the two different types of eggs may be exactly the same, in which case you may be tempted to accept

More information

Repeated Measures Analysis of Variance

Repeated Measures Analysis of Variance Repeated Measures Analysis of Variance Review Univariate Analysis of Variance Group A Group B Group C Repeated Measures Analysis of Variance Condition A Condition B Condition C Repeated Measures Analysis

More information

Chapter 13 Correlation

Chapter 13 Correlation Chapter Correlation Page. Pearson correlation coefficient -. Inferential tests on correlation coefficients -9. Correlational assumptions -. on-parametric measures of correlation -5 5. correlational example

More information

[ B R L A U. t L 1-1 E T I. RB-60-l3. S. H. Abdel-Aty

[ B R L A U. t L 1-1 E T I. RB-60-l3. S. H. Abdel-Aty RB-60-l3 ~ [ s [ B A U R L t L 1-1 E T I N TECHNIQUES OF TESTING SJNlLARITY BETWEEN PROFILES S. H. Abdel-Aty This Bulletin is a draft for interoffice circulation. Corrections and suggestions for revision

More information

Item Response Theory and Computerized Adaptive Testing

Item Response Theory and Computerized Adaptive Testing Item Response Theory and Computerized Adaptive Testing Richard C. Gershon, PhD Department of Medical Social Sciences Feinberg School of Medicine Northwestern University gershon@northwestern.edu May 20,

More information

Psy 420 Final Exam Fall 06 Ainsworth. Key Name

Psy 420 Final Exam Fall 06 Ainsworth. Key Name Psy 40 Final Exam Fall 06 Ainsworth Key Name Psy 40 Final A researcher is studying the effect of Yoga, Meditation, Anti-Anxiety Drugs and taking Psy 40 and the anxiety levels of the participants. Twenty

More information

Reporting Subscores: A Survey

Reporting Subscores: A Survey Research Memorandum Reporting s: A Survey Sandip Sinharay Shelby J. Haberman December 2008 ETS RM-08-18 Listening. Learning. Leading. Reporting s: A Survey Sandip Sinharay and Shelby J. Haberman ETS, Princeton,

More information

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr. Simulation Discrete-Event System Simulation Chapter 8 Input Modeling Purpose & Overview Input models provide the driving force for a simulation model. The quality of the output is no better than the quality

More information

Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design

Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design Research Report Testing the Untestable Assumptions of the Chain and Poststratification Equating Methods for the NEAT Design Paul W. Holland Alina A. von Davier Sandip Sinharay Ning Han Research & Development

More information

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University richamar@gvsu.edu Overview of Lesson In this activity students explore the properties of the distribution

More information

A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example

A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example A White Paper on Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example Robert L. Brennan CASMA University of Iowa June 10, 2012 On May 3, 2012, the author made a PowerPoint presentation

More information

A Non-parametric bootstrap for multilevel models

A Non-parametric bootstrap for multilevel models A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

Equating Subscores Using Total Scaled Scores as an Anchor

Equating Subscores Using Total Scaled Scores as an Anchor Research Report ETS RR 11-07 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan Longjuan Liang March 2011 Equating Subscores Using Total Scaled Scores as an Anchor Gautam Puhan and

More information