Outline. Reliability. Reliability. PSY Oswald

Similar documents
Measurement Theory. Reliability. Error Sources. = XY r XX. r XY. r YY

Concept of Reliability

Classical Test Theory. Basics of Classical Test Theory. Cal State Northridge Psy 320 Andrew Ainsworth, PhD

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Latent Trait Reliability

Classical Test Theory (CTT) for Assessing Reliability and Validity

The Reliability of a Homogeneous Test. Measurement Methods Lecture 12

Lesson 6: Reliability

Item Response Theory and Computerized Adaptive Testing

TESTING AND MEASUREMENT IN PSYCHOLOGY. Merve Denizci Nazlıgül, M.S.

UCLA STAT 233 Statistical Methods in Biomedical Imaging

Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement

Item Reliability Analysis

Classification methods

Instrumentation (cont.) Statistics vs. Parameters. Descriptive Statistics. Types of Numerical Data

Review. A Bernoulli Trial is a very simple experiment:

Model II (or random effects) one-way ANOVA:

Basic IRT Concepts, Models, and Assumptions

Lecture Week 1 Basic Principles of Scientific Research

Inter Item Correlation Matrix (R )

Chapter 6 The Normal Distribution

An Overview of Item Response Theory. Michael C. Edwards, PhD

Q1 Own your learning with flash cards.

Assessment, analysis and interpretation of Patient Reported Outcomes (PROs)

T. Mark Beasley One-Way Repeated Measures ANOVA handout

Measurement Error. Martin Bland. Accuracy and precision. Error. Measurement in Health and Disease. Professor of Health Statistics University of York

Section 4. Test-Level Analyses

Applied Multivariate Analysis

Relation between Chaos Theory and Lewis s Development theory: Common Understanding is never archieved by covariation alone. Hideaki Yanagisawa

Chapter (4) Discrete Probability Distributions Examples

1. How will an increase in the sample size affect the width of the confidence interval?

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

A MULTIVARIATE GENERALIZABILITY ANALYSIS OF STUDENT STYLE QUESTIONNAIRE

Grade 6 Social Studies

Overview for Families

Detailed Contents. 1. Science, Society, and Social Work Research The Process and Problems of Social Work Research 27

CRDTS EXAMINER PROFILE SERVICE. Examiner Profile Statistics 2017

COEFFICIENTS ALPHA, BETA, OMEGA, AND THE GLB: COMMENTS ON SIJTSMA WILLIAM REVELLE DEPARTMENT OF PSYCHOLOGY, NORTHWESTERN UNIVERSITY

Test Administration Technical Report

Reliability Theory Classical Model (Model T)

11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

Measures of Central Tendency. Mean, Median, and Mode

Factorial Designs. Outline. Definition. Factorial designs. Factorial designs. Higher order interactions

8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)

Estimating Operational Validity Under Incidental Range Restriction: Some Important but Neglected Issues

Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores. Instructor s Summary of Chapter

Package CMC. R topics documented: February 19, 2015

Probability and Samples. Sampling. Point Estimates

CHAPTER 2 BASIC MATHEMATICAL AND MEASUREMENT CONCEPTS

Validation Study: A Case Study of Calculus 1 MATH 1210

Balancing Chemical Equations

Dimension Reduction and Classification Using PCA and Factor. Overview

Inter-Rater Agreement

Comparing IRT with Other Models

The Empirical Rule, z-scores, and the Rare Event Approach

Chapter 3. Measuring data

Hypothesis Testing with Z and T

Clarifying the concepts of reliability, validity and generalizability

Scaling of Variance Space

Introduction to Confirmatory Factor Analysis

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Reporting Subscores: A Survey

POWER ALGEBRA NOTES: QUICK & EASY

Sampling Distributions: Central Limit Theorem

INTRODUCTION TO ANALYSIS OF VARIANCE

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

Uni- and Bivariate Power

Sampling and Sample Size. Shawn Cole Harvard Business School

ANCOVA. Lecture 9 Andrew Ainsworth

Finite Math - Fall Section Present Value of an Annuity; Amortization

Group Dependence of Some Reliability

Composite scales and scores

Learning Plan 09. Question 1. Question 2. Question 3. Question 4. What is the difference between the highest and lowest data values in a data set?

Module 7 Practice problem and Homework answers

WELCOME! Lecture 14: Factor Analysis, part I Måns Thulin

The David P. Weikart Center for Youth Program Quality, Bringing together over fifty years of experience and the latest research,

Hypothesis testing: Steps

Deciphering Math Notation. Billy Skorupski Associate Professor, School of Education

Test Administration Technical Report

TABLE 2: Means, standard deviations, skew and kurtosis values for 10-item D-FAW

Chapter 6: Linear Regression With Multiple Regressors

10/31/2012. One-Way ANOVA F-test

Part III Classical and Modern Test Theory

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Chapter 9: How can I tell if scores differ between three or more groups? One-way independent measures ANOVA.

Lecture 10: Everything Else

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Psychology 405: Psychometric Theory Validity

Commutative Property: Change Order Addends can be attached in any order. Multipliers can be magnified in any order.

A Re-Introduction to General Linear Models (GLM)

psychological statistics

Measures of Agreement

Chapter - 5 Reliability, Validity & Norms

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Independent Component (IC) Models: New Extensions of the Multinormal Model

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in

UMI INFORMATION TO USERS. In the unlikely. event that the author did not send UMI a complete

Transcription:

PSY 395 - Oswald Outline Concept of What are Constructs? Construct Contamination and Construct Deficiency in General Classical Test Theory Concept of Cars (engines, brakes!) Friends (on time, secrets) Surgeons Tattoo Artists Oh, and psychological measures 1

Concept of Affects Decision-Making in Psychology Selecting airline pilots Placing a child into Head Start Determining whether a client needs treatment for depression What Are Constructs? Constructs Often are unseen but can be measured indirectly (often via many multiplechoice questions): Airline pilots conscientiousness Head Start language skills Clinical treatment depression What Are Constructs? Constructs Endorsing items related to a construct doesn t mean you have a high standing on the construct you must endorse many items Example: Conscientiousness what are some possible items? 2

Construct Contamination and Deficiency Sample measure: Lack of Culture Fit (adapted from Lynness & Thompson, 2000, Journal of Applied Psychology) 1. I feel pressure to fit in or adapt to the situation around me. 2. I feel like I am an outsider. 3. There are few role models I can turn to. 4. I feel like I m held to a higher standard than others. Construct Contamination and Deficiency Sample measure: Lack of Culture Fit (adapted from Lynness & Thompson, 2000, Journal of Applied Psychology) 1. I feel pressure to fit in or adapt to the situation around me. 2. I feel like I am an outsider. 3. There are few role models I can turn to. 4. I feel like I m held to a higher standard than others. 5. I like food. CONSTRUCT CONTAMINATION Construct Contamination and Deficiency Sample measure: Lack of Culture Fit (adapted from Lynness & Thompson, 2000, Journal of Applied Psychology) 1. I feel pressure to fit in or adapt to the situation around me. 2. I feel like I am an outsider. 3. There are few role models I can turn to. CONSTRUCT DEFICIENT 4. I feel like I m held to a higher standard than others. 3

in General Standards of Measurement Given that the measure of a construct is not contaminated and not deficient then we concern ourselves with: Validity (a measure can not be valid if it is not reliable!) in General How consistent are the observed scores obtained by the same person tested at different time points? To what extent does your observed score reflect your true score on the construct measured? Classical Test Theory Classical Test Theory (CTT) X = T + E where X=observed score [the scores you have] T= true score [construct score anything consistent] E = random error score [guessing, fatigue anything inconsistent] 4

Classical Test Theory CTT Assumptions 1. True scores and errors are uncorrelated (independent) 2. Errors across people average to zero 3. Across repeated measurements, a person s average score is equal to his/her true score (i.e., #2 above also applies within people) Classical Test Theory Measures (1, 2, k) People (a, b, N) Error distributions Across people within X 1 measures (and within people across measures ) a b N X 1a = T 1a + E 1a X 1b = T 1b + E 1b X 1N = T 1N + E 1N Assumption If different measures (X 1, X 2, X 3 ) measure the same construct, then for a particular person the measures should have the same true score Classical Test Theory If X = T + E, then since true scores and error scores are uncorrelated: var (X) = var(t) + var(e) Question: What percent of the observed variance is reliable (true-score) variance?? 5

coefficients reflect the proportion of true score variance to observed score variance r var( T ) = var( X ) Therefore reliabilities range from 0.0 (all error variance) to 1.0 (all true-score variance) All Indexes of are Correlation Coefficients r =.23 r =.87 Errors in Measurement What is error?! Depends on how you define it; what variance you decide is irrelevant. 6

Errors in Measurement Some questions to ask: Is the construct stable? Are there multiple dimensions to the construct? (e.g., leadership) Will test-takers be fatigued? Are there practical considerations (is test-retest possible? how are scores being used?) Conceptual Considerations in Measurement Type of Test Speeded Power Type of Group Tested Range-restricted? Generalizable? Different How you estimate reliability depends on what you consider to be error Source of Error Time Content Type of Test-retest Parallel forms Split-half Cronbach s Alpha 7

Different Hey, there s more! Source of Error Time and Content Raters/Observers Type of Parallel forms with time delay Interrater correlation Test-Retest What is it? The correlation between the scores of people who have taken a test twice Why is this useful? What are some possible sources of error variance? practice, fatigue, health, motivation Parallel (or Alternate) Forms What is it? Same vs. different points in time Practical considerations (time/$) Counterbalancing 8

Split-Half What is it? odd/even; first/last What are the sources of error? (content) What are NOT the sources of error? (time) Correcting for Split-Half: Spearman-Brown (SB) Formula r SB k r Correction: = r 1 + ( k 1) r SB > r k = amount by which the test is multiplied (here, it s 2 why?) r = correlation between halves Spearman-Brown (SB) Formula EXAMPLE Example: 100 items, split-half r =.70, what s the estimated reliability of the full test? SB = (2 x.7) / (1 + (2 1) x.70) = 1.4/1.7 =.82 r SB k r 2.70 = = =.82 1 + ( k 1) r 1 + (2 1).70 (note the increase) 9

Coefficient Alpha (α) What is it? (1) average inter-item correlation boosted by the SB formula (where k = # of items) Another way to say this is (2) average of all possible split-half correlations boosted by the SB formula (where k = 2) Coefficient Alpha (α) What are the sources of error? (content) What are NOT the sources of error? (time) Coefficient Alpha (α) k rij α = 1 + ( k 1) r Where you need (a) the # of items and (b) the average inter-item correlation ij 10

r SB = nr 1 + ( n 1) r k rij α = 1 + ( k 1) r ij α Example: Correlations item 1 2 3 4 1 1.00.33.33.50 2.33 1.00.50.33 3.33.50 1.00.25 4.50.33.25 1.00 k rij α = 1 + ( k 1) r ij = (4*.373) / 1+(4-1)*.373 =.70 2 k r ij or α = = (4 2 x.373) / 8.48 =.70 R Note: the average uses the lower diagonal not including the 1.0 values SB and α : Assumptions The formulas are not magical corrections that solve real measurement problems, if the test needs to be developed/revised further. A couple of assumptions of both formulas 1. all items are equal statistically 2. a longer test will generally have higher reliability because it is assumed to be the same in terms of content as its shorter counterpart 11