Measurement Error 1: Consequences Page 1. Definitions. For two variables, X and Y, the following hold: Expectation, or Mean, of X.

Similar documents
ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

Specification Error: Omitted and Extraneous Variables

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

(10) (a) Derive and plot the spectrum of y. Discuss how the seasonality in the process is evident in spectrum.

Properties of Autocorrelated Processes Economics 30331

Biol. 356 Lab 8. Mortality, Recruitment, and Migration Rates

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

ACE 562 Fall Lecture 4: Simple Linear Regression Model: Specification and Estimation. by Professor Scott H. Irwin

Econ Autocorrelation. Sanjaya DeSilva

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

The Simple Linear Regression Model: Reporting the Results and Choosing the Functional Form

Unit Root Time Series. Univariate random walk

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

GMM - Generalized Method of Moments

Wednesday, November 7 Handout: Heteroskedasticity

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Math 10B: Mock Mid II. April 13, 2016

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

Hypothesis Testing in the Classical Normal Linear Regression Model. 1. Components of Hypothesis Tests

ACE 564 Spring Lecture 7. Extensions of The Multiple Regression Model: Dummy Independent Variables. by Professor Scott H.

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Some Basic Information about M-S-D Systems

FITTING EQUATIONS TO DATA

Lecture Notes 2. The Hilbert Space Approach to Time Series

Generalized Least Squares

Kinematics Vocabulary. Kinematics and One Dimensional Motion. Position. Coordinate System in One Dimension. Kinema means movement 8.

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Innova Junior College H2 Mathematics JC2 Preliminary Examinations Paper 2 Solutions 0 (*)

Comparing Means: t-tests for One Sample & Two Related Samples

Lecture 4. Classical Linear Regression Model: Overview

Time series Decomposition method

Wisconsin Unemployment Rate Forecast Revisited

Forecasting optimally

Wednesday, December 5 Handout: Panel Data and Unobservable Variables

20. Applications of the Genetic-Drift Model

OBJECTIVES OF TIME SERIES ANALYSIS

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

The general Solow model

13.3 Term structure models

Components Model. Remember that we said that it was useful to think about the components representation

Licenciatura de ADE y Licenciatura conjunta Derecho y ADE. Hoja de ejercicios 2 PARTE A

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Cointegration and Implications for Forecasting

Distribution of Estimates

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

Vehicle Arrival Models : Headway

STATE-SPACE MODELLING. A mass balance across the tank gives:

Regression with Time Series Data

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

Lecture 33: November 29

STA 114: Statistics. Notes 2. Statistical Models and the Likelihood Function

Instructor: Barry McQuarrie Page 1 of 5

1. VELOCITY AND ACCELERATION

GDP Advance Estimate, 2016Q4

CONFIDENCE LIMITS AND THEIR ROBUSTNESS

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

ECON 482 / WH Hong Time Series Data Analysis 1. The Nature of Time Series Data. Example of time series data (inflation and unemployment rates)

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

( ) a system of differential equations with continuous parametrization ( T = R + These look like, respectively:

(a) Set up the least squares estimation procedure for this problem, which will consist in minimizing the sum of squared residuals. 2 t.

04. Kinetics of a second order reaction

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Chapter 3 Boundary Value Problem

Lecture 5. Time series: ECM. Bernardina Algieri Department Economics, Statistics and Finance

MATH 4330/5330, Fourier Analysis Section 6, Proof of Fourier s Theorem for Pointwise Convergence

Solutions: Wednesday, November 14

Modeling and Forecasting Volatility Autoregressive Conditional Heteroskedasticity Models. Economic Forecasting Anthony Tay Slide 1

Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "P1-Aggregate Analyses of 6 cohorts ( )

Computer Simulates the Effect of Internal Restriction on Residuals in Linear Regression Model with First-order Autoregressive Procedures

Testing for a Single Factor Model in the Multivariate State Space Framework

Math Week 14 April 16-20: sections first order systems of linear differential equations; 7.4 mass-spring systems.

Physics 127b: Statistical Mechanics. Fokker-Planck Equation. Time Evolution

14 Autoregressive Moving Average Models

Ensamble methods: Bagging and Boosting

Lecture 4 Kinetics of a particle Part 3: Impulse and Momentum

UNC resolution Uncertainty Learning Objectives: measurement interval ( You will turn in two worksheets and

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

Morning Time: 1 hour 30 minutes Additional materials (enclosed):

Linear Response Theory: The connection between QFT and experiments

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

Vectorautoregressive Model and Cointegration Analysis. Time Series Analysis Dr. Sevtap Kestel 1

5.2. The Natural Logarithm. Solution

The fundamental mass balance equation is ( 1 ) where: I = inputs P = production O = outputs L = losses A = accumulation

Outline. lse-logo. Outline. Outline. 1 Wald Test. 2 The Likelihood Ratio Test. 3 Lagrange Multiplier Tests

CHAPTER 2: Mathematics for Microeconomics

Solutions to Odd Number Exercises in Chapter 6

Learning a Class from Examples. Training set X. Class C 1. Class C of a family car. Output: Input representation: x 1 : price, x 2 : engine power

Estimation Uncertainty

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Lesson 2, page 1. Outline of lesson 2

4.1 Other Interpretations of Ridge Regression

Answers to QUIZ

Second Order Linear Differential Equations

Mark Scheme (Results) January 2011

You must fully interpret your results. There is a relationship doesn t cut it. Use the text and, especially, the SPSS Manual for guidance.

Transcription:

Measuremen Error 1: Consequences of Measuremen Error Richard Williams, Universiy of Nore Dame, hps://www3.nd.edu/~rwilliam/ Las revised January 1, 015 Definiions. For wo variables, X and Y, he following hold: Parameer Eplanaion X = N i E X ) = µ ( Epecaion, or Mean, of X V ( X ) = E[( X µ ) ] = Variance of X SD( X ) = V ( X ) = Sandard Deviaion of X COV ( X, Y) = E[( X µ )( Y µ )] = Covariance of X and Y y y y CORR( X, Y) = = ry = r β y y = y y Correlaion of X and Y Slope coefficien for he Bivariae regression of Y on X (Y dependen) Quesion: Suppose X suffers from random measuremen error - ha is, he values of X ha we observe differ randomly from he rue values ha we are ineresed in. For eample, we migh be ineresed in income. Since people do no remember heir income eacly, repored income will someimes be higher and someimes be lower han rue income. In such a case, how does random measuremen error affec he various saisical measures we are ypically ineresed in? Tha is, how does unreliabiliy affec our saisical measures and conclusions? Revised Quesion: Le us pu he quesion more formally. Le X = X +ε, where ε is a random error erm (i.e. has mean 0 and variance s² ε). Tha is, X is he rue value of he variable, and X is he flawed measure of he variable ha is observed. We wan o see how he saisics for he observed variable, X, differ from he saisics for he rue variable, X. When hinking abou his quesion, keep in mind ha, because ε is a random error erm, i is independen from all oher variables (ecep iself), e.g. COV(X, ε) = COV(Y, ε) = 0. Definiion of Reliabiliy: The reliabiliy of a variable is defined as: REL(X) = = r XX The firs equaliy says reliabiliy is rue variance divided by oal variance. The second equaliy says he reliabiliy of a variable is he squared correlaion beween he rue value of he variable and he observed value ha suffers from random measuremen error. If here is no random measuremen error, reliabiliy = 1. Some addiional rules for epecaions. Before answering he quesion, he following addiional rules are helpful. Le A, B, C, and D be random variables. Then, (1) E(A + B) = E(A) + E(B) () If A and B are independen, V(A + B) = V(A) + V(B) (3) COV(A + B, C + D) = COV(A,C) + COV(A,D) + COV(B,C) + COV(B,D) Measuremen Error 1: Consequences Page 1

Hypoheical Daa. To help illusrae he poins ha will follow, we creae a daa se where he rue measures (Y and X) have a correlaion of.7 wih each oher bu he observed measures (Y and X) boh have some degree of random measuremen error, and he reliabiliy of boh is.64. The way I am consrucing he daa se, using he corrdaa command, here will be no sampling variabiliy, i.e. we can ac as hough we have he enire populaion.. mari inpu corr = (1,.7,0,0\.7,1,0,0\0,0,1,0\0,0,0,1). mari inpu sd = (4,8,3,6). mari inpu mean = (10,7,0,0). corrdaa Y X ey e, corr(corr) sd(sd) mean(mean) n(500) (obs 500). * Creae flawed measures wih random measuremen error. gen Y = Y + ey. gen X = X + e Effecs of Unreliabiliy A. For he mean: E(X) = E(X + ε) = E(X ) + E(ε) = E(X ) [Epecaions rule 1] NOTE: Remember, since errors are random, ε has mean 0. Implicaion: Random measuremen error does no bias he epeced value of a variable - ha is, E(X) = E(X ) B. For he variance: V(X) = V(X + ε) = V(X ) + V(ε) [Epecaions rule ] NOTE: Remember, COV(X, ε) = 0 because ε is a random disurbance. Implicaion: Random measuremen error does resul in biased variances. The variance of he observed variable will be greaer han he rue variance. A & B illusraed wih our hypoheical daa. We see ha he flawed, observed measures have he same means as he rue measures bu heir variances & sandard deviaions are larger:. sum Y Y X X Variable Obs Mean Sd. Dev. Min Ma -------------+-------------------------------------------------------- Y 500 10 4 -.639851.83863 Y 500 10 5-3.706503 6.55569 X 500 7 8-16.16331 8.80884 X 500 7 10-3.81675 38.4917 Measuremen Error 1: Consequences Page

C. For he covariance (we ll le Y sand for he perfecly measured Y variable): COV(X,Y) = COV(X + ε, Y) = COV(X, Y) + COV(ε, Y) = COV(X, Y) [Epecaions rule 3] NOTE: Remember, COV(ε,Y) = 0 because ε is a random disurbance. Implicaion: Covariances are no biased by random measuremen error. C illusraed wih our hypoheical daa. Random measuremen error in X does NOT affec he covariance:. corr Y X X, cov (obs=500) Y X X -------------+--------------------------- Y 16 X.4 64 X.4 64 100 D. For he correlaion: XY X Y XY r y =, r y X = =. Y Y Y X Thus, when X and Y covary posiively, CORR(X,Y) CORR(X,Y) X Implicaion: Random measuremen error produces a downward bias in he bivariae correlaion. This is ofen referred o as aenuaion. D wih hypoheical daa. The correlaion is aenuaed by random measuremen error:. corr Y X X (obs=500) Y X X -------------+--------------------------- Y 1.0000 X 0.7000 1.0000 X 0.5600 0.8000 1.0000 Noe ha he correlaion beween X and X is.8 and ha he correlaion beween X and Y (.56) is only.8 imes as large as he correlaion beween X and Y (.7). Also, he.8 correlaion beween X and X means ha he reliabiliy of X is.64. Measuremen Error 1: Consequences Page 3

E. For ß YX: (Y is perfecly measured, X has random measuremen error) XY XY βyx =, β YX = Thus, when X and Y covary posiively, ß YX ß YX X X Implicaion: Random measuremen error in he Independen variable produces a downward bias in he bivariae regression slope coefficien. E wih hypoheical daa. In a bivariae regression, random measuremen error in X causes he slope coefficien o be aenuaed, i.e. smaller in magniude. Firs we run he regression beween he rue measures, and hen we run he regression of Y wih he flawed measure X:. reg Y X -------------+------------------------------ F( 1, 498) = 478.47 Model 391.16007 1 391.16007 Prob > F = 0.0000 Residual 4071.84001 498 8.17638555 R-squared = 0.4900 -------------+------------------------------ Adj R-squared = 0.4890 Toal 7984.00008 499 16.000000 Roo MSE =.8594 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.35.0160008 1.87 0.000.318567.3814373 _cons 7.55.169994 44.41 0.000 7.16006 7.883994. reg Y X -------------+------------------------------ F( 1, 498) = 7.5 Model 503.7847 1 503.7847 Prob > F = 0.0000 Residual 5480.1761 498 11.004453 R-squared = 0.3136 -------------+------------------------------ Adj R-squared = 0.31 Toal 7984.00008 499 16.000000 Roo MSE = 3.3173 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.4.0148503 15.08 0.000.194831.531769 _cons 8.43.1811488 46.55 0.000 8.07609 8.78791 Noe ha X has a reliabiliy of.64 and he slope coefficien using he flawed X (.4) is only.64 imes as large as he slope coefficien using he perfecly measured X (.35). Measuremen Error 1: Consequences Page 4

F. For ß YX: (Now Y is measured wih random error, while X is measured perfecly) X Y X Y βyx =, β Y X =. Thus, ß YX = ß YX X X Implicaion: Random measuremen error in he Dependen variable does no bias he slope coefficien. HOWEVER, i does lead o larger sandard errors. Recall ha he formula for he sandard error of b is s b = 1 R s ( N K 1) * s Y X When you have random measuremen error in Y, R goes down because of he previously noed downward bias. This increases he numeraor. Also, he variance of Y goes up, which furher increases he sandard error. F wih hypoheical daa. Random measuremen error in Y does no cause he slope coefficien o be biased bu i does cause he sandard error for he slope coefficien o be larger and he value smaller. Again we run he rue regression followed by he regression of Y wih X.. reg Y X -------------+------------------------------ F( 1, 498) = 478.47 Model 391.16007 1 391.16007 Prob > F = 0.0000 Residual 4071.84001 498 8.17638555 R-squared = 0.4900 -------------+------------------------------ Adj R-squared = 0.4890 Toal 7984.00008 499 16.000000 Roo MSE =.8594 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.35.0160008 1.87 0.000.318567.3814373 _cons 7.55.169994 44.41 0.000 7.16006 7.883994. reg Y X -------------+------------------------------ F( 1, 498) = 7.5 Model 391.16001 1 391.16001 Prob > F = 0.0000 Residual 856.84011 498 17.194458 R-squared = 0.3136 -------------+------------------------------ Adj R-squared = 0.31 Toal 1475.0001 499 5.000000 Roo MSE = 4.1466 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.35.03035 15.08 0.000.3044111.3955889 _cons 7.55.465171 30.63 0.000 7.065658 8.03434 Measuremen Error 1: Consequences Page 5

Addiional implicaions When you have more han one independen variable, random measuremen error can cause coefficiens o be biased eiher upward or downward. As you add more variables o he model, all you can really be sure of is ha, if he variables suffer from random measuremen error (and mos do) he resuls will probably be a leas a lile wrong! Reliabiliy is a funcion of boh he oal variance and he error variance. True variance is a populaion characerisic; error variance is a characerisic of he measuring insrumen. The fac ha reliabiliies differ beween groups does no necessarily mean ha one group is more accurae. I may jus mean ha here is less rue variance in one group han here is in anoher. Comparisons of any sor can be disored by differenial reliabiliy of variables. For eample, if comparing effecs of wo variables, one variable may appear o have a sronger effec simply because i is beer measured. If comparing, say, husbands and wives, he spouse who gives more accurae informaion may appear more influenial. For a more deailed discussion of how measuremen error can affec group comparisons, see Thomson, Elizabeh and Richard Williams (198) Beyond wives family sociology: a mehod for analyzing couple daa Journal of Marriage and he Family Vol 44 999:1008 Dealing wih measuremen error. For he mos par, his is a subjec for a research mehods class or a more advanced saisics class. I ll oss ou a few ideas for now: Collec beer qualiy daa in he firs place. Make quesions as clear as possible. Measure muliple indicaors of conceps. When more han one quesion measures a concep, i is possible o esimae reliabiliy and o ake correcive acion. For a more deailed discussion on measuring reliabiliy, see Reliabiliy and Validiy Assessmen, by Edward G. Carmines and Richard A. Zeller. 1979. Paper # 17 in he Sage Series on Quaniaive Applicaions in he Social Sciences. Beverly Hills, CA: Sage. Creae scales from muliple indicaors of a concep. The scales will generally be more reliable han any single iem would be. In SPSS you migh use he FACTOR or RELIABILITY commands; in Saa relevan commands include facor and alpha. Use advanced echniques, such as LISREL, which le you incorporae muliple indicaors of a concep in your model. Ideally, LISREL purges he iems of measuremen error hence producing unbiased esimaes of srucural parameers. In Saa 1+, his can also be done wih he sem (Srucural Equaion Modeling) command. Measuremen Error 1: Consequences Page 6