Measuremen Error 1: Consequences of Measuremen Error Richard Williams, Universiy of Nore Dame, hps://www3.nd.edu/~rwilliam/ Las revised January 1, 015 Definiions. For wo variables, X and Y, he following hold: Parameer Eplanaion X = N i E X ) = µ ( Epecaion, or Mean, of X V ( X ) = E[( X µ ) ] = Variance of X SD( X ) = V ( X ) = Sandard Deviaion of X COV ( X, Y) = E[( X µ )( Y µ )] = Covariance of X and Y y y y CORR( X, Y) = = ry = r β y y = y y Correlaion of X and Y Slope coefficien for he Bivariae regression of Y on X (Y dependen) Quesion: Suppose X suffers from random measuremen error - ha is, he values of X ha we observe differ randomly from he rue values ha we are ineresed in. For eample, we migh be ineresed in income. Since people do no remember heir income eacly, repored income will someimes be higher and someimes be lower han rue income. In such a case, how does random measuremen error affec he various saisical measures we are ypically ineresed in? Tha is, how does unreliabiliy affec our saisical measures and conclusions? Revised Quesion: Le us pu he quesion more formally. Le X = X +ε, where ε is a random error erm (i.e. has mean 0 and variance s² ε). Tha is, X is he rue value of he variable, and X is he flawed measure of he variable ha is observed. We wan o see how he saisics for he observed variable, X, differ from he saisics for he rue variable, X. When hinking abou his quesion, keep in mind ha, because ε is a random error erm, i is independen from all oher variables (ecep iself), e.g. COV(X, ε) = COV(Y, ε) = 0. Definiion of Reliabiliy: The reliabiliy of a variable is defined as: REL(X) = = r XX The firs equaliy says reliabiliy is rue variance divided by oal variance. The second equaliy says he reliabiliy of a variable is he squared correlaion beween he rue value of he variable and he observed value ha suffers from random measuremen error. If here is no random measuremen error, reliabiliy = 1. Some addiional rules for epecaions. Before answering he quesion, he following addiional rules are helpful. Le A, B, C, and D be random variables. Then, (1) E(A + B) = E(A) + E(B) () If A and B are independen, V(A + B) = V(A) + V(B) (3) COV(A + B, C + D) = COV(A,C) + COV(A,D) + COV(B,C) + COV(B,D) Measuremen Error 1: Consequences Page 1
Hypoheical Daa. To help illusrae he poins ha will follow, we creae a daa se where he rue measures (Y and X) have a correlaion of.7 wih each oher bu he observed measures (Y and X) boh have some degree of random measuremen error, and he reliabiliy of boh is.64. The way I am consrucing he daa se, using he corrdaa command, here will be no sampling variabiliy, i.e. we can ac as hough we have he enire populaion.. mari inpu corr = (1,.7,0,0\.7,1,0,0\0,0,1,0\0,0,0,1). mari inpu sd = (4,8,3,6). mari inpu mean = (10,7,0,0). corrdaa Y X ey e, corr(corr) sd(sd) mean(mean) n(500) (obs 500). * Creae flawed measures wih random measuremen error. gen Y = Y + ey. gen X = X + e Effecs of Unreliabiliy A. For he mean: E(X) = E(X + ε) = E(X ) + E(ε) = E(X ) [Epecaions rule 1] NOTE: Remember, since errors are random, ε has mean 0. Implicaion: Random measuremen error does no bias he epeced value of a variable - ha is, E(X) = E(X ) B. For he variance: V(X) = V(X + ε) = V(X ) + V(ε) [Epecaions rule ] NOTE: Remember, COV(X, ε) = 0 because ε is a random disurbance. Implicaion: Random measuremen error does resul in biased variances. The variance of he observed variable will be greaer han he rue variance. A & B illusraed wih our hypoheical daa. We see ha he flawed, observed measures have he same means as he rue measures bu heir variances & sandard deviaions are larger:. sum Y Y X X Variable Obs Mean Sd. Dev. Min Ma -------------+-------------------------------------------------------- Y 500 10 4 -.639851.83863 Y 500 10 5-3.706503 6.55569 X 500 7 8-16.16331 8.80884 X 500 7 10-3.81675 38.4917 Measuremen Error 1: Consequences Page
C. For he covariance (we ll le Y sand for he perfecly measured Y variable): COV(X,Y) = COV(X + ε, Y) = COV(X, Y) + COV(ε, Y) = COV(X, Y) [Epecaions rule 3] NOTE: Remember, COV(ε,Y) = 0 because ε is a random disurbance. Implicaion: Covariances are no biased by random measuremen error. C illusraed wih our hypoheical daa. Random measuremen error in X does NOT affec he covariance:. corr Y X X, cov (obs=500) Y X X -------------+--------------------------- Y 16 X.4 64 X.4 64 100 D. For he correlaion: XY X Y XY r y =, r y X = =. Y Y Y X Thus, when X and Y covary posiively, CORR(X,Y) CORR(X,Y) X Implicaion: Random measuremen error produces a downward bias in he bivariae correlaion. This is ofen referred o as aenuaion. D wih hypoheical daa. The correlaion is aenuaed by random measuremen error:. corr Y X X (obs=500) Y X X -------------+--------------------------- Y 1.0000 X 0.7000 1.0000 X 0.5600 0.8000 1.0000 Noe ha he correlaion beween X and X is.8 and ha he correlaion beween X and Y (.56) is only.8 imes as large as he correlaion beween X and Y (.7). Also, he.8 correlaion beween X and X means ha he reliabiliy of X is.64. Measuremen Error 1: Consequences Page 3
E. For ß YX: (Y is perfecly measured, X has random measuremen error) XY XY βyx =, β YX = Thus, when X and Y covary posiively, ß YX ß YX X X Implicaion: Random measuremen error in he Independen variable produces a downward bias in he bivariae regression slope coefficien. E wih hypoheical daa. In a bivariae regression, random measuremen error in X causes he slope coefficien o be aenuaed, i.e. smaller in magniude. Firs we run he regression beween he rue measures, and hen we run he regression of Y wih he flawed measure X:. reg Y X -------------+------------------------------ F( 1, 498) = 478.47 Model 391.16007 1 391.16007 Prob > F = 0.0000 Residual 4071.84001 498 8.17638555 R-squared = 0.4900 -------------+------------------------------ Adj R-squared = 0.4890 Toal 7984.00008 499 16.000000 Roo MSE =.8594 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.35.0160008 1.87 0.000.318567.3814373 _cons 7.55.169994 44.41 0.000 7.16006 7.883994. reg Y X -------------+------------------------------ F( 1, 498) = 7.5 Model 503.7847 1 503.7847 Prob > F = 0.0000 Residual 5480.1761 498 11.004453 R-squared = 0.3136 -------------+------------------------------ Adj R-squared = 0.31 Toal 7984.00008 499 16.000000 Roo MSE = 3.3173 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.4.0148503 15.08 0.000.194831.531769 _cons 8.43.1811488 46.55 0.000 8.07609 8.78791 Noe ha X has a reliabiliy of.64 and he slope coefficien using he flawed X (.4) is only.64 imes as large as he slope coefficien using he perfecly measured X (.35). Measuremen Error 1: Consequences Page 4
F. For ß YX: (Now Y is measured wih random error, while X is measured perfecly) X Y X Y βyx =, β Y X =. Thus, ß YX = ß YX X X Implicaion: Random measuremen error in he Dependen variable does no bias he slope coefficien. HOWEVER, i does lead o larger sandard errors. Recall ha he formula for he sandard error of b is s b = 1 R s ( N K 1) * s Y X When you have random measuremen error in Y, R goes down because of he previously noed downward bias. This increases he numeraor. Also, he variance of Y goes up, which furher increases he sandard error. F wih hypoheical daa. Random measuremen error in Y does no cause he slope coefficien o be biased bu i does cause he sandard error for he slope coefficien o be larger and he value smaller. Again we run he rue regression followed by he regression of Y wih X.. reg Y X -------------+------------------------------ F( 1, 498) = 478.47 Model 391.16007 1 391.16007 Prob > F = 0.0000 Residual 4071.84001 498 8.17638555 R-squared = 0.4900 -------------+------------------------------ Adj R-squared = 0.4890 Toal 7984.00008 499 16.000000 Roo MSE =.8594 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.35.0160008 1.87 0.000.318567.3814373 _cons 7.55.169994 44.41 0.000 7.16006 7.883994. reg Y X -------------+------------------------------ F( 1, 498) = 7.5 Model 391.16001 1 391.16001 Prob > F = 0.0000 Residual 856.84011 498 17.194458 R-squared = 0.3136 -------------+------------------------------ Adj R-squared = 0.31 Toal 1475.0001 499 5.000000 Roo MSE = 4.1466 Y Coef. Sd. Err. P> [95% Conf. Inerval] X.35.03035 15.08 0.000.3044111.3955889 _cons 7.55.465171 30.63 0.000 7.065658 8.03434 Measuremen Error 1: Consequences Page 5
Addiional implicaions When you have more han one independen variable, random measuremen error can cause coefficiens o be biased eiher upward or downward. As you add more variables o he model, all you can really be sure of is ha, if he variables suffer from random measuremen error (and mos do) he resuls will probably be a leas a lile wrong! Reliabiliy is a funcion of boh he oal variance and he error variance. True variance is a populaion characerisic; error variance is a characerisic of he measuring insrumen. The fac ha reliabiliies differ beween groups does no necessarily mean ha one group is more accurae. I may jus mean ha here is less rue variance in one group han here is in anoher. Comparisons of any sor can be disored by differenial reliabiliy of variables. For eample, if comparing effecs of wo variables, one variable may appear o have a sronger effec simply because i is beer measured. If comparing, say, husbands and wives, he spouse who gives more accurae informaion may appear more influenial. For a more deailed discussion of how measuremen error can affec group comparisons, see Thomson, Elizabeh and Richard Williams (198) Beyond wives family sociology: a mehod for analyzing couple daa Journal of Marriage and he Family Vol 44 999:1008 Dealing wih measuremen error. For he mos par, his is a subjec for a research mehods class or a more advanced saisics class. I ll oss ou a few ideas for now: Collec beer qualiy daa in he firs place. Make quesions as clear as possible. Measure muliple indicaors of conceps. When more han one quesion measures a concep, i is possible o esimae reliabiliy and o ake correcive acion. For a more deailed discussion on measuring reliabiliy, see Reliabiliy and Validiy Assessmen, by Edward G. Carmines and Richard A. Zeller. 1979. Paper # 17 in he Sage Series on Quaniaive Applicaions in he Social Sciences. Beverly Hills, CA: Sage. Creae scales from muliple indicaors of a concep. The scales will generally be more reliable han any single iem would be. In SPSS you migh use he FACTOR or RELIABILITY commands; in Saa relevan commands include facor and alpha. Use advanced echniques, such as LISREL, which le you incorporae muliple indicaors of a concep in your model. Ideally, LISREL purges he iems of measuremen error hence producing unbiased esimaes of srucural parameers. In Saa 1+, his can also be done wih he sem (Srucural Equaion Modeling) command. Measuremen Error 1: Consequences Page 6