Internal vs. external validity. External validity. Internal validity

Secti 7 Mdel Assessmet Iteral vs. exteral validity Iteral validity refers t whether the aalysis is valid fr the pplati ad sample beig stdied. Exteral validity refers t whether these reslts ca be geeralized t ther pplatis: is the pplati frm which the sample is draw represetative f a larger pplati abt which iferece is sght? Exteral validity Exteral validity is related t Assmpti #0. Bt i this case, the qesti is t whether all sample bservatis fllw the same mdel bt rather d the sample bservatis fllw the same mdel as the mre geeral pplati. Or, alteratively, are they draw frm a sb-pplati that has characteristics that wld make the cefficiets (r specificati) differet? All pplatis have sb-pplatis that vary i their characteristics. If r samplig prcess is based a particlar sb-pplati, we mst wrry abt the geeralizability f r reslts, which is exteral validity: Ca perfrm a iterally valid aalysis f a idisycratic sb-pplati that wld t geeralize t thers. Example: Nel s wrk measrig the vale f tree capy r walkability i Prtlad. D reslts geeralize t ther cities r d Prtladers vale these characteristics mre (r less) tha peple i ther cities. There are direct statistical tests fr exteral validity (less y have data draw frm a brader pplati, i which case y prbably shld have sed it t begi with). It is a sally a matter f jdgmet. Oe way that sme peple try t assess exteral validity is t split the sample i half, estimate ver e sample, the assess the predictis fr the ther sample. If predictis are gd, the bth halves f the sample may fllw same mdel. This is seless if bth halves f the sample are draw frm a sbpplati that is idisycratic, thgh. Iteral validity Give the pplati frm which the sample is draw, are the assmptis derlyig the estimatrs valid? Omitted variables They are always there. ~ 63 ~

Omitted variables bias the cefficiet estimatrs fr ay iclded variables that are crrelated with them. I a strict sese, early every ecmetric regressi is biased becase f this. What variables are mst bvisly mitted? What variables i the eqati wld be crrelated with them? Hw des this missi bias the iclded cefficiets? Prxy variables are bservable variables that are crrelated with bserved variables that shld be iclded. Prxy variables are legitimate if we are t particlarly iterested i the effect f the variable fr which they prxy. Ca t iterpret the cefficiet the prxy directly as the cefficiet the mitted variable. OK if the differece betwee the tre variable ad the prxy is crrelated with iclded variables. Pael data ca help if bserved variables vary acrss its bt t ver time r ver time bt t acrss its. Misspecificati f fctial frm Ca se RESET test t explre whether qadratics are sefl. If y kw what alterative fctial frms might be mre apprpriate, y ca test them. Measremet errr (errrs-i-variables bias) Measremet errr i depedet variable Sppse that the tre depedet variable is Y bt that we istead bserve Y = Y +ε, where ε i is a radm measremet errr. i i i The estimated mdel, the is = β +β + ( +ε ) Y. i 0 1 i i i As lg as the measremet errr i Y (ε) is crrelated with, there is bias i the estimatr f β 1. The SER will be a estimate f the stadard deviati f the cmpsite errr term + ε, bt therwise OLS is fie. Measremet errr i regressr Sppse that the depedet variable is measred accrately bt that we measre with errr: = +η. i i i The estimated mdel is = β +β + ( β η ) Y. i 0 1 i i 1 i Becase η is part f ad therefre crrelated with it, the cmpsite errr term is w crrelated with the actal regressr, meaig that βˆ 1 is biased ad icsistet. ~ 64 ~

If ad η are idepedet ad rmal, the σ plim β ˆ =. 1 β 1 σ +ση The estimatr is biased tward zer. If mst f the variati i cmes frm, the the bias will be small. As the variace f the measremet errr grws i relati t the variati i the tre variable, the magitde f the bias icreases. As a wrst-case limit, if the tre des t vary acrss r sample f bservatis ad all f the variati i r measre is radm ise, the the expected vale f r cefficiet is zer. Best slti is gettig a better measre. Alteratives are istrmetal variables r direct measremet f degree f measremet errr. Fr example, if a alterative, precise measre is available fr sme argably radm sb-sample f bservatis, the we ca calclate the variace f the tre variable ad the variace f the measremet errr ad crrect the estimate. Sample selecti bias Few samples are trly radm draws frm fll pplati. Istead, they are draws (radm r t) frm sme sb-pplati: May hmeless are cted i cess N wage data thse wh d t wrk Plls miss peple with listed phe mber Crss-ctry regressis are fte limited t the ctries fr which gd data are available (which is t a radm sample f ctries) If sample selecti is related t, the we have isses f exteral validity (d estimates apply t missed sb-pplati) bt t iteral validity. Reslts may be valid fr the sb-pplati fr which they are estimated. If sample selecti is related t Y (r, specifically, t ), the we are t drawig radmly frm the pplati distribti f the errr term (as we assme) ad r reslts will be biased. There are methds f cpig with sample-selecti bias. Imptig vales fr missig wage data t allw iclsi f fll sample Simltaeity bias (reverse r bidirectial casality) If chages i Y (presmably de t chages i ) case t chage, the ad will be crrelated ad OLS estimates will be biased ad icsistet. Fr example, fr may years macrecmists estimated Keyesia csmpti fctis by OLS: C = β +β GDP + ~ 65 ~ t 0 1 t t.

(There are time-series prblems with this regressi that we will stdy later.) Fr w, te that if aggregate demad affects tpt, the GDP i each year is C + I + G + N, s a psitive shck t csmpti (a psitive ) icreases GDP. Becase the regressi is crrelated with the errr term, OLS estimates f β 1 were biased ad icsistet. (Bt they lked gd ad had ridiclsly high R vales, s they persisted fr may years despite the prtests f ecmetricias.) The sal crrecti is t se a istrmetal-variables (tw-stage least sqares) estimatr. Heterskedasticity Atcrrelati Recall that heterskedasticity cases OLS t be iefficiet (relative t WLS), bt it is still biased ad csistet. The classical stadard errrs will be biased der heterskedasticity, bt we ca se White s rbst cvariace matrix estimatr, which we ve talked abt earlier. Usig rbst errrs is the mst cmm crrecti fr heterskedasticity. If errr terms f differet bservatis are crrelated, the OLS is als iefficiet (relative t a crrected GLS estimatr), bt is biased ad csistet. Atcrrelati ca be spatial: Umeasred eighbrhd characteristics (mitted variables) that case hses that are clse tgether t be mre r less valable. Atcrrelati is biqits i time-series data: This perid s errr term is early always related t last perid s. (Umeasred mitted variables are themselves crrelated ver time.) Agai, stadard errrs are biased, bt White s heterskedastic-csistet stadard errrs d t help here. There are estimated stadard errrs that are rbst t atcrrelati. (Use hac pti i Stata.) Alteratively, e ca try t mdel the atcrrelati ad trasfrm the mdel it e that has atcrrelati (GLS). Examples iclde AR(1) mdels i time series ad mdelig spatially crrelated errrs i crss-secti mdels. Validity i frecastig/predicti Regressi mdels may be valid fr frecastig eve if their cefficiets are t biased r csistet. Sppse that we kw that is measred with errr. ~ 66 ~

We ca still se a regressi f Y t predict the tcme f a particlar measred eve thgh the estimated cefficiet is a biased estimatr fr the effect f. That is becase we have crrectly estimated the relatiship betwee the isy ad Y. We wld t get reliable estimates if r predicti qesti relied the tre rather tha the isy. We fte bild mdels with isy data r prxy variables t get predictis f ather variable. The biggest qesti i frecastig is exteral validity: des the mdel that applies t the sample y sed fr estimati als apply t the bservati fr which y wat a frecast? Measrig predicti errr: What is the variace f Y ˆ? Yˆ =β ˆ +β ˆ 0 1 Y =β 0 +β 1 + ( ˆ ) Y Yˆ =β β ˆ + β β + 0 0 1 1 ( Yˆ) = E( Y Yˆ) = ( β ˆ 0) + ( β ˆ 1) + ( βˆ ˆ 0 β 1) + ( ) var var var cv, var. Fr simple regressi der hmskedasticity, cv 1 i 1 i i i= 1 i= 1 i σ i= 1 i= 1 i i i i= 1 i= 1 i i σ i= 1 i= 1. ( ˆ ) ( ) β =σ =σ i = = ( i ) i ~ 67 ~

S + i ( ˆ var Y ) =σ 1+ ( i ) + + =σ 1+ ( i ) ( i ) + ( ) =σ 1+ ( i ) 1 ( ) =σ 1 + +. ( i ) Predicti errr is smaller fr: Smaller errr variace i Larger sample size (thrgh bth secd ad third terms) Greater sample variati i Observatis clser () t the mea ~ 68 ~