Section 15 Advanced Topics

Size: px

Start display at page:

Download "Section 15 Advanced Topics"

Arron Fox
5 years ago
Views:

1 Sectin 15 Advanced Tpics Specificatin searches Experiments vs. nn-experiments If we can d randm cntrlled experiments, then we dn t need t wrry abut mitted variables bias because the regressr f interest (treatment effect) is randm and uncrrelated with everything that might be mitted. Cntrlled experiments are becming mre cmmn in ecnmics Develpment prjects may chse villages as treatment r cntrl villages Plicies can smetimes be separated randmly int treatment r cntrl grups Is it ethical t withhld treatment if we knw that it is likely t be beneficial? Of curse, experimental ecnmics has lng put experimental subjects int cntrlled settings randmly. Mst ften, we must use the fallen fruit f natural experiments r bservatinal data Examples: State plicy differences such as the seat-belt law regressins we lked at earlier in the semester Crss-cuntry grwth regressins in which cuntries differ in variables such as initial per-capita incme that are suppsed t affect grwth In these cases, we must wrry abut selectin and mitted-variable bias Can we cntrl fr the ther variables that are crrelated with selectin int the treatment grup (r with the regressr f interest)? If nt, ur results are biased Idealized ecnmetric prject Thery tells us exactly which variables shuld be in the regressin as cntrls All regressrs are measured accurately We knw abut any endgeneity issues and can deal with them using instrumental variables We knw the apprpriate structure f the errr term In this case, we need nly d ne regressin t cmplete the prject Nne f these cnditins is ever fully realized That s why we have tests fr the varius regressin pathlgies ~ 170 ~

2 That s (ne reasn) why we have tests fr significance f regressrs That s why we lk at ur residuals fr clues That s why we usually try linear and lg-based mdels That s why we have t experiment with different lag lengths In real research, we must deal with what Leamer calls misspecificatin errr which, like sampling errr, generally causes ur results t be imprecise Cnsider the regressins that yu ran with the 254,654 Census bservatins n further fertility f mthers with tw children. Hw much sampling errr is there when N = 254,654? If all f ur assumptins are crrect, then ur estimates cnverge with the square rt f N, s the standard errrs with this sample are divided by 500! What are the likely relative magnitudes f sample errr and misspecificatin errr in this exercise? Null hyptheses and maintained hyptheses In any statistical test, we make lts f assumptins Sme f the assumptins are givens such as functinal frm, structure f the errr term, IID (r ther assumed nature) f the sample, etc. These are the maintained hyptheses that are assumed t be true in the test. We usually assume that we have made n misspecificatin errrs as a maintained hypthesis. Sme f the assumptins are tested. These are the null hypthesis. We are nt sure that these are true; in fact, we usually expect t disprve the null hypthesis. What des a hypthesis test d? It measures the likelihd that such an extreme vilatin wuld ccur if bth the null hypthesis and the maintained hyptheses are true. Hwever, we interpret evidence against this jint set f assumptins as invalidating the null hypthesis, nt the maintained hypthesis. In fact, what we have fund is evidence that the wrld is nt as the null and maintained hyptheses assume it is. This culd be due t the null hypthesis being false with the maintained hyptheses true (which is what we always assume) Or it culd be that the maintained hypthesis (r ne part f it) is false and the null hypthesis is true (which is the essence f an invalid test: we have made incrrect assumptins underlying the test) Or bth culd be false (an invalid test that gives the right answer) ~ 171 ~

3 By separating the assumptins int null and maintained classes, we artificially define which nes we are ging t blame fr any failure f the data t cnfrm t the cllective set f assumptins. If we d this wrng, then we bviusly can draw incrrect cnclusins. Leamer n functinal frm With a high-enugh rder plynmial, we can exactly fit the data! S what is the right functin? A, B, r C? We can nly answer that questin by applying ur judgment. All ecnmetric analysis is a cmbinatin f calculatin and interpretatin: this is unavidable! Mre s in ecnmics than in hard sciences? Leamer, page 36, 37: ~ 172 ~

4 ~ 173 ~

5 Leamer is a prpnent f Bayesian ecnmetrics, which we may study next week if there is interest. In Bayesian mdel, ne specifies a prir distributin fr the parameter befre beginning the analysis Then the evidence frm the data is cmbined with the prir t calculate a psterir distributin Criticized because yu can get nearly any psterir distributin by varying yur prir. Shuldn t yur results reflect the evidence frm the data and nt yur pinins? Leamer s pint exactly! Cnventinally reprted results reflect yur pinin as much as a Bayesian psterir, but yu haven t reprted hw yur pinin cnditined yur results. Hw d we slve this prblem? Page 38: Angrist & Pischke and Leamer s respnse Credibility revlutin in ecnmetrics? Better and mre data Fewer distractins Functinal frm and GLS methds ften dn t matter Better research design Quasi-experimental methds: IV, D-in-D, RD Randmized trials: STAR and develpment studies Mre transparent discussin f research design Extreme bunds analysis Has nt caught n much ~ 174 ~

Leamer s pet: test all pssible specificatins and lk at the nes that are least favrable t yur preferred hypthesis Leamer: Tantalus n the rad t Asymptpia

Sensitivity analysis is crucial: shw the mapping frm assumptins t cnclusins!

effect f ur variable f interest n the dependent variable (needing interactin terms). Leaving these ut is prblematic even in truly randmized experiments.

6 Leamer s pet: test all pssible specificatins and lk at the nes that are least favrable t yur preferred hypthesis Leamer: Tantalus n the rad t Asymptpia Ecnmetricians are still t ptimistic abut hw much they knw. Rbust standard errrs are nt a panacea because we still have inefficient estimatrs. Sensitivity analysis is crucial: shw the mapping frm assumptins t cnclusins! Experiments may be prblematic in small samples if we dn t bserve and cntrl fr all the cnfunding variables: Interactive cnfunders are variables that affect the effect f ur variable f interest n the dependent variable (needing interactin terms). Leaving these ut is prblematic even in truly randmized experiments. D data-generating prcesses really exist? Are they stable? Are peple ratinal enugh t yield predictable ecnmetric relatinships? Mdern cmputers and sftware have made the actual cmputatins f ecnmetrics trivially easy, but the thinking part is just as hard as ever. ~ 175 ~

7 This is nt ging t change! Yur generatin f ecnmetricians will face ever greater temptatin t push the buttn and get results withut thinking abut the crrect underlying assumptins, then publish them if they lk nice. Lvell s data mining experiment Stepwise regressin Hw des it wrk Frmally available in Stata Infrmally practiced by many ecnmetricians Lvell s Mnte Carl experiment Create a large set f rthgnal regressrs that are unrelated t the dependent variable Regress dependent variable n c f these and chse the best 2 t statistics Table 1 shws results Hw ften wuld be expect k independent tests t turn up n significant t statistic at the level? (1 ) k Lvell s rule f thumb test statistic: chsing best k ut f c candidate regressrs c give a true significance level f 1 1 ˆ / k, where ˆ is the putative significance level. Secnd experiment uses macrecnmic regressrs that are crrelated (and nnstatinary) and shws that it is almst always pssible t find regressrs with significant t statistics even when the dependent variable is rthgnal. (There is sme spurius regressin effect here als because his variables are nnstatinary.) Publicatin bias If yu lk at ecnmetric papers published in jurnals, mst null hyptheses are rejected. The papers published that accept the central null hypthesis tend t fail t reject hyptheses that are widely believed t be false. Are all ecnmic hyptheses false? De Lng and Lang build simple mdel t test: Size f test = = 0.05 = Prreject true H. 0 ~ 176 ~

8 Pwer f test = q Praccept H false Suppse that the true prprtin f true null hyptheses is 0 Fail t reject Reject Ttal H 0 true H 0 false (1 q)(1 ) q (1 ) 1 Ttal 1qq0.05 q 0.05 q Let a be a test statistic and let f (a) be its marginal significance level (p value) Under the null hypthesis, ~ 0,1 f a U Pr f a p 1 p Under alternative hypthesis, f (a) fllws sme unknwn distributin G s that f a p G p We assume G p Pr p. Share f test statistics that have p value less than r equal t p (= share f rejected nulls at significance level p) shuld be 1p1Gp1Gp Pr f a p 1 p 1 1 G p f a p G p 1 p 1Gp f a p. 1 p Pr 1 Pr This gives an upper bund fr. Fr example, if = 1/2, then at least a Pr f : at least 10% f actual p values shuld be in the range (0.80, 1.00). ~ 177 ~

Simulatin, Mnte Carl, and btstrap methds In a few special cases, with apprpriate assumptins, we knw that actual distributins f the test

9 These are pint estimates. They can reject the null hypthesis that = 1/3 against the alternative that it is < 1/3. Thus, they are quite cnfident that frm the evidence f the literature, < 1/3. Why? They think prbably publicatin bias. Simulatin, Mnte Carl, and btstrap methds In a few special cases, with apprpriate assumptins, we knw that actual distributins f the test statistics that we use. In sme additinal cases, we can apprximate the asympttic distributins f these statistics. Hwever, mst f the time these assumptin are prbably dubius. ~ 178 ~

10 Hw much f a prblem is this? Are the actual expected values and (especially) standard errrs f the distributins that different? We can use simulatin t determine the prperties f ur standard test statistics under the null hypthesis when the assumptins we usually make fail t hld. Simulatin methds Mnte Carl analysis is the simulatin f the behavir f estimatrs under cntrlled cnditins that may deviate frm the standard assumptins under which it is used. Btstrap methds apply simulatin t a specific sample f data, re-running a regressin many times with either parametric r nn-parametric errr terms t estimate the standard deviatin f the test statistic under H 0 (rather than using the cnventinal standard errr as an estimate). Generating data fr simulatins Can use actual variables (as Lvell did in his secnd data-mining experiment with macr variables) r can generate them randmly Errr terms are always generated randmly. Randm-number generatrs N cmputer-generated sequence f numbers is truly randm. The way these generatrs wrk is t begin with a seed, then generate new numbers in the sequence based n calculatins such as remainders f divisin by large prime numbers. Same seed implies same sequence f numbers, s if yu want t cntrl the prcess (especially during debugging) yu can get the same sequence again. Default seed is usually taken frm the secnds f the cmputer clck r smething like that: will nt be the same n repeated executin. In Stata: runifrm (r unifrm) draws a randm number frm (0, 1). rnrmal(mean, std) draws frm the nrmal distributin with mean and standard deviatin given. Can generate nrmal variate as invnrm(runifrm()) We generate randm set f e* and use them t cmpute y* under the null hypthesis abut given the values f x, which may be set t sample values, generated randmly, r smething else. Implementing Mnte Carl Create repeated samples (hw many? 1000? 10000? ?) f e* and y*. Fr each sample, calculate the test statistic f interest: ˆ, se ˆ, t ˆ else. Accumulate the estimates in a new data set. r anything ~ 179 ~

11 Examine the prperties f the estimates: Mean t assess bias Standard deviatin t cmpare t estimated standard errr Quantiles t assess critical values r estimate p values fr yur estimates Btstrap standard errrs If the assumptins f OLS are nt valid fr yur sample, yu can estimate the standard errrs f yur OLS estimates by using a btstrap technique Use yur actual x variables, sample size, etc. Generate a sample f e* errr terms Can use a nrmal distributin based n the SEE as estimate f standard deviatin Can use re-sampling, assigning randm u ˆi values t bservatin j. Calculate sample f y* values. Run regressin f y* n actual x values Save estimated cefficients ˆ k fr the kth replicatin Repeat K times with different randmly generated errr terms Examine the distributin f ˆ by calculating the standard deviatin: this is the btstrap standard errr. Lk at the 2.5 th and 97.5 th percentiles f the distributin: these are the critical values fr a tw-tailed 5% hypthesis test. Mnte Carl demnstratin f Granger-Newbld spurius regressin result D-file spurius.d: prgram spurius drp _all set bs 100 g id=_n tsset id * Generate y variable as randm walk g e=rnrmal(0,1) g y=e if id==1 replace y=l.y+e if id>1 * Generate x variable as independent randm walk g a=rnrmal(0,1) g x=a if id==1 ~ 180 ~

12 replace x=l.x+a if id>1 * Run regressin f y n x reg y x end Shw single replicatin What can be retrieved? ereturn list shws available results Cmmand t invke simulatin: simulate b=_b[x] se=_se[x] r2=e(r2), reps(1000) : spurius Creates data set with 1000 bservatins with variables b, se, r2 Can nw use summarize, centile, and histgram t lk at behavir f estimates. Cntrast dspurius with spurius t see effect f regressin n integrated variables. Methds fr cping with missing data Missing data prblems are very cmmn in ecnmetrics. In surveys, sme peple mit questins r have undecipherable respnses. In lngitudinal surveys, attritin usually ccurs In databases cmpiled fr ther purpses, they ften dn t care if sme variables are missing: Reed database missing class ranks, high-schl grades, etc. In macr data, smetimes there is a change in hw the series is defined Nt exactly missing data, but hw d yu splice the tw series? Best way t splice is t regress verlapping bservatins and use fitted values fr shrter series. If nt enugh verlapping bservatins t run regressin, then can use pivt bservatin t jin series Example: price index changing base years. If value in 2002 is 125 in 1990 dllars and 95 in 2005 dllars, then yu can multiply all f the 1990-dllar bservatins by 95/125 t cnvert t 2005 dllars. Key questin that infrms missing-data prblem: Why are the data missing? Missing cmpletely at randm (MCAR): Prbability that the bservatin/variable is missing is unrelated t any variable in the analysis. Missing at randm (MAR): Prbability that the bservatin/variable is missing is unrelated t the missing variable, but may be related t ther, bserved variables. ~ 181 ~

13 Nt missing at randm (NMAR): Prbability that the bservatin/variable is missing depends n the true value f that variable. Methds f dealing with missing data Cmplete-case analysis This is the default: Stata will simply delete any bservatins fr which ne r mre variables in the mdel are missing. We lse infrmatin by ding this. Example: Suppse that we are missing ne bservatin n x ut f ten and that the cefficients based n the ther nine bservatins are y = x. The missing bservatin has a y value f 25. By mitting this bservatin, we are implicitly assuming that the x value is 2, s that it will nt have a residual and nt add t the regressin. If the univariate distributin f x in the rest f the sample is such that a value f 2 seems highly unlikely, then we are almst surely missing imprtant infrmatin abut the relatinship by ignring this bservatin. Cmplete-case analysis des nt lead t bias if missingness des nt depend n y. (This is standard sample-selectin prblem that we have dealt with befre.) Available-case analysis Regressin cefficients and standard errrs depend nly n the sample variances and cvariances f the variables. Even if y is missing fr an bservatin, if x 1 and x 2 are available, we can use that bservatin t cntribute t the estimate f the variances f the x variables and t their cvariance. This seems t use additinal infrmatin, but has ther prblems and it rarely used. Because it uses different grups f bservatins, there is n guarantee that XX has an inverse, s it may even be impssible t calculate OLS estimate. Dummy-variable methds Yi 0 1X1 i 2X2i ui X 1 is cmplete; X 2 has sme missing data. 1 if X2 is missing, Let M i 0 therwise. Let X 0 2i X2i if Mi 0, 0 if M i 1. 0 Yi0 0 1X1i 2X2i M i ui is biased fr 1. 1 picks up the effect f unbserved variatin in X 2. ~ 182 ~

14 0 Yi 0 0M i 1X1i 1X1iM i 2X2i ui is unbiased, but is difficult t implement unless pattern f missingness is blck-style. Imputatin methds If there is an irregular pattern in which several variables have missing bservatins scattered thrugh the sample (and the same bservatins d nt tend t be missing fr all variables), then we have sme infrmatin abut the bservatins fr which a particular variable is missing based n the bserved values f ther variables. Imputatin methds use the values f the ther variables (and the pattern f cvariance between the bserved and missing variables fr the part f the sample fr which bth are bserved) t impute estimates f the missing values. Uncnditinal imputatin replaces missing values by the means f the variables. This leads t bias in the cefficients because the ther variables that are crrelated with the missing ne have t carry extra weight in predicting y fr thse bservatins in which the missing X is set t its mean. Cnditinal imputatin based n ther X variables Use cmplete cases t estimate X2i 0 1X1 i vi. Culd use LDV mdel if apprpriate. Calculate single imputed values fr missing bservatins as X ˆ ˆ 2i 0 1X. 1i Use full sample t estimate Yi 0 1X1i 2X 2i ui. This prcedure is cnsistent if data are MCAR. Standard errrs are prblematic because we dn t take accunt f the imputed nature f the data and the errr in measurement that results. Cnditinal imputatin based n ther X variables and y Can include y in the imputatin regressin. Imprves quality f imputatin if missing X is highly crrelated with y. Leads t bias in OLS regressins f filled-in mdel. Multiple imputatin with cmbined equatins (MICE) Instead f replacing missing bservatin with single cnditinal expectatin based n imputatin regressin, we cnstruct multiple samples with stchastic imputatins: Expected value f missing X plus randm draw frm the errr term f the imputatin regressin, which includes bth X variables and y. ~ 183 ~

15 Use cmplete cases t estimate X2i 0 1X1 i 2 Yi vi. Nte that we can (and must) include y here when we are using randm draws frm the distributin rather than expected values. Can use LDV methds if the missing variable is a dummy, rdered, censred, etc. Calculate m randm imputed samples using X ˆ ˆ ˆ, 2ij 0 1X1 i 2Yi vij where v ij is a randm draw frm the estimated distributin f v, usually nrmal with zer mean and variance equal t the estimated variance f v based n residuals. Fr each sample j, run the regressin using imputed values: Yi 0 1X1i 2X 2i ui and get the estimates ˆij and the squared standard errrs var ˆ ij. Cmbine the results f the m regressins as fllws: m 1 ˆ ˆ i m j 1 ij m m 1 1 var var ˆ ˆ ˆ ˆ 2 i ij ij i m j1 m1 j1 The parameter estimate is just the mean f the estimates fr the m imputed samples. The variance is the mean f the estimated variances in the m samples, plus the estimated variance f the parameter estimate acrss the samples. This last term crrects the standard errr fr the imputatin prcess, adding variance t accunt fr the fact that the m imputatins d nt all lead t the same answer. Because a highly uncertain imputatin prcess is likely t lead t wide variatin in ˆij acrss samples, this crrectin t the variance will be high when the imputatin prcess is imprecise. Stata 11+ has an implementatin f MI mdels with a dashbard t cntrl imputatin regressins (which can be OLS, prbit, tbit, rdered prbit, etc.) and the cmbined regressin using the multiple imputatins. MICE wrks with MCAR f MAR data. Can als use ML methds t estimate missing-data mdels (nt ging t talk abut) ~ 184 ~

16 Mdels with varying parameters We have talked a lt abut Assumptin #0: The same mdel applies t all bservatins. What if this is false and the mdel changes frm ne set f bservatins (either ver time r crss-sectinally) t anther? We can mdel this by allwing sme parameters f the mdel t vary acrss bservatins. We have cnsiderable experience with simple, deterministic frms f varying parameters: Dummy variables allw the cnstant term t differ fr the set f bservatins fr which the dummy is turned n. Interactin terms allw the effect f ne variable t depend n the magnitude f anther (where ne r bth may be dummies). Splitting samples at recgnized breakpints is anther strategy. We nw cnsider mdels in which the variatin in the parameters is at least partially randm, especially ver time. Statinary randm parameter mdels Yt txt ut, t Zt vt. Yt Xt XtZt wt, Substituting yields w u v X. t t t t Our usual assumptins are that u and v are classical errr terms that are uncrrelated with ne anther. In that case, var wt u Xt v and w is nt serially crrelated unless u r v is. This mdel is heterskedastic with variance a prprtinal t 1 + x 2, where 2 v. 2 u Hw t estimate? Culd use OLS with rbust standard errrs (did nt exist when Maddala wrte his bk). Maddala suggests ML with Y X X Z 2 n n n t t t t ln LK ln u ln1 Xt i1 2, with K u i1 1 Xt an irrelevant cnstant. Can d this with a tw-step prcedure: Fr given, the and that maximize L are the WLS estimatrs calculated by applying OLS t Yt Xt ZtXt X 1X 1X t t t ~ 185 ~

17 Search ver t find the value that yields the highest L with () and () calculated by WLS/OLS. Switching regressins: tw (r mre) regimes with different parameters We cnsidered the simple case f this with the Quandt likelihd-rati (QLR) test when we talked abut nnstatinarity due t breaks in S&W s Chapter 14. The QLR test statistic is the maximum f the Chw-test F statistic cnsidered ver pssible breakpints within the middle 70% (r s) f the sample. S&W s Table 14.6 gives the critical values fr the QLR test statistic, which des nt fllw a standard parametric distributin. Mre interesting case is where mdel can switch back and frth depending n values f ther variables. Example: is ecnmic respnse t il-price increases different than ilprice increases? One set f parameters when P O is psitive and a different set when it is negative. This is simple case because there are n unknwn parameters in the switching rule. Mre interesting case is where the switching rule invlves unknwn parameters. Suppse that the parameters are in regime 1 ( Yt 1 1Xt ut ) when 1Z1 kzk c and in regime 2 ( Yt 2 2Xt ut ) when > c. Errr term may als differ between regimes. Can estimate by ML, which is kind f like a regressin (t determine the and parameters) cmbined with a prbit (t determine which regime gverns each bservatin) Anther mdel f interest is the single-breakpint mdel cnstraining the functin t be cntinuus ver time. Example: fitting a trend line t the lg f a variable and allwing the trend grwth rate t change at sme date withut allwing the functin t jump at that date. Let n 0 be the breakpint in the sample, s that Yt 1 1Xt ut fr 1 t n 0 ; Yt 2 2Xt ut fr n 0 < t N. Bth regressin lines must g thrugh the pint n 0, s we must impse the restrictin X X n the estimatin. This is a simple 1 1 n0 2 2 n0 linear restrictin that can be impsed in OLS by the usual means. Adaptive regressin: cnstant term is a randm walk. This mdel was develped befre the thery f integrated prcesses was well understd. ~ 186 ~

18 The mdel that they prpse has issues with an integrated errr term (and dependent variable) that are better handled with differencing and (smetimes) cintegratin methds. Can lk at the mre interesting mdel where slpe is a randm walk as well. Cannt estimate all t parameters t. Can estimate ne f them: suggestin is t estimate the last ne (r ne after last) Fr varying cnstant term: t t 1 vt. Let Y X u t T t T t t T X u v. T t t i i t 1 When we write the mdel in terms f T, bservatin T 1 has additinal variance relative t T because f change in frm T 1 t T. Observatin T 2 has yet mre variance because the is tw changes away frm T. Thus, we end up with a WLS estimatr that weights the mst recent bservatins mst heavily and earlier bservatins less. There will als be crrelatin between the cmpsite errr terms because f the accumulatin f the parameter changes. This is an intuitively attractive idea fr regressins that yu are using fr frecasting but dn t knw if the parameters are stable ver time. Mst recent bservatins are the mst relevant fr the frecast, s we weight them the mst heavily. Observatins in the distant past are nt ttally irrelevant, but are less imprtant s we include them with lwer weights. Anther class f mdels is panel-data mdels in which each crss-sectinal unit has a different parameter value: If the varying parameter is the intercept term and the variatin is deterministic, then this is the fixed-effects mdel. If the varying parameter is the intercept term and the variatin is randm, this is the randm-effects mdel. If the varying parameter is a slpe cefficient and the variatin is deterministic, then this is a variatin n fixed effects where the unit dummies interact with the variable whse cefficient is changing. We lse a lt f degrees f freedm in this mdel. In the limiting case f all cefficients varying deterministically acrss units, we are just ding separate time-series regressins fr each unit. If the varying parameter is a slpe cefficient and variatin is randm, then we have a variant f the randm effects mdel in which the variance f the unitspecific errr cmpnent fr each unit depends n the values f x fr that unit. ~ 187 ~

19 When t use varying-parameter mdels? Can almst always justify it. What d we really gain frm mdeling the variatin in the cefficients rather than putting in the errr term? If variatin is systematic, then we have a better understanding f hw the effect f x n y depends n Z. This is the essence f interactin terms and we knw that they can be very useful. If variatin is randm, then we may nt gain t much, althugh adaptive regressin mdel is appealing and if there are large variatins in x, then we might want t take it int accunt if the cefficient f x varies randmly. Duratin and hazard-rate mdels We have encuntered duratin prblems befre: when we cnsidered the censred distributin f unemplyment spells in a sample where sme are nging, fr example. In these mdels, the fcus was n what ther variables determine the length (duratin) f the spell. The frmal analysis f hazard (r survival) mdels fcuses nt nly n the effects f ther variables, but n mdeling the prbability that a spell will end as a functin f its current length. Des it becme mre r less likely that smething will happen when it has nt happened fr a lng time? Earthquakes, divrce, end f a strike, success in jb search, survival after events are examples. In hazard analysis, we think f a duratin event as a sequence f pprtunities t end, with a certain prbability (hazard) f ending at each time that may depend n ther variables and n the current duratin f the event. Let T be the spell length variable with density f (t). (Nrmal distributin nt apprpriate because T must be nn-negative.) t is the prbability that the spell is n lnger than t. F t f sds PrT t 0 is the survival functin: the prbability that a spell is at least length t. The hazard rate is defined as the prbability that the spell ends nw cnditinal n the fact that it has lasted this lng: Pr t T T t T t F t t F t f t f t t lim lim. t 0 t t 0 t S t S t 1 F t St 1 F t PrT t Nte similarity t inverse Mills rati ~ 188 ~

20 dlns t t because f (t) is S (t) dt The integrated hazard functin is t t St e t lnst t 0 s ds. All f these functins can (bviusly) be derived frm ne anther, s f, F, S,, and are all equivalent ways t characterize the hazard behavir f the mdel as a functin f current duratin t. Mdeling the hazard rate: Cnstant hazard rate t, ln S t kt, t t S t Ke e because S 0 1 With cnstant hazard rate, Et1/, s MLE f is 1/ t Psitive r negative duratin dependence Greene s T25.8 and F25.2 shw several cmmn chices fr nncnstant functins ~ 189 ~

21 Weibull is a cmmn ne because depending n the parameter p it can be increasing r decreasing with t. Estimatin f survival mdels We estimate these mdels by ML: ln L ln f t ln S t uncensred bservatins censred bservatins t St ln ln uncensred bservatins all bservatins Including exgenus variables We usually want ther variables t cnditin the survival/hazard functins X i One cmmn mdel: i e replacing cnstant in Weibull functin (r expnential functin) Nte that x must be cnstant ver the spell (such as persnal characteristics) r mdel becme mre cmplex. (Yu wuld need t knw x thrugh entire spell in rder t mdel different hazards at different mments during spell.) Nnparametric mdels What d we mean by nnparametric? N assumptin f a specific functinal frm r prbability distributin ~ 190 ~

22 In case f hazard mdels, we use the analg f a frequency distributin: What share f spells that lasted tw weeks ended in the third week? What share f spells that lasted three weeks ended in the furth week? Etc. Plt these as a functin f duratin t get empirical hazard functin Advantages: n distributinal assumptin, can mdel unusual shapes Disadvantages: des nt invke smthness assumptins that may be apprpriate, difficult t mdel effects f ther variables Quantile regressin We get s used t the basic idea f traditinal regressin analysis that we smetimes frget imprtant details abut what we are ding. Standard regressin estimates the cnditinal mean f y as a functin f x. What abut ther prperties f the cnditinal distributin f y? We smetimes talk abut the estimated cnditinal standard deviatin (SEE), but rarely abut any ther attributes f the distributin. If y fllws a nrmal distributin, then we can calculate the whle distributin frm the mean and standard deviatin. If y is nt nrmal, then we generally dn t knw all the details f the distributin. There may be much mre useful infrmatin embdied in the cnditinal distributin than just the mean. Cnsider Figure 1 frm Kenker & Hallck: Prvides: quartiles, range, median, arithmetic and gemetric means f CEO cmpensatin fr each decile f firm size. What wuld regressin give us? Equivalent f a line cnnecting the means (either arithmetic r gemetric if we used a lg functin) This is an example f the kind f expanded view f the cnditinal distributin that we can get frm quantile regressin, which lks at hw the quantiles f the distributin f the dependent variable depend n the regressr. ~ 191 ~

23 Mments and quantiles as minimizatin prblems: The uncnditinal mean is the value f that minimizes y 2 Uncnditinal median is the value f m that minimizes n i 1 n i 1 y i i m i, where Uncnditinal th quantile is the value f that minimizes y x if x 0, is the tilted abslute value functin x ( 1) x if x 0. Generalizing t the cnditin regressin situatin: In standard parametric regressin, we let depend n x In quantile regressin, we let the th quantile be a functin f x: n yi xi which fr the linear case is min yi xi min,, i 1 n i 1 n i 1. Because the functin is nn-differentiable, we can t use basic calculus methds t slve this, but we can use methds develped fr linear prgramming mdels t find minimum pretty efficiently. Sample utput f quantile regressin mdel There will be a separate regressin fr each quantile that we are interested in. Figure 3 frm Kenker & Hallck: ~ 192 ~

Fd expenditure as a functin f incme OLS regressin gives us dashed line Cnditin median f distributin f fd expenditure as linear functin f incme is bld line. Other lines are 0.05, 0.1, 0.25, 0.75, 0.

24 Fd expenditure as a functin f incme OLS regressin gives us dashed line Cnditin median f distributin f fd expenditure as linear functin f incme is bld line. Other lines are 0.05, 0.1, 0.25, 0.75, 0.9, 0.95 quantiles f distributin as linear functins f incme. Under standard OLS regressin, the distributin f fd expenditure cnditinal n incme wuld be assumed t be nrmal with mean given by the dashed line and cnstant variance given by SEE 2. (Regressing in lg terms wuld allw variance t be prprtinal t x.) A multivariate example: Figure 4 shws baby birth weight as a functin f mther s variables: Nte that each variable can have distinct pattern f effects n different quantiles f the distributin. Fr example, by babies tend t be larger, but especially at the tp end f the distributin. That suggests that the difference is driven mre by really big bys than by really little girls. Can t get that nuance ut f an OLS regressin. High-schl graduatin has acrss-the-bard effect n all parts f the distributin. ~ 193 ~

25 Nte effect f cllege graduates: Much less likely t have a very small baby (strng effect at lw quantiles) but nt much mre likely t have a very large baby (little effect at upper quantiles) In Stata: qreg dvar indvars, quantile(0.5) will d 0.5 quantile. ~ 194 ~

26 Example: Reed GPA as dependent variable Reed qreg.dta Shw reg uggpa irdr satm100 satv100 hsgpa female if humfresh Shw qreg with quantile(0.5) fr MAD regressin estimatr Ask: Which f these variables wuld yu expect t impact the tp end f the grade distributin mre r less than the bttm? Shw reedqregs.dc fr results f varius quantiles Shw Reed qregs.xlsx fr diagram (ld versin) f effects f irdr n quantiles f grades Regressin discntinuity mdels Identificatin is always difficult: endgeneity is always a threat and instruments are rare Randmized experiments are prbably the best way t avid endgeneity, but are nt always feasible Smetimes we can find natural experiments that allw us t effectively cntrl fr the things we cannt measure Example: we can t cntrl peple s genetic structure (yet) but we can examine identical twins The case that is examined in van der Klaauw (2002): Effect f increased financial aid n prbability f enrllment. Intuitin says that mre aid (r lwer cst, in general) higher likelihd f enrllment, ther things held cnstant If we culd measure all f the factrs that g int the enrllment decisin, we culd estimate this directly This wuld include the cmplete set f clleges t which the student was admitted and the amunt f aid/cst at each It wuld als include all f the relevant characteristics f the student, bth bjective (test scres, high-schl grades), subjective (essays, interviews, recmmendatins), and preferential (student s preferences abut lcatin and characteristics f schl, experience during campus visit, etc.) Obviusly, we can never measure all f these, s they g int the errr term. If they are crrelated with the amunt f aid, then the estimated effect f aid will be biased. Why wuld these be crrelated with aid? Unmeasured factrs wuld increase aid at Cllege X, but prbably als increase aid elsewhere Thse wh are ffered high aid packages at X may be less likely t cme unless we cntrl fr the unbservable factrs that affect ~ 195 ~

27 aid at X (in the equatin) and aid elsewhere (in the errr term), which lead t crrelatin between aid and the errr Effect f aid is likely biased dwnward because f this Culd we randmize? Wuld any schl be willing t increase aid fr a randm selectin f students t see which nes cme? (Perhaps nt) The idea f RD is that we smetimes have arbitrary, discrete threshlds where peple wh are nearly identical but n ppsite sides f the threshld are treated differently. This allws us t estimate a treatment effect by cmparing the nearly identical peple n the tw sides f the line; we can think f these as natural experiments Examples: Cutff birthdays fr schl attendance: September 2 babies are a year lder when they start schl than August 31 babies: des this affect their utcmes? Laws smetimes have arbitrary cutff pints: If unemplyment insurance lasts 26 weeks, is the likelihd f an unemplyed wrker taking a jb higher in the 27 th week than the 26 th week? van der Klaauw: Cllege X has arbitrary threshlds fr awarding discrete levels f aid: are students just abve the threshld (wh get mre aid) mre likely t attend than nearly identical students just belw the threshld? Classic RD design is illustrated by the paper s Figure 1: ~ 196 ~

28 The treatment here depends sharply n the value f the fully bservable selectin variable S: peple abve S are in the treatment grup and peple belw in the cntrl grup. The gap at the selectin value S is the effect f crssing the threshld, which culd be an unbiased measure f the effect f the treatment variable. Ecnmetrically, we estimate the tw relatinships n bth sides (which may r may nt have the same slpe) and then estimate the treatment effect as lim E y S lim E y S SS SS We can estimate this in the simplest case as van der Klaauw s equatin (8): yi Ti k Si i, where T is a treatment dummy and k(s) is the general relatinship between y and S ignring the treatment (which culd be a linear r nnlinear functin, shwn as linear here). If the relatinship between T and S is nt sharp, then there may be sme peple clse t S wh are put int the wrng categry. This is the fuzzy RD design illustrated by the selectin criteria in Figure 2 f the paper: ~ 197 ~

29 In this case, we have t use the predicted T rather than the actual T and ur identificatin f becmes: Aid ffers depend n the threshlds, but als n need fr filers wh applied fr need-based aid: ~ 198 ~

30 Clearly the threshlds are imprtant determinants f the amunt f aid ffered, especially fr nn-filers, s this suggests that RD at the threshlds might be a useful way t identify the effect f aid n enrllment. Fr filers, estimated relatinship between S and enrllment prbability is shwn in Figure 7: Nte general dwnward slpe f the relatinship between S and prbability f enrllment: Why? Nte jumps at the threshld levels: these are the measured effects f the discrete change in aid assciated with crssing a threshld: students n ppsite sides f the line are very similar except that ne grup gets mre aid than the ther. Result is a statistically strng and ecnmically significant effect f aid n enrllment: The elasticity fr filers is 0.86, which is larger than mst estimates in the literature btained by traditinal means (thugh nt f thse estimates that have access t additinal data such as cmpeting aid ffers). The elasticity fr nn-filers is nly 0.13, which is cnsistent with ur expectatin (and ther evidence). ~ 199 ~

31 Sectin 16 Empirical Research Prjects Starting pint: Questin and data Starting pint always must be What questin am I trying t answer? Fr thesis: smething yu can be interested in fr a whle year Smething that can be answered Secnd cnsideratin: What data are available t help me find the answer? Macr data Micr data frm existing surveys Cllecting yur wn data frm surveys Methds Once yu have the questin and the data, yu can carefully cnsider what methd yu shuld use Nature f dependent variable: cntinuus, limited? Might need t cnsider LDV mdels What explanatry variables can yu measure (and what is mitted)? Are there endgeneity cncerns? If yes, are apprpriate instruments available t allw IV estimatin? Are there ther cncerns abut the errr term? Heterskedasticity? Autcrrelatin? Are yur data time series, crss sectin, pled, r panel? Apprpriate mdels fr each, including statinarity cncerns What is the apprpriate specificatin? Functinal frm Scaling and/r differencing t make the variables cmparable Estimatin, diagnstic testing, re-estimatin What did yu learn frm the first regressin? Are there issues in the residuals r diagnstics based n the cefficients r residuals that suggest that yur assumptins are incrrect? Lk fr utliers and cnsider why they d nt fit (Errrs in data) Can yu test the underlying assumptins frmally? Are they OK? ~ 200 ~

32 Writing the paper Intrductin What is the questin? Hw d yu g abut answering it? What d yu cnclude? Thery sectin What des ecnmic thery tell us abut the questin? What variables shuld be in the regressin? What cnsideratins des thery suggest abut functinal frm (e.g., CRTS)? Literature review May cme befre thery sectin Wh else has explred this questin and what did they find? Methds and data sectin What estimatin methds and tests are yu prpsing t use? Why are these methds apprpriate? What data d yu have (and nt have)? What issues f measurement might be imprtant? Results sectin Regressin tables with basic descriptin f results Text must read as a narrative, referring t tables but nt relying n them t tell the stry. Analysis/interpretatin/discussin sectin What d the results mean? Are there simulated experiments using yur mdel that wuld help the reader understand yur results? Hw strng are the results? Issues f internal and external validity: is it safe t draw cnclusins based n yur results? Cnclusin What d yu cnclude frm yur analysis? What additinal wrk remains t be dne in future research? ~ 201 ~

Section 13 Advanced Topics

Section 13 Advanced Topics Sectin 13 Advanced Tpics Specificatin searches Experiments vs. nn-experiments If we can d randm cntrlled experiments, then we dn t need t wrry abut mitted variables bias because the regressr f interest