Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35
What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate regression 3 Hypothesis testing 4 Nonlinear regression functions 5 Internal and external validity 3 Extensions 1 Panel data 2 Binary dependent variable 3 Instrumental variables 4 Natural experiments () Rewrap ECON 4135 November 18, 2011 2 / 35
What is econometrics? 1 may quantify relationships that are usually only signed by theory 2 may falsify a theoretical model 3 may identify the structural parameters of a theoretical model () Rewrap ECON 4135 November 18, 2011 3 / 35
Bivariate regression The basic OLS estimator minimizes the sum of squared errors (yi ȳ) (x i x) p cov (x, y) ˆβ = (xi x) 2 var (x) () Rewrap ECON 4135 November 18, 2011 4 / 35
Bivariate regression The basic measure of the goodness-of-t of your regression is R 2 = ESS TSS = (ŷi ȳ) 2 (yi ȳ) 2 = 1 SSR TSS = 1 (yi ŷ i ) 2 (yi ȳ) 2 () Rewrap ECON 4135 November 18, 2011 5 / 35
Bivariate regression The basic OLS assumptions are 1 Conditional mean of error is zero 2 (X i, Y i ) are independently and identically distributed 3 Large outliers are unlikely () Rewrap ECON 4135 November 18, 2011 6 / 35
Bivariate regression Under the OLS assumptions, the OLS estimator is a random variable ˆβ = (yi ȳ) (x i x) (xi x) 2 1 that is unbiased: E ˆβ = β, 2 consistent: ˆβ p cov(x,y) var(y) = β, d 3 converges to a normal distribution: ˆβ Z N ( β, σ ˆβ ), OLS and 4 is BLUE (if u is homoskedastic Gauss-Markov theorem). () Rewrap ECON 4135 November 18, 2011 7 / 35
Multivariate regression Multivariate regression is more complex to calculate, but essentially proceeds as usual (with obvious reformulation), except: 1 an additional OLS assumption: No perfect multicollinearity. linear relationships between regressors e.g. dummy variable trap remember to consider the constant in addition to the x-es 2 an (additional) adjusted measure of goodness-of-t R 2 = 1 n 1 SSR n k 1 TSS 3 imperfect multicollinearity between regressors implies that you have less free variation than you expect, and estimates become imprecise () Rewrap ECON 4135 November 18, 2011 8 / 35
Nonlinear regression functions Nonlinear regression functions can usually be formulated as multivariate regression functions by appropriate transformations of y or x, usually taking logarithms, including polynomials or interacting coecients 1 polynomial: y i = α + β 1 x i + β 2 x 2 i + β 3 x 3 i + W i γ + u i 2 logs log-linear: ln y i = α + β 1 x i + W γ + u i i linear-log: y i = α + β 1 ln x i + W γ + u i i log-log: ln y i = α + β 1 ln x i + W γ + u i i Taking logs is common because it converts the estimated eects to percentages or elasticities. 3 interactions: y i = α + β 1 x i + β 2 D i + β 3 (x i D i ) + W i γ + u i () Rewrap ECON 4135 November 18, 2011 9 / 35
Nonlinear regression functions () Rewrap ECON 4135 November 18, 2011 10 / 35
Nonlinear regression functions () Rewrap ECON 4135 November 18, 2011 11 / 35
Nonlinear regression functions Note that the eect of a variable will now often depend on levels and more than one coecient, and 1 signicance often involves tests of joint hypotheses 2 eects should be calculated from the predictions (before vs after) () Rewrap ECON 4135 November 18, 2011 12 / 35
Hypothesis tests and condence intervals test single hypotheses with the t-test, e.g. if H 0 : β = a vs H 1 : β a, then β a σ β should be t-distributed with n k degrees of freedom. 1 the critical t-value is found in your t-table, 1 pick your condence/rejection/signicance level 2 decide on one-sided vs two-sided test 2 the p-value is the probability of the estimate under H 0, i.e. the probability of type I-error if you reject H 0. 3 condence intervals are calculated from the t-statistic, such that the true β is inside the interval with probability 1 p: ( ) CI 1 p (β) = ˆβ ± t p se ˆβ () Rewrap ECON 4135 November 18, 2011 13 / 35
Hypothesis tests and condence intervals Test q joint hypotheses with the F -test, e.g. if H 0 : β 1 = β 2 = 0, then under H 0 1 under heteroskedasticity:f = 1 q=2 F q, -distributed in large samples t 2 1 +t2 2 2ˆρt 1,t 2 t 1t 2 1 ˆρ t1,t 2 (SSRr SSRu)/(q=2) 2 under homoskedasticity: F = SST u/(n k u 1) F q,n k u 1-distributed. 1 Note that for a single hypothesis, F = t 2. should be should be () Rewrap ECON 4135 November 18, 2011 14 / 35
Hypothesis tests and condence intervals Alternatively, for single restrictions on multiple coecients, reformulate the regression to test all hypotheses with a single coecient, e.g. if H 0 : β 1 = β 2 Y i = α + β 1 x 1i + β 2 x 2i + u i we can always add β 2 x 1 β 2 x 1 = 0 Y i = α + (β 1 β 2 ) x 1i + β 2 (x 1i x 2i ) + u i which can be reformulated as Y i = α + γ 1 x 1i + β 2 W i + u i where W i = x 1i x 2i H0 can now be reformulated as γ 1 = 0, which can be tested using a standard t-test () Rewrap ECON 4135 November 18, 2011 15 / 35
Heteroskedasticity Homoskedastic errors have constant variance for all values of x, heteroskedastic errors do not. () Rewrap ECON 4135 November 18, 2011 16 / 35
Heteroskedasticity () Rewrap ECON 4135 November 18, 2011 17 / 35
Internal and external validity 1 Internal validity: can estimates be trusted for the population studied. 2 External validity: can estimates be trusted for other populations than the one studied. () Rewrap ECON 4135 November 18, 2011 18 / 35
Internal and external validity Estimates are not internally validity when conditional mean independence fails, E (u x) 0 for some x, e.g. when cov (u, x) 0. 1 omitted variables bias 2 functional form misspecication 3 measurement error 4 sample selection bias 5 simultaneity bias () Rewrap ECON 4135 November 18, 2011 19 / 35
Internal and external validity Estimates may not be externally valid due to 1 dierences in populations 2 dierences in settings Assessing internal and external validity requires judgment and economic reasoning. In the end, validity of estimates rests on assumptions that cannot be adequately tested using statistics. () Rewrap ECON 4135 November 18, 2011 20 / 35
Panel data 1 panel or longitudinal data are data sets that include the same entities (individuals, states) over several time periods. 2 panel data allows controlling for unobservable factors by comparing changes in y and x over time rather than the levels directly, e.g. 1 individual xed eects: control for unobservable factors that vary across individuals but not over time 2 year xed eects: control for unobservable factors that vary over time but not across individuals 3 individual and year xed eects: both () Rewrap ECON 4135 November 18, 2011 21 / 35
Panel data 1 Panel data models can be estimated by 1 including dummies for time periods and entities 2 dierencing the data, e.g. subtract the mean of an entity over time from the observed value (both y and x) and regressing on these 3 rst-dierencing (if T = 2) 2 estimation and inference proceeds as usual, but should take into account correlation between observations from the same entity over time (clustered standard errors) () Rewrap ECON 4135 November 18, 2011 22 / 35
Binary dependent variables 1 Linear probability model (LPM): D i = α + x i β + ɛ i β is the percentage point change in the probability of D = 1 from a one unit change in x. 2 Probit regression: Pr (D i = 1) = Φ (z) = Φ ( α + x i β + ɛ i β is the change in the z-value from a one unit change in x. Φ is the cdf of the standard normal distribution. 3 Logit regression: Pr (D i = 1) = F ( α + x i β + ɛ i ) = 1 1 exp( (α+x i β)) β is the change in the log-odds ratio from a one unit change in x the log-odds ratio = ln (p/1 p). similar to probit, but with a logistic distribution in place of the standard normal. () Rewrap ECON 4135 November 18, 2011 23 / 35 )
Binary dependent variables () Rewrap ECON 4135 November 18, 2011 24 / 35
Binary dependent variables Comparing methods: 1 Predicted probabilities LPM is implausible near the end-points, since predicted probabilities can take negative values or values above one Probit and logit forces predicted probabilities to lie on 01 2 Ease of implementation LPM is much easier to estimate Probit and (to lesser extent) logit requires much more heavy data calculations modern computers may solve this problems in samples of several thousand individuals 3 Ease of interpretation LPM can be interpreted directly Probit must be inverted using the standard normal Logit must be inverted using the logistic distribution Probit and logit coecients cannot be interpreted without setting the values of all variables () Rewrap ECON 4135 November 18, 2011 25 / 35
Instrumental variables 1 IV is useful when we suspect problems with internal validity, i.e. corr (x, u) 0, 2 IV uses a certain part of the overall variation in x that is hypothesized not to be aected by the validity problems 3 Specically, given an instrument z that is correlated with x: corr (z, x) 0, and not correlated with the error term: corr (z, u) = 0, 4 using variation in x driven by z reinstates internal validity β IV = = = cov (z, y) cov (z, x), IV cov (z, y) /var(z) cov (z, x) /var (z) = β yz, β xz cov (ˆx, y) var (ˆx), 2SLS ILS () Rewrap ECON 4135 November 18, 2011 26 / 35
Instrumental variables 2SLS estimates in two stages: 1st stage: x i = α + βz i + W γ + u i 2nd stage: y i = α + βˆx i + W γ + u i note that standard errors should account for both stages: standard errors on 2nd stage alone are biased downwards. With k endogenous variables and m instruments, 1 the system can be overidentied (k < m), underidentied (k > m) or just-identied (k = m) 2 we can do overidentication tests using the J-statistic Weak instruments have low correlation with the endogenous variable and causes unreliable estimates: rule-of-thumb, F > 10. () Rewrap ECON 4135 November 18, 2011 27 / 35
Experiments and quasi-experiments The causal eect of a treatment T i for an individual i can be thought of as the dierence between the potential outcomes Y i (T i ) β i = Y i (1) Y i (0) We can put this in regression terms y i = Y i (1) T i + Y i (0) (1 T i ) = Y i (0) + (Y i (1) Y i (0)) T i = E [Y i (0)] + (Y i (1) Y i (0)) T i + [Y i (0) E [Y i (0)]] = α + β i T i + u i If the treatment is randomized, then cov (T i, u i ) = 0, and we may estimate an average of β i Including covariates/control variables isn't necessary, but cannot harm estimation of β i (given no correlation) may help reduce the variance of u i, and therefore improve precision () Rewrap ECON 4135 November 18, 2011 28 / 35
Experiments and quasi-experiments Quasi-experiments are as if-experiments that are not intended as experiments. Typically, a reform, a rule, geographic variation... Some important quasi-experimental methods Dierence-in-dierences Regression discontinuity (Instrumental variables) () Rewrap ECON 4135 November 18, 2011 29 / 35
Experiments and quasi-experiments () Rewrap ECON 4135 November 18, 2011 30 / 35
Experiments and quasi-experiments () Rewrap ECON 4135 November 18, 2011 31 / 35
Experiments and quasi-experiments The causal eect of a treatment T i for an individual i can be thought of as the dierence between the potential outcomes Y i (T i ) β i = Y i (1) Y i (0) In heterogeneous populations, The average treatment eect (ATE) is the mean eect of the treatment in the population β ATE = E [Y i (1) Y i (0)] The average treatment eect on the treated (ATT) is the mean eect of treatment in the population that is actually treated β ATT = E [Y i (1) Y i (0) T i = 1] The average treatment eect on the untreated (ATUT) is the mean eect of treatment in the population that is not actually treated β ATUT = E [Y i (1) Y i (0) T i = 0] () Rewrap ECON 4135 November 18, 2011 32 / 35
OLS with heterogeneous populations If the treatment is truly randomized, then we recover the ATE We compare outcomes of treated and untreated β = E (Y i T = 1) E (Y i T = 0) = E (Y i (1) T = 1) E (Y i (0) T = 0) = E (Y i (1)) E (Y i (0)) = E [Y i (1) Y i (0)] Often, we can only recover a local average treatment eect (LATE) without imposing stronger assumptions e.g. di-in-di recovers the ATT RD recovers a particular margin () Rewrap ECON 4135 November 18, 2011 33 / 35
IV with heterogeneous populations Remember that 2SLS estimates in two stages such that y i = β 0 + β 1i x i + u i x i = π 0 + π 1i z i + v i β IV 1 = cov (ˆx, y) var (ˆx) thus, if π 1i = 0 for some parts of the population, these individuals are ignored! IV puts most of the weight on individuals for whom z has a large inuence on x () Rewrap ECON 4135 November 18, 2011 34 / 35
IV with heterogeneous populations y i = β 0 + β 1i x i + u i x i = π 0 + π 1i z i + v i More specically, assuming β 1i and π 1i are distributed independently of (u i, v i, z i ), E (u i z i ) = E (v i z i ) = 0, and E (π 1i ) 0 β IV 1 p E (β 1iπ 1i ) E (π 1i ) = LATE = ATE + cov (β 1i, π 1i ) E (π 1i ) () Rewrap ECON 4135 November 18, 2011 35 / 35