Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Similar documents
Review of Econometrics


Introduction to Econometrics

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Applied Statistics and Econometrics

WISE International Masters

ECONOMETRICS HONOR S EXAM REVIEW SESSION

EMERGING MARKETS - Lecture 2: Methodology refresher

The Simple Linear Regression Model

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Homoskedasticity. Var (u X) = σ 2. (23)

1 Motivation for Instrumental Variable (IV) Regression

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Panel Data. STAT-S-301 Exercise session 5. November 10th, vary across entities but not over time. could cause omitted variable bias if omitted

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Final Exam. Economics 835: Econometrics. Fall 2010

Regression and Statistical Inference

2. Linear regression with multiple regressors

Christopher Dougherty London School of Economics and Political Science

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Applied Econometrics (QEM)

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Simple Linear Regression: The Model

Econometrics Summary Algebraic and Statistical Preliminaries

Statistical Inference with Regression Analysis

THE MULTIVARIATE LINEAR REGRESSION MODEL

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Econometric Analysis of Cross Section and Panel Data

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Lecture #8 & #9 Multiple regression

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

Honors General Exam Part 3: Econometrics Solutions. Harvard University

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

A Guide to Modern Econometric:

Lecture 1: OLS derivations and inference

Introduction to Econometrics. Multiple Regression (2016/2017)

ECON3150/4150 Spring 2015

Discrete Dependent Variable Models

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

ECON 4230 Intermediate Econometric Theory Exam

Nonlinear Regression Functions

Lectures 5 & 6: Hypothesis Testing

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

ECO321: Economic Statistics II

The regression model with one fixed regressor cont d

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

A Practitioner s Guide to Cluster-Robust Inference

The multiple regression model; Indicator variables as regressors

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Lecture 3: Multiple Regression

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

WISE International Masters

Least Squares Estimation-Finite-Sample Properties

Multiple Regression Analysis: Heteroskedasticity

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Empirical approaches in public economics

The returns to schooling, ability bias, and regression

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Applied Health Economics (for B.Sc.)

Steps in Regression Analysis

The F distribution. If: 1. u 1,,u n are normally distributed; and 2. X i is distributed independently of u i (so in particular u i is homoskedastic)

Chapter 6: Linear Regression With Multiple Regressors

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Motivation for multiple regression

Analisi Statistica per le Imprese

Econometrics I Lecture 3: The Simple Linear Regression Model


Making sense of Econometrics: Basics

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Econ 582 Fixed Effects Estimation of Panel Data

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Introduction to Econometrics. Multiple Regression

EC4051 Project and Introductory Econometrics

4.8 Instrumental Variables

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Environmental Econometrics

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Rockefeller College University at Albany

Lecture 1: intro. to regresions

Lab 07 Introduction to Econometrics

8. Instrumental variables regression

Föreläsning /31

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Introductory Econometrics

Econometrics -- Final Exam (Sample)

Economics 620, Lecture 18: Nonlinear Models

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Transcription:

Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35

What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate regression 3 Hypothesis testing 4 Nonlinear regression functions 5 Internal and external validity 3 Extensions 1 Panel data 2 Binary dependent variable 3 Instrumental variables 4 Natural experiments () Rewrap ECON 4135 November 18, 2011 2 / 35

What is econometrics? 1 may quantify relationships that are usually only signed by theory 2 may falsify a theoretical model 3 may identify the structural parameters of a theoretical model () Rewrap ECON 4135 November 18, 2011 3 / 35

Bivariate regression The basic OLS estimator minimizes the sum of squared errors (yi ȳ) (x i x) p cov (x, y) ˆβ = (xi x) 2 var (x) () Rewrap ECON 4135 November 18, 2011 4 / 35

Bivariate regression The basic measure of the goodness-of-t of your regression is R 2 = ESS TSS = (ŷi ȳ) 2 (yi ȳ) 2 = 1 SSR TSS = 1 (yi ŷ i ) 2 (yi ȳ) 2 () Rewrap ECON 4135 November 18, 2011 5 / 35

Bivariate regression The basic OLS assumptions are 1 Conditional mean of error is zero 2 (X i, Y i ) are independently and identically distributed 3 Large outliers are unlikely () Rewrap ECON 4135 November 18, 2011 6 / 35

Bivariate regression Under the OLS assumptions, the OLS estimator is a random variable ˆβ = (yi ȳ) (x i x) (xi x) 2 1 that is unbiased: E ˆβ = β, 2 consistent: ˆβ p cov(x,y) var(y) = β, d 3 converges to a normal distribution: ˆβ Z N ( β, σ ˆβ ), OLS and 4 is BLUE (if u is homoskedastic Gauss-Markov theorem). () Rewrap ECON 4135 November 18, 2011 7 / 35

Multivariate regression Multivariate regression is more complex to calculate, but essentially proceeds as usual (with obvious reformulation), except: 1 an additional OLS assumption: No perfect multicollinearity. linear relationships between regressors e.g. dummy variable trap remember to consider the constant in addition to the x-es 2 an (additional) adjusted measure of goodness-of-t R 2 = 1 n 1 SSR n k 1 TSS 3 imperfect multicollinearity between regressors implies that you have less free variation than you expect, and estimates become imprecise () Rewrap ECON 4135 November 18, 2011 8 / 35

Nonlinear regression functions Nonlinear regression functions can usually be formulated as multivariate regression functions by appropriate transformations of y or x, usually taking logarithms, including polynomials or interacting coecients 1 polynomial: y i = α + β 1 x i + β 2 x 2 i + β 3 x 3 i + W i γ + u i 2 logs log-linear: ln y i = α + β 1 x i + W γ + u i i linear-log: y i = α + β 1 ln x i + W γ + u i i log-log: ln y i = α + β 1 ln x i + W γ + u i i Taking logs is common because it converts the estimated eects to percentages or elasticities. 3 interactions: y i = α + β 1 x i + β 2 D i + β 3 (x i D i ) + W i γ + u i () Rewrap ECON 4135 November 18, 2011 9 / 35

Nonlinear regression functions () Rewrap ECON 4135 November 18, 2011 10 / 35

Nonlinear regression functions () Rewrap ECON 4135 November 18, 2011 11 / 35

Nonlinear regression functions Note that the eect of a variable will now often depend on levels and more than one coecient, and 1 signicance often involves tests of joint hypotheses 2 eects should be calculated from the predictions (before vs after) () Rewrap ECON 4135 November 18, 2011 12 / 35

Hypothesis tests and condence intervals test single hypotheses with the t-test, e.g. if H 0 : β = a vs H 1 : β a, then β a σ β should be t-distributed with n k degrees of freedom. 1 the critical t-value is found in your t-table, 1 pick your condence/rejection/signicance level 2 decide on one-sided vs two-sided test 2 the p-value is the probability of the estimate under H 0, i.e. the probability of type I-error if you reject H 0. 3 condence intervals are calculated from the t-statistic, such that the true β is inside the interval with probability 1 p: ( ) CI 1 p (β) = ˆβ ± t p se ˆβ () Rewrap ECON 4135 November 18, 2011 13 / 35

Hypothesis tests and condence intervals Test q joint hypotheses with the F -test, e.g. if H 0 : β 1 = β 2 = 0, then under H 0 1 under heteroskedasticity:f = 1 q=2 F q, -distributed in large samples t 2 1 +t2 2 2ˆρt 1,t 2 t 1t 2 1 ˆρ t1,t 2 (SSRr SSRu)/(q=2) 2 under homoskedasticity: F = SST u/(n k u 1) F q,n k u 1-distributed. 1 Note that for a single hypothesis, F = t 2. should be should be () Rewrap ECON 4135 November 18, 2011 14 / 35

Hypothesis tests and condence intervals Alternatively, for single restrictions on multiple coecients, reformulate the regression to test all hypotheses with a single coecient, e.g. if H 0 : β 1 = β 2 Y i = α + β 1 x 1i + β 2 x 2i + u i we can always add β 2 x 1 β 2 x 1 = 0 Y i = α + (β 1 β 2 ) x 1i + β 2 (x 1i x 2i ) + u i which can be reformulated as Y i = α + γ 1 x 1i + β 2 W i + u i where W i = x 1i x 2i H0 can now be reformulated as γ 1 = 0, which can be tested using a standard t-test () Rewrap ECON 4135 November 18, 2011 15 / 35

Heteroskedasticity Homoskedastic errors have constant variance for all values of x, heteroskedastic errors do not. () Rewrap ECON 4135 November 18, 2011 16 / 35

Heteroskedasticity () Rewrap ECON 4135 November 18, 2011 17 / 35

Internal and external validity 1 Internal validity: can estimates be trusted for the population studied. 2 External validity: can estimates be trusted for other populations than the one studied. () Rewrap ECON 4135 November 18, 2011 18 / 35

Internal and external validity Estimates are not internally validity when conditional mean independence fails, E (u x) 0 for some x, e.g. when cov (u, x) 0. 1 omitted variables bias 2 functional form misspecication 3 measurement error 4 sample selection bias 5 simultaneity bias () Rewrap ECON 4135 November 18, 2011 19 / 35

Internal and external validity Estimates may not be externally valid due to 1 dierences in populations 2 dierences in settings Assessing internal and external validity requires judgment and economic reasoning. In the end, validity of estimates rests on assumptions that cannot be adequately tested using statistics. () Rewrap ECON 4135 November 18, 2011 20 / 35

Panel data 1 panel or longitudinal data are data sets that include the same entities (individuals, states) over several time periods. 2 panel data allows controlling for unobservable factors by comparing changes in y and x over time rather than the levels directly, e.g. 1 individual xed eects: control for unobservable factors that vary across individuals but not over time 2 year xed eects: control for unobservable factors that vary over time but not across individuals 3 individual and year xed eects: both () Rewrap ECON 4135 November 18, 2011 21 / 35

Panel data 1 Panel data models can be estimated by 1 including dummies for time periods and entities 2 dierencing the data, e.g. subtract the mean of an entity over time from the observed value (both y and x) and regressing on these 3 rst-dierencing (if T = 2) 2 estimation and inference proceeds as usual, but should take into account correlation between observations from the same entity over time (clustered standard errors) () Rewrap ECON 4135 November 18, 2011 22 / 35

Binary dependent variables 1 Linear probability model (LPM): D i = α + x i β + ɛ i β is the percentage point change in the probability of D = 1 from a one unit change in x. 2 Probit regression: Pr (D i = 1) = Φ (z) = Φ ( α + x i β + ɛ i β is the change in the z-value from a one unit change in x. Φ is the cdf of the standard normal distribution. 3 Logit regression: Pr (D i = 1) = F ( α + x i β + ɛ i ) = 1 1 exp( (α+x i β)) β is the change in the log-odds ratio from a one unit change in x the log-odds ratio = ln (p/1 p). similar to probit, but with a logistic distribution in place of the standard normal. () Rewrap ECON 4135 November 18, 2011 23 / 35 )

Binary dependent variables () Rewrap ECON 4135 November 18, 2011 24 / 35

Binary dependent variables Comparing methods: 1 Predicted probabilities LPM is implausible near the end-points, since predicted probabilities can take negative values or values above one Probit and logit forces predicted probabilities to lie on 01 2 Ease of implementation LPM is much easier to estimate Probit and (to lesser extent) logit requires much more heavy data calculations modern computers may solve this problems in samples of several thousand individuals 3 Ease of interpretation LPM can be interpreted directly Probit must be inverted using the standard normal Logit must be inverted using the logistic distribution Probit and logit coecients cannot be interpreted without setting the values of all variables () Rewrap ECON 4135 November 18, 2011 25 / 35

Instrumental variables 1 IV is useful when we suspect problems with internal validity, i.e. corr (x, u) 0, 2 IV uses a certain part of the overall variation in x that is hypothesized not to be aected by the validity problems 3 Specically, given an instrument z that is correlated with x: corr (z, x) 0, and not correlated with the error term: corr (z, u) = 0, 4 using variation in x driven by z reinstates internal validity β IV = = = cov (z, y) cov (z, x), IV cov (z, y) /var(z) cov (z, x) /var (z) = β yz, β xz cov (ˆx, y) var (ˆx), 2SLS ILS () Rewrap ECON 4135 November 18, 2011 26 / 35

Instrumental variables 2SLS estimates in two stages: 1st stage: x i = α + βz i + W γ + u i 2nd stage: y i = α + βˆx i + W γ + u i note that standard errors should account for both stages: standard errors on 2nd stage alone are biased downwards. With k endogenous variables and m instruments, 1 the system can be overidentied (k < m), underidentied (k > m) or just-identied (k = m) 2 we can do overidentication tests using the J-statistic Weak instruments have low correlation with the endogenous variable and causes unreliable estimates: rule-of-thumb, F > 10. () Rewrap ECON 4135 November 18, 2011 27 / 35

Experiments and quasi-experiments The causal eect of a treatment T i for an individual i can be thought of as the dierence between the potential outcomes Y i (T i ) β i = Y i (1) Y i (0) We can put this in regression terms y i = Y i (1) T i + Y i (0) (1 T i ) = Y i (0) + (Y i (1) Y i (0)) T i = E [Y i (0)] + (Y i (1) Y i (0)) T i + [Y i (0) E [Y i (0)]] = α + β i T i + u i If the treatment is randomized, then cov (T i, u i ) = 0, and we may estimate an average of β i Including covariates/control variables isn't necessary, but cannot harm estimation of β i (given no correlation) may help reduce the variance of u i, and therefore improve precision () Rewrap ECON 4135 November 18, 2011 28 / 35

Experiments and quasi-experiments Quasi-experiments are as if-experiments that are not intended as experiments. Typically, a reform, a rule, geographic variation... Some important quasi-experimental methods Dierence-in-dierences Regression discontinuity (Instrumental variables) () Rewrap ECON 4135 November 18, 2011 29 / 35

Experiments and quasi-experiments () Rewrap ECON 4135 November 18, 2011 30 / 35

Experiments and quasi-experiments () Rewrap ECON 4135 November 18, 2011 31 / 35

Experiments and quasi-experiments The causal eect of a treatment T i for an individual i can be thought of as the dierence between the potential outcomes Y i (T i ) β i = Y i (1) Y i (0) In heterogeneous populations, The average treatment eect (ATE) is the mean eect of the treatment in the population β ATE = E [Y i (1) Y i (0)] The average treatment eect on the treated (ATT) is the mean eect of treatment in the population that is actually treated β ATT = E [Y i (1) Y i (0) T i = 1] The average treatment eect on the untreated (ATUT) is the mean eect of treatment in the population that is not actually treated β ATUT = E [Y i (1) Y i (0) T i = 0] () Rewrap ECON 4135 November 18, 2011 32 / 35

OLS with heterogeneous populations If the treatment is truly randomized, then we recover the ATE We compare outcomes of treated and untreated β = E (Y i T = 1) E (Y i T = 0) = E (Y i (1) T = 1) E (Y i (0) T = 0) = E (Y i (1)) E (Y i (0)) = E [Y i (1) Y i (0)] Often, we can only recover a local average treatment eect (LATE) without imposing stronger assumptions e.g. di-in-di recovers the ATT RD recovers a particular margin () Rewrap ECON 4135 November 18, 2011 33 / 35

IV with heterogeneous populations Remember that 2SLS estimates in two stages such that y i = β 0 + β 1i x i + u i x i = π 0 + π 1i z i + v i β IV 1 = cov (ˆx, y) var (ˆx) thus, if π 1i = 0 for some parts of the population, these individuals are ignored! IV puts most of the weight on individuals for whom z has a large inuence on x () Rewrap ECON 4135 November 18, 2011 34 / 35

IV with heterogeneous populations y i = β 0 + β 1i x i + u i x i = π 0 + π 1i z i + v i More specically, assuming β 1i and π 1i are distributed independently of (u i, v i, z i ), E (u i z i ) = E (v i z i ) = 0, and E (π 1i ) 0 β IV 1 p E (β 1iπ 1i ) E (π 1i ) = LATE = ATE + cov (β 1i, π 1i ) E (π 1i ) () Rewrap ECON 4135 November 18, 2011 35 / 35