Practice 2SLS with Artificial Data Part 1

Similar documents
Suggested Answers Problem set 4 ECON 60303

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Problem Set 10: Panel Data

Instrumental Variables

Quantitative Methods Final Exam (2017/1)

****Lab 4, Feb 4: EDA and OLS and WLS

Controlling for Time Invariant Heterogeneity

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Lecture#12. Instrumental variables regression Causal parameters III

Fixed and Random Effects Models: Vartanian, SW 683

Lecture 3 Linear random intercept models

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Exercices for Applied Econometrics A

The Regression Tool. Yona Rubinstein. July Yona Rubinstein (LSE) The Regression Tool 07/16 1 / 35

Lecture 8: Instrumental Variables Estimation

Lab 07 Introduction to Econometrics

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Econometrics. 8) Instrumental variables

Handout 11: Measurement Error

Handout 12. Endogeneity & Simultaneous Equation Models

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Binary Dependent Variables

Problem Set 1 ANSWERS

Problem set - Selection and Diff-in-Diff

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

Problem Set 5 ANSWERS

An explanation of Two Stage Least Squares

Econometrics II Censoring & Truncation. May 5, 2011

ECON Introductory Econometrics. Lecture 17: Experiments

1 The basics of panel data

Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

Autoregressive models with distributed lags (ADL)

Эконометрика, , 4 модуль Семинар Для Группы Э_Б2015_Э_3 Семинарист О.А.Демидова

General Linear Model (Chapter 4)

Problem Set #3-Key. wage Coef. Std. Err. t P> t [95% Conf. Interval]

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

Specification Error: Omitted and Extraneous Variables

Spatial Regression Models: Identification strategy using STATA TATIANE MENEZES PIMES/UFPE

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Dealing With and Understanding Endogeneity

Ecmt 675: Econometrics I

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Microeconometrics (PhD) Problem set 2: Dynamic Panel Data Solutions

Empirical Application of Simple Regression (Chapter 2)

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Monday 7 th Febraury 2005

ECON3150/4150 Spring 2016

Question 1 [17 points]: (ch 11)

1. The shoe size of five randomly selected men in the class is 7, 7.5, 6, 6.5 the shoe size of 4 randomly selected women is 6, 5.

Multiple Regression: Inference

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Lab 6 - Simple Regression

Lecture#17. Time series III

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Empirical Application of Panel Data Regression

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

7 Introduction to Time Series

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

Econometrics Homework 4 Solutions

Applied Statistics and Econometrics

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Lecture 4: Multivariate Regression, Part 2

Instrumental Variables, Simultaneous and Systems of Equations

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Lecture 14. More on using dummy variables (deal with seasonality)

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

Lecture 3: Multivariate Regression

Fortin Econ Econometric Review 1. 1 Panel Data Methods Fixed Effects Dummy Variables Regression... 7

Lab 10 - Binary Variables

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Section Least Squares Regression

sociology 362 regression

Lecture 4: Multivariate Regression, Part 2

Econometrics Homework 1

ECO220Y Simple Regression: Testing the Slope

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

GMM Estimation in Stata

Applied Statistics and Econometrics

ECON 497 Final Exam Page 1 of 12

Practice exam questions

multilevel modeling: concepts, applications and interpretations

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Econ 371 Problem Set #6 Answer Sheet. deaths per 10,000. The 90% confidence interval for the change in death rate is 1.81 ±

ECON 836 Midterm 2016

Lecture 24: Partial correlation, multiple regression, and correlation

Reply to Manovskii s Discussion on The Limited Macroeconomic Effects of Unemployment Benefit Extensions

Day 2A Instrumental Variables, Two-stage Least Squares and Generalized Method of Moments

. *DEFINITIONS OF ARTIFICIAL DATA SET. mat m=(12,20,0) /*matrix of means of RHS vars: edu, exp, error*/

Control Function and Related Methods: Nonlinear Models

sociology sociology Scatterplots Quantitative Research Methods: Introduction to correlation and regression Age vs Income

Dynamic Panel Data Models

Rockefeller College University at Albany

David M. Drukker Italian Stata Users Group meeting 15 November Executive Director of Econometrics Stata

Correlation and Simple Linear Regression

Transcription:

Practice 2SLS with Artificial Data Part 1 Yona Rubinstein July 2016 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 1 / 16

Practice with Artificial Data In this note we use artificial data to illustrate how to approach a selection problem using 2SLS. The "constructed" data set "MG4A4 2SLS DIET EXAMPLE PART 1.dta" can be found on my website under "Practice 2SLS using Artificial Data". The data contains 1000 individuals each individual observed over 100 periods (days, weeks). We have information regarding their weight and on whether they are dating. Specifically the data contains their (1) permanent weight, (2) change in weight if diet is not taken, (3) whether they are on diet and (iv) whether they received an invitation to date. People are on diet for three reasons: (1) if their weight exceeds 220 pounds; (2) if they gained 5 pounds; (3) if they receive an invitation to date. The latter is exogenous to fluctuations in their current weight. Yona A person Rubinstein (LSE) loses 15 pounds Practice 2SLS while with being Artificial Data on Part diet. 1 07/16 2 / 16

The Model: The Production Function of Peorsons Weight The casual model exhibits the following form: Y it = β 0 + β D D it + U it. (1) The variable Y it is person s i weight (in pounds) in time t and D it is a binary indicator which equals 1 if person i is on diet. The error term (U it ) is a composition of person s i permanent weight (relative to the population mean, that is θ i ) and person s i specific time varying fluctuations to his weight: U it = θ i + ε it. (2) The parameters that I used to impute weights (Y it ) are: Y it = 0 10 D it + U it. (3) Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 3 / 16

Selection into Diet - the Simplest Case People are on diet for three reasons: 1 If their weight, without diet, exceeds 220 pounds (100kg): β 0 + θ i + ε it 220. 2 If they gained 5 pounds (or more): ε it >= 5 3 If they receive an invitation to date: Date it = 1 We observe neighter vdate it nor ε it. Yet we observe whether the person is on diet and his weight. We know that person i is on diet if: D it = max [(β 0 + θ i + ε it 220), Date it, ε it ] > 0. (4) Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 4 / 16

Data Describe the data set storage display value variable name type format label variable label id byte %8.0g person id number time byte %8.0g PWi float %9.0g Person's permanent weigh Eit float %9.0g episilon it date byte %8.0g shock to date value vdate float %9.0g date PWi/190 Date float %9.0g 1 if date==1 Diet float %9.0g 1 if on diet PWit float %9.0g PWi + Eit 10*Diet Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 5 / 16

Data (cont.) Summary statistics > sum ; Variable Obs Mean Std. Dev. Min Ma id 10,000 50.5 28.86751 1 10 time 10,000 50.5 28.86751 1 10 PWi 10,000 189.5 28.86751 140 23 Eit 10,000 1.49e 07 13.81276 60.54 63.4 date 10,000.5004.5000248 0 vdate 10,000.4969684.5217212 1.257895.263157 Date 10,000.5004.5000248 0 Diet 10,000.7144.4517223 0 PWit 10,000 182.356 30.97509 84.08 268.0 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 6 / 16

Estimating the Regression Model We next estimate the model in equation (1) using OLS Y it = b 0 + b D D it + e it. (5) > eststo: reg PWit Diet ; Source SS df MS Number of obs = 10,000 + F(1, 9998) = 256.56 Model 240026.875 1 240026.875 Prob > F = 0.0000 Residual 9353574.21 9,998 935.54453 R squared = 0.0250 + Adj R squared = 0.0249 Total 9593601.08 9,999 959.456054 Root MSE = 30.587 PWit Coef. Std. Err. t P> t [95% Conf. Interval] + Diet 10.84626.6771461 16.02 0.000 9.51892 12.17361 _cons 174.6074.5723387 305.08 0.000 173.4855 175.7293 According to the OLS estimate b OLS D diet leads to a gain of 10 pounds in weight!! Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 7 / 16

Estimating the Regression Model Controlling for Fixed Effects Next next turn to estimate the model using controlling for person fixed effects (θ i ): Y it = b 0 + b D D it + θ i + n it. (6) eststo: areg PWit Diet, absorb(id) ; Linear regression, absorbing indicators Number of obs = 10,000 F( 1, 9899) = 106.28 Prob > F = 0.0000 R squared = 0.8353 Adj R squared = 0.8336 Root MSE = 12.6344 PWit Coef. Std. Err. t P> t [95% Conf. Interval] + Diet 2.946265.285791 10.31 0.000 2.386056 3.506473 _cons 180.2512.2400997 750.73 0.000 179.7805 180.7218 + id F(99, 9899) = 491.888 0.000 (100 categories) Accounting for person fixed effects matter! the bias is smaller, yet still large enough (3 rather than 10). Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 8 / 16

Estimating the Regression Model using 2SLS Next we turn to estimate the model using 2SLS using the following equations: 1 The first stage model: where V it is the error term. 2 The second tage model: where ˆD it = a0 OLS + a OLS Date it. We present the results in next silde. D it = a 0 + a D Date it + v it, (7) D Y it = b 0 + b D ˆD it + e it, (8) Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 9 / 16

First Stage D it = a 0 + a D Date it + v it, (9) > reg Diet Date ; Source SS df MS Number of obs = 10,000 + F(1, 9998) = 6676.90 Model 816.979723 1 816.979723 Prob > F = 0.0000 Residual 1223.34668 9,998.12235914 R squared = 0.4004 + Adj R squared = 0.4004 Total 2040.3264 9,999.204053045 Root MSE =.3498 Diet Coef. Std. Err. t P> t [95% Conf. Interval] + Date.5716573.006996 81.71 0.000.5579438.5853708 _cons.4283427.0049489 86.55 0.000.4186419.4380435 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 10 / 16

Second Stage (without correcting SE). Y it = b 0 + b D ˆD it + e it, (10) > eststo: reg PWit Diethat ; Source SS df MS Number of obs = 10,000 + F(1, 9998) = 68.61 Model 65389.6299 1 65389.6299 Prob > F = 0.0000 Residual 9528211.45 9,998 953.011748 R squared = 0.0068 + Adj R squared = 0.0067 Total 9593601.08 9,999 959.456054 Root MSE = 30.871 PWit Coef. Std. Err. t P> t [95% Conf. Interval] + Diethat 8.94641 1.080049 8.28 0.000 11.06352 6.829296 _cons 188.7473.8310522 227.12 0.000 187.1183 190.3763 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 11 / 16

Estimating the 2SLS using "ivreg" Note that we obtain identical point estimates. The standard errors were corrected to account for using a projected variable ˆD it. Using dates as an instrument allows to correct of selection on "LHS" variable. > eststo: ivreg PWit (Diet=Date) ; Instrumental variables (2SLS) regression Source SS df MS Number of obs = 10,000 + F(1, 9998) = 64.39 Model 559270.763 1 559270.763 Prob > F = 0.0000 Residual 10152871.8 9,998 1015.49028 R squared =. + Adj R squared =. Total 9593601.08 9,999 959.456054 Root MSE = 31.867 PWit Coef. Std. Err. t P> t [95% Conf. Interval] + Diet 8.94641 1.114891 8.02 0.000 11.13182 6.761 _cons 188.7473.8578613 220.02 0.000 187.0657 190.4289 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 12 / 16

Controlling for Fixed Effects We can take advantage of the panel data and further control for omitted person fixed effects. 1 The first stage model controls for person fixed effects (γ i ): where V it is the error term. D it = a 0 + a D Date it + γ i + v it, (11) 2 The second stage model also controls for person fixed effects (δ i ): where ˆD it = a0 OLS + a OLS Date it + γ OLS i We present the results in next silde. Y it = b 0 + b D ˆD it + δ i + e it, (12) D Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 13 / 16

Estimating the 2SLS controlling for person fixed effects using "xtivreg" Person fixed effects are part of the causal model. While our instrument provides a source of exogenous variation - controlling for persons fixed effects does not heart. > eststo: xtivreg PWit (Diet=Date) ; G2SLS random effects IV regression Number of obs = 10,000 Group variable: id Number of groups = 100 R sq: Obs per group: within = 0.0106 min = 100 between = 0.3901 avg = 100.0 overall = 0.0250 max = 100 Wald chi2(1) = 385.69 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 PWit Coef. Std. Err. z P> z [95% Conf. Interval] + Diet 9.548751.486211 19.64 0.000 10.50171 8.595795 _cons 189.1776 2.624326 72.09 0.000 184.034 194.3212 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 14 / 16

Estimating the Model using OLS, FE and 2SLS OLS FE OLS2ND 2SLS 2SLS w/fe (1) (2) (3) (4) (5) Dependent variable: "Weight" Diet 10.846*** 2.946*** 8.946*** 8.946*** 9.549*** (0.677) (0.286) (1.080) (1.115) (0.486) Constant 175*** 180*** 189*** 189*** 189*** (0.572) (0.240) (0.831) (0.858) (2.624) First Stage: Dependent variable "Diet" Date 0.572*** 0.572*** 0.571*** (0.007) (0.007) (0.007) 0.428*** 0.428*** 0.429*** (0.005) (0.005) (0.005) R square 0.400 0.400 0.390 Observations 10000 10000 10000 10000 10000 Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 15 / 16

Take Home Message We can identify the causal impact of a treatment on an outcome of interest accounting for selection into treatment if we have a variable that 1 Is uncorrelated with the unobserved component in the outcome equation (the error term); 2 Does not affect directly the outcome of interest Controlling for subjects fixed effects might eliminate some of the bias but not all as long as subject self-sort into treatment on time varying unobservables. Yona Rubinstein (LSE) Practice 2SLS with Artificial Data Part 1 07/16 16 / 16