Capital humain, développement et migrations: approche macroéconomique (Empirical Analysis - Static Part)

Similar documents
Basic Regressions and Panel Data in Stata

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Topic 10: Panel Data Analysis

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Empirical Application of Panel Data Regression

Applied Microeconometrics (L5): Panel Data-Basics

Instrumental Variables Estimation in Stata

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Jeffrey M. Wooldridge Michigan State University

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

1 Motivation for Instrumental Variable (IV) Regression

EC327: Advanced Econometrics, Spring 2007

MSc Economic Policy Studies Methods Seminar. Stata Code and Questions sheet: Computer lab session 24 th October

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Panel Data III. Stefan Dahlberg

Econometrics of Panel Data

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Fortin Econ Econometric Review 1. 1 Panel Data Methods Fixed Effects Dummy Variables Regression... 7

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Lecture 4: Linear panel models

Fixed Effects Models for Panel Data. December 1, 2014

Topic 7: Heteroskedasticity

Short T Panels - Review

Motivation for multiple regression

Time-Series Cross-Section Analysis

Econometrics of Panel Data

Economics 308: Econometrics Professor Moody

Econometric Analysis of Cross Section and Panel Data

Quantitative Methods Final Exam (2017/1)

Maria Elena Bontempi Roberto Golinelli this version: 5 September 2007

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Econometrics of Panel Data

FinQuiz Notes

Lab 11 - Heteroskedasticity

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

ADVANCED ECONOMETRICS I. Course Description. Contents - Theory 18/10/2017. Theory (1/3)

ECON2228 Notes 7. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 41

Econometrics Summary Algebraic and Statistical Preliminaries

Panel data methods for policy analysis

Advanced Econometrics

Econometrics. 8) Instrumental variables

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

IV and IV-GMM. Christopher F Baum. EC 823: Applied Econometrics. Boston College, Spring 2014

point estimates, standard errors, testing, and inference for nonlinear combinations

Instrumental variables estimation using heteroskedasticity-based instruments

Dynamic Panels. Chapter Introduction Autoregressive Model

Economics 582 Random Effects Estimation

WISE International Masters

Multiple Regression Analysis: Heteroskedasticity

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Applied Econometrics. Lecture 3: Introduction to Linear Panel Data Models

Econometrics of Panel Data

Lecture 8 Panel Data

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Ordinary Least Squares Regression

Empirical Application of Simple Regression (Chapter 2)

Rockefeller College University at Albany

Econ 582 Fixed Effects Estimation of Panel Data

Econometrics. 9) Heteroscedasticity and autocorrelation


MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Applied Quantitative Methods II

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Wooldridge, Introductory Econometrics, 2d ed. Chapter 8: Heteroskedasticity In laying out the standard regression model, we made the assumption of

Graduate Econometrics Lecture 4: Heteroskedasticity

Instrumental variables estimation using heteroskedasticity-based instruments

10 Panel Data. Andrius Buteikis,

Making sense of Econometrics: Basics

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Estimation of pre and post treatment Average Treatment Effects (ATEs) with binary time-varying treatment using Stata

Introduction to Panel Data Analysis

Instrumental Variables and GMM: Estimation and Testing. Steven Stillman, New Zealand Department of Labour

1 Estimation of Persistent Dynamic Panel Data. Motivation

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Non-linear panel data modeling

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Econometrics for PhDs

Essential of Simple regression

(a) Briefly discuss the advantage of using panel data in this situation rather than pure crosssections

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

Advanced Quantitative Methods: panel data

GLS. Miguel Sarzosa. Econ626: Empirical Microeconomics, Department of Economics University of Maryland

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Multiple Equation GMM with Common Coefficients: Panel Data

Session 3-4: Estimating the gravity models

The gravity models for trade research

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Dynamic Panel Data Ch 1. Reminder on Linear Non Dynamic Models

Lecture 8: Instrumental Variables Estimation

Transcription:

Séminaire d Analyse Economique III (LECON2486) Capital humain, développement et migrations: approche macroéconomique (Empirical Analysis - Static Part) Frédéric Docquier & Sara Salomone IRES UClouvain Février-Juin 2015

STATA In order to perform your empirical analysis you are supposed to be at least STATA beginners, which implies: Having access to a STATA program (the commands we are using during this course are from STATA 11) Being able to open it and recognize windows and icons Being familiar with basic commands (such as list, des, sum, gen...) Being able to use and save a dataset Knowing how to prepare a do.file However, if a command is unknown or not clear do not hesitate to ask STATA itself with the help command or myself (sara.salomone@uclouvain.be)!!!!

STATA

The dataset The dataset at your disposal in.dta file is made up of panel and cross-sectional variables. The first ones (migration stocks, for example) take into account two dimensions: 1 The geographical one (with 195 countries), where i refers to each country 2 The year of interest (9 periods from 1970 to 2010 every 5 years), where t refers to the time dimension While cross-section variables just considers dimension 1. It implies that cross-sectional variables are constant over time for each specific country (proportion of religious groups, for example).

The panel dataset Advantages of panel data with respect to cross sectional data: You can identify dynamical aspects You can control for unobserved heterogeneity that is constant over time You can address endogeneity issues also with internal instruments (lagged values of the independent variables)

Data Editor in STATA

Do file in STATA

Panel Data in STATA The TSSET command declares the data in memory to be a panel and allows you to construct the time-series operator L. (lag). In the dataset at your disposal 1 time lag corresponds to 5 years since you have data for 1970,1975,1980,1985,1990,1995,2000,2005 and 2010. STEPS TO ORGANIZE THE PANEL DATA IN STATA: egen year idbis= group(year) (To enable STATA to identify a five year time span as 1 lag) tsset newcountry year idbis (to declare the data in memory to be a panel) gen lagvarlist=l.varlist (to create a 1 period lagged variable) gen lag2varlist=l2.varlist (to create a 2 period lagged variable)

Panel Data description in STATA To see if the panel is balanced or unbalanced: xtdescribe To have an idea of the overall (total) variability: xtsum varlist To decompose the total variability into the within or intra-individuals variability (σ e ) and the between or inter-individuals variability(σ u ): xtreg varlist,fe

The model The causal relationship between y and z can be identified through two models to which instrumental variable techniques are applied: 1 STATIC MODEL: y i,t = α 0 + α 1 2 DYNAMIC MODEL: y i,t = α 0 + α 1 z i,t }{{} endogenous z i,t }{{} endogenous +γx i,t + ε i,t (1) +β y i,t 1 }{{} +γx i,t + ε i,t (2) endogenous During this course we will just focus on static models with endogeneity.

Endogeneity z i,t is presumably endogenous (i.e. correlated with ε i,t ) because of at least one of the following reasons: 1 Reverse causality or economic simultaneity: z i,t is generated inside the same economic system also generating y i,t 2 Omitted variable: the original model is mispecified since at least one explanatory is missing 3 Measurement error: z i,t is measured with error If this is the case, the α 1 becomes biased and inconsistent. So an instrumentation strategy needs to be implemented.

Instruments validity An instrument is an external (neither equal to z i,t nor to X i,t ) or internal (lagged values of z i,t ) variable which has to be: 1 Relevant: correlated with the endogenous variable z i,t 2 Exogenous: uncorrelated with the error term ε i,t Unfortunately, you cannot statistically check for exogeneity but for relevance various first-stage results and identification stats should be taken into consideration.

Instrumentation tests An instrument is relevant if: The First Stage F-stat is higher than 10 (as a rule of thumb) The Stock-Yogo weak ID test critical value exceeds 16.38 The Hausman or endogeneity test is correctly specified (H 0 : z i,t is exogenous) The Hansen J-test does not reject the over-identification strategy (H 0 : the model is overidentified) If more than one instrument are used.

Static Model: y i,t = α 0 + α 1 z i,t + γx i,t + ε i,t It captures the long run causal relationship between z i,t and y i,t where z i,t needs to be instrumented if endogenous ESTIMATION TECHNIQUES: 1 Pooled OLS 2 Fixed effect (FE) estimation 3 Random effect (RE) estimation

POLS properties Estimation command: regress y z X without instruments ivreg2 y (z=instr) X, with instruments robust }{{} ffirst }{{} endog(z) }{{} heterosk.correction instr.tests HausmanTest This standard multiple regression model implies: 1 An homogeneous behaviour of different countries in both slope and intercept. This can be checked graphically: twoway (scatter y z) (lfit y z)

POLS properties 2 Homoskedasticity (the error variance is constant). The Breusch and Pagan test and a graphical analysis detect it: quietly regress y z X (which replicates the estimation without showing the table) predict ŷ (to predict and store the fitted value of y) predict residuals,res (to predict and store residuals) hettest residuals (H 0: constant variance) twoway scatter ŷ residuals (random distr homoskedasticity) To correct for heteroskedasticity: regress y z X,robust

POLS properties 3 Errors are serially uncorrelated. To deal with serial correlation, perform the LM test on the past value of residuals to see whether they are correlated with contemporaneous values: gen lagres=l.residuals (to create the first lagged value of residuals) regress residuals z X lagres (If φ lagres is significant at 1% there is autocorrrelation) testparm φ lagres ( H 0 :φ lagres =0) 4 Valid model specification. The model should include all the relevant variables and exclude irrelevant ones: quietly regress y z X ovtest (H 0 : the model has no omitted variables)

From overall to within and between variability The restriction related to the hypothesis of homogeneous behaviour of different countries in both slope and intercept is rarely admissible: different countries in different historical phases could have followed different policies The panel structure that distinguishes within and between variability (in the individual, temporal or both dimension) is fully ignored A misleading non linearity may result from POLS (which is equivalent to define a large cross-section of n i.i.d observations)

FE estimation (one way) Fixed effect estimation (one way) exploits the within (or intra countries) variability The within variability is the effect on y i,t of the deviation from the mean over time of the z i,t As a result variables which are constant over time (ex.language,small island nature, colonial links,latitude/longitude,..) are removed The individual effect can be correlated with the z i,t and X i,t variables Estimation commands: xtregress y z X,fe vce(robust) without instruments }{{} heterosk.correction xtivregress y (z=instr1 instr2 instr3) X,fe vce(robust) }{{} first with instruments instr.tests

Help xtreg

FE estimation (one way) and (two ways) The same FE estimates can be obtained through a least squares dummy variable model (LSDV) with the use of dummy variables: 1 One way model: which includes only one set of dummy variables (country) 2 Two ways model: which considers two sets of dummy variables (country and year) To create country and year dummies in STATA: egen country id=group(country) xi i.country, prefix ( C) egen year id=group(year) xi i.year, prefix ( Y) LSDV ESTIMATION (two ways): 1 reg y z X }{{} C countryfe }{{} Y,robust without instruments yearfe 2 ivreg2 y z (instr1 instr2 instr3) X C Y,robust with instruments

RE estimation The individual effects are considered as random (which is a plausible assumption when there are many individuals randomly drawn from a large population, and the specific nature of the individual heterogeneity is unknown) The individual effect is a part of the model s error, thus it must be uncorrelated with the z i,t and X variables. Estimation commands: xtregress y z X,re instruments theta }{{} vce(robust) without heterogeneity xtivregress y (z=instr1 instr2 instr3) X,re theta vce(robust) ffirst endog(z) with instruments

POLS, FE or RE? If θ=1 RE FE (max heterogeneity) If θ=0 RE POLS (few heterogeneity) POLS or RE? qui xtreg y z X,re xttest0 (Breusch and Pagan test for RE where H 0 :no individual heterogeneity) If H 0 is rejected RE estimates must be used instead of POLS

Weaknesses of FE: FE or RE? 1 It cannot estimate the effect of time invariant vrb 2 The residuals are autocorrelated Weaknesses of RE: 1 Assumes exogeneity 2 Heteroskedasticity Hausman test by hand to select the best option: qui xtreg y z X,fe est store FE qui xtreg y z X,re est store RE hausman FE.,sigmamore (H 0 :difference in coefficient not systematic)