Asymptotic Properties and simulation in gretl

Similar documents
Instrumental Variables

Large Sample Properties & Simulation

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Econometrics I. Ricardo Mora

Probit Estimation in gretl

Testing Hypothesis after Probit Estimation

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

ECON Introductory Econometrics. Lecture 16: Instrumental variables

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

8. Instrumental variables regression

Homoskedasticity. Var (u X) = σ 2. (23)

Econometrics Summary Algebraic and Statistical Preliminaries

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Final Exam. Economics 835: Econometrics. Fall 2010

Multiple Linear Regression

Linear Regression. Junhui Qian. October 27, 2014

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Econometrics Problem Set 11

Lecture 8: Instrumental Variables Estimation

Applied Quantitative Methods II

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Instrumental Variables

1 Motivation for Instrumental Variable (IV) Regression

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Essential of Simple regression

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics Master in Business and Quantitative Methods

We begin by thinking about population relationships.

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Applied Econometrics (QEM)

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

Instrumental Variables, Simultaneous and Systems of Equations

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Dealing With Endogeneity

Economics 582 Random Effects Estimation

1 Appendix A: Matrix Algebra

3 Multiple Linear Regression

4.8 Instrumental Variables

dqd: A command for treatment effect estimation under alternative assumptions

Dynamic Panel Data Models

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

Introductory Econometrics

Recitation 1: Regression Review. Christina Patterson

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Regression and Statistical Inference

Economics 583: Econometric Theory I A Primer on Asymptotics

Maria Elena Bontempi Roberto Golinelli this version: 5 September 2007

Intermediate Econometrics

Review of Econometrics

Applied Econometrics (QEM)

Econometrics. 8) Instrumental variables

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Simple Linear Regression

Statistical Inference with Regression Analysis

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

UNIVERSIDAD CARLOS III DE MADRID ECONOMETRICS FINAL EXAM (Type B) 2. This document is self contained. Your are not allowed to use any other material.

Generalized Method of Moments (GMM) Estimation

Econometrics - 30C00200

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

ECON3150/4150 Spring 2015

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Link to Paper. The latest iteration can be found at:

Econometrics of Panel Data

The Statistical Property of Ordinary Least Squares

ECO375 Tutorial 8 Instrumental Variables

Econometric Methods and Applications II Chapter 2: Simultaneous equations. Econometric Methods and Applications II, Chapter 2, Slide 1

The Simple Linear Regression Model

Dynamic Panels. Chapter Introduction Autoregressive Model

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Maximum Likelihood (ML) Estimation

Practice exam questions

Topics in Applied Econometrics and Development - Spring 2014

Econometrics II - EXAM Answer each question in separate sheets in three hours

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

1. You have data on years of work experience, EXPER, its square, EXPER2, years of education, EDUC, and the log of hourly wages, LWAGE

Specification testing in panel data models estimated by fixed effects with instrumental variables

Business Statistics. Lecture 10: Correlation and Linear Regression

ECO220Y Simple Regression: Testing the Slope

Multivariate Regression Analysis

Endogeneity. Tom Smith

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

Heteroskedasticity and Autocorrelation

Lecture 4: Heteroskedasticity

Econometrics 2, Class 1

Simultaneous Equation Models

Short Questions (Do two out of three) 15 points each

Single-Equation GMM: Endogeneity Bias

Handout 11: Measurement Error

Transcription:

Asymptotic Properties and simulation in gretl Quantitative Microeconomics R. Mora Department of Economics Universidad Carlos III de Madrid

Outline 1 Asymptotic Results for OLS 2 3 4 5

Classical Assumptions Gauss-Markov Assumptions: A1: Linearity: y = β + β 1 x 1 +... + β k x k + v A2: Random Sampling A3: Conditional Mean Independence: E[y x] = β 0 + β 1 x 1 +... + β k x k A4: Invertibility of Variance-covariance Matrix A5: Homoskedasticity: Var[v x] = σ 2 Normality A6: Normality: y x N(β 0 + β 1 x 1 +... + β k x k,σ 2 )

Asymptotic Properties for OLS (1/2) Consistency Under Gauss-Markov A1-A4, plim( ˆβ j ) = β j Asymptotic normality (CLT): strong version Under Gauss-Markov A.1 to A.5: n 1 ˆβ /2 j β j ( N(0,1)as n where a σ/a j ² = plim 1 j n i res ˆ ji ² ) Asymptotic eciency Under Gauss-Markov A.1 to A5, OLS is asymptotically ecient in the class of linear estimators

Asymptotic Properties for OLS (2/2) Asymptotic normality (CLT): weak version Under A.1 to A.4: n 1 /2 ( ˆβj β j ) N ( 0, n Avar( ˆβ ) j ) as n but OLS is not longer asymptotically ecient From the CLTs where plim(se( ˆβ j )) = Avar( ˆβ j ) t = ˆβ j β j se( ˆβ N (0,1) as n j)

Suppose that A3 does not hold y = β 0 + β 1 x + u but cov(x, u) 0 OLS is such that consistent with a false property cov ˆ N (x, y ˆβ 0 ˆβ 1 x) = 0 Example: wages = β 0 + β 1 education + u { ˆβ0, ˆβ } 1 is those with higher ability are more likely to go to college and have higher wages: cov(educ, u) 0 ˆβ 1 would overestimate the eect of going to college by the eect of ability on education we want to use in the sample a property which is true for the population

Instruments y = β 0 + β 1 x + u cov(x, u) 0 An instrument z is a variable whose inuence on the dependent variable is only via a control z is relevant in the sense that it correlates with controls: cov(x, z) 0 z is exogenous in the sense that controls capture all its eects on the dependent variable: cov(u, z) = 0 each exogenous control is an instrument of itself

: The Basic Idea y = β 0 + β 1 x + u cov(x, u) 0 (OLS is inconsistent) cov(x, z) 0 (z is relevant) cov(z, u) = 0 (z is exogenous) cov(y,z) = β 1 cov(x, z) β 1 = cov(y,z) cov(x,z) we use in the sample a property which is true for the population ˆβ 1 IV = cov ˆ N(y i,z i ) cov ˆ N (x i,z i )

in the General Case y 1 = β 0 + β 1 z 1 + β 2 y 2 + u cov(y 2, u) 0 z 1 is a set of k 1 exogenous variables: cov(z 1, u) = 0 y 2 is a set of k 2 endogenous variables, but there is an instrument for each endogenous variable in y 2, cov(z 2, u) = 0 the system of k 1 + k 2 + 1 linear equations ( cov ˆ N z 1i, y 1i ˆβ 0 ˆβ 1 z 1i + ˆβ ) 2 y 2i = 0 ( cov ˆ N z 2i, y 1i ˆβ 0 ˆβ 1 z 1i + ˆβ ) 2 y 2i = 0 ( mean ˆ N uniquely identies { ˆβ IV 0 y 1i ˆβ 0 ˆβ 1 z 1i + ˆβ ) 2 y 2i = 0 } IV IV, ˆβ 1, ˆβ 2

2SLS Assumptions Gauss-Markov Assumptions 2SLS1: Linearity: y = β + β 1 x 1 +... + β k x k + v 2SLS2: Random Sampling 2SLS3: Exogeneity: cov(u, z) = 0 2SLS4: Rank condition: (i) there are no perfect linear relations among the instruments. (ii) The invertibility (relevance) condition holds. 2SLS5: Homoskedasticity: var[v z] = σ 2

2SLS Large Sample Results Theorem Under 2SLS1-2SLS4, 2SLS is consistent Theorem Under 2SLS1-2SLS5, 2SLS is asymptotically normal and asymptotically ecient in the class of IV estimators Theorem Under 2SLS1-2SLS4, 2SLS is consistent and asymptotically normal

Some Properties of 2SLS IV standard errors tend to be larger than OLS standard errors the stronger the correlation between z and x, the smaller the IV standard errors getting non-signicant results using IV may simply be a problem of poor instruments

Testing after 2SLS t-tests: under H 0 : β j = 0 t = IV ˆβ j se(ˆβ IV j ) a N(0,1) it is possible to test for multiple linear hypothesis Hausman test for endogeneity H 0 : OLS is consistent a t-test for endogeneity: First step: regress y2 on all z1 and z2 and compute residual ˆv Second step: OLS y1 on z1 and y2 AND ˆv. Under the null, the slope for ˆv should not be signicant The Sargan test tests overidentifying restrictions

A simple example: estimating a demand function a supply and demand system of equations supply function: q = γ 0 + β s p + γx s + u s demand function: q = α 0 + β d p + αx d + u d At equilibrium, q = q(x s, x d, u s, u d ), p = p(x s, x d, u s, u d ) Note that cov(p, u d ) 0 (OLS is inconsistent) identication of β d using a supply shifter cov(x s, p) 0(relevance) (because p is a function of x s ) cov(x s, u d ) = 0(exogeneity) (otherwise, x s supply shifter) is not really a

A Graphical Interpretation of Identication of Demand

A simple Monte Carlo experiment log(wages) = 10 + 0.05 D + u, u N(0,1), D = 1 with prob. 0.3 1 draw N realizations of D 2 draw N realizations of u 3 compute log(wages) 4 OLS log(wages) on D and store β r 1 5 replicate step 1 to 4 R times 6 examine the empirical distribution of β r 1

How do we draw N realizations of D and u? A random number generator is a device designed to generate a sequence of numbers, called pseudo-random numbers, that appear random there are two main methods: 1 using a physical random phenomenon (i.e. sunspots) 2 using a computer the latter type are determined by a shorter initial number given to the computer, known as the seed controlling the seed is useful: it permits replication

Pseudo-random numbers of the uniform many econometric packages provide pseudo-random numbers from the uniform distribution between 0 and 1 uniform values between 0 and 1 can be used to generate random numbers of any desired distribution how? by passing them through the inverse cdf of the desired distribution ( x ) N µ,σ 2 1 generate the uniform U(0,1) : u 2 generate the standard normal N (0,1) : z = Φ 1 (u) 3 compute x = µ + σ u

Multivariate normal pseudo-random numbers Any multivariate normal [ ] x1 N x 2 can be expressed as [ x1 A is such that [ u1 u 2 ] N x 2 ([ µ1 ] [ µ1 = µ 2 [ σ 2 1 σ 12 ([ 0 0 σ 12 σ2 2 ], [ 1 0 0 1 µ 2 ] [ σ 2, 1 σ 12 σ 12 σ2 2 ] [ u1 + A u 2 ] ]) ] = AA T (Cholesky decomposition) ])

Commands to generate random numbers uniform: draws a series of iid values from the uniform distribution normal: draws from the uniform distribution genpois: draws from the poisson distribution randgen: all purpuse random number generator We are going to predominantly use uniform and normal

uniform(#a,#b) generates values from the uniform in the interval (a, b)by default, in the interval (0,1) Example nulldata 500 # "blank" data set with 500 obs. set seed 2703 # sets the seed for replicability genr x = 100 * uniform(-1,1)

normal(#µ,#σ ) generates values from the normal N ( µ,σ 2 ) by default, the N(0, 1) Example 1 genr z = normal(5,2) Example 2: conditional normal distribution genr x1 = 20+5*uniform(-1,1)+1.3*normal() genr u = uniform(-1,1)+3*normal() genr y = 2 + 3 * x1 + 3*u

Sample Covariance of Any Two Variables Suppose that we have two random variables, x 1 and x 2, with the following properties [ ] ([ ] [ ]) x1 0 1 0.5 N, 0 0.5 1 x 2 Suppose that we estimate the covariance with samples of size N = 5,50,500,5000 Can we estimate the statistical properties of the sample covariance of these two variables for each sample size? Can we understand how the asymptotic properties are related with these small sample properties?

Estructura Objetive: simulate a bivariate normal and estimate the small sample properties of the sample covariance 1 Inicialization: sample size, seed, Cholesky 2 Within the loop: 1 Simulation 2 Computation of the sample covariance for dierent samples 3 Store results 3 Recovery of results

The gretl Script # *************** Before the loop ********************* nulldata 5000 set seed 547 matrix S = {1,0.5;0.5,1} matrix A = cholesky(s) #****** open a loop, to be repeated R=500 times ********** loop 500 progressive quiet genr u1 = normal() genr u2 = normal() genr x1 = A[1,1]*u1+A[1,2]*u2 genr x2 = A[2,1]*u1+A[2,2]*u2 smpl 5 random genr cov5 = cov(x1,x2) smpl full smpl 50 random genr cov50 = cov(x1,x2) smpl full smpl 500 random genr cov500 = cov(x1,x2) smpl full genr cov5000 = cov(x1,x2) store myfirstmc.gtd cov5 cov50 cov500 cov5000 endloop #*********** we open the results ************************ open myfirstmc.gtd summary cov* simple

The Monte Carlo Results & the LLN Read datafile /home/ricmora/aaoficin/cursos/miccua/materiales/sesión 3_Tema 1_2_ Propiedades Asintóticas y Simulación en gretl/myfirstmc.gtd periodicity: 1, maxobs: 500 observations range: 1 500 Listing 5 variables: 0) const 1) cov5 2) cov50 3) cov500 4) cov5000? summary cov* simple statistics, using the observations 1 500 Mean Minimum Maximum Std. Dev. cov5 0.50421 0.94961 4.0369 0.56057 cov50 0.49751 0.15717 1.0123 0.15393 cov500 0.50006 0.37126 0.63924 0.047988 cov5000 0.50061 0.45830 0.54222 0.015756

The average is very similar across dierent sample sizes, and it is quite close to the population covariance. Why? The standard deviation gets smaller and smaller as the sample size increases. Why?

The Monte Carlo Results & the CLT 1 0.9 Test statistic for normality: Chi-square(2) = 134.610 [0.0000] cov5 N(0.50421,0.56057) 3.5 3 Test statistic for normality: Chi-square(2) = 7.137 [0.0282] cov50 N(0.49751,0.15393) 0.8 0.7 2.5 Density 0.6 0.5 0.4 Density 2 1.5 0.3 1 0.2 0.1 0.5 0-1 0 1 2 3 4 cov5 0 0 0.2 0.4 0.6 0.8 1 cov50 9 8 Test statistic for normality: Chi-square(2) = 4.691 [0.0958] cov500 N(0.50006,0.047988) 30 Test statistic for normality: Chi-square(2) = 1.397 [0.4973] cov5000 N(0.50061,0.015756) 25 7 6 20 Density 5 4 Density 15 3 10 2 1 5 0 0.35 0.4 0.45 0.5 0.55 0.6 0.65 cov500 0 0.46 0.48 0.5 0.52 0.54 cov5000

under classical assumptions, OLS is consistent and asymptotically normal when a control is likely correlated with the error term, then OLS is inconsistent under general assumptions, 2SLS is consistent and asymptotically normal if we want to estimate the price elasticity in a demand equation, we need a supply shifter within a basic Monte Carlo algorithm we need a random number generator. In gretl, this is very easy