Multivariate Regression: Part I

Similar documents
Econometrics Midterm Examination Answers

Applied Statistics and Econometrics

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Introduction to Econometrics. Review of Probability & Statistics

1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e

Nonlinear Regression Functions

Econometrics. 8) Instrumental variables

Extensions to the Basic Framework II

Hypothesis Tests and Confidence Intervals. in Multiple Regression

Lab 07 Introduction to Econometrics

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

Introduction to Econometrics. Multiple Regression (2016/2017)

Lecture notes to Stock and Watson chapter 8

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Chapter 7. Hypothesis Tests and Confidence Intervals in Multiple Regression

ECON3150/4150 Spring 2016

Extensions to the Basic Framework I

Essential of Simple regression

Greene, Econometric Analysis (7th ed, 2012)

THE MULTIVARIATE LINEAR REGRESSION MODEL

Hypothesis Tests and Confidence Intervals in Multiple Regression

Regression #8: Loose Ends

ECO220Y Simple Regression: Testing the Slope

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Applied Statistics and Econometrics

Introduction to Econometrics. Multiple Regression

Applied Statistics and Econometrics

ECON Introductory Econometrics. Lecture 17: Experiments

Lecture 5. In the last lecture, we covered. This lecture introduces you to

Multiple Regression Analysis: Estimation. Simple linear regression model: an intercept and one explanatory variable (regressor)

Question 1a 1b 1c 1d 1e 2a 2b 2c 2d 2e 2f 3a 3b 3c 3d 3e 3f M ult: choice Points

ECON Introductory Econometrics. Lecture 4: Linear Regression with One Regressor

Econ 2120: Section 2

Chapter 6: Linear Regression With Multiple Regressors

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

ECON Introductory Econometrics. Lecture 7: OLS with Multiple Regressors Hypotheses tests

Linear Regression with Multiple Regressors

Econometrics Homework 1

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Week 3: Simple Linear Regression

Problem Set 1 ANSWERS

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Linear Regression. Junhui Qian. October 27, 2014

Introduction to Econometrics Third Edition James H. Stock Mark W. Watson The statistical analysis of economic (and related) data

Question 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Multiple Linear Regression CIVL 7012/8012

Instrumental Variable Regression

At this point, if you ve done everything correctly, you should have data that looks something like:

Handout 12. Endogeneity & Simultaneous Equation Models

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON3150/4150 Spring 2016

ECON3150/4150 Spring 2015

Lecture 3: Multivariate Regression

Lab 6 - Simple Regression

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Practice exam questions

Control Function and Related Methods: Nonlinear Models

Instrumental Variables, Simultaneous and Systems of Equations

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

2. (3.5) (iii) Simply drop one of the independent variables, say leisure: GP A = β 0 + β 1 study + β 2 sleep + β 3 work + u.

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Econometrics II Censoring & Truncation. May 5, 2011

General Linear Model (Chapter 4)

1 Linear Regression Analysis The Mincer Wage Equation Data Econometric Model Estimation... 11

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Chapter 11. Regression with a Binary Dependent Variable

The Classical Linear Regression Model

Handout 11: Measurement Error

Applied Statistics and Econometrics

2.1. Consider the following production function, known in the literature as the transcendental production function (TPF).

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Problem Set 10: Panel Data

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

STAT 100C: Linear models

Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 317- Introduction to Econometrics

Statistical Inference with Regression Analysis

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics August 2013

Statistical Modelling in Stata 5: Linear Models

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

Lecture 8: Instrumental Variables Estimation

ECON 594: Lecture #6

Specification Error: Omitted and Extraneous Variables

Quantitative Methods Final Exam (2017/1)

Lecture #8 & #9 Multiple regression

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Graduate Econometrics Lecture 4: Heteroskedasticity

Part 6: Multivariate Normal and Linear Models

Linear Regression with one Regressor

Transcription:

Topic 1 Multivariate Regression: Part I ARE/ECN 240 A Graduate Econometrics Professor: Òscar Jordà

Outline of this topic Statement of the objective: we want to explain the behavior of one variable as a function of other variables. Typical assumptions and why they are needed. Three approaches to pursue the objective given the assumptions: Method of Moments Ordinary Least Squares Maximum Likelihood Estimation 2

Objective We continue our evaluation on how to improve schools using the California data We began by asking if class size affects scores and we have data for both. However, there may be other explanatory factors and/or policy variables (e.g. increase overall expenditures per student). Here is a summary of the data 3

Policy Evaluation: Test Scores and Class Size The California Test Score Data Set (caschool.dta STATA data file) All K-6 and K-8 California school districts (n = 420) Variables: 5 th grade test scores (Stanford-9 achievement test, combined math and reading), district average (testscr) Student-teacher teacher ratio = no. of students in the district divided by no. full-time equivalent teachers (str) District parental average income in thousands of dollars(avginc) Average expenditures per student in dollars (expn_stu) 4

A look at the data Basic Statistics:. sum testscr str expn_stu avginc Variable Obs Mean Std. Dev. Min Max testscr 420 654.1565 19.05335 605.5555 706.7575 str 420 19.64043 1.891812 14 25.8 expn_stu 420 5312.408 633.9371 3926.07 7711.507 avginc 420 15.31659 7.22589 5.335 55.328 Correlation matrix. correlate testscr str expn_stu avginc (obs=420) testscr str expn_stu avginc testscr 1.0000 str -0.2264 1.0000 expn_stu 0.1913-0.6200 1.0000 avginc 0.7124-0.2322 0.3145 1.0000 5

Statement t t of the Population Problem It is natural to postulate that: testscr i = constant + f(str i, expn_stu i, avginc i ) + error i for i = 1,,420., A good place to begin is to assume this relation is linear. Using the more general notation we will use in the course: y i = x i1 1 + ::: + x ik K + ² i for i =1;:::;n The left hand side is the endogenous or dependent variable, the x s are the regressors, or explanatory variables, and the the residuals or error terms 6

Some Features of the Linear Regression Model To investigate the problem we collect a random sample of fdata fy i ;x i1 ;:::;x ik g n i=1 Some vector and matrix notation: 2 y1 3 2 x1j 3 y = 6 7 4. 5 ; x j n 1 y n and X i 1 K = 6 4. j x nj = x i1 ::: x ik 7 5 for j =1; :::; K and X = x 1 ::: x K We use the convention that x 1 contains the constant t term So, in matrix notation, the linear regression model is y = X + ² 7

What we want using statistical concepts Assuming y, X have a joint distribution, we want to make statements about the conditional mean of y given X, notice that. sum testscr if avginc < 15 Variable Obs Mean Std. Dev. Min Max testscr 255 645.1076 15.45333 605.55 683.4. sum testscr if avginc > 40 Variable Obs Mean Std. Dev. Min Max testscr 9 695.7333 7.055042 681.9 706.75 Mathematically: m(x) =E(yjX) = Z 1 1 yf(yjx)dy 8

The Regression Error Given the previous definition: ² = y m(x) This implies the following properties for the regression error: 1. 2. E(²jX) =0 E(²) =0 3. 0 for any function h(.) 4. E(h(X) 0 ²) =0 E(X 0 ²)=0 For example, to prove the first property: E(²jX) =E((y m(x))jx) =E(yjX) E(m(X)jX) = m(x) ( ) m(x) ( ) =0 9

Prediction: min MSE The conditional mean has the property that it minimizes the mean squared error (MSE) out of any function g(.), E(y g(x)) 2 = E(²+m(X) g(x)) 2 = E(² 2 )+2E(²(m(X) g(x))) + E(m(X) g(x)) 2 = E(² 2 )+E(m(X) g(x)) 2 >E(² 2 )ifm(x) 6= g(x) Here I abuse notation to indicate that, e.g. 0 ) E(" 2 ) = E("" 0 ) 10

Conditional Variance Just as we consider the conditional mean, we may explore how the variance of y varies with X, (X) =V (yjx) =E("" 0 jx) When it is the case that the variance is constant so that (X) =E("" 0 jx) =¾ 2 I n we say the error term is homoscedastic, otherwise we say it is heteroscedastic. 11

Normality If we assume y and X are jointly normally distributed, life gets easy (clearly a strong assumption) That is because we can use the projection formula for the joint normal to obtain the conditional mean of y i given X i. Here is how, if i yi X 0 i» N i ¹y ¹ X ; μ 11 12 21 22 then E(y i jxi)=m(x 0 i)=¹ 0 yi + 12 1 22 (X0 i ¹ X ) V (y i) 0 1 i jx i ) = 11 12 22 21 12

Let s make some assumptions 1. Linearity: y = X + " 2. Full rank: 3. X is an n K matrix with rank K 2 E[²1 jx] 3 E(²jX) = 6 7 4. 5 = 0; hence E[²] =0andE[yjX] =X E[² n jx] 4. Homoscedasticity: V (²jX) =¾ 2 I n hence V (² i jx) =¾ 2 and Cov(² i ; ² j jx) = 0 for all i = j 5. Normality: ²jX» N(0;¾ 2 I n ) 13

Checking the assumptions in the data 1. Linearity: could be problematic theoretically. sum testscr if str <= 17 Variable Obs Mean Std. Dev. Min Max testscr 36 660.4139 23.03868 618.05 704.3. sum testscr if str >17 & str <=20 Variable Obs Mean Std. Dev. Min Max testscr 207 656.6229 18.56458 606.75 706.75. sum testscr if str >= 22.8 Variable Obs Mean Std. Dev. Min Max testscr 19 647.0395 16.94368 622.05 676.85. sum testscr if str < 22.8 & str >= 19.8 Variable Obs Mean Std. Dev. Min Max testscr 184 650.3753 17.83076 605.55 694.8 but testscr(14-17)-testscr(17-20) = 3.8 and testscr(20-23)-testscr(23-26) = 3.3 14

More Checking 2. X is full rank: this means that one (or more) regressors cannot be exact linear combinations of the others. Easiest is to check the correlation matrix of X:. correlate str expn_stu avginc (obs=420) str expn_stu avginc str 1.0000 expn_stu -0.6200 1.0000 avginc -0.2322 0.3145 1.0000 Later we will discuss slightly more sophisticated ways of checking this 15

Final Checks Assumptions 3 (residuals have zero conditional mean) and d4(h (homoscedasticity) it we cannot check just yet. Assumption 5 is normality. This we can check with Jarque-Bera statistics and also looking at some histogram/density plots.005.01 Dens.015.02 sity.025 16 0 600 620 640 660 680 700 Average Test Score (= (read_scr+math_scr)/2 );

Why do we make these assumptions? Linearity: not as strict as it sounds. Usual example, a Cobb-Douglas production function: Y = AL l K k! log(y )=log(a)+ l log(l)+ k log(k) y i = 1 + x i2 2 + x i3 3 + " i Beyond that, t we will discuss what to do with truly nonlinear specification later. For now, linearity makes derivations very convenient by using projection arguments 17

Multicolinearity X is a full rank matrix: easy, we cannot really identify parameters otherwise. An example, suppose 3 regressors such that x 1 = x 2 + x 3 y = x 1 1 + x 2 2 + x 3 3 + " y =(x 2 + x 3 ) 1 + x 2 2 + x 3 3 + " y = x 2 ( 1 + 2)+x 3 ( 1 + 3)+" y = x 2 2 + x 3 3 + " which means that 1; 2 and 3 cannot be separately identified. Mechanically, we run into numerical problems Exact colinearity is easy to detect, but approximate colinearity can affect regression results as well. 18

Conditional mean-zero errors This is a critical assumption, as we will see, it ensures that the model is properly specified and that the parameters estimates tend to their true values. Reasons why this assumption may not hold in practice have to do with misspecification problems: e.g. omitted variable bias, errors-invariables and endogeneity (only really applies when we want to emphasize analysis of causal relations as opposed to simple correlations) 19

Homoscedasticity 20 This assumption is often violated. However, it is easy to relax. It will not affect parameter estimates but it will affect how their standard errors are calculated (i.e., the efficiency of the estimator).. sum testscr if str <= 17 Variable Obs Mean Std. Dev. Min Max testscr 36 660.4139 23.03868 618.05 704.3. sum testscr if str >17 & str <=20 Variable Obs Mean Std. Dev. Min Max testscr 207 656.6229 18.56458 606.75 706.75. sum testscr if str >= 22.8 Variable Obs Mean Std. Dev. Min Max testscr 19 647.0395 16.94368 622.05 676.85. sum testscr if str < 22.8 & str >= 19.8 Variable Obs Mean Std. Dev. Min Max testscr 184 650.3753 17.83076 605.55 694.8

Normality/Gaussianity Assuming the data are Gaussian allows us to use well known projection formulas and allows us to derive finite sample statistics However, the data is often not Gaussian. It turns out that using the thought experiment of increasing the sample size to infinity will allow us to use some probability limit theory under which the estimators will have a Normal distribution Hence the importance of having a random sample 21

Random Sample Let be i.i.d. fw i g n i=1 = fy i ;X i g n i=1 Then f(w 1 ; :::; w n )= f 1 (w 1 ; μ 1 ):::ff i (w i jw i 1 ; :::; w 1 ; μ i ):::ff n (w n jw n 1 ; :::; w 1 ; μ n ) = f(w 1 ; μ):::f(w n ; μ) i.e. notice the independence assumption in the first line, and the identical assumption in the second In time series, as long as the amount of dependence is limited, one can relax the indepedence assumption 22

Where are we so far? We have postulated a population model of how y relates to X y i = 1 + x i2 2 + ::: + x ik K + " i ; i =1; :::; n We have a random sample: Now we want to obtain the distribution ib ti of the parameters. The mean of the distribution is the parameter estimate and knowing the distribution is vital to do inference: b» D( ; ) 23

Methods of Moments Let s try to figure out how to estimate We will use the method of moments approach first. It consists on the analogy principle: i translate t a population moment condition into its equivalent sample moment condition (think LLN). For example: ¹ 1 X n yi 1 X n E(y ¹y ) = 0! y i ¹ y = 0 n!1 n n i=1 i=1 b¹ = 1 X n ¹ y yi y i n i=1 24

Deriving the MM estimator for linear regression Recall, one of the key assumptions in the linear regression model is: P n E("jX) =0! E(X 0 i=1 ") =0! X0 i " i X 0 " = =0 n n with Hence: y = X + ² E(X 0 ")=E(X 0 (y X )) = 0! X0 y n b =(X 0 X) 1 X 0 y X0 X n =0 25

Least Squares Linear Regression: Test Scores and Student-to-Teacher Ratio est Score 6 00 620 64 40 T 660 680 700 15 20 25 Student to Teacher Ratio Average Test Score (= (read_scr+math_scr)/2 ); Fitted values 26

Deriving the OLS estimator Consider the problem of minimizing the distance of the observations with respect to the regression line. Since we care about distance but not the sign of the error, we could use absolute values: this gives rise to the LAD estimator but it is not convenient because it is not differentiable Instead, by squaring the distance, the objective function can be optimized using derivative methods 27

Derivation of OLS Objective: min S( ) =E(" 2 i )! min In matrix algebra: 1 n nx " 2 i = 1 n i=1 nx (y i X i ) 2 i=1 "0 " (y X ) 0 (y X ) min S( ) = = n n General result: suppose f( ) is a real valued scalar function of. A necessary condition for a local optimum is @f =0 @ = b 28

Derivation of OLS (cont.) If the hessian is positive semidefinite, then ^ is a local minimum. Rules of matrix differentiation: @f @ = 2 6 4 @f @ 1. @f @ K 3 7 5 ; 2 @2f @ 2 f = 6 @ @ 0 4. 2 2 @ 1@ 01 @ 2 f @ K@ 01 ::: ::: @ 2 f @ 1@ 0K. @ 2 f @ K@ 0K 3 7 5 @A @ 0 @ 0A @ @ 0A0 = A; A 0 @ = A0 ; =(A + A 0 ) ; @ 0A = 0(A 0 + A) @ 0 29

Derivation of OLS (cont.) Recall: min S( ) = "0 " (y X ) 0 (y X ) = n n = y0 y 0X 0 n y y0 X 0X 0 X + n n n Applying the rules of matrix differentiation @S( ) =0 X0 y X 0 μ y X 0 μ X X 0 X 0 + + =0 @ n n n n = 2 X0 y X n +2X0 n =0 ^ = (X 0 X) 1 X 0 y 30 @ 2 S( ) X 0 X = 2 which is positive de nite @ @ 0 n

Remarks No multicolinearity assumption ensures X X is invertible.. M is both b" = y X b = y X(X 0 X) 1 X 0 y = My symmetric and idempotent ( M = M 0 and M = M 2 ) and MX = 0 y by = y " = (I M)y = X(X 0 X) 1 X 0 y = Py where P is called the projection matrix. X 0^² = X 0 My = 0 by construction, the residuals are uncorrelated to the regressors. 31

Maximum Likelihood Estimator Assuming the random sample fy i ;X i g n i=1 is normally distributed and since the ^ are a linear combination of these, they will have a multivariate Gaussian distribution. Further, we now that the residuals are mean zero. And under the assumption of homoscedasticity, their covariance matrix is =¾. 2 I The multivariate normal is f("; ) =(2¼) n=2 j j 1=2 expf 1 2 (" ¹)0 1 (" ¹)g 32

MLE Taking the log (to construct the log likelihood function) and using the assumptions of the linear regression model: L("; ) = n 2 log(2¼) n 2 log ¾2 1 2¾ 2 "0 " = n log(2¼) n log ¾2 1 X )0 X ) 2 2 2¾ 2 (y (y and ¾ 2 Take derivatives with respect to @L b 1 @ = 1 2¾ 2 2 X 0 y + X 0 X =0! =(X 0 X) 1 X 0 y @L = n 12 + 1 2 b"0 b" =0! b¾ 2 = b"0 b" @¾ 2 = b 2 ¾ 2 2¾ 4 n 33

Let s revisit Joint Normality and Linear Regression Recall: if y and X are jointly normal then yi X 0 i» N ¹y ¹ X ; μ 11 12 21 22 E(y i jxi)=m(x 0 i)=¹ 0 yi + 12 1 22 (X i 0 ¹ X ) V (y 0 1 i jx i ) = 11 12 22 21 Compare to OLS E(yjX)! y b = X b = X(X 0 X) 1 X 0 y! Ã! Ã! 1 X n 1 n 1 X y by yi 0 0 i = y i X i X n i=1 n i X i X i i=1 34

An example of GAUSS code for OLS Here is the basic code (a more complete file labeled topic1.prg does more things): load z[] = topic1.csv; vars = 4; z = reshape(z,rows(z)/vars,vars); rows(z)/vars vars); n = rows(z); y = z[.,1]; x = ones(rows(z),1) 1)~z[ z[.,2:cols(z)]; beta = inv(x'x)*x'y; beta; 669.74510-1.3257657-0.0034947061 1.8943746 35

Some Regression output From STATA. use "C:\Docs\teaching\140\STATA\caschool.dta". reg testscr str expn_stu avginc Source SS df MS Number of obs = 420 F( 3, 416) = 149.86 Model 79004.2997 3 26334.7666 Prob > F = 0.0000 Residual 73105.294 416 175.73388 R-squared = 0.5194 Adj R-squared = 0.5159 Total 152109.594 419 363.030056 Root MSE = 13.256 testscr Coef. Std. Err. t P> t [95% Conf. Interval] str -1.325765.4368463-3.03 0.003-2.184466 -.4670634 expn_stu -.0034947.0013358-2.62 0.009 -.0061205 -.000869 avginc 1.894375.0945335 20.04 0.000 1.708552 2.080198 _cons 669.7451 13.97392 47.93 0.000 642.2768 697.2134 36

Regression Output from GAUSS TOPIC 1 OLS EXAMPLE USING GAUSS' BUILT IN OLS ROUTINE Valid cases: 420 Dependent variable: Y Missing cases: 0 Deletion method: None Total SS: 152109.594 Degrees of freedom: 416 R-squared: 0.519 Rbar-squared: 0.516 Residual SS: 73105.309 Std error of est: 13.256 F(3,416): 149.856 Probability of F: 0.000 Standard Prob Standardized Cor with Variable Estimate Error t-value > t Estimate Dep Var ------------------------------------------------------------------------------- CONSTANT 669.745098 13.973921 47.928215 0.000 --- --- X1-1.325766 0.436846-3.034856 0.003-0.131636-0.226363 X2-0.003495003495 0.001336 001336-2.616202 0.009009-0.116275 0.191273 X3 1.894375 0.094534 20.039176 0.000 0.718432 0.712431 37

Measuring Goodness of Fit Intuition: If the regression is really good, then the residuals will be very close to zero and the predictions of the dependent variable will be close to y, most of the time. R-squared: is the standard measure of fit and is based on comparing the residual variance or the prediction variance, with the variance of the dependent variable. 38

R-squared Recall ^y = X ^ = X(X 0 X) 1 X 0 y = P y and y = P y +(I P )y = P y + My Definition: R 2 R2 = y0 P y y 0 y =1 y0 My y 0 y 2 [0; 1] where I use the properties: P = P and P P = P M = M and M M = M 39

Adjusted R-squared Takes advantage of the different degrees of freedom adjustments in computing sample variances: Pn (by by) R 2 = ( i=1 (y i 2 )=(n k) ( P n i=1 (y i y) 2 )=(n 1) =1 (P n i=1 b" i 2 )=(n k) P n ( i=1 1 (y i y) 2 )=(n 1) 2 [0; 1] Generally superior but most programs still report both 40