Economics 620, Lecture 4: The K-Variable Linear Model I. y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N

Similar documents
Economics 620, Lecture 4: The K-Varable Linear Model I

Economics 620, Lecture 5: exp

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Multiple Regression Model: I

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Economics 620, Lecture 18: Nonlinear Models

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #1. July 11, 2013 Solutions

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b)

LECTURE 11: GENERALIZED LEAST SQUARES (GLS) In this lecture, we will consider the model y = Xβ + ε retaining the assumption Ey = Xβ.

STAT 100C: Linear models

Solutions for Econometrics I Homework No.1

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

The General Linear Model. Monday, Lecture 2 Jeanette Mumford University of Wisconsin - Madison

1 A complete Fourier series solution

Variable Selection and Model Building

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

Introduction to Estimation Methods for Time Series models. Lecture 1

plim W 0 " 1 N = 0 Prof. N. M. Kiefer, Econ 620, Cornell University, Lecture 16.

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

STA 302f16 Assignment Five 1

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 6 & 7. Shuanglin Shao. September 16th and 18th, 2013

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

can only hit 3 points in the codomain. Hence, f is not surjective. For another example, if n = 4

Need for Several Predictor Variables

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

1 Correlation between an independent variable and the error

3 Multiple Linear Regression

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Introduc)on to linear algebra

Economics 620, Lecture 14: Lags and Dynamics

Advanced Econometrics I

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 15: Multiple Linear Regression & Correlation

the error term could vary over the observations, in ways that are related

Regression Models - Introduction

STAT 100C: Linear models

Appendix A: Matrices

Introduction to the Simultaneous Equations Model

18.06SC Final Exam Solutions

4 Multiple Linear Regression

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Applied Econometrics (QEM)

Linear Algebra March 16, 2019

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #2 Solutions

LECTURE 13: TIME SERIES I

Lecture 4: Multivariate Regression, Part 2

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

E2 212: Matrix Theory (Fall 2010) Solutions to Test - 1

STAT 540: Data Analysis and Regression

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Econometrics I. Professor William Greene Stern School of Business Department of Economics 3-1/29. Part 3: Least Squares Algebra

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Multivariate Regression

Topic 3: Inference and Prediction

Economics 620, Lecture 15: Introduction to the Simultaneous Equations Model

Chapter 6. Orthogonality

Inference in Regression Analysis

Chapter 5 Matrix Approach to Simple Linear Regression

Economics 620, Lecture 13: Time Series I

ECONOMET RICS P RELIM EXAM August 19, 2014 Department of Economics, Michigan State University

Simple Linear Regression for the MPG Data

We begin by thinking about population relationships.

Chapter 3. Matrices. 3.1 Matrices

Econ 2120: Section 2

EE5120 Linear Algebra: Tutorial 3, July-Dec

Topic 3: Inference and Prediction

Brief Suggested Solutions

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

10-725/36-725: Convex Optimization Prerequisite Topics

LECTURE 18: NONLINEAR MODELS

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

Homoskedasticity. Var (u X) = σ 2. (23)

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

Multiple Regression Analysis

Notes on Time Series Modeling

EE263: Introduction to Linear Dynamical Systems Review Session 2

Regression. Oscar García

Properties of Matrices and Operations on Matrices

Lecture 05 Geometry of Least Squares

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

M340L Final Exam Solutions May 13, 1995

Lecture 12: Diagonalization

Conceptual Questions for Review

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

MATH 304 Linear Algebra Lecture 10: Linear independence. Wronskian.

The General Linear Model. How we re approaching the GLM. What you ll get out of this 8/11/16

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Math 321: Linear Algebra

Chapter 11 Specification Error Analysis

Transcription:

1 Economics 620, Lecture 4: The K-Variable Linear Model I Consider the system y 1 = + x 1 + " 1 y 2 = + x 2 + " 2 :::::::: :::::::: y N = + x N + " N or in matrix form y = X + " where y is N 1, X is N 2, is 21, and " is N 1.

2 K-Variable Linear Model X = 2 6 4 1 x 1 1 x 2 :. 1 x N 3 7 5, = " # : Good statistical practice requires inclusion of the column of ones. Consider the general model y = X + " Convention: " is N 1. X = 2 6 4 y is N 1, X is N K, is K 1, and 1 x 21 ::::x K1 1 x 22 ::::x K2 : : :::: : 1 x 2N ::::x Kn 3 7 5, = 2 6 4 1 2 : : K 3 7 5 :

3 A typical row looks like: More on the Linear Model y i = 1 + 2 x 2i + 3 x 3i + ::: + K x Ki + " i The Least Squares Method: First Assumption: Ey = X S(b) = (y Xb) 0 (y Xb) = y 0 y 2b 0 X 0 y + b 0 X 0 Xb Normal Equations X 0 X ^ X 0 y = 0 These equations always have a solution. geometry to come) (Clear from If X 0 X is invertible ^ = (X 0 X) 1 X 0 y.

4 More on the Linear Model Proposition: ^ is a minimizer. Proof : Let b be any other K-vector. (y Xb) 0 (y Xb) = (y X ^ + X(^ b)) 0 (y X ^ + X(^ b)) = (y X ^) 0 (y X ^) + (^ b) 0 X 0 X(^ b) (y X ^)(y X ^). (Why?) De nition: e = y X ^ is the vector of residuals. Note: Ee = 0 and X 0 e = 0. Proposition: The LS estimator is unbiased. Proof : E^ = E[(X 0 X) 1 X 0 y] = E[(X 0 X) 1 X 0 (X + ")] =

5 Geometry of Least Squares Consider y = X + " with y = " y1 y 2 #, X = " x1 x 2 # : De nition: The space spanned by matrix X is the vector space which consists of all linear combinations of the column vectors of X. De nition: X(X 0 X) 1 X 0 y is the orthogonal projection of y to the space spanned by X. Proposition: e is perpendicular to X, i.e., X 0 e = 0. Proof : e = y X ^ = y X(X 0 X) 1 X 0 y e = (I X(X 0 X) 1 X 0 )y ) X 0 e = (X 0 X 0 )y = 0.

6 Geometry of Least Squares (cont d) Thus the equation y = X ^ + e gives y as the sum of a vector in R[X] and a vector in N[X 0 ]. Common (friendly) projection matrices: 1. The matrix which projects to the space orthogonal to the space spanned by X (i.e. to N[X 0 ] is M = I X(X 0 X) 1 X 0 : Note: e = My. If X is full column rank, M has rank (N K). 2. The matrix which projects to the space spanned by X is I M = X(X 0 X) 1 X 0 : Note: ^y = y e = y My = (I M)y. If X is full column rank, (I M) has rank K.

7 Example in R 2 y i = x i + " i i = 1; 2 y e xb X x What is the case of singular X 0 X?

8 Properties of projection matrices 1. Projection matrices are idempotent. I.G. (I M)(I M) = (I M): Proof : (I M)(I M) = (X(X 0 X) 1 X 0 )(X(X 0 X) 1 X 0 ) = X(X 0 X) 1 X 0 = (I M) 2. Idempotent matrices have eigenvalues equal to zero or one. Proof : Consider the characteristic equation Mz = z ) M 2 z = Mz = 2 z. Since M is idempotent, M 2 z = Mz. Thus, 2 z = z, which implies that is either 0 or 1:

9 Properties of projection matrices 3. The number of nonzero eigenvalues of a matrix is equal to its rank. ) For idempotent matrices, trace = rank. More assumptions to the K-variable linear model: Second assumption: and " are N-vectors. V (y) = V (") = 2 I N where y With this assumption, we can obtain the sampling variance of ^: Proposition: V (^) = 2 (X 0 X) 1 Proof : ^ = (X 0 X) 1 X 0 y

10 = (X 0 X) 1 X 0 X + (X 0 X) 1 X 0 " hence ^ = + (X 0 X) 1 X 0 "

11 cont d V (^) = E(^ E^)(^ E^) 0 = E(X 0 X) 1 X 0 "" 0 X(X 0 X) 1 V (^) = (X 0 X) 1 X 0 (E"" 0 )X(X 0 X) 1 = 2 (X 0 X) 1 Gauss-Markov Theorem: The LS estimator is BLUE. Proof : Consider estimating c 0 for some c. A possible estimator is c 0^ with variance 2 c 0 (X 0 X) 1 c. An alternative linear unbiased estimator: b = a 0 y. Eb = a 0 Ey = a 0 X. Since both c 0^ and b are unbiased, a 0 X = c 0.

12 Gauss-Markov Theorem(cont d) Thus, b = a 0 y = a 0 (X + ") = a 0 X + a 0 " = c 0 + a 0 ". Hence, V (b) = 2 a 0 a: Now, V (c 0^) = 2 a 0 X(X 0 X) 1 X 0 a since c 0 = a 0 X. So V (b) V (c 0^) = 2 a 0 Ma, p.s.d. Hence, V (b) V (c 0^)

13 Estimation of 2 Proposition: s 2 = e 0 e=(n for 2. K) is an unbiased estimator Proof : e = y X ^ = My = M" ) e 0 e = " 0 M" Ee 0 e = E" 0 M" = E tr " 0 M" (Why?) = tr E" 0 M" = tr EM"" 0 (important trick) = tr M E"" 0 = 2 tr M = 2 (N K) ) s 2 = e 0 e=(n K) is unbiased for 2 :

14 Fit: Does the Regression Model Explain the Data? We will need the useful idempotent matrix A = I 1(1 0 1) 1 1 0 = I 11 0 =N which sweeps out means. Here 1 is an N-vector of ones. Note that AM = M when X contains a constant term. De nition: case is The correlation coe cient in the K-variable R 2 = (Sum of squares due to X)/(Total sum of squares) = 1 (e 0 e=y 0 Ay).

15 More Fit Using A, y 0 Ay =: NP i=1 (y i y) 2 y 0 Ay = (Ay) 0 (Ay) = (A^y + Ae) 0 (A^y + Ae) = ^y 0 A^y + e 0 Ae since ^y 0 e = 0 Thus, y 0 Ay = ^y 0 A^y + e 0 e since Ae = e: Scaling yields: 1 = ^y0 A^y y 0 Ay + e0 e y 0 Ay What are the two terms of this splitup?

16 More Fit R 2 gives the fraction of variation explained by X: R 2 = 1 (e 0 e=y 0 Ay): Note: The adjusted squared correlation coe cient is given by R 2 = 1 e 0 e=(n K) y 0 Ay=(N 1) (Why might this be preferable?)

17 Reporting Always report characteristics of the sample, i.e. means, standard deviations, anything unusual or surprising, how the data set is collected and how the sample is selected. Report ^ and standard errors (not t-statistics). The usual format is ^ (s:e: of ^) Specify S 2 or 2 ML : Report N and R 2 : Plots are important. For example, predicted vs. actual values or predicted and actual values over time in time series studies should be presented.

18 Comments on Linearity Consider the following argument: Economic functions don t change suddenly. Therefore they are continuous. Thus they are di erentiable and hence nearly linear by Taylor s Theorem. This argument is false (but irrelevant). f(x) f(x) Continuous, not diff, but well-approximated by a line. x Continuous, diff, and not wellapproximated x

19 NOTE ON THE GAUSS-MARKOV THEOREM Consider estimation of a mean based on an observation X. Assume: X F and Z x df = ; Z x 2 df = 2 + 2 : () Suppose that the estimator for is h(x). Unbiasedness implies that Z h(x) df = : Theorem: The only function h unbiased for all F and satisfying () is h(x) = x.

20 Proof : Let h(x) = x + (x). Then Z (x) df = 0 for all F. Suppose, on the contrary, that (x) > 0 on some set A. Let 2 A and F = (the distribution assigning point mass to x = ). Then () is satis ed and Z h(x)d = + () 6= ; which is a contradiction. Argument is the same if (x) < 0. (x) = 0: This shows that The logic is that if the estimator is nonlinear, we can choose a distribution so that it is biased.