First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Similar documents
Algebra of Least Squares

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

CLRM estimation Pietro Coretto Econometrics

Simple Linear Regression

(all terms are scalars).the minimization is clearer in sum notation:

ECON 3150/4150, Spring term Lecture 3

1 Inferential Methods for Correlation and Regression Analysis

Statistical Properties of OLS estimators

Regression, Inference, and Model Building

Simple Regression Model

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Correlation Regression

Linear Regression Demystified

Properties and Hypothesis Testing

Linear Regression Models, OLS, Assumptions and Properties

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Linear Regression Models

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

1 General linear Model Continued..

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

Zeros of Polynomials

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Topic 9: Sampling Distributions of Estimators

UNIT 11 MULTIPLE LINEAR REGRESSION

Full file at

STP 226 ELEMENTARY STATISTICS

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Chimica Inorganica 3

Lecture 11 Simple Linear Regression

11 Correlation and Regression

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Efficient GMM LECTURE 12 GMM II

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Ma 530 Introduction to Power Series

CHAPTER I: Vector Spaces

Chapter Vectors

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Topic 9: Sampling Distributions of Estimators

Simple Linear Regression

Polynomial Functions and Their Graphs

Math 475, Problem Set #12: Answers

Topic 9: Sampling Distributions of Estimators

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

University of California, Los Angeles Department of Statistics. Practice problems - simple regression 2 - solutions

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Linear Programming and the Simplex Method

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Math 61CM - Solutions to homework 3

Stat 139 Homework 7 Solutions, Fall 2015

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Matrix Representation of Data in Experiment

Statistics 511 Additional Materials

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Lecture 19: Convergence

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

Random Models. Tusheng Zhang. February 14, 2013

The Method of Least Squares. To understand least squares fitting of data.

MAT 271 Project: Partial Fractions for certain rational functions

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

4 Multidimensional quantitative data


STP 226 EXAMPLE EXAM #1

The Ratio Test. THEOREM 9.17 Ratio Test Let a n be a series with nonzero terms. 1. a. n converges absolutely if lim. n 1

Kinetics of Complex Reactions

Chapter 13, Part A Analysis of Variance and Experimental Design

a for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a

Properties and Tests of Zeros of Polynomial Functions

Asymptotic Results for the Linear Regression Model

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

Chapter 1 Simple Linear Regression (part 6: matrix version)

Determinants of order 2 and 3 were defined in Chapter 2 by the formulae (5.1)

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Why learn matrix algebra? Vectors & Matrices with statistical applications. Brief history of linear algebra

MATH 304: MIDTERM EXAM SOLUTIONS

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Lecture 7: Properties of Random Samples

Least-Squares Regression

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Week 5-6: The Binomial Coefficients

University of California, Los Angeles Department of Statistics. Simple regression analysis

Random Variables, Sampling and Estimation

Unit 4: Polynomial and Rational Functions

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Transcription:

0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical usefuless but are good exercises for uderstadig the algebra of least squares. Partitioed ad partial regressio is ot treated as seriously as it might be i a graduate course, ad is oly give a cursory presetatio. However, the residual maker matrix Mi is preseted, ad is used i to defie R 2, ad i several other parts of the course. 2. Some basic properties of OLS First, ote that the LS residuals are orthogoal to the regressors X Xb X y = 0 ( ormal equatios ; (k ) ) So, X (y Xb) = X e = 0 ; or, X e = 0 Orthogoality implies liear idepedece (but ot vice versa). See Liearly Idepedet, Orthogoal, ad Ucorrelated Variables (Rodgers et al., 984), for the defiitio ad relatioship betwee the three terms. If the model icludes a itercept term, the oe regressor (say, the first colum of X) is a uit vector. I this case we get some further results:. The LS residuals sum to zero From the first elemet: x k e X e = ( ) (e ) = ( ) ( ) x k e x k x k e i e i 0 = (? ) = ( )? 0 i= e i = 0

2. Fitted regressio passes through sample mea X y = X Xb, y x k b or, ( ) ( ) = ( ) ( ) ( ). x k x k y x k x k x k b k i? y i i x i2 So, ( ) = (?? ) ( ).??? b k b From the first row of this vector equatio or, i y i = b + b 2 x i2 + + b k x ik i i y = b + b 2 x 2 + + b k x k 3. Sample mea of the fitted y-values equals sample mea of actual y-values So, y i = x i β + ε i = x i b + e i = y i + e i. or, i=, y i = y i= i + e i= i y = y + 0 = y Note: These last 3 results use the fact that the model icludes a itercept. 2.2 Partitioed ad Partial Regressio We ca solve for blocks of b. Suppose we partitio our model ito 2 blocks: y = X β + X 2 β 2 + ε. ( ) ( k)(k ) ( k2)(k2 ) ( ) We could solve for the OLS estimator of β oly: Whe does b = (X X ) X y? b = (X X ) X y (X X ) X X 2 b 2

2 Above we have a solutio for b i terms of b 2. We ca substitute i a similar solutio for b 2 to obtai a solutio for b that is a fuctio of oly the X ad y data. Defie: M 2 = (I X 2 (X 2 X 2 ) X 2 ) M2 is a idempotet matrix M 2 M 2 = M 2 M 2 = M 2 = M 2 M 2. The, we ca write: b = (X M 2 X ) X M 2 y By the symmetry of the problem, we ca iterchage the ad 2 subscripts, ad get: b 2 = (X 2 M X 2 ) X 2 M y So, fially, we ca write: b = (X X ) X y b 2 = (X 2 X 2 ) X 2 y 2 where: X = M 2 X ; X 2 = M X 2 ; y = M 2 y ; y 2 = M y These results are importat for several reasos: Historically, for computatioal reasos. See the Frisch-Waugh-Lovell Theorem. I certai situatios b ad b 2 may have differet properties. This is difficult to show without have separate formulas. Havig established the Mi residual maker matrix, it is used frequetly to derive other results. Why is Mi called a residual maker matrix?

3 2.3 Goodess-of-Fit Oe way of measurig the quality of fitted regressio model is by the extet to which the model explais the sample variatio for y. Sample variace of y is ( ) (y i y) 2 i=. Or, we could just use i= (y i y) 2 to measure variability. Our fitted regressio model, usig LS, gives us y = Xb + e = y + e where y = Xb = X(X X) X y Recall that if the model icludes a itercept, the the residuals sum to zero, ad y = y. To simplify thigs, itroduce the followig matrix: M 0 = [I ii ] where: i = ( ) ; ( ) Note that: M 0 is a idempotet matrix. M 0 i = 0. M 0 trasforms elemets of a vector ito deviatios from sample mea. y M 0 y = y M 0 M 0 y = i= (y i y) 2. Let s check the third of these results: 0 / / M 0 y = {[ ] [[ ]]} ( ) 0 / / y y

4 y y y 2 y y y = [ ] = ( ). y y y 2 y y y Returig to our fitted model: y = Xb + e = y + e So, we have: M 0 y = M 0 y + M 0 e = M 0 y + e. [M 0 e = e ; because the residuals sum to zero.] The y M 0 y = y M 0 M 0 y = (M 0 y + e) (M 0 y + e) = y M 0 y + e e + 2e M 0 y However, e M 0 y = e M 0 y = (M 0 e) y = e y = e X(X X) X y = 0. So, we have Recall: y = y. y M 0 y = y M 0 y + e e i= (y i y) 2 = i= (y i y) 2 2 + e i i= SST = SSR + SSE This lets us defie the Coefficiet of Determiatio R 2 = ( SSR SST ) = (SSE SST )

5 Note: The secod equality i defiitio of R 2 holds oly if model icludes a itercept. R 2 = ( SSR SST ) 0 R 2 = ( SSE SST ) So, 0 R 2 Iterpretatio of 0 ad? R 2 is uitless. What happes if we add ay regressor(s) to the model? y = X β + ε ; [] The: y = X β + X 2 β 2 + u ; [2] (A) Applyig LS to [2]: mi. (u u ) ; u = y X b X 2 b 2 (B) Applyig LS to []: mi. (e e) ; e = y X β Problem (B) is just Problem (A), subject to restrictio: β 2 = 0. Miimized value i (A) must be miimized value i (B). So, u u e e. What does this imply? Addig ay regressor(s) to the model caot icrease (ad typically will decrease) the sum of squared residuals. So, addig ay regressor(s) to the model caot decrease (ad typically will icrease) the value of R 2. Meas that R 2 is ot really a very iterestig measure of the quality of the regressio model, i terms of explaiig sample variability of the depedet variable. For these reasos, we usually use the adjusted Coefficiet of Determiatio. We modify R 2 = [ e e y M 0 y ] to become:

6 R 2 = [ e e/( k) y M 0 y/( ) ]. What are we doig here? We re adjustig for degrees of freedom i umerator ad deomiator. Degrees of freedom = umber of idepedet pieces of iformatio. e = y Xb. We estimate k parameters from the data-poits. We have ( k) degrees of freedom associated with the fitted model. I deomiator have costructed y from sample. Lost oe degree of freedom. Possible for R 2 < 0 (eve with itercept i the model). R 2 ca icrease or decrease whe we add regressors. Whe will it icrease (decrease)? I multiple regressio, R 2 will icrease (decrease) if a variable is deleted, if ad oly if the associated t-statistic has absolute value less tha (greater tha) uity. If model does t iclude a itercept, the SST SSR + SSE, ad i this case o loger ay guaratee that 0 R 2. Must be careful comparig R 2 ad R 2 values across models. Example () C i = 0.5 + 0.8Y i ; R 2 = 0.90 (2) log (C i ) = 0.2 + 0.75Y i ; R 2 = 0.80 Sample variatio is i differet uits.