Algebra of Least Squares

Similar documents
Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Simple Linear Regression

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

CLRM estimation Pietro Coretto Econometrics

Efficient GMM LECTURE 12 GMM II

Properties and Hypothesis Testing

Chapter Vectors

Regression, Inference, and Model Building

ECON 3150/4150, Spring term Lecture 3

Linear Regression Models, OLS, Assumptions and Properties

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Linear Regression Demystified

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

1 Inferential Methods for Correlation and Regression Analysis

R is a scalar defined as follows:

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

CHAPTER 5. Theory and Solution Using Matrix Techniques

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Mathematical Foundations -1- Sets and Sequences. Sets and Sequences

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Matrix Algebra from a Statistician s Perspective BIOS 524/ Scalar multiple: ka

Math 475, Problem Set #12: Answers

Math 451: Euclidean and Non-Euclidean Geometry MWF 3pm, Gasson 204 Homework 3 Solutions

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

Simple Linear Regression

The multiplicative structure of finite field and a construction of LRC

Supplemental Material: Proofs

Machine Learning for Data Science (CS 4786)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

, then cv V. Differential Equations Elements of Lineaer Algebra Name: Consider the differential equation. and y2 cos( kx)

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

b i u x i U a i j u x i u x j

Recurrence Relations

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Introduction to Optimization Techniques

M A T H F A L L CORRECTION. Algebra I 1 4 / 1 0 / U N I V E R S I T Y O F T O R O N T O

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Correlation Regression

Machine Learning for Data Science (CS 4786)

Topic 9: Sampling Distributions of Estimators

Chimica Inorganica 3

CS / MCS 401 Homework 3 grader solutions

Linearly Independent Sets, Bases. Review. Remarks. A set of vectors,,, in a vector space is said to be linearly independent if the vector equation

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

(all terms are scalars).the minimization is clearer in sum notation:

(VII.A) Review of Orthogonality

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model

Symmetric Matrices and Quadratic Forms

Statistical Properties of OLS estimators

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

STP 226 ELEMENTARY STATISTICS

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

The Method of Least Squares. To understand least squares fitting of data.

Lecture 8: October 20, Applications of SVD: least squares approximation

Math 61CM - Solutions to homework 3

Homework Set #3 - Solutions

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Stat 139 Homework 7 Solutions, Fall 2015

Zeros of Polynomials

5.1 Review of Singular Value Decomposition (SVD)

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

9. Simple linear regression G2.1) Show that the vector of residuals e = Y Ŷ has the covariance matrix (I X(X T X) 1 X T )σ 2.

Lecture 7: Properties of Random Samples

Question 1: Exercise 8.2

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

MATH10212 Linear Algebra B Proof Problems

Optimally Sparse SVMs

Lecture 3 The Lebesgue Integral

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Mathematical Notation Math Introduction to Applied Statistics

Vector Spaces and Vector Subspaces. Remarks. Euclidean Space

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

10-701/ Machine Learning Mid-term Exam Solution

MATH 304: MIDTERM EXAM SOLUTIONS

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Real Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B)

Ma 4121: Introduction to Lebesgue Integration Solutions to Homework Assignment 5

Solutions to home assignments (sketches)

8. Applications To Linear Differential Equations

Determinants of order 2 and 3 were defined in Chapter 2 by the formulae (5.1)

APPLICATION OF YOUNG S INEQUALITY TO VOLUMES OF CONVEX SETS

Full file at

ECON 3150/4150, Spring term Lecture 1

Axioms of Measure Theory

Lecture 15: Learning Theory: Concentration Inequalities

Solution to Chapter 2 Analytical Exercises

The axial dispersion model for tubular reactors at steady state can be described by the following equations: dc dz R n cn = 0 (1) (2) 1 d 2 c.

Finally, we show how to determine the moments of an impulse response based on the example of the dispersion model.

Simple Regression Model

1 General linear Model Continued..

Math 312 Lecture Notes One Dimensional Maps

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Eigenvalues and Eigenvectors

Transcription:

October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal idepedet variable X: X X 1 X 2 X 1,1 X 1,2 X 1,k X 2,1 X 2,2 X 2,k ad Y Y 1 Y 2 X X,1 X,k X,k k Y 1 We ca thik of Y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca defie a subspace of R called the colum space of a k matrix X, that is a collectio of all vectors i R that ca be writte as liear combiatios of the colums of X: SX { z R : z Xb, b b 1, b 2,, b k R k} For two vectors a, b i R, the distace betwee a ad b is give by the Euclidea orm 1 of their differece a b a b a b Thus, the least squares problem, miimizatio of the sumof-squared errors Y Xb Y Xb, is to fid, out of all elemets of SX, the oe closest to Y : mi Y y SX y 2 The closest poit is foud by "droppig a perpedicular" That is, a solutio to the least squares problem, Ŷ X β must be chose so that the residual vector ê Y Ŷ to each colum of X: ê X 0 is orthogoal perpedicular As a result, ê is orthogoal to every elemet of SX Ideed, if z SX, the there exists b R k such that z Xb, ad SX: ê z ê Xb 0 The collectio of the elemets of R orthogoal to SX is called the orthogoal complemet of S X { z R : z X 0 } Every elemet of S X is orthogoal to every elemet i SX 1 For a vector x x 1, x 2,, x, its Euclidea orm is defied as x x x i1 x2 i 1

The solutio to the least squares problem is give by Ŷ X β X X X 1 X Y P X Y, where P X X X X 1 X is called the orthogoal projectio matrix For ay vector z R, P X z SX Furthermore, the residual vector will be i S X: z P X z S X 1 To show 1, first ote, that, sice the colums of X are i SX, P X X X X X 1 X X X, ad, sice P X is a symmetric matrix, X P X X Now, X z P X z X z X P X z X z X z 0 Thus, by the defiitio, the residuals z P X z belogs to S X The residuals ca be writte as ê Y P X Y I P X Y M X Y, where is a projectio matrix oto S X M X I P X I X X X 1 X, The projectio matrices P X ad M X have the followig properties: 2

P X + M X I This implies, that for ay z R, z P X z + M X z Symmetric: P X P X, M X M X Idempotet: P X P X P X, ad M X M X M X P X P X X X X 1 X X X X 1 X X X X 1 X P X M X M X I P X I P X I 2P X + P X P X I P X M X Orthogoal: P X M X P X I P X P X P X P X P X P X 0 This property implies that M X X 0: M X X I P X X X P X X X X 0 Note that, i the above discussio, oe of statistical assumptios such as E e i X i 0 have bee used Give data, Y ad X, oe ca always perform least squares, regardless of what data geeratig process stads behid the data However, oe eeds a model to discuss the statistical properties of a estimator such as ubiasedess ad etc 3

Partitioed regressio We ca partitio the matrix of regressors X as follows: X [X 1 X 2 ], ad write the model as Y X 1 β 1 + X 2 β 2 + e, where X 1 is a k 1 matrix, X 2 is k 2, k 1 + k 2 k, ad β where β 1 ad β 2 are k 1 ad k 2 -vectors respectively Such a decompositio allows oe to focus o a group of variables ad their correspodig parameters, say X 1 ad β 1 If β 1 β 2 β β1 β 2 the oe ca write the followig versio of the ormal equatios first-order coditios of the least square: X X β X Y,, as X 1X 1 X 1X 2 X 2X 1 X 2X 2 β1 β 2 X 1Y X 2Y Oe ca obtai the expressios for β 1 ad β 2 by ivertig the partitioed matrix o the left-had side of the equatio above Alteratively, let s defie M 2 to be the projectio matrix o the space orthogoal to the space S X 2 : M 2 I X 2 X 1 2 X 2 X 2 The, I order to show that, first write β 1 X 1M 2 X 1 1 X 1 M 2 Y 2 Y X 1 β1 + X 2 β2 + ê 3 Note that by the costructio, M 2 ê ê ê is orthogoal to X 2, M 2 X 2 0, X 1ê 0, X 2ê 0 4

Substitute equatio 3 ito the right-had side of equatio 2: X 1 1 M 2 X 1 X 1 M 2 Y X 1 1M 2 X 1 X 1 M 2 X 1 β1 + X 2 β2 + ê X 1M 2 X 1 1 X 1 M 2 X 1 β1 + X 1M 2 X 1 1 X 1 ê M 2 X 2 0 ad M 2 ê ê β 1 where Sice M 2 is symmetric ad idempotet, oe ca write β 1 M 2 X 1 M 2 X 1 1 M 2 X 1 M 2 Y X 1 X 1 1 X 1Ỹ, X 1 M 2 X 1 X 1 X 2 X 1 2 X 2 X 2 X 1 residuals from the regressio of colums of X 1 o X 2, Ỹ M 2 Y Y X 2 X 1 2 X 2 X 2 Y residuals from the regressio of Y o X 2 Thus, to obtai coefficiets for the first k 1 regressors, istead of ruig the full regressio with k 1 + k 2 regressors, oe ca regress Y o X 2 to obtai the residuals Ỹ, regress X 1 o X 2 to obtai the residuals X 1, ad the regress Ỹ o X 1 to obtai β 1 I other words, β 1 shows the effect of X 1 after cotrollig for X 2 Similarly to β 1, oe ca write: β 2 X 1 2M 1 X 2 X 2 M 1 Y, where M 1 I X 1 X 1 1 X 1 X 1 For example, cosider a simple regressio for i 1,, Y i β 1 + β 2 X i + e i, Let s defie a -vector of oes: 1 1 1 1 1 5

I this case, the matrix of regressors is give by 1 X 1 1 X 2 1 X 1 X Cosider ad Now, 1 1 Therefore, M 1 I 1 1 1 1 1, β 2 X M 1 Y X M 1 X M 1 I 1 11, ad M 1 X X 1 1 X X X1 X 1 X X 2 X X X, where X is the sample average: X 1 X 1 X i Thus, the matrix M 1 trasforms the vector X ito the vector of deviatios from the average We ca write i1 β 2 i1 Xi X Y i i1 Xi X 2 i1 Xi X Y i Y i1 Xi X 2 Goodess of fit Write Y P X Y + M X Y Ŷ + ê, 6

where, by the costructio, Ŷ ê P X Y M X Y Y P X M X Y 0 Suppose that the model cotais a itercept, ie the first colum of X is the vector of oes 1 The total variatio i Y is Yi Y 2 Y M 1 Y i1 Ŷ + ê M 1 Ŷ + ê Ŷ M 1 Ŷ + Ŷ M 1 ê + 2Ŷ M 1 ê Sice the model cotais a itercept, 1 ê 0, ad M 1 ê ê However, Ŷ ê 0, ad, therefore, Y M 1 Y Ŷ M 1 Ŷ + ê ê, or Yi Y 2 2 Ŷi Ŷ + ê 2 i Note that i1 i1 i1 Y 1 Y 1 Ŷ 1 Ŷ Ŷ + 1 ê Hece, the averages of Y ad its predicted values Ŷ are equal, ad we ca write: Yi Y 2 2 Ŷi Y + ê 2 i, or 4 i1 i1 i1 T SS ESS + RSS, where T SS Yi Y 2 total sum-of-squares, i1 7

ESS RSS i1 2 Ŷi Y explaied sum-of-squares, ê 2 i residual sum-of-squares i1 The ratio of the ESS to the T SS is called the coefficiet of determiatio or R 2 : i1 Ŷi Y 2 R 2 i1 Yi Y 2 1 i1 ê2 i i1 Yi Y 2 1 ê ê Y M 1 Y Properties of R 2 : Bouded betwee 0 ad 1 as implied by decompositio 4 This property does ot hold if the model does ot have a itercept, ad oe should ot use the above defiitio of R 2 i this case If R 2 1 the ê ê 0, which ca happe oly if Y SX, ie Y is exactly a liear combiatio of the colums of X R 2 icreases by addig more regressors Suppose we have observatios o regressors Z 1,, Z k ad W 1,, W m ad depedet variable Y Cosider two regressios: the log regressio with all regressors ad the short regressio with oly Z 1,, Z k It ca be show that the R 2 of the log regressio must be smaller or equal to the R 2 of the short regressio R 2 shows how much of the sample variatio i Y was explaied by X However, our objective is to estimate populatio relatioships ad ot to explai the sample variatio High R 2 is ot ecessary a idicator of the good regressio model, ad a low R 2 is ot a evidece agaist it Sice R 2 icreases with iclusio of additioal regressors, istead researchers ofte report the adjusted coefficiet of determiatio R 2 : R 2 1 1 1 R 2 k 1 ê ê/ k Y M 1 Y / 1 The adjusted coefficiet of determiatio discouts the fit whe the umber of the regressors k is large relative to the umber of observatios R 2 may decrease with k However, there is o strog argumet for usig such a adjustmet 8