Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Similar documents
Chapter 6: Orthogonality

Pseudoinverse & Moore-Penrose Conditions

Stat 159/259: Linear Algebra Notes

Solutions to Review Problems for Chapter 6 ( ), 7.1

P = A(A T A) 1 A T. A Om (m n)

MATH 22A: LINEAR ALGEBRA Chapter 4

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Typical Problem: Compute.

Math Linear Algebra

MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products.

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam January 23, 2015

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Symmetric and self-adjoint matrices

Elementary linear algebra

Linear Models Review

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Chapter 6 - Orthogonality

Math 407: Linear Optimization

Linear Algebra- Final Exam Review

Lecture Notes 1: Vector spaces

October 25, 2013 INNER PRODUCT SPACES

Recall the convention that, for us, all vectors are column vectors.

2.2. Show that U 0 is a vector space. For each α 0 in F, show by example that U α does not satisfy closure.

Lecture notes: Applied linear algebra Part 1. Version 2

ECE 275A Homework #3 Solutions

LINEAR ALGEBRA SUMMARY SHEET.

Orthogonality and Least Squares

5. Orthogonal matrices

Chapter 4 Euclid Space

Normed & Inner Product Vector Spaces

Pseudoinverse & Orthogonal Projection Operators

COMP 558 lecture 18 Nov. 15, 2010

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Approximations - the method of least squares (1)

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Math Linear Algebra II. 1. Inner Products and Norms

Least squares problems Linear Algebra with Computer Science Application

5 Compact linear operators

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Chapter 6. Orthogonality

Maximum Likelihood Estimation

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

Spectral theory for compact operators on Banach spaces

Linear Algebra. Session 12

Miderm II Solutions To find the inverse we row-reduce the augumented matrix [I A]. In our case, we row reduce

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

MTH 2032 SemesterII

MATH 323 Linear Algebra Lecture 12: Basis of a vector space (continued). Rank and nullity of a matrix.

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

18.06 Quiz 2 April 7, 2010 Professor Strang

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

A Primer in Econometric Theory

Linear Algebra, Summer 2011, pt. 3

A Review of Linear Algebra

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

Sample ECE275A Midterm Exam Questions

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

The geometry of least squares

MATH Linear Algebra

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Linear algebra 2. Yoav Zemel. March 1, 2012

EECS 275 Matrix Computation

22 Approximations - the method of least squares (1)

Math 108b: Notes on the Spectral Theorem

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

MTH 2310, FALL Introduction

1 Last time: least-squares problems

Linear algebra review

SPECTRAL THEORY EVAN JENKINS

ECE 275A Homework # 3 Due Thursday 10/27/2016

MAT Linear Algebra Collection of sample exams

Eigenvalues and Eigenvectors A =

Linear Algebra Massoud Malek

STAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Mathematics Department Stanford University Math 61CM/DM Inner products

MATH 23a, FALL 2002 THEORETICAL LINEAR ALGEBRA AND MULTIVARIABLE CALCULUS Solutions to Final Exam (in-class portion) January 22, 2003

Review problems for MA 54, Fall 2004.

Lecture 2: Linear Algebra Review

STAT 350: Geometry of Least Squares

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #2 Solutions

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Elements of Convex Optimization Theory

NOTES ON BILINEAR FORMS

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

Linear Algebra Lecture Notes-II

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3

Chapter 2. Matrix Arithmetic. Chapter 2

. = V c = V [x]v (5.1) c 1. c k

LINEAR ALGEBRA W W L CHEN

Transcription:

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016 1. Let V be a vector space. A linear transformation P : V V is called a projection if it is idempotent. That is, if P 2 P 2. Exercise: If P is a projection then so is Q I P. Moreover, im Q ker P ker Q im P 3. Let P : V V a projection then V is a direct sum: V im P ker P Proof. Let v V. Then v P x + (I P )x im P + ker P Let y im P ker P. Since y im P there exists z V such that y P z. Since y ker P, P y P 2 z 0. By idempotency P 2 z P z and so y 0. 4. If W is a complete inner product space then a projection P in W is called orthogonal if im P and ker P are orthogonal subspaces. 5. Let P be a projection. The following are equivalent: (a) P is self-adjoint. (b) P is orthogonal. Proof: Let P be orthogonal. ker P are orthogonal. Since im P and P x, (y P y) (x P x), P y 1 http://pennance.us 0 Hence x, P y P x, P y P x, y x, P y Thus, P is self adoint. Conversely, if P is self adjoint P x, y P y P 2 x, y P y P x, P (I P )y P x, (P P 2 )y 0 6. Claim: Let W be a Hilbert space. If U is a closed (under the norm topology) subspace of W, then orthogonal projection on U exists. Proof: [From Wikipedia] Let x W and define f(u) x u, u U The infimum of f exists and by completeness f has a minimum at some u U. Let P x u. Clearly P 2 P. Now let e x P x. Then for any non zero vector u U: e e, u u 2 u 2 e 2 e, u 2 u 2 From this it follows that unless e, u 0. the vector satisfies w P x + e, u u u 2 x w < x P x contradicting minimality. Thus, for all u U, and x W x P x, u 0

In particular, x P x, P x 0. Also, for any u U (x + y) P (x + y), u 0 (x P x) + (y P y), u 0 Subtraction yields P x + P y P (x + y), u 0 Finally, choosing u P x+p y P (x+y) shows that: P x + P y P (x + y) By a similar argument λp x P (λx) for every scalar λ. Hence P is linear. 7. Orthogonal Projection Special Case Let a be a non zero column vector in IR n with span C(a) {ta : t IR} and x IR n. The orthogonal projection of x on C(a) is the unique point P a x C(a) which satisfies O x P a x, a 0 x Pa x Writing P a x ta and solving for t yields: x, a P a x a, a a 8. Exercise: The matrix of P a relative to the standard basis is is: P a(a T a) 1 a T a 9. Example: In statistics, the mean of a data vector y is determined by projection onto the vector 1 (1, 1,... 1) T IR n. Specifically: y, 1 P 1 (y) 1, 1 1 y 1 +... + y n 1 n ȳ1 We write ȳ for the mean vector ȳ1. 10. Claim: Let V be a finite dimensional inner product space and W be a proper subspace of V with basis (e 1,..., e k ). Let g be the metric matrix: (g ij ) e i, e j. If x V then,the orthogonal projection of x on W is given by: where Proof: P W x c 1 e 1 + c k e k e 1, x c g 1 e 2, x e k, x c 1 e 1 + c k e k + x x (1) Taking inner products with each of the e j k e j, e i c i e j, x i1 In matrix form e 1, x e 2, x gc e k, x Hence 11. Special Cases e 1, x c g 1 e 2, x e k, x (2) 2

(a) If the basis is orthonormal, then g is the k k identity matrix I, so that c i e i, x. In this case: k P W x e i, x e i i1 (b) If the basis is merely orthogonal then g is diagonal and easily inverted to give, c i e i, x e i, e i 12. Let V IR m and W V the column space C(A) of an m n real matrix A with independent columns A 1, A 2,..., A n, and y IR n then, (2) for projection onto W takes the form: A 1, y A 2, y gc A n, y Since (A T ) 1, y (A T ) 2, y (A T ) n, y A T y g(i, j) (A i, A j (A T ) i, A j A T A(i, j) The projection coefficient vector c is given by: A T Ac A T y In Statistics, this is called the normal equation. The orthogonal projection of y onto W C(A) is: P W y c i A i Ac A(A T A) 1 A T y Hence, projection onto the column space C(A) has matrix P A(A T A) 1 A T (3) 13. Equation (3) can be proven directly using properties of the row and column spaces. 14. Lemma: If the columns A i of A are independent then the (square) matrix A T A is invertible. Proof: It suffices to show that A T Ax 0 has unique solution x 0. A T Ax 0 x T A T Ax 0 (AX) T Ax 0 Ax 0 x i A i 0 Since the A i are independent it must be that x 0. 15. Claim: Let A IR m n have independent columns. Then the matrix P A(A T A) 1 A T determines an orthogonal projection onto C(A). Proof: Notice P 2 P so P is indeed a projection. Since P A A the subspace C(A) is P -invariant. Moreover, if x IR m and y (A T A) 1 A T x then P x Ay yi A i C(A) and so P is a projection onto C(A). Since the columns of A are independent, A has a left inverse. From this fact and the preceeding lemma we have: b ker P P b 0 A(A T A) 1 A T b 0 (A T A) 1 A T b 0 A T b 0 b N(A T ) 3

But the left nullspace N(A T ) is orthogonal to the the column space C(A). Since im P and ker P are orthogonal subspaces it follows that the projection P is orthogonal. 16. Equation (3) can also be obtained geometrically as follows. Let A IR m n has independent columns, and let matrix P represent orthogonal projection onto C(A). Let y IR n and ŷ P y. Since ŷ C(A) there exists ˆx such that Aˆx ŷ. By orthogonality (Aˆx y) C(A) and I P is the orthogonal projection onto N(A T ). Thus if y IR m and ŷ P y and e y ŷ, then the residual vector e belongs to the left nullspace. R(A) dim n ĉ 0 Aĉ ŷ P y ŷ y e + ŷ e ŷ C(A) dim n R m N(A T ) dim m n But C(A) N(A T ), hence A T (Aˆx y) 0 Since A T A is invertible it follows that ˆx (A T A) 1 A T y Thus the projection of y on C(A) is ŷ Aˆx where ˆx is the solution of. A T Aˆx A T y 17. The action of an arbitrary m n real matrix is illustrated in the following diagram (due to Strang). R(A) dim r R n N(A) dim n r x r x n 0 Ax r b Ax b x x r + x n Ax n 0 18. When A has independent columns, the nullspace N(A) is trivial, IR m C(A) N(A T ) b C(A) dim r R m N(A T ) dim m r 4 19. Example Consider the problem of finding the line y b + mx which best fits the points (1, 1), (2, 2) and (3, 2). An exact fit would require that: or in matrix form: b + 1m 1 b + 2m 2 b + 3m 2 1 1 [ ] 1 1 2 b 2 m 1 3 }{{} 2 }{{} x }{{} A y This system has no solution since y / C(A). The normal equation A T Aˆx A T y [ ] [ ] [ ] 3 6 ˆb 5 6 14 ˆm 11 Solving for the projection coefficients: [ ] [ ] 1 [ ] ˆb 3 6 5 ˆm 6 14 11 [ ] 2/3 1/2

The projection of y on the column space of A is: or equivalently ŷ Aˆx 2 1 1 + 1 1 2 3 2 1 3 7/6 5/3 13/6 ˆb + 1 ˆm 7/6 ˆb + 2 ˆm 5/3 ˆb + 3 ˆm 13/6 It follows that the regression line line ŷ ˆb + ˆmx contains the points (1, 7/6), (2, 5/3), and (3, 13/6). Since the projection ŷ is the point in C(A) which minimizes ŷ y it follows that the sum of the squares of the vertical distances of the data points (1, 1), (2, 2) and(3, 2) from the regression line is minimized. For this reason, the above procedure is known as the method of least squares. The vector e y ŷ 1/6 1/3 1/6 is called the residual vector and is orthogonal to C(A). 20. The Mean. The simplest case of least squares uses a model spanned by a single vector 1 (1, 1,..., 1) T IR n Let ȳ (y 1, y 2,..., y n ) T. As noted previously, the projection of ȳ on the vector 1 is just the mean vector ȳ ȳ1. This means that ȳ is the real number b IR which minimizes y b 1 2 (y i b) 2. We can also obtain this result using the normal equations. Let A (1, 1,..., 1) T and b (b, b,... b) T. The equation Ay b will not have a solution unless b belongs to the one dimensional column space C(A). The normal equation A T Aŷ A T b nŷ b ŷ ȳ so, as already proven, the projection of y on C(A) is Aȳ ȳ 1. Let v 1 1 IR n and extend to a basis: variance {}}{ v 1 1, v 2,... v n The difference C n (y) y ȳ which lies in the span of v 2,... v n is called the centering of y. The map C n is just the projection of y onto the subspace perpendicular to 1. The random variable S 2 n i1 (Y i Ȳ )2 Y Ȳ 2 provides an unbiased estimate of the variance of the random variable Y (of which, y is an instance). data: y mean: ȳ y ȳ }{{} C n(y) Finally we note that, since the projection is orthogonal, (y ȳ) x 5

21. Statistical Appendix Let Y (Y 1, Y 2,..., Y n ) be a vector of independent random variables with means and standard deviations specified by: and u IR n then (a) where µ (µ 1, µ 1,..., µ n ) (µ 1, µ 1,..., µ n ) σ (σ 1, σ 2,..., σ n ) E(Y u) u µ Var(Y u) u 2 σ 2 σ 2 (σ 2 1, σ 2 2,..., σ 2 n) u 2 (u 2 1, u 2 2,..., u 2 n). (b) If Y i (µ, σ 2 ) and u 1 Then E(Y u) (u 1)µ Var(Y u) σ 2 Moreover E(Y u) 0 whenever u 1 0. (c) If Y i (µ, σ 2 ) and u 1 and u 1 then E(Y.u) 2 σ 2. (d) If Y i (µ, σ 2 ) and u, v are unit vectors in IR n, the following are equivalent: i. Y u, Y v are independent ii. u v 0 (e) If Y i N(µ, σ 2 ) then and u 1 then Y u is also normal with variance σ 2. 22. Corollary. Let Y (Y 1, Y 2,..., Y n ) IR n be a random vector with Y i (µ, σ 2 ). Let S 2 n i1 (Y i Ȳ )2 Y Ȳ 2 then ES 2 n σ 2 Proof. Let u 1 1/ n IR n be extended (say, by Gram Schmidt) to an orthonormal basis u 1, u 2,..., u n. By orthonormal expansion: Y P ui Y Hence, i1 P u1 Y + P ui Y i2 Ȳ + P ui Y Y Ȳ 2 i2 P ui Y 2 i2 i2 ( u i Y ) 2 For i 2, u i 1 and therefore E(Y u i ) 2 σ 2. Since the vectors u i are orthogonal the projection coefficients are independent and it follows that ESn 2 σ 2. 23. Linear Regression in IR n. For n 2 an orthogonal basis is used in which u 1 1 n 1, u 2,..., u p are designated as basis vectors for the model space M and the remaining vectors u p+1,... u p+q as basis for the error space M. u } 1 1/ variance {}}{ n, u {{ 2,..., u } p, u p+1,... u p+q model space Notice that if u M, and Y (Y 1, Y 2,..., Y n ) IR n is a random vector with Y i (µ, σ 2 ) then (as in the proof of the corollary) E(Y u) 2 σ 2 and it follows that [ q i1 E (Y u ] p+i) 2 σ 2 q 6

In the special case n 2, q 1, and u 1 (1, 1)/ 2 u 2 ( 1, 1)/ 2 Then, as advertized: [ ] (Y u2 ) 2 E E 1 σ 2 [ (Y2 ) ] 2 Y 1 2 For details, please see [2]. response: y fit: ŷ mean ȳ y ȳ residuals: e y ŷ Model Space effect: ŷ ȳ Sources Linear Regression in Higher Dimensions -diagram taken from Pruim 1. Gilbert Strang, Introduction to Linear Algebra. 2. Statistical Methods: The Geometric Approach (Springer Texts in Statistics), by David Saville and Graham R. Wood. 3. Foundations and Applications of Statistics: An Introduction Using R (Pure and Applied Undergraduate Texts), by Randall Pruim 2. 2 Whose color scheme we have followed 7