A Primer in Econometric Theory

Similar documents
Chapter 6: Orthogonality

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Lecture 1: Basic Concepts

MTH 2310, FALL Introduction

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

MTH 2032 SemesterII

Typical Problem: Compute.

Linear Algebra Massoud Malek

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Inner Product and Orthogonality

Math Linear Algebra

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

CHAPTER II HILBERT SPACES

LINEAR ALGEBRA W W L CHEN

MAT Linear Algebra Collection of sample exams

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Math 3191 Applied Linear Algebra

Algebra II. Paulius Drungilas and Jonas Jankauskas

Inner Product Spaces

Elementary linear algebra

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Lecture 10: Vector Algebra: Orthogonal Basis

Linear Models Review

Math Linear Algebra II. 1. Inner Products and Norms

Section 6.2, 6.3 Orthogonal Sets, Orthogonal Projections

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Then x 1,..., x n is a basis as desired. Indeed, it suffices to verify that it spans V, since n = dim(v ). We may write any v V as r

Elements of Convex Optimization Theory

The Gram Schmidt Process

The Gram Schmidt Process

Chapter 6 - Orthogonality

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

6. Orthogonality and Least-Squares

NOTES on LINEAR ALGEBRA 1

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections

22 Approximations - the method of least squares (1)

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

OHSX XM511 Linear Algebra: Multiple Choice Exercises for Chapter 2

MATH Linear Algebra

INNER PRODUCT SPACE. Definition 1

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

Vectors. Vectors and the scalar multiplication and vector addition operations:

Applied Linear Algebra in Geoscience Using MATLAB

Chapter 1 Preliminaries

Linear Algebra. Session 12

5 Compact linear operators

Approximations - the method of least squares (1)

Matrix Algebra: Vectors

Math 3191 Applied Linear Algebra

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

March 27 Math 3260 sec. 56 Spring 2018

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

Functional Analysis HW #5

Chapter 6. Orthogonality and Least Squares

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Lecture 1: Review of linear algebra

Exercises for Unit I (Topics from linear algebra)

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Chapter 3 Transformations

Abstract Vector Spaces

Lecture Notes 1: Vector spaces

2. Review of Linear Algebra

Numerical Linear Algebra

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Vector Spaces, Orthogonality, and Linear Least Squares

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Section 6.1. Inner Product, Length, and Orthogonality

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Chapter 2 Linear Transformations

(v, w) = arccos( < v, w >

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

The following definition is fundamental.

Exercises for Unit I (Topics from linear algebra)

Mathematics Department Stanford University Math 61CM/DM Inner products

10 Orthogonality Orthogonal subspaces

I teach myself... Hilbert spaces

Mathematical Methods wk 1: Vectors

Mathematical Methods wk 1: Vectors

Lecture 7. Econ August 18

Basic Elements of Linear Algebra

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

Exercise Sheet 1.

MATH 115A: SAMPLE FINAL SOLUTIONS

3 Orthogonality and Fourier series

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Lecture 2: Linear Algebra Review

Announcements Monday, November 20

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector Spaces. distributive law u,v. Associative Law. 1 v v. Let 1 be the unit element in F, then

Orthogonal Complements

A Do It Yourself Guide to Linear Algebra

Math 290, Midterm II-key

This pre-publication material is for review purposes only. Any typographical or technical errors will be corrected prior to publication.

Lecture 6: Geometry of OLS Estimation of Linear Regession

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.

Transcription:

A Primer in Econometric Theory Lecture 1: Vector Spaces John Stachurski Lectures by Akshay Shanker May 5, 2017 1/104

Overview Linear algebra is an important foundation for mathematics and, in particular, for Econometrics: performing basic arithmetic on data solving linear equations using data advanced operations such as quadratic minimisation Focus of this chapter: 1. vector spaces: linear operations, norms, linear subspaces, linear independence, bases, etc. 2. orthogonal projection theorem 2/104

Vector Space The symbol R N represents set of all vectors of length N, or N vectors An N-vector x is a tuple of N real numbers: x = (x 1,..., x N ) where x n R for each n We can also write x vertically, like so: x = x 1 x 2. x N 3/104

(-3, 3) 4 (2, 4) 2 0 4 2 0 2 4 2 (-4, -3.5) 4 Figure: Three vectors in R 2 4/104

The vector of ones will be denoted 1 1 := 1. 1 Vector of zeros will be denoted 0 0 := 0. 0 5/104

Linear Operations Two fundamental algebraic operations: 1. Vector addition 2. Scalar multiplication 1. Sum of x R N and y R N defined by x + y :=: x 1 x 2. + y 1 y 2. := x 1 + y 1 x 2 + y 2. x N y N x N + y N 6/104

7/104 Vector Space Example 1: 1 2 3 4 + 2 4 6 8 := 3 6 9 12 Example 2: 1 2 3 4 + 1 1 1 1 := 2 3 4 5

Figure: Vector addition 8/104

2. Scalar product of α R and x R N defined by αx = α x 1 x 2. := αx 1 αx 2. x N αx N 9/104

10/104 Vector Space Example 1: 0.5 1 2 3 4 := 0.5 1.0 1.5 2.0 Example 2: 1 1 2 3 4 := 1 2 3 4

Figure: Scalar multiplication 11/104

Subtraction performed element by element, analogous to addition x y := x 1 y 1 x 2 y 2. x N y N Definition can be given in terms of addition and scalar multiplication x y := x + ( 1)y 12/104

Figure: Difference between vectors 13/104

Inner Product The inner product of two vectors x and y in R N is denoted by x, y, and defined as the sum of the products of their elements: x, y := N x n y n n=1 14/104

Fact. (2.1.2) For any α, β R and any x, y R N, the following statements are true: 1. x, y = y, x, 2. αx, βy = αβ x, y, and 3. x, αy + βz = α x, y + β x, z. Properties easy to check using definitions of scalar multiplication and inner product For example, to show 2., pick any α, β R and any x, y R N : αx, βy = N n=1 αx n βy n = αβ N n=1 x n y n = αβ x, y 15/104

Norms and Distance The (Euclidean) norm of x R N is defined as x := x, x Interpretation: x represents the length of x x y represents distance between x and y 16/104

Fact. (2.1.2) For any α R and any x, y R N, the following statements are true: 1. x 0 and x = 0 if and only if x = 0 2. αx = α x 3. x + y x + y (triangle inequality) 4. x y x y (Cauchy-Schwarz inequality) 17/104

Properties 1. and 2. are straight-forward to prove (exercise) Property 4. is addressed in ET exercise 3.5.33 To show property 3, by properties of the inner product in fact 2.1.1 x + y 2 = x + y, x + y = x, x + 2 x, y + y, y x, x + 2 x, y + y, y 18/104

Apply the Cauchy Schwarz inequality x + y 2 ( x + y ) 2 Taking the square root gives the triangle inequality 19/104

A linear combination of vectors x 1,..., x K in R N is a vector y = K α k x k = α 1 x 1 + + α K x K k=1 where α 1,..., α K are scalars Example. 0.5 6.0 2.0 8.0 + 3.0 0 1.0 1.0 = 3.0 4.0 1.0 20/104

α1 = 0.7, α2 = 1.1 α1 = 1.2, α2 = 0.1 x2 2 1 x1 x2 2 1 x1 2 1 1 2 1 2 y 2 1 1 2 1 y 2 α1 = 0.6, α2 = 1.1 α1 = 1.0, α2 = 0.4 x2 y 2 1 x1 x2 2 1 y x1 2 1 1 2 1 2 2 1 1 2 1 2 Figure: Linear combinations of x 1, x 2 21/104

Span Let X R N be any nonempty set Set of all possible linear combinations of elements of X is called the span of X, denoted by span(x) For finite X := {x 1,..., x K } the span can be expressed as span(x) := { all } K α k x k such that (α 1,..., α K ) R K k=1 22/104

Example. The four vectors labeled y in the previous figure lie in the span of X = {x 1, x 2 } Can any vector in R 2 be created as a linear combination of x 1, x 2? The answer is affirmative. We ll prove this in 2.1.5 23/104

Example. Let X = {1} R 2, where 1 := (1, 1) The span of X is all vectors of the form ( ) α α1 = with α α R Constitutes a line in the plane that passes through the vector 1 (set α = 1) the origin 0 (set α = 0) 24/104

3 2 1 0 3 2 1 0 1 2 3 1 2 3 Figure: The span of 1 := (1, 1) in R 2 25/104

Example. The set of canonical basis vectors {e 1,..., e N } is linearly independent in R N Proof.Let α 1,..., α N be coefficients such that N k=1 α ke k = 0 Equivalently, α 1 α 2. α N N = α k e k = 0 = k=1 0 0. 0 In particular, α k = 0 for all k 26/104

Example. Let x 1 = (3, 4, 2) and let x 2 = (3, 4, 0.4) By definition, the span is all vectors of the form y = α 3 4 2 + β 3 4 0.4 where α, β R This is a plane that passes through the vector x 1 the vector x 2 the origin 0 27/104

0 x 1 0 x 1 x 2 x 2 0 0 0 0 Figure: Span of x 1, x 2 28/104

29/104 Vector Space Example. Consider the vectors {e 1,..., e N } R N, where e 1 := 1 0. 0, e 2 := 0 1. 0,, e N := 0 0. 1 That is, e n has all zeros except for a 1 as the n-th element Vectors e 1,..., e N are called the canonical basis vectors of R N

e 2 = (0, 1) y = y 1 e 1 + y 2 e 2 e 1 = (1, 0) Figure: Canonical basis vectors in R 2 30/104

Example. (cont.) The span of {e 1,..., e N } is equal to all of R N Proof for N = 2: Pick any y R 2, we have ( ) ( y1 y1 y := = 0 y 2 Thus, y span{e 1, e 2 } = y 1 ( 1 0 ) ( 0 + y 1 ) ) + y 2 ( 0 1 ) = y 1 e 1 + y 2 e 2 Since y arbitrary, we have shown span{e 1, e 2 } = R 2 31/104

Example. Consider the set P := {(x 1, x 2, 0) R 3 : x 1, x 2 R} Graphically, P = flat plane in R 3, where height coordinate = 0 0 0 0 32/104

Example. (cont.) If e 1 = (1, 0, 0) and e 2 = (0, 1, 0), then span{e 1, e 2 } = P To verify the claim, let x = (x 1, x 2, 0) be any element of P. We can write x as x 1 1 0 x = x 2 = x 1 0 + x 2 1 = x 1 e 1 + x 2 e 2 0 0 0 In other words, P span{e 1, e 2 } Conversely we have span{e 1, e 2 } P (why?) 33/104

e 2 0 e 1 0 0 Figure: span{e 1, e 2 } = P 34/104

Fact. (2.1.3) If X and Y are non-empty subsets of R N and X Y, then span(x) span(y) Proof.Pick any nonempty X Y R N Let z span(x), we have z = K α k x k for some x 1,..., x K X, α 1,..., α K R k=1 35/104

Proof.(cont.) Since X Y, each x k is also in Y, giving us z = K α k x k for some x 1,..., x K Y, α 1,..., α K R k=1 Hence z span(y) 36/104

Linear Independence Important applied questions: When is a matrix invertible? When do regression arguments suffer from collinearity? When does a set of linear equations have a solution? All of these questions closely related to linear independence 37/104

Definition A nonempty collection of vectors X := {x 1,..., x K } R N is called linearly independent if K α k x k = 0 = α 1 = = α K = 0 k=1 Informally, linearly independent sets span large spaces 38/104

Example. Consider the two vectors x 1 = (1.2, 1.1) and x 2 = ( 2.2, 1.4) Suppose α 1 and α 2 are scalars with α 1 ( 1.2 1.1 ) + α 2 ( 2.2 1.4 ) = 0 This translates to a linear, two-equation system, where the unknowns are α 1 and α 2 The only solution is α 1 = α 2 = 0 {x 1, x 2 } is linearly independent 39/104

α 2 α 2 = (1.2/2.2)α 1 α 2 = (1.1/1.4)α 1 1 α 1 2 1 0 1 2 1 Figure: The only solution is α 1 = α 2 = 0 40/104

Example. The set of canonical basis vectors {e 1,..., e N } is linearly independent in R N To see this, let α 1,..., α N be coefficients such that k=1 N α ke k = 0. We have α 1 0 α 2 0. α N N = α k e k = 0 = k=1. 0 In particular, α k = 0 for all k Hence {e 1,..., e N } linearly independent 41/104

Theorem. (2.1.1) Let X := {x 1,..., x K } R N. For K > 1, the following statements are equivalent: 1. X is linearly independent 2. X 0 is a proper subset of X = span X 0 is a proper subset of span X 3. No vector in X can be written as a linear combination of the others Proof is an exercise. See ET ex. 2.4.15 and solution 42/104

Example. Dropping any of the canonical basis vectors reduces span Consider the N = 2 case We know span{e 1, e 2 } = R 2 removing either element of span{e 1, e 2 } reduces the span to a line 43/104

However, let x 1 = (1, 0) and x 2 = ( 2, 0) The pair are not linearly independent since x 2 = 2x 1 dropping either vector does not change the span the span remains the horizontal axis we have x 2 = 2x 1, which means that each vector can be written as a linear combination of the other x 2 = ( 2, 0) x 1 = (1, 0) 44/104

Fact. (2.1.4) If X := {x 1,..., x K } is linearly independent, then 1. every subset of X is linearly independent, 2. X does not contain 0, and 3. X {x} is linearly independent for all x R N such that x / span X. The proof is a solved exercise (ex. 2.4.16 in ET) 45/104

Linear Independence and Uniqueness Linear independence is the key condition for existence and uniqueness of solutions to system of linear equations Theorem. (2.1.2) Let X := {x 1,..., x K } be any collection of vectors in R N. The following statements are equivalent: 1. X is linearly independent 2. For each y R N there exists at most one set of scalars α 1,..., α K such that y = α 1 x 1 + + α K x K (1) 46/104

Proof.(1. = 2.) Let X be linearly independent and pick any y Suppose by contradiction that (1) holds for more than one set of scalars; we have α 1,..., α K and β 1,..., β K s.t. y = K k=1 (α k β k )x k = 0 α k = β k for all k K K α k x k = β k x k k=1 k=1 47/104

Proof.(2. = 1.) If 2. holds, then there exists at most one set of scalars such that 0 = K α k x k k=1 Because α 1 = = α k = 0 has this property, no nonzero scalars yield 0 = K k=1 α kx k We can then conclude X is linearly independent, by the definition of linear independence 48/104

Linear Subspaces A nonempty subset S of R N is called a linear subspace (or just subspace) of R N if x, y S and α, β R = αx + βy S In other words, S R N is closed under vector addition and scalar multiplication Example. If X is any nonempty subset of R N, then span X is a linear subspace of R N 49/104

Example. R N is a linear subspace of R N Example. Given any a R N, the set A := {x R N : a, x = 0} is a linear subspace of R N To see this, let x, y A, let α, β R and define z := αx + βy A We have a, z = a, αx + βy = α a, x + β a, y = 0 + 0 = 0 Hence z A 50/104

Fact. (2.1.5) If S is a linear subspace of R N, then 1. 0 S 2. X S = span X S, and 3. span S = S Theorem. (2.1.3) Let S be a linear subspace of R N. If S is spanned by K vectors, then any linearly independent subset of S has at most K vectors 51/104

Example. Recall the canonical basis vectors {e 1, e 2 } spanned R 2. As such, from Theorem 2.1.3, the three vectors below are linearly dependent (-3, 3) 4 (2, 4) 2 0 4 2 0 2 4 2 (-4, -3.5) 4 52/104

Example. The plane P := {(x 1, x 2, 0) R 3 : x 1, x 2 R} from example 2.1.5 in ET can be spanned by two vectors By theorem 2.1.3, the three vectors in the figure below are linearly dependent x 3 x 1 0 x 2 0 0 53/104

Bases and Dimension Theorem. (2.1.4) Let X := {x 1,..., x N } be any N vectors in R N. The following statements are equivalent: 1. span X = R N 2. X is linearly independent For a proof see page 22 in ET 54/104

Let S be a linear subspace of R N and let B S The set B is called a basis of S if 1. B spans S and 2. B is linearly independent The plural of basis is bases 55/104

From theorem 2.1.2, when B is a basis of S, each point in S has exactly one representation as a linear combination of elements of B From theorem 2.1.4, any N linearly independent vectors in R N form a basis of R N 56/104

Example. Recall the plane from the example above P := {(x 1, x 2, 0) R 3 : x 1, x 2 R} We showed span{e 1, e 2 } = P for e 1 := 1 0 0, e 2 := 0 1 0 Moreover, {e 1, e 2 } is linearly independent (why?) Hence {e 1, e 2 } is a basis for P 57/104

Theorem. (2.1.5) If S is a linear subspace of R N distinct from {0}, then 1. S has at least one basis and 2. every basis of S has the same number of elements. If S is a linear subspace of R N, then the common number identified in theorem 2.1.5 is called the dimension of S, and written as dim S 58/104

Example. For P := {(x 1, x 2, 0) R 3 : x 1, x 2 R}, dim P = 2 because 1 0 e 1 := 0, e 2 := 1 0 0 is a basis with two elements Example. A line {αx R N : α R} through the origin is one dimensional 59/104

In R N the singleton subspace {0} is said to have zero dimension If we take a set of K vectors, then how large will its span be in terms of dimension? Theorem. (2.1.6) If X := {x 1,..., x K } R N, then 1. dim span X K and 2. dim span X = K if and only if X is linearly independent. For a proof, see exercise 2.4.19 in ET 60/104

Fact. (2.1.6) The following statements are true: 1. Let S and S be K-dimensional linear subspaces of R N. If S S, then S = S 2. If S is an M-dimensional linear subspace of R N and M < N, then S = R N 61/104

Linear Maps A function T : R K R N is called linear if T(αx + βy) = αtx + βty x, y R K, α, β R Notation: Linear functions often written with upper case letters Typically omit parenthesis around arguments when convenient 62/104

Example. T : R R defined by Tx = 2x is linear To see this, take any α, β, x, y in R and observe T(αx + βy) = 2(αx + βy) = α2x + β2y = αtx + βty Example. The function f : R R defined by f (x) = x 2 is nonlinear To see this, set α = β = x = y = 1. We then have f (αx + βy) = f (2) = 4 However, α f (x) + β f (y) = 1 + 1 = 2 63/104

Remark: Thinking of linear functions as those whose graph is a straight line is not correct Example. Function f : R R defined by f (x) = 1 + 2x is nonlinear Take α = β = x = y = 1. We then have f (αx + βy) = f (2) = 5 However, α f (x) + β f (y) = 3 + 3 = 6 This kind of function is called an affine function 64/104

By definition, if T is linear, then the exchange of order in T[ K k=1 α k x k ] = K α k Tx k k=1 will be valid whenever K = 2 Inductive argument extends this to arbitrary K 65/104

Fact. (2.1.7) If T : R K R N is a linear map, then rng(t) = span(v) where V := {Te 1,..., Te K } where e k is the k-th canonical basis vector in R K Proof.Any x R K can be expressed as k=1 K α ke k. Hence rng(t) is the set of all points of the form Tx = T [ K k=1 α k e k ] = K α k Te k k=1 as we vary α 1,..., α K over all combinations. This coincides with the definition of span(v) 66/104

The null space or kernel of linear map T : R K R N is ker(t) := {x R K : Tx = 0} Fact. (2.1.7) If T : R K R N is a linear map, then rng T = span V, where V := {Te 1,..., Te K } Proofs are straight-forward (complete as exercise) 67/104

Linear Independence and Bijections Many scientific and practical problems are inverse problems we observe outcomes but not what caused them how can we work backwards from outcomes to causes? Examples what consumer preferences generated observed market behavior? what kinds of expectations led to given shift in exchange rates? 68/104

Loosely, we can express an inverse problem as model F(x) = y outcome what x led to outcome y? does this problem have a solution? is it unique? Answers depend on whether F is one-to-one, onto, etc. The best case is a bijection But other situations also arise 69/104

Theorem. (2.1.7) If T is a linear function from R N to R N, then all of the following are equivalent: 1. T is a bijection. 2. T is onto. 3. T is one-to-one. 4. ker T = {0}. 5. V := {Te 1,..., Te N } is linearly independent. 6. V := {Te 1,..., Te N } forms a basis of R N. See exercise 2.4.21 in ET for proof If any one of these conditions is true, then T is called nonsingular. Otherwise T is called singular 70/104

0.4 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.4 Tx =αx with α =0.2 0.4 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0.4 Tx =αx with α =0 Figure: The case of N = 1, nonsingular and singular 71/104

If T is nonsingular, then, being a bijection, it must have an inverse function T 1 that is also a bijection (fact 15.2.1 on page 410) Fact. (2.1.9) If T : R N R N is nonsingular, then so is T 1. For a proof, see ex. 2.4.20 72/104

Maps Across Different Dimensions Remember that the above results apply to maps from R N to R N Things change when we look at linear maps across dimensions The general rules for linear maps are maps from lower to higher dimensions cannot be onto maps from higher to lower dimensions cannot be one-to-one In either case they cannot be bijections 73/104

Theorem. (2.1.8) For a linear map T from R K R N, the following statements are true: 1. If K < N, then T is not onto. 2. If K > N, then T is not one-to-one. Proof.(part 1) Let K < N and let T : R K R N be linear Letting V := {Te 1,..., Te K }, we have dim(rng(t)) = dim(span(v)) K < N rng(t) = R N Hence T is not onto 74/104

Proof.(part 2) Suppose to the contrary that T is one-to-one Let α 1,..., α K be a collection of vectors such that α 1 Te 1 + + α K Te K = 0 T(α 1 e 1 + + α K e K ) = 0 (by linearity) α 1 e 1 + + α K e K = 0 (since ker(t) = {0}) α 1 = = α K = 0 (by independence of {e 1,... e K }) We have shown that {Te 1,..., Te K } is linearly independent But then R N contains a linearly independent set with K > N vectors contradiction 75/104

1.000 9.000 5 4 8.000 3 5.000 l 2 6.000 7.00 1 2.000 4.000 3.000 0 0 1 2 3 4 5 k Example. Cost function c(k, l) = rk + wl cannot be one-to-one 76/104

Orthogonal Vectors and Projections A core concept in the course is orthogonality not just of vectors, but random variables Let x and z be vectors in R N If x, z = 0, then we call x and z orthogonal Write x z In R 2, orthogonal means perpendicular 77/104

x z Figure: x z 78/104

Let S be a linear subspace We say that x is orthogonal to S if x z for all z S Write x S 79/104

x S Figure: x S 80/104

Fact. (2.2.1) (Pythagorian law) If {z 1,..., z K } is an orthogonal set, then z 1 + + z K 2 = z 1 2 + + z K 2 Proof is an exercise Fact. (2.2.2) If O R N is an orthogonal set and 0 / O, then O is linearly independent 81/104

An orthogonal set O R N is called an orthonormal set if u = 1 for all u O An orthonormal set spanning a linear subspace S of R N is an orthonormal basis of S example of an orthonormal basis for all of R N is the canonical basis {e 1,..., e N } Fact. (2.2.3) If {u 1,..., u K } is an orthonormal set and x span{u 1,..., u K }, then x = K x, u k u k k=1 82/104

Given S R N, the orthogonal complement of S is S := {x R N : x S} Fact. (2.2.4) For any nonempty S R N, the set S is a linear subspace of R N Proof.If x, y S and α, β R, then αx + βy S because, for any z S αx + βy, z = α x, z + β y, z = α 0 + β 0 = 0 Fact. (2.2.5) For S R N, we have S S = {0} 83/104

S S Figure: Orthogonal complement of S in R 2 84/104

The Orthogonal Projection Theorem Problem: Given y R N and subspace S, find closest element of S to y Formally: Solve for ŷ := argmin y z (2) z S Existence, uniqueness of solution not immediately obvious Orthogonal projection theorem: ŷ always exists, unique Also provides a useful characterization 85/104

Theorem. (2.2.1) [Orthogonal Projection Theorem I] Let y R N and let S be any nonempty linear subspace of R N. The following statements are true: 1. The optimization problem (2) has exactly one solution 2. ŷ R N solves (2) if and only if ŷ S and y ŷ S The unique solution ŷ is called the orthogonal projection of y onto S 86/104

y y ŷ S ŷ Figure: Orthogonal projection 87/104

Proof.(sufficiency of 2.) Let y R N and let S be a linear subspace of R N Let ŷ be a vector in S satisfying y ŷ S Let z be any point in S. We have y z 2 = (y ŷ) + (ŷ z) 2 = y ŷ 2 + ŷ z 2 The second equality follows from y ŷ S and the Pythagorian law Since z was an arbitrary point in S, we have y z y ŷ for all z S 88/104

Example. Let y R N and let 1 R N be the vector of ones Let S be the set of constant vectors in R N S is the span of {1} Orthogonal projection of y onto S is ŷ := ȳ1, where ȳ := 1 N N n=1 y n Clearly, ŷ S To show y ŷ is orthogonal to S, we need to check y ŷ, 1 = 0 (see ex. 2.4.14 on page 36). This is true because y ŷ, 1 = y, 1 ŷ, 1 = N y n ȳ 1, 1 = 0 n=1 89/104

Holding subspace S fixed, we have a functional relationship y its orthogonal projection ŷ S This is a well-defined function from R N to R N The function is typically denoted by P P(y) or Py represents ŷ P is called the orthogonal projection mapping onto S and we write P = proj S 90/104

y S y Py Py Figure: Orthogonal projection under P 91/104

Theorem. (2.2.2) [Orthogonal Projection Theorem II] Let S be any linear subspace of R N, and let P = proj S. The following statements are true: 1. P is a linear function Moreover, for any y R N, we have 2. Py S, 3. y Py S, 4. y 2 = Py 2 + y Py 2, 5. Py y, 6. Py = y if and only if y S, and 7. Py = 0 if and only if y S. For a discussion of the proof, see page 31 and exercise 2.4.29 92/104

The following is a fundamental result Fact. (2.2.6) If {u 1,..., u K } is an orthonormal basis for S, then, for each y R N, K Py = y, u k u k (3) k=1 93/104

Proof.First, the right-hand side of (3) lies in S since it is a linear combination of vectors spanning S Next, we know y Py S if and only if y Py u j for each u j in the basis set (exercise ex. 2.4.14) For any y Py u j, the following holds y Py, uj = y, uj K k=1 y, u k u k, u j This confirms y Py S = y, u j y, uj = 0 94/104

Fact. (2.2.7) Let S i be a linear subspace of R N for i = 1, 2 and let P i = proj S i. If S 1 S 2, then P 1 P 2 y = P 2 P 1 y = P 1 y for all y R N 95/104

The Residual Projection Project y onto S, where S is a linear subspace of R N Closest point to y in S is ŷ := Py here P = proj S Unless y was already in S, some error y Py remains Introduce operator M that takes y R N and returns the residual where I is the identity mapping on R N M := I P (4) 96/104

For any y we have My = Iy Py = y Py In regression analysis M shows up as a matrix called the annihilator We refer to M as the residual projection 97/104

Example. Recall the projection of y R N onto span{1} is ȳ1 The residual projection is M c y := y ȳ1 vector of errors obtained when the elements of a vector are predicted by its sample mean 98/104

Fact. (2.2.8) Let S be a linear subspace of R N, let P = proj S, and let M be the residual projection as defined in (4). The following statements are true: 1. M = proj S 2. y = Py + My for any y R N 3. Py My for any y R N 4. My = 0 if and only if y S 5. P M = M P = 0 99/104

S y My Py S Figure: The residual projection 100/104

If S 1 and S 2 are two subspaces of R N with S 1 S 2, then S 2 S 1 The result in fact 2.2.7 is reversed for M Fact. (2.2.9) Let S 1 and S 2 be two subspaces of R N and let y R N. Let M 1 and M 2 be the projections onto S 1 and S 2 respectively. If S 1 S 2, then M 1 M 2 y = M 2 M 1 y = M 2 y 101/104

Gram Schmidt Orthogonalization Recall we showed every orthogonal subset of R N not containing 0 is linearly independent fact 2.2.2 Here is an (important) partial converse Theorem. (2.2.3) For each linearly independent set {b 1,..., b K } R N, there exists an orthonormal set {u 1,..., u K } with span{b 1,..., b k } = span{u 1,..., u k } for k = 1,..., K Formal proofs are solved as exercises 2.4.34 to 2.4.36 102/104

The proof provides an important algorithm for generating the orthonormal set {u 1,..., u K } The first step is to construct orthogonal sets {v 1,..., v k } with span identical to {b 1,..., b k } for each k The construction of {v 1,..., v K } uses the Gram Schmidt orthogonalization procedure: For each k = 1,..., K, let 1. B k := span{b 1,..., b k }, 2. P k := proj B k and M k := proj Bk, 3. v k := M k 1 b k where M 0 is the identity mapping, and 4. V k := span{v 1,..., v k }. 103/104

In step 3. we map each successive element b k into a subspace orthogonal to the subspace generated by b 1,..., b k 1 To complete the argument, define u k by u k := v k / v k The set of vectors {u 1,..., u k } is orthonormal with span equal to V k 104/104