MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

Similar documents
MATH 22A: LINEAR ALGEBRA Chapter 4

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

LINEAR ALGEBRA W W L CHEN

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

P = A(A T A) 1 A T. A Om (m n)

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Projections and Least Square Solutions. Recall that given an inner product space V with subspace W and orthogonal basis for

Conceptual Questions for Review

MAT Linear Algebra Collection of sample exams

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

6. Orthogonality and Least-Squares

LINEAR ALGEBRA KNOWLEDGE SURVEY

There are two things that are particularly nice about the first basis

Linear Models Review

Applied Linear Algebra in Geoscience Using MATLAB

MATH Linear Algebra

October 25, 2013 INNER PRODUCT SPACES

Chapter 6: Orthogonality

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Linear Algebra Massoud Malek

18.06 Quiz 2 April 7, 2010 Professor Strang

Typical Problem: Compute.

Math 407: Linear Optimization

MATH 167: APPLIED LINEAR ALGEBRA Chapter 2

Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning:

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

v = v 1 2 +v 2 2. Two successive applications of this idea give the length of the vector v R 3 :

The QR Factorization

Math 224, Fall 2007 Exam 3 Thursday, December 6, 2007

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Chapter 3 Transformations

MATH 304 Linear Algebra Lecture 19: Least squares problems (continued). Norms and inner products.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

Math Linear Algebra II. 1. Inner Products and Norms

Linear Algebra Primer

REVIEW FOR EXAM III SIMILARITY AND DIAGONALIZATION

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

7. Dimension and Structure.

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

MATH 369 Linear Algebra

Linear Algebra, Summer 2011, pt. 3

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

The 'linear algebra way' of talking about "angle" and "similarity" between two vectors is called "inner product". We'll define this next.

1. General Vector Spaces

PRACTICE PROBLEMS (TEST I).

MATH 304 Linear Algebra Lecture 23: Diagonalization. Review for Test 2.

MATH 532: Linear Algebra

Knowledge Discovery and Data Mining 1 (VO) ( )

Linear Algebra. Session 12

MTH 2310, FALL Introduction

Math 291-2: Lecture Notes Northwestern University, Winter 2016

Math 3191 Applied Linear Algebra

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Linear Algebra- Final Exam Review

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

1 Last time: least-squares problems

Review of linear algebra

Elementary linear algebra

Diagonalizing Matrices

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

Review problems for MA 54, Fall 2004.

Section 6.4. The Gram Schmidt Process

Least squares problems Linear Algebra with Computer Science Application

A Review of Linear Algebra

Lecture notes: Applied linear algebra Part 1. Version 2

Math Linear Algebra Final Exam Review Sheet

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

Cheat Sheet for MATH461

12x + 18y = 30? ax + by = m

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

Lecture 2: Linear Algebra Review

7. Symmetric Matrices and Quadratic Forms

Math Linear Algebra

Lecture 4: Applications of Orthogonality: QR Decompositions

Chapter 5 Orthogonality

6 Inner Product Spaces

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

Math 3C Lecture 25. John Douglas Moore

Math 416, Spring 2010 Gram-Schmidt, the QR-factorization, Orthogonal Matrices March 4, 2010 GRAM-SCHMIDT, THE QR-FACTORIZATION, ORTHOGONAL MATRICES

Chapter 6. Orthogonality

Vector Spaces, Orthogonality, and Linear Least Squares

Linear Systems. Math A Bianca Santoro. September 23, 2016

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

18.06 Professor Johnson Quiz 1 October 3, 2007

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

Math 21b: Linear Algebra Spring 2018

Symmetric matrices and dot products

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Review 1 Math 321: Linear Algebra Spring 2010

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Transcription:

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares October 30, 2014

Least Squares

We do a series of experiments, collecting data. We wish to see patterns!! We expect the output b to be a linear function of the input t b = α + tβ, but we need to determine α, β. At different times t i we measure a quantity b i. EXAMPLE: A police man is interested on clocking the speed of a vehicle by using measurements of its relative distance. Assuming the vehicle is traveling at constant speed, so we know linear formula, but errors exist. At t = t i, the error between the measured value b i and the value predicted by the function is e i = b i (α + βt i ). We can write it as e = b Ax where x = (α, β). e is the error vector, b is the data vector. A is an m 2 matrix. We seek the line that minimizes the total squared error or Euclidean norm e = m i=1 e2 i. GOAL: Given m n matrix A and m-vector b, Find x that minimizes b Ax. We assume m n.

Distance and projection are closely related to each other!!! Fundamental question: If we have a subspace S, is there a formula for the projection p of a vector b into that subspace? Imaging b as data from experiments, b is not in S, due to error of measurement, its projection p is the best choice to replace b. Key idea of LEAST SQUARES for regression analysis Let us learn how to do this projection for a line! b is projected into the line L given by the vector a. (PICTURE!). The projection of vector b onto the line in the direction a is p = at b a T a a.

Note: b Ax is the distance from b to the point Ax which is element of the column space! Key point: The optimal solution is x that minimizes that distance! Theorem The smallest error vector e = b Ax is must be perpendicular to the column space (picture!). Thus for each column a i we have a T i (b Ax) = 0. Thus in matrix notation: A T (b Ax) = 0, This gives the normal equations A T Ax = A T b. Theorem The best estimate is given by x = (A T A) 1 A T b. and its projection is p = A((A T A) 1 A T )b. Lemma A T A is a symmetric matrix. A T A has the same Nullspace as A. Why? if x N(A), then clearly A T Ax = 0. Conversely, if A T Ax = 0 then x T A T Ax = Ax = 0, thus Ax = 0. Corollary If A has independent columns, then A T A is square, symmetric and invertible.

Example Consider the problem Ax = b with 1 2 0 3 1 1 A = 1 2 1 b T = (1, 0, 1, 2, 2). 1 1 2 2 1 1 We can see that there is no EXACT solution to Ax = b, use NORMAL EQUATION! 16 2 2 8 A T A = 2 11 2 AT b = 0 2 2 7 7 Solving A T Ax = A T b we get the least square solution x (0.4119, 0.2482, 0.9532) T with error b Ax 0.1799.

Example A sample of lead-210 measured the following radioactivity data at the given times (time in days). Can YOU predict how long will it take until one percent of the original amount remains? time in days 0 4 8 10 14 18 mg 10 8.8 7.8 7.3 6.4 6.4 A linear model does not work. There is an exponential decay on the material m(t) = m 0 e βt, where m 0 is the initial radioactive material and β the decay rate. By taking logarithms y(t) = log(m(t)) = log(m 0 ) + βt Thus we can now use linear least squares to fit on the logarithms y i = log(m i ) of the radioactive mass data. In this case we have [ ] 1 1 1 1 1 1 A T = b T = [2.302585093, 2.174751721 0 4 8 10 14 18

[ ] 6 54 Thus A T A =. Solving the NORMAL form 54 700 system we get log(m 0 ) = 2.277327661 and β = 0.0265191683 Thus the original amount was 10 mg. After 173 days it will below one percent of the radioactive material. There is nothing special about polynomials or exponential functions in the application. We can deal with approximating function is al linear combination of some prescribed functions h 1 (t), h 2 (t),..., h n (t). Then we receive data y i at time t i and the matrix A has entry A ij = h i (t j ). The least squares method can be applied when the measurement of error is not can be applied to situations when not all observations are trusted the same way! Now the error is (b Ax) T C(b Ax). Then the weighted least square error is given by the new equations A T CAx = A T Cb, and x = (A T CA) 1 A T Cb.

Review of Orthogonal Vectors and Subspaces.

In real life vector spaces come with additional METRIC properties!! We have notions of distance and angles!! You are familiar with the Euclidean vector space R n : Since kindergarden you know that the distance between two vectors x = (x 1,..., x n ) y = (y 1,..., y n ) is given by dist(x, y) = (x 1 y 1 ) 2 + (x 2 y 2 ) 2 + + (x n y n ) 2 We say vectors x, y are perpendicular when they make a 90 degree angle. When that happens the triangle they define is right triangle! (WHY?) Lemma Two vectors x, y in R n are perpendicular if and only if x 1 y 1 + + x n y n = xy T = 0 When this last equation holds we say x, y are orthogonal. Orthogonal Bases: A basis u 1,..., u n of V is orthogonal if u i, u j = 0 for all i j. Lemma If v 1, v 2,..., v k are orthogonal then they are linearly independent.

The Orthogonality of the Subspaces Definition We say two say two subspaces V, W of R n are orthogonal if for u V and w W we have uw T = 0. Can you see a way to detect when two subspaces are orthogonal?? Through their bases! Theorem: The row space and the nullspace are orthogonal. Similarly the column space is orthogonal to the left nullspace. proof: The dot product between the rows of A T and the respective entries in the vector y is zero. Therefore the rows of A T are perpendicular to any y N(A T ) Column 1 of A y 1 0 A T y =.. =. Column n of A y n 0 where y N(A T ).

There is a stronger relation, for a subspace V of R n the set of all vectors orthogonal to V is the orthogonal complement of V, denoted V. Warning Spaces can be orthogonal without being complements! Exercise Let W be a subspace, its orthogonal complement is a subspace, and W W = 0. Exercise If V W subspaces, then W V. Theorem (Fundamental theorem part II) C(A T ) = N(A) and N(A) = C(A T ). Why? proof: First equation is easy because x is orthogonal to all vectors of row space x is orthogonal to each of the rows x N(A). The other equality follows from exercises. Corollary Given an m n matrix A, the nullspace is the orthogonal complement of the row space in R n. Similarly, the left nullspace is the orthogonal complement of the column space inside R m WHY is this such a big deal?

Theorem Given an m n matrix A, every vector x in R n can be written in a unique way as x n + x r where x n is in the nullspace and x r is in the row space of A. proof Pick x n to be the orthogonal projection of x into N(A) and x r to be the orthogonal projection into C(A T ). Clearly x is a sum of both, but why are they unique? If x n + x r = x n + x r, then x n x n = x r x r Thus the must be the zero vector because N(A) is orthogonal to to C(A T ). This has a beautiful consequence: Every matrix A, when we think of it as a linear map, transforms the row space into its column space!!!

An important picture

Orthogonal Bases and Gram-Schmidt

A basis u 1,..., u n of a vector space V is orthonormal if it is orthogonal and each vector has unit length. Observation If the vectors u 1,..., u n are orthogonal basis, u their normalizations i u i form an orthonormal basis. Examples Of course the standard unit vectors are orthonormal. Consider the vector space of all quadratic polynomials p(x) = a + bx + cx 2, using the L 2 inner product of integration: p, q = 1 0 p(x)q(x)dx The standard monomials 1, x, x 2 form a basis, but do not form an orthogonal basis! 1, x = 1/2, 1, x 2 = 1/3, x, x 2 = 1/4 An orthonormal basis is given by u 1 (x) = 1, u 2 (x) = 3(2x 1), u 3 (x) = 5(6x 2 6x +1).

Why do we care about orthonormal bases? Theorem Let u 1,..., u n be an orthonormal bases for a vector space with inner product V. The one can write any element v V as a linear combination v = c 1 u 1 + + c n u n where c i = v, u i, for i = 1,..., n. Moreover the norm v = c 2 i. Example Let us rewrite the vector v = (1, 1, 1) T in terms of the orthonormal basis u 1 = ( 1 6, 2, 1 ) T, u 2 = (0, 6 6 1 5, 2 ), u 3 = ( 5, 2, 5 30 30 Computing the dot products v T u 1 = 2 6, v T u 2 = 3 5, and v T u 3 = 4 30. Thus v = 2 6 u 1 + 3 5 u 2 + 4 30 u 3 Challenge: Figure out the same kind of formulas if the vectors are just orthogonal!!! 1 30 )

A key reason to like matrices that have orthonormal vectors: The least-squares equations are even nicer!!! Lemma If Q is a rectangular matrix with orthonormal columns, then the normal equations simplify because Q T Q = I : Q T Qx = Q T b simplifies to x = Q T b Projection matrix simplifies Q(Q T Q) 1 Q T = QIQ T = QQ T. Thus the projection point is p = QQ T b, thus p = (q T 1 b)q 1 + (q T 2 b)q 2 + + (q T n b)q n So how do we compute orthogonal/orthonormal bases for a space?? We use the GRAM-SCHMIDT ALGORITHM. Input Starting with a linear independent vectors a 1,..., a n, Output: orthonormal vectors q 1,..., q n.

So how do we compute orthogonal/orthonormal bases for a space?? We use the GRAM-SCHMIDT ALGORITHM. Here is the geometric idea:

input Starting with a linear independent vectors a 1,..., a n, output: orthogonal vectors q 1,..., q n. Step 1: q 1 = a 1 Step 2: q 2 = a 2 ( at 2 q1 q T 1 q1 )q 1 Step 3: q 3 = a 3 ( at 3 q1 q T 1 q1 )q 1 ( at 3 q2 q T 2 q2 )q 2 Step 4: q 4 = a 4 ( at 4 q1 q T 1 q1 )q 1 ( at 4 q2 q T 2 q2 )q 2 ( at 4 q3 q T 3 q3 )q 3.... Step j: q 4 = a j ( at j q 1 q T 1 q1 )q 1 ( at j q 2 q T 2 q2 )q 2... ( at j q j 1 q T j 1 q j 1 )q j 1 At the end NORMALIZE all vectors if you wish to have unit vectors!! (DIVIDE BY LENGTH).

EXAMPLE Consider the subspace W spanned by (1, 2, 0, 1), ( 1, 0, 0, 1) and (1, 1, 0, 0). Find an orthonormal basis for the space W. ANSWER: ( 1, 2, 0, 1 1 ), (, 1, 0, 1 ), ( 1, 0, 0, 1 ) 6 6 6 3 3 3 2 2

In this way, the original basis vectors a 1,..., a n can be written in a triangular way! If q 1, q 2,..., q n are orthogonal Just think of r ij = a T j q i a 1 = r 11 (q 1 /q T 1 q 1 ), (1) a 2 = r 12 (q 1 /q T 1 q 1 ) + r 22 (q 2 /q T 1 q 1 ) (2) a 3 = r 13 (q 1 /q T 1 q 1 ) + r 23 (q 2 /q T 2 q 2 ) + r 33 (q 3 /q T 3 q 3 ) (3).. (4) a n = r 1n (q 1 /q T 1 q 1 ) + r 2n (q 2 /q T 2 q 2 ) + + r nn (q n /q T n q n ). (5) Where r ij = a T j q i. Write this equations in matrix form! we obtain A = QR where A = (a 1... a n ) and Q = (q 1 q 2... q n ) and R = (r ij ).

Theorem (QR decomposition) An m n matrix A with independent columns can be factor as A = QR where the columns of Q are orthonormal and R is upper triangular and invertible. NOTE: A and Q have the same column space. R is an invertible and upper triangular The simplest way to compute the QR decomposition: 1 Use Gram-Schmidt to get the q i orthonormal vectors. 2 Matrix Q has columns q 1,..., q n 3 The matrix R is filled with the dot products r ij = a T j q i. Key Point: Every matrix has two decompositions LU and QR. They are both useful for different reasons!! One is for solving equations, the other good for least-squares.