COMP 558 lecture 18 Nov. 15, 2010

Similar documents
Singular Value Decomposition

UNIT 6: The singular value decomposition.

Singular Value Decomposition (SVD)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Maths for Signals and Systems Linear Algebra in Engineering

Linear Algebra. Session 12

Applied Linear Algebra in Geoscience Using MATLAB

Review of similarity transformation and Singular Value Decomposition

CS 143 Linear Algebra Review

Properties of Linear Transformations from R n to R m

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

Linear Algebra Review. Vectors

Chapter 7: Symmetric Matrices and Quadratic Forms

1 Linearity and Linear Systems

Singular Value Decomposition

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

Linear Algebra, part 3 QR and SVD

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Singular Value Decomposition

7. Symmetric Matrices and Quadratic Forms

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Linear Algebra Primer

A Brief Outline of Math 355

Review problems for MA 54, Fall 2004.

Linear Algebra Final Exam Study Guide Solutions Fall 2012

Principal Component Analysis

235 Final exam review questions

The Singular Value Decomposition

Linear Algebra Review. Fei-Fei Li

Conceptual Questions for Review

Linear Algebra Massoud Malek

Maths for Signals and Systems Linear Algebra in Engineering

Foundations of Computer Vision

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Math Homework 8 (selected problems)

Stat 159/259: Linear Algebra Notes

18.06 Quiz 2 April 7, 2010 Professor Strang

. = V c = V [x]v (5.1) c 1. c k

1 Last time: least-squares problems

Econ Slides from Lecture 7

There are six more problems on the next two pages

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Properties of Matrices and Operations on Matrices

2. Review of Linear Algebra

Chap 3. Linear Algebra

8. Diagonalization.

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

October 25, 2013 INNER PRODUCT SPACES

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

ECE 275A Homework #3 Solutions

EE731 Lecture Notes: Matrix Computations for Signal Processing

Singular Value Decomposition

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Solutions to Review Problems for Chapter 6 ( ), 7.1

Linear Algebra Review. Fei-Fei Li

Designing Information Devices and Systems II

Lecture 6 Positive Definite Matrices

Singular Value Decomposition

Basic Math for

q n. Q T Q = I. Projections Least Squares best fit solution to Ax = b. Gram-Schmidt process for getting an orthonormal basis from any basis.

Positive Definite Matrix

MTH 2032 SemesterII

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Linear Algebra - Part II

Linear Algebra- Final Exam Review

14 Singular Value Decomposition

Computational math: Assignment 1

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Further Mathematical Methods (Linear Algebra) 2002

Further Mathematical Methods (Linear Algebra)

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Notes on Eigenvalues, Singular Values and QR

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

NORMS ON SPACE OF MATRICES

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

DS-GA 1002 Lecture notes 10 November 23, Linear models

The SVD-Fundamental Theorem of Linear Algebra

MATH 122: Matrixology (Linear Algebra) Solutions to Level Tetris (1984), 10 of 10 University of Vermont, Fall 2016

Chapter 3 Transformations

Numerical Methods I Singular Value Decomposition

Linear Algebra: Matrix Eigenvalue Problems

Lecture notes: Applied linear algebra Part 1. Version 2

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

Vectors and Matrices Statistics with Vectors and Matrices

Bindel, Fall 2013 Matrix Computations (CS 6210) Week 2: Friday, Sep 6

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Linear Least Squares. Using SVD Decomposition.

Seminar on Linear Algebra

Vector and Matrix Norms. Vector and Matrix Norms

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

CS 246 Review of Linear Algebra 01/17/19

Pseudoinverse & Moore-Penrose Conditions

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Transcription:

Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to solve them. Version 1: Given an m n matrix A, where m > n, find a unit length vector x that minimizes Ax. Here, and in the rest of this lecture, the norm is the L 2 norm. Also note that minimizing the L 2 norm is equivalent to minimizing the sum squares of the elements of the vector Ax, i.e. the L 2 norm is just the square root of the sum of squares of the elements of Ax. Also, the reason we restrict the minimization to be for x of unit length is that, without this restriction, the minimum is achieved when x = 0 which is uninteresting. We can solve this constrained least squares problem using Lagrange multipliers, by finding the x that minimizes: x T A T Ax + λ(x T x 1). ( ) That is, taking the derivative of (*) with respect to λ and setting it to 0 enforces that x has unit length. Taking derivatives with respect to the x components enforces A T Ax + λx = 0 which says that x is an eigenvector of A T A. So which eigenvector gives the least value of x T A T Ax? Since x T A T Ax 0, it follows that the eigenvalues of A T A are also greater than or equal to 0. Thus, to minimize Ax, we take the (unit) eigenvector of A T A with the smallest eigenvalue, i.e. we are sure this eigenvalue is non-negative. Version 2: Given an m n matrix A with m > n, and given an m-vector b, minimize Ax b. If b is 0 then we have the same problem above, so let s assume b 0. We don t need Lagrange multipliers in this case, since the trivial solution x = 0 is no longer a solution so we don t need to avoid it. Instead, we expand: Ax b 2 = (Ax b) T (Ax b) = x T A T Ax 2b T Ax + b T b and then take partial derivatives with respect to the x variables and set them to 0. This gives or 2A T Ax 2A T b = 0. A T Ax = A T b (1) which are called the normal equations. Assume the columns of the m n matrix A have full rank n, i.e. they are linearly independent so A T A is invertible. (Not so obvious.) Then, the solution is x = (A T A) 1 A T b. last updated: 8 th Dec, 2016 at 08:30 1

What is the geometric interpretation of Eq. (1)? Since b is an m-dimensional vector and A is an m n matrix, we can uniquely write b as a sum of a vector in the column space of A and a vector in the space orthogonal to the column space of A. To minimize Ax b, by definition we find the x such that the distance from Ax to b is as small as possible. This is done by choosing x such that Ax is the component of b that lies in the column space of A, that is, Ax is the orthogonal projection of b to the column space of A. Note that if b already belonged in the column space of A then the error Ax b would be 0 and there would be an exact solution. Pseudoinverse of A In the above problem, we define the pseudoinverse of A to be the n m matrix, A + (A T A) 1 A T. Recall we are assuming m > n, so that A maps from a lower dimensional space to a higher dimensional space. 1 Note that A + A = I. Moreover, AA + = A(A T A) 1 A T and AA + projects any vector b R m onto the column space of A, that is, it removes from b the component that is perpendicular to the column space of A. This is illustrated in the figure below. Note that the plane in the figure on the right contains the origin, since it includes A x = 0, where x = 0. A m x n x = A + b + b Ax A n x m The pseudoinverse maps in the reverse direction of A, namely it maps b in an m-d space to an x in an n-d space. Rather than inverting the mapping A, however, only inverts the component of b that belongs to the column space of A, i.e. A + A = I. It ignores the component of b that is orthogonal to the column space of A. 1 Note that if m = n then A + = A 1. last updated: 8 th Dec, 2016 at 08:30 2

Non-linear least squares (Gauss-Newton method) Let s look at a commmon application of version 2 above. Suppose we have m differentiable functions f i (x) that take R n to R m. The problem we consider now is, given an initial value x 0 R n, find a nearby x that minimizes f(x). Note that this problem is not well defined, in the sense that nearby is not well defined. Nonetheless, it is worth considering problems for which one can seek to improve the solution from some initial x 0. Consider a linear approximation of the m-vector of functions f(x) in the neighborhood of x 0, and try to minimize f(x 0 ) + f(x) (x x 0), where the Jacobian f(x) is evaluated at x 0. It is an m n matrix. Notice that if you square this function then you get a quadratic that increases as any compoonent of x goes to infinity. So there is a unique minimum of this quadratic and that s what we want to find. This linearized minimization problem is of the form we saw above in version 2 where A = f(x) and b = f(x 0 ) f(x) x 0. However, since we are using a linear approximation to the f(x) functions, we do not expect to minimize f(x) exactly. Instead, we iterate x (k+1) x (k) + x. At each step, we evaluate the Jacobian at the new point x (k). Examples To fit a line y = mx + b to a set of points (x i, y i ) we found the m and b that minimized i (y i mx i b) 2. This is a linear least squares problem (version 2). To fit a line x cos θ +y sin θ = r to a set of points (x i, y i ), we found the θ and r that minimized i (ax i + by i r) 2, subject to the constraint a 2 + b 2 = 1. As we saw in lecture 15, this reduces to solving for the (a, b) that minimized i (a(x i x) + b(y i ȳ)) 2, which is the version 1 least squares problem. Once we have θ, we can solve for r since (recall lecture 15), namely r = x cos θ + ȳ sin θ. Vanishing point detection (lecture 15). It is of the form version 2. Recall the image registration problem where we wanted the (h x, h y ) that minimizes: {I(x + h x, y + h y ) J(x, y)} 2 (x,y) Ngd(x 0,y 0 ) This is a non-linear least squares problem, since the (h x, h y ) are parameters of I(x+h x, y+h y ). To solve the problem, we took a first order Taylor series expansion of I(x + h x, y + h y ) and thereby turned it into a linear least squares problem. Our initial estimate of (h x, h y ) was (0, 0). We then iterated to try to find a better solution. Note that we required that A is of rank 2, so that A T A is invertible. This condition says that the second moment matrix M needs to be invertible. last updated: 8 th Dec, 2016 at 08:30 3

SVD (Singular Value Decomposition) In the version 1 least squares problem, we need to find the eigenvector of A T A that had smallest eigenvalue. In the following, we use the eigenvectors and eigenvalues of A T A to decompose A into a product of simple matrices. This singular value decomposition is a very heavily used tool in data analysis in many fields. Strangely, it is not taught in most introductory linear algebra courses. For this reason, I give you the derivation. Let A be an m n matrix with m n. The first step is to note that the n n matrix A T A is symmetric and positive semi-definite. (It is obvious that A T A is symmetric, and it is easy to see that the eigenvalues of A T A are non-negative since x T A T Ax 0 for any real x. ) It follows from basic linear algebra A T A has an orthonormal set of eigenvectors. [Why? The claim is that any symmetric real valued matrix M has an orthogonal (and hence orthonormal) set of eigenvectors. Here is why. First, if we have two eigenvectors v 1 and v 2 with distinct eigenvalues λ 1 and λ 2, then v T 1 Mv 2 = v T 1 M T v 2 because M is symmetric, and λ 1 v T 1 v 2 = λ 2 v T 1 v 2 and since we are assuming λ 1 λ 2, it follows that v T 1 v 2 = 0. Second, if we have two eigenvectors v 1 and v 2 with the same eigenvalue, then we can perform Gram-Schmidt orthogonalization and to make the two eigenvectors orthogonal. Note I have not shown that if M is n n then there must be n distinct eigenvectors. I SHOULD FILL THAT IN ONE DAY.] Let V be an n n matrix whose columns are the orthonormal eigenvectors of A T A. Since the eigenvalues are non-negative, we can write 2 them as σ 2 i, that is, σ i is the square root of the ith eigenvalue. We can define σ i > 0, i.e. it is the positive square root. Define Σ to be an n n diagonal matrix with values Σ ii = σ i on the diagonal. The elements, σ i are called the singular values of A. I emphasize: they are the square roots of the eigenvalues of A T A. Also note that Σ 2 is an n n diagonal matrix, and A T AV = VΣ 2. Next define Ũm n as follows: Then Ũ m n A m n V n n. (2) Ũ T Ũ = V T A T AV = V T VΣ 2 = Σ 2. Thus, the n columns of Ũ are orthogonal and of length σ i. We now normalize the columns of Ũ by defining an m n matrix U whose columns are orthonormal (length 1), so that UΣ = Ũ. Substituting into Eq. 2 and right multiplying by V T gives us A m n = U m n Σ n n V T n n. Thus, any m n matrix A can be decomposed into A = UΣV T where U is m n, Σ is an n n diagonal matrix, and V is n n. A similar construction can be given when m < n. 2 Forgive me for using the symbol σ yet again, but σ is always used in the SVD. last updated: 8 th Dec, 2016 at 08:30 4

One often defines the singular value decomposition of A slightly differently than this, namely one defines the U to be m m, by just adding m n orthonormal columns. One also needs to add m n rows of 0 s to Σ to make it m n, giving A m n = U m m Σ m n V T n n For our purposes there is no important difference between these two decompositions. Finally, note Matlab has a function svd which computes the singular value decomposition. One can use svd(a) to compute the eigenvectors and eigenvalues of A T A. last updated: 8 th Dec, 2016 at 08:30 5