Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Similar documents
Singular Value Decomposition

Singular Value Decomposition

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

UNIT 6: The singular value decomposition.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Linear Systems. Carlo Tomasi

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Maths for Signals and Systems Linear Algebra in Engineering

Linear Algebra, part 3 QR and SVD

ECE 275A Homework # 3 Due Thursday 10/27/2016

Bindel, Fall 2013 Matrix Computations (CS 6210) Week 2: Friday, Sep 6

Sparse least squares and Q-less QR

Notes on Eigenvalues, Singular Values and QR

The Singular Value Decomposition

The Singular Value Decomposition

Numerical Methods I Singular Value Decomposition

Singular Value Decomposition

IV. Matrix Approximation using Least-Squares

The Singular Value Decomposition

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Linear Least Squares. Using SVD Decomposition.

COMP 558 lecture 18 Nov. 15, 2010

B553 Lecture 5: Matrix Algebra Review

Lecture 5 Singular value decomposition

Linear Algebra Review. Fei-Fei Li

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

MATH36001 Generalized Inverses and the SVD 2015

Principal Component Analysis

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Linear Algebra Methods for Data Mining

Linear Algebra Primer

Homework 1. Yuan Yao. September 18, 2011

Linear Algebra Review. Fei-Fei Li

Conditions for Robust Principal Component Analysis

Properties of Matrices and Operations on Matrices

MATH 612 Computational methods for equation solving and function minimization Week # 2

Matrix decompositions

14 Singular Value Decomposition

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

The Singular Value Decomposition and Least Squares Problems

Jeffrey D. Ullman Stanford University

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

Review problems for MA 54, Fall 2004.

Linear algebra I Homework #1 due Thursday, Oct Show that the diagonals of a square are orthogonal to one another.

Pseudoinverse & Moore-Penrose Conditions

STA141C: Big Data & High Performance Statistical Computing

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Sample ECE275A Midterm Exam Questions

22A-2 SUMMER 2014 LECTURE 5

STA141C: Big Data & High Performance Statistical Computing

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Numerical Methods I Non-Square and Sparse Linear Systems

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

EE263 homework 9 solutions

Linear Algebra Review

Least-Squares Rigid Motion Using SVD

Singular Value Decomposition

7. Symmetric Matrices and Quadratic Forms

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Computational Methods. Eigenvalues and Singular Values

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

Applied Numerical Linear Algebra. Lecture 8

DS-GA 1002 Lecture notes 10 November 23, Linear models

Numerical Linear Algebra

Linear Algebra in Actuarial Science: Slides to the lecture

ECE 275A Homework #3 Solutions

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Block Bidiagonal Decomposition and Least Squares Problems

Linear Algebra and Robot Modeling

Least squares: the big idea

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

LEAST SQUARES SOLUTION TRICKS

MATH 350: Introduction to Computational Mathematics

Linear Algebra Massoud Malek

Some Notes on Least Squares, QR-factorization, SVD and Fitting

Orthonormal Transformations and Least Squares

1 Non-negative Matrix Factorization (NMF)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Vector Spaces, Orthogonality, and Linear Least Squares

Basic Math for

MATHEMATICS FOR COMPUTER VISION WEEK 2 LINEAR SYSTEMS. Dr Fabio Cuzzolin MSc in Computer Vision Oxford Brookes University Year

EE731 Lecture Notes: Matrix Computations for Signal Processing

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Problem set 5: SVD, Orthogonal projections, etc.

Matrix Vector Products

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

18.06SC Final Exam Solutions

Singular Value Decompsition

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1 Linearity and Linear Systems

Subset selection for matrices

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Lecture 6 Positive Definite Matrices

Transcription:

Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and handing them off to Matlab s sparse solvers. I also mixed up the definition of A slightly. An updated version of the homework has been posted.. HW 3 is due on Wednesday, Oct 1. The midterm will be handed out on the same day. Constrained least squares We ve worked for a while on the problem of minimizing Ax b. What about doing the same minimization in the presence of constraints? For equality constraints, we can use the method of Lagrange multipliers. For example, to minimize 1 Ax b subject to Bx d, we would look for stationary points of the Lagrangian L(x, λ) 1 Ax b, Ax b + λt (Bx d). If we take derivatives in the direction δx and δλ, we have δl Aδx, Ax b + λ T Bδx + δλ T (Bx d) [ ] T ([ ] [ ] [ ]) [ ] δx A T A B T x A T T [ ] b δx x L. δλ B 0 λ d δλ λ L Setting the gradient equal to zero (or equivalently setting the directional derivative to zero in all directions) gives us the constrained normal equations [ ] [ ] [ ] A T A B T x A T b. B 0 λ d Of course, in the case of linear equality constraints, we could also simply find a basis in which the some components of x are completely constrained and the remaining components of x are completely free (this is done in the book).

Orthogonal Procrustes Not all least squares problems involve individual vectors. We can have more exotic problems in which the unknown is a linear transformation, too. For example, consider the following alignment problem: suppose we are given two sets of coordinates for m points in n-dimensional space, arranged into rows of A R m n and B R m n. Suppose the two matrices are (approximately) related by a rigid motion that leaves the origin fixed; how can we recover that transformation? That is, we want to find an orthogonal W O(n) that minimizes AW B F. This is a constrained least squares problem in disguise. Define the Frobenius inner product X, Y F tr(x T Y ) i,j x ij y ij, and note that X F X, X F. Then to minimize 1 AW B F W T W I, we look for points where subject to L(W ) 1 AW B, AW B F has zero derivative in the directions tangent to the constraint W T W I. We could do this with Lagrange multipliers, but in this case it is simpler to work with the constraints directly. When W is orthogonal, we have L(W ) 1 ( ) AW F + B F AW, B F 1 ( ) A F + B F AW, B F, so minimizing L is equivalent to maximizing AW, B F. Now, note that if we differentiate W T W I, we find δw T W + W T δw 0; that is, δw W S where S is a skew symmetric matrix (S S T ). So, the constrained stationary point should satisfy for all skew symmetric S. δl AW S, B F 0

Now, note that AW S, B F tr(s T W T A T B) S, W T A T B F. This says that W T A T B should be in the orthogonal subspace to the space of skew-symmetric matrices which is the same as saying W T A T B should be symmetric. Now write a polar decomposition for A T B, A T B QH where Q is orthogonal and H is symmetric and positive definite (assuming A T B has full rank). Then we need an orthogonal matrix W such that W T QH is symmetric. There are two natural choices, W ±Q. The choice W Q maximizes AW, B F tr(w T A T B). Note that the polar decomposition of A T B can be computed through the singular value decomposition: A T B UΣV T (UV T )(V ΣV T ) QH. There is another argument (used in the book) that the polar factor Q is the right matrix to maximize AW, B F tr(w T A T B). Recall that for any matrices X and Y such that XY is square, tr(xy ) tr(y X). Therefore if we write A T B UΣV T, we have tr(w T A T B) tr(w T UΣV T ) tr(v W U T Σ) tr(zσ), where Z V W U T is again orthogonal. Note that tr(σz) i σ i z ii is greatest when z ii 1 for each i (since the entries of an orthogonal matrix are strictly bounded by one). Therefore, the trace is maximized when Z I, which again corresponds to W UV T Q. While the SVD argument is slick, I actually like the argument in terms of constrained stationary points better. It brings in the relationship between orthogonal matrices and skew-symmetric matrices, and it invokes again the idea of decomposing a vector space into orthogonal spaces (in this case the symmetric and the skew-symmetric matrices). It also gives me the excuse to introduce the polar factorization of a matrix. This least squares problem of finding a best-fitting orthogonal matrix is sometimes called the orthogonal Procrustes problem. It is named in honor of the Greek legend of Procrustes, who had a bed on which he would either stretch guests or amputate them in order to make them fit perfectly.

Rank-deficient problems Suppose A is not full rank. In this case, we have where A UΣV T Σ diag(σ 1, σ,..., σ r, 0,..., 0). There are now multiple minimizers to Ax b, and we need some additional condition to make a unique choice. A standard choice is the x with smallest norm, which satisfies x A b V Σ U T b where Σ diag(σ1 1, σ 1,..., σr 1, 0,..., 0). The matrix A is the Moore-Penrose pseudoinverse. The Moore-Penrose pseudoinverse is discontinuous precisely at the rankdeficient matrices. This is not good news for computation. On the other hand if we have a nearly rank-deficient problem in which σ r+1,..., σ n σ r, then we might want to perturb to the rank-deficient case and apply the Moore-Penrose pseudoinverse. That is, we use the truncated SVD of A to approximately solve the minimization problem. One common source of rank-deficient or nearly rank-deficient problems is fitting problems in which some of the explanatory factors (represented by columns of A) are highly correlated with other factors. In such cases, we might prefer an (approximate) minimization of Ax b that does not use redundant factors that is, we might want x to have only r nonzeros, where r is the effective rank. A conventional choice of the r columns is via QR with column pivoting: that is, we write AΠ QR, where the column permutation Π is chosen so that the diagonal elements of R are in order of descending magnitude. If there is a sharp drop from σ r to σ r+1, then the n r by n r trailing submatrix of R is likely to be small (on the order of σ r+1 in size), and discarding it corresponds to a small permutation of A. Pivoted QR is not foolproof; there are nearly singular matrices for which the diagonal elements of R never get too small. But pivoted QR does a good job at revealing the rank, and at constructing sparse approximate minimizers, for many practical problems.

Underdetermined problems So far we have talked about overdetermined problems in which A has more rows than columns (m > n). Sometimes there is also a call to solve underdetermined problems (m < n). Assuming that A has full row rank, an underdetermined problem will have an (n m)-dimensional space of solutions. We can use QR or SVD decompositions to find the minimum l norm solution to an underdetermined problem; if we write economy decompositions A T QR UΣV T, then x QR T b UΣ 1 V T b is the minimal l norm solution to Ax b. In many circumstances, though, we may be interested in some other solution. For example, we may want a sparse solution with as few nonzeros as possible; this is often the case for applications in compressive sensing. It turns out that the sparsest solution often minimizes the l 1 norm. Needless to say, the matrix decompositions we have discussed do not provide such a simple approach to finding the minimal l 1 -norm solution to an underdetermined system.