MATH 612 Computational methods for equation solving and function minimization Week # 2

Similar documents
The Singular Value Decomposition and Least Squares Problems

UNIT 6: The singular value decomposition.

Linear Least Squares. Using SVD Decomposition.

The Singular Value Decomposition

MATH36001 Generalized Inverses and the SVD 2015

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

Singular Value Decomposition

Notes on Eigenvalues, Singular Values and QR

Maths for Signals and Systems Linear Algebra in Engineering

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Computational math: Assignment 1

Singular Value Decomposition

Linear Algebra, part 3 QR and SVD

Introduction to Numerical Linear Algebra II

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

The Singular Value Decomposition

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

. = V c = V [x]v (5.1) c 1. c k

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

1 Linearity and Linear Systems

Linear Algebra Massoud Malek

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Review of similarity transformation and Singular Value Decomposition

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Large Scale Data Analysis Using Deep Learning

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Lecture 8: Linear Algebra Background

Review problems for MA 54, Fall 2004.

EXAM. Exam 1. Math 5316, Fall December 2, 2012

MATH 581D FINAL EXAM Autumn December 12, 2016

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Functional Analysis Review

Cheat Sheet for MATH461

The Singular Value Decomposition

Designing Information Devices and Systems II

Review of Some Concepts from Linear Algebra: Part 2

14 Singular Value Decomposition

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

STA141C: Big Data & High Performance Statistical Computing

Lecture notes: Applied linear algebra Part 1. Version 2

Part 1a: Inner product, Orthogonality, Vector/Matrix norm

Math 224, Fall 2007 Exam 3 Thursday, December 6, 2007

MATH 350: Introduction to Computational Mathematics

Numerical Methods I Singular Value Decomposition

Linear Algebra. Session 12

Matrix decompositions

Linear Algebra Review. Fei-Fei Li

STA141C: Big Data & High Performance Statistical Computing

Math Fall Final Exam

I. Multiple Choice Questions (Answer any eight)

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

1 Inner Product and Orthogonality

Linear Algebra Review. Vectors

Linear Algebra Fundamentals

Review of Basic Concepts in Linear Algebra

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

Moore-Penrose Conditions & SVD

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

MATH2071: LAB #5: Norms, Errors and Condition Numbers

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

Problem # Max points possible Actual score Total 120

Applied Linear Algebra in Geoscience Using MATLAB

Lecture: Face Recognition and Feature Reduction

Scientific Computing: Dense Linear Systems

Math 407: Linear Optimization

Lecture: Face Recognition and Feature Reduction

1 Last time: least-squares problems

EE731 Lecture Notes: Matrix Computations for Signal Processing

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

COMP 558 lecture 18 Nov. 15, 2010

Linear Algebra Review. Fei-Fei Li

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Chapter 0 Miscellaneous Preliminaries

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Linear Algebra for Machine Learning. Sargur N. Srihari

18.06 Problem Set 8 - Solutions Due Wednesday, 14 November 2007 at 4 pm in

LinGloss. A glossary of linear algebra

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

Basic Calculus Review

Singular Value Decompsition

Pseudoinverse & Moore-Penrose Conditions

ECE 275A Homework #3 Solutions

Principal Component Analysis

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4

Pseudoinverse & Orthogonal Projection Operators

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Chapter 7: Symmetric Matrices and Quadratic Forms

Math 408 Advanced Linear Algebra

MAT Linear Algebra Collection of sample exams

Lecture 2: Linear Algebra Review

MATH 54 - FINAL EXAM STUDY GUIDE

Summary of Week 9 B = then A A =

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Quantum Computing Lecture 2. Review of Linear Algebra

Transcription:

MATH 612 Computational methods for equation solving and function minimization Week # 2 Instructor: Francisco-Javier Pancho Sayas Spring 2014 University of Delaware FJS MATH 612 1 / 38

Plan for this week Discuss any problems you couldn t solve of Lectures 1 and 2 (Lectures are the chapters of the book) Read Lectures 3, 4, and 5 I will talk about Lectures 3, 4, and 5, but will not cover everything in the book, and especially, not in the same order. I will respect notation though. The first HW assignment will be given. It includes problems from Lectures 1 4. Warning Typically, I ll keep on updating, correcting, and modifying the slides until the end of each week. FJS MATH 612 2 / 38

NORMS FJS MATH 612 3 / 38

Norms for vectors A norm in C n is a map : C n R satisfying Positivity: x 0 and x = 0 if and only if x = 0 The triangle inequality: x + y x + y Homogeneity: αx = α x These properties hold for arbitrary x, y C n and α C. Important examples ( n x p = x j p) 1/p 1 p <, x = max,...,n x j. The values p = 1 and p = 2 are the most commonly used. FJS MATH 612 4 / 38

Other norms If W is an invertible matrix, then Wx p 1 p are also norms (W as in weight). Think for instance of a diagonal matrix W = diag(w 1,..., w n ) and p = 2 ( n w j 2 x j 2) 1/2, and with p = 1 n w j x j. FJS MATH 612 5 / 38

Several MATLAB formulas To compute the 2-norm you can use all of these strategies: >> x=[3 1 2 5 6] ; % column vector >> sqrt(sum(abs(x).^2)) ans = 8.660254037844387 >> sqrt(dot(x,x)); % results ommited >> sqrt(x *x); % only works if x is a column vector >> norm(x,2); >> norm(x); % the 2-norm is the default norm P.S. If there s a Matlab function for something, use it (unless instructed to DIY) FJS MATH 612 6 / 38

Continues... >> x=[3 1 2 5 6] ; % column vector >> sum(abs(x)) ans = 17 >> norm(x,1) ans = 17 >> max(abs(x)) ans = 6 >> norm(x,inf) ans = 6 A three line experiment Find numerical evidence (using a single vector) that the p norm of a vector is a continuous function of the parameter p with correct limit as p. FJS MATH 612 7 / 38

The shape of the p-balls The sets {x R 2 : x p 1} have very different shapes for p = 2 (circle), p = (square parallel to the axes) p = 1 (square parallel to the main diagonals). Note that x x 2 x 1. (Can you prove this? Can you see why this proves that for the same radius, the balls are larger than the 2 balls, and both are larger than the 1 balls?) FJS MATH 612 8 / 38

Operator (or induced) norms for matrices For any A C m,n and arbitrary p we can define the norm A (p) = Ax p sup = sup Ax p. 0 x C n x p x C n, x p=1 (The supremum is actually a maximum.) These are actually norms valid in the spaces of matrices for all sizes. They additionally satisfy: 1 By definition... Ax p A (p) x p 2 By a simple argument... AB (p) A (p) B (p) 3 By a simpler argument... I (p) = 1. The Frobenius norm ( m A F = i=1 n a ij 2) 1/2 = trace(a A) satisfies the second property but not the third. FJS MATH 612 9 / 38

Computing the norms The Frobenius norm is easy to compute (it s just a formula). There are formulas for A (1) and A ( ). The computation of A (2) will be shown later this week. (It requires computing one eigenvalue.) Formulas (the proofs are in the book) A (1) = max a j 1 j where a j are the columns of A Note that A ( ) = max ai 1 where ai are the rows of A i A ( ) = A (1) FJS MATH 612 10 / 38

More Matlab... >> A=[-1-2 3;4-5 6] A = -1-2 3 4-5 6 >> norm(a,1) ans = 9 >> max(sum(abs(a))) % sum adds by columns (default) ans = 9 >> norm(a,inf) ans = 15 >> max(sum(abs(a ))) ans = 15 >> max(sum(abs(a),2)) ans = 15 FJS MATH 612 11 / 38

Now you think Prove that if Q is a unitary, then Q A F = A F and Q A (2) = A (2) What is I F? (Hint. It depends on the dimension.) What is computed with the following Matlab line? sum(abs(a(:).^2)) More importantly, how? Figure out what is the Matlab command for the Frobenius norm of a matrix. FJS MATH 612 12 / 38

The SVD FJS MATH 612 13 / 38

Important announcement This is one of the most important concepts in mathematics you will ever learn. It is used with different names in all kinds of areas of mathematics, signal processing, statistics,... SVD stands for singular value decomposition This lecture is quite theoretical. You ll need to be patient and alert! FJS MATH 612 14 / 38

The matrix A A Let A be a complex m n matrix. Consider the n n matrix A A. It is Hermitian ((A A) = A A). Therefore its eigenvalues are real, it is diagonalizable, and its eigenvectors can be made build an orthonormal basis of C n (R n if A is real). It is positive semidefinite: x (A Ax) = (x A )(Ax) = (Ax) (Ax) = Ax 2 0 x C n. Therefore its eigenvalues are non-negative (real) numbers. Its nullspace is the same as the one for A: Ax = 0 A Ax = 0. (See the previous item for the proof.) Therefore, by the rank-nullity theorem A A has the same rank as A. FJS MATH 612 15 / 38

Applying the spectral theorem We organize the eigenvalues of A A in decreasing order: σ1 2 σ2 2... σ2 r > σr+1 2 =... = σ2 n = 0. We choose an orthonormal basis of eigenvectors A Av j = σj 2 v j null(a) = nul(a A) = v r+1,..., v n. We then consider the following vectors u j = 1 Av j, j = 1,..., r. σ j FJS MATH 612 16 / 38

Elementary, my dear Watson The vectors u j are orthonormal u i u j = ( 1 σ i Av i ) ( 1 σ j Av j ) = σ2 j σ i σ j v i v j = δ ij = 1 σ i σ j v i A Av j and they span the range of A. Moreover, for all x ( n ) Ax = A (vj x)v j = = r }{{} =x (v j x)σ j u j = r n (v j x)av j σ j (v j x)u j. FJS MATH 612 17 / 38

The reduced SVD The reduced SVD is the matrix form of the equality Ax = r σ j (v j x)u j. A = Û Σ V V is the n r matrix whose columns are the orthonormal vectors {v 1,..., v r }. Σ is the r r diagonal matrix whose diagonal entries are the positive numbers σ 1 σ 2... σ r > 0. Û is the m r matrix whose columns are the orthonormal vectors {u 1,..., u r }. FJS MATH 612 18 / 38

Terminology Ax = r σ j (v j x)u j A = Û Σ V The numbers σ 1 σ 2... σ r > 0 r is the rank of A are the singular values of A. The orthonormal vectors {v 1,..., v r } are the right singular vectors. (They are the eigenvectors of A A except those corresponding to the zero eigenvalue.) The orthonormal vectors {u 1,..., u r } are the left singular vectors. FJS MATH 612 19 / 38

Bear with me There s a lot of information encoded in this expressions: Ax = r σ j (v j x)u j A = Û Σ V. Forget how we got it. Assume that we have just been told that the SVD exists. (Remember the rules on the three matrices.) Then: r A y = σ j (uj y)v j A = V ΣÛ. A Ax = AA y = r σ 2 j (v j x)v j r σj 2 (uj y)u j A A = V Σ 2 V AA = Û Σ 2 Û FJS MATH 612 20 / 38

All of this is true: prove it Let A = Û Σ V where the column vectors of Û and V are orthonormal, and where Σ is an r r positive diagonal matrix with entires of non-increasing order. Then: the rank of A is r and A Av j = σ 2 j v j, AA u j = σ 2 j u j, Av j = σ j u j, A u j = σ j v j, v 1,..., v r = null(a) = range(a ) u 1,..., u r = range(a) = null(a ). FJS MATH 612 21 / 38

A geometric/information approach Let A be an n n real invertible matrix, with SVD 1 A = UΣV By construction, the matrices U and V are unitary and can be taken to be real. Let x R n be such that We then decompose (c = V x) x 2 2 = 1 = x 2 1 +... + x 2 n. x = c 1 v 1 +... + c n v n = (v 1 x)v 1 +... + (v n x)v n, where c 2 2 = c2 1 +... + c2 n = 1. 1 we eliminate the hats, because in this case the reduced SVD is equal to the full SVD FJS MATH 612 22 / 38

A geometric/information approach (2) Then Ax = σ 1 c 1 u 1 +... + σ n c n u n = y 1 u 1 +... + y n u n, where ( y1 ) 2 +... + ( yn ) 2 = 1. In other words... σ 1 σ n Let A be an invertible matrix with singular values σ 1 σ 2... σ n > 0. Then there exists an orthogonal reference frame in R n such that the image of the unit ball by A is the hyperellipsoid ( y1 ) 2 +... + ( yn ) 2 = 1. σ 1 σ n FJS MATH 612 23 / 38

The 2-norm of a matrix We go back to the genera case Ax = r σ j (v j x)u j A = Û Σ V. Then Ax 2 2 = r σ 2 1 σ 2 j (v j x) 2 σ 2 1 n This proves (how?) that A (2) = r (v j x) 2 = σ 2 1 x 2 2 Ax 2 sup 0 x C n x 2 (v j x) 2 (σ 1 σ j j) = σ 1 = max singular value FJS MATH 612 24 / 38

The full SVD Again, for a general m n matrix A,... Ax = r σ j (v j x)u j A = Û Σ V. Now: Build an orthonormal basis of C n, {v 1,..., v r, v r+1,..., v n } by completing the right singular vectors with orthonormal vectors (from null(a)). Build an orthonormal basis of C m, {u 1,..., u r, u r+1,..., u m } by completing the left singular vectors with orthonormal vectors (from null(a )). This is the same as adding n r columns to the right of V and m r columns to the right of Û making the resulting matrices (V and U) unitary. FJS MATH 612 25 / 38

The full SVD (2) We then create an m n matrix (same size as A) Σ = [ Σ 0 0 0 by adding n r columns to the right and m r rows to the bottom of Σ. It is simple to see that ] A = UΣV = Û Σ V. Note sizes in the reduced decomposition and in the full decomposition (m r)(r r)(r n) (m m)(m n)(n n) FJS MATH 612 26 / 38

The pseudoinverse FJS MATH 612 27 / 38

Recall the reduced SVD Every m n matrix A can be decomposed in the form A = Û Σ V Û is m r with orthonormal columns Σ is r r diagonal with positive entries σ 1 σ 2... σ r > 0 V is n r with orthonormal columns The MATLAB command [U,S,V]=svd(A) returns the full SVD. FJS MATH 612 28 / 38

The (Moore-Penrose) pseudoinverse If A = Û Σ V we define A + = V Σ 1 Û. Note the following: If A is invertible, then A + = A 1. Why? For a general matrix A x = A + b = A Ax = A b (we ll talk about least-squares in due time) If Ax = b has a unique solution (A has full rank by columns and b is in the range of A), then x = A + b. FJS MATH 612 29 / 38

Wasting our time? Let A = [ 1 1 1 1 1 1 ] Can we compute its SVD? Can we do it without going through the diagonalization of A A? A A = 2 2 2 2 2 2 2 2 2 Note that A has rank one. This is good news because Σ is a 1 1 matrix. Now we cheat! FJS MATH 612 30 / 38

Wasting our time? (2) First... so we can choose Then null(a) = V = 1 1 0, 1/ 3 1/ 3 1/ 3. 1 0 1 σ 1 u 1 = Av 1, u 1 2 = 1, which gives Û = [ 1/ 2 1/ 2 ], Σ = [ 6 ] FJS MATH 612 31 / 38

Wasting our time? (... and 3) This is the reduced SVD [ ] [ 1 1 1 1/ 2 = 1 1 1 1/ 2 and this is the pseudoinverse A + = 1/ 3 1/ 3 1/ 3 ] [ 6 ] [ 1/ 3 1/ 3 1/ 3 ] [ 1/ 6 ] [ 1/ 2 1/ 2 ] = 1/6 1/6 1/6 1/6 1/6 1/6 Matlab computes the pseudoinverse with the command pinv You have a couple ol little challenging SVD to compute by hand in the book. Fast now! Give me another SVD for A FJS MATH 612 32 / 38

Low-rank approximation FJS MATH 612 33 / 38

Ways to look at the SVD The SVD A = Û Σ V Σ = σ 1... can be read as A = r σ j u j v j. Follow me now... We have written A, which is a matrix of rank r, as the sum of r matrices of rank 1. (Careful with this: if you add matrices of rank one you might end up with a matrix of less rank. The orthogonality of the vectors plays a key role here.) σ r FJS MATH 612 34 / 38

A compressed version of A Let A k := k σ j u j v j = Ûk Σ k V k. Then A k is a matrix of rank k. Its first k singular values, left singular vectors and right singular vectors coincide with those of A. Now this is quite surprising... A A k = r j=k+1 so the singular values of A A k are and A A k (2) = σ k+1. σ k+1... σ r σ j u j v j, FJS MATH 612 35 / 38

The low-rank approximation In the book you have a proof (not complicated) of the following fact: If B has rank k < r, then A B (2) σ k+1, so we have found the best approximation of A in spectral norm (2) by matrices of rank k or less. Now you are going to code this. FJS MATH 612 36 / 38

What to do Create a function B=lowrank(A,R) such that given any matrix A and a number 0 < R < 1 (a rate), computes its SVD (let Matlab do this, but remember that you get the full SVD) finds the lowest k such that σ 2 1 +... + σ2 k R (σ2 1 +... + σ2 r ) then compresses A to rank k For the code, it might be convenient to create a vector with the squares of the diagonal elements of Σ. FJS MATH 612 37 / 38

Sneak peek Compute the full SVD of A (U, Σ, V ) D = vector with the square of the diagonal of Σ energy=sum of elements of D c=0; for k = 1 : length of D c = c + D(k) if c R energy leave the loop end end keep the first k columns of U keep the first columns of V keep the upper k k block of Σ build the low rank approximation FJS MATH 612 38 / 38