IV. Matrix Approximation using Least-Squares

Size: px
Start display at page:

Download "IV. Matrix Approximation using Least-Squares"

Transcription

1 IV. Matrix Approximation using Least-Squares

2 The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that has rank r 1? As before, we will use the Frobenius norm to measure the distance between two matrices: A B 2 F = M m=1 A[m, n] B[m, n] 2. Recall that X 2 F is also equal to the sum of the squares of the singular values of X. We can now formulate our problem as X A X 2 F subject to rank(x) = r. (1) The functional above is standard least-squares, but the constraint set (the set of all M N matrices that have a rank of r) is a complicated entity. Nevertheless, as with many things in this class, the SVD reveals the solution immediately. Low-rank approximation. Let A be a matrix with SVD A = UΣV T = R σ p u p v T p. p=1 1 We will assume that r < R, as for r = R the answer is easy, and for R < r min(m, N) the question is not well-posed. 1

3 Then (1) is solved simply by truncating the SVD: ˆX = r σ p u p v T p = U r Σ r V T r, p=1 where U r contains the first r columns of U, V r contains the first r columns of V, and Σ r is the first r columns and r rows of Σ. The framed result above, known as the Eckart-Young theorem, is an immediate consequence of the following lemma, which we will actually use again later in this set of notes. Subspace Approximation Lemma. A = UΣV T, the optimization program For fixed A with SVD Q:M r Θ:r N A QΘ 2 F subject to Q T Q = I, (2) has solution ˆQ = U r, ˆΘ = U T r A, where U r = [ u 1 u 2 u r ] contains the first r columns of U. We prove this lemma in the technical details section at the end of the notes. To see how it implies the Eckart-Young theorem, we can interpret the search over M r matrices Q with orthonormal columns as a search over all possible column spaces of dimension r. Then the search over Θ finds the best linear combinations in a column spaces 2

4 to approximate the columns of A. Since any rank-r matrix can be represented this way, the optimization program (2) is equivalent to (1); if ˆQ, ˆΘ solve (2), then  = ˆQ ˆΘ solves (1). Also note that ˆΘ = U T r UΣV T = [ I 0 ] ΣV T, where I is the r r identity matrix, and 0 is a r (R r) matrix of zeros. This matrix of all zeros has the same effect as removing all but the first r terms along the diagonal of Σ and all but the first r rows of V T. Thus ˆQ ˆΘ = U r [ I 0 ] ΣV T = U r Σ r V T r. What is the error between A and its best rank-r approximation Â? Well, R A  = σ p u p v T p, p=r+1 and so the error matrix has singular values σ r+1,..., σ R. Since the Frobenius norm (squared) can be calculated by summing the squares of the singular values, A  2 F = R p=r+1 σ 2 p. In what follows, we use this low-rank matrix approximation result to develop two fundamental tools: total least-squares, and principal components analysis. 3

5 Total Least-Squares Our fundamental approach thus far to solving y Ax is to optimize y Ax 2 x 2. Thought of another way, if we can t find a x such that y = Ax exactly, we are looking for the smallest possible perturbation we could add to y so that there is an exact solution. Mathematically, the standard least-squares program above is equivalent to solving y,x y 2 2 subject to (y + y) = Ax. This reformulation makes it clear that least-squares implicitly assumes that all of the error (i.e. all of the reasons we can t find an exact solution) lies in the measured data y. But what if the entries of A are also subject to error? That is, how can we account for modeling error as well as measurement error? Total least-squares (TLS) is a framework for doing exactly this in a principled manner. TLS finds the smallest perturbations y, A such that (y + y) = (A + A)x has an exact solution. It does this by solving A, y,x A 2 F + y 2 2 subject to (y + y) = (A + A)x. Example: 1D linear regression Say we are given a set of points (a 1, y 1 ), (a 2, y 2 ),, (a M, y M ) 4

6 Suppose that the goal is to find the best line that fits these points. (For simplicity, we will only consider lines that pass through the origin.) That is, we are looking for the slope x such that the a m x are as close to the y m as possible. The standard least-squares framework models this problem as follows. We observe y m = a m x + noise, or in matrix form, The solution is of course y = a 1 a 2. a M x + noise. ˆx = (A T A) 1 A T y = M m=1 a m y m M. m=1 a 2 m This solution s the size of the residual r 2 2 = y Ax 2 2 = M y m a m x 2. m=1 Geometrically, we are choosing the slope that s the sum of the squares of the vertical distances of the points to the line we choose to approximate them: 5

7 In contrast, the TLS estimate (which we will see how to compute below) s the distance in the plane of the points to the line we choose: This distance includes changes in both the a m and y m. 6

8 Solving TLS We will assume that A is an M N matrix, with M > N, and rank(a) = N (i.e. A is overdetermined with full column rank). The problem only really makes sense if rank(a) < M, otherwise there is always an exact solution. By being careful with the details, the method we present here can also be extended to the case where rank(a) < N < M, but I will leave it to you to fill in those gaps. We want to find A, y, x such that (y + y) = (A + A)x, for y, A of minimal size. Rewrite this as where (A + A)x (y + y) = 0 [ A + A y + y ] [ ] x = 0 1 [ x (C + ) = 0 1] C = [ A y ], = [ A y ]. Note that both C and are M (N + 1) matrices. The result of the progression of equations above says that we [ are x looking for a (of minimal size) such that there is a vector 1] in the nullspace of C +. Since v Null(C + ) αv Null(C + ) for all α R, and x in arbitrary, we are really just asking that C + has a nullspace; as long as there is at least one vector in the nullspace 7

9 whose last entry is nonzero, we can find a vector of the required form just by normalizing. In short, this means that our task is to find such that the M (N + 1) matrix C + is rank deficient, that is rank(c + ) < N + 1. Put another way, we want to solve the optimization program 2 F subject to rank(c + ) = N. Making the substitution X = C +, this is equivalent to solving X C X 2 F subject to rank(x) = N, and then taking ˆ = ˆX C. This is a low-rank approximation problem 2, and we now know exactly how to solve it. Take the SVD of C, C = W ΓZ T = N+1 γ n w n z T n, and create ˆX by leaving out the last term in the sum above 3 : ˆX = γ n w n z T n. Then ˆ = ˆX C = γ N+1 w N+1 z T N+1. 2 Or at least a lower rank approximation problem. 3 If C has fewer than N + 1 non-zero singular values, then it is already rank deficient, and we can take ˆX = C ˆ = 0. 8

10 Now we are ready to construct the actual estimate ˆx. Recall that we want a vector such that [ [ (C + ˆ ) x x = 0, meaning ˆX = 0. 1] 1] The null space of ˆX is (by construction) simply the span of zn+1, meaning we need to find a scalar α such that [ x 1] = α z N+1. Thus we can take ˆx TLS = 1 z N+1 [N + 1] z N+1 [1] z N+1 [2]. z N+1 [N]. If it happens that z N+1 (N + 1) = 0, this means y = 0, and we would need an x such that (A + A)x = y. Such an x may or may not exist (and probably doesn t), so in this case there is no TLS solution. In the special case where the smallest singular value of C = [ A y ] is not unique, i.e. γ 1 γ 2 γ q > γ q+1 = γ q+2 = = γ N+1, for some q < N, then the TLS solution may not be unique. We take Z = [ z q+1 z q+2 z N+1 ], 9

11 and try to find a vector in the span that has the right form; any vector x such that [ x 1] Span ({z q+1,..., z N+1 }) is equally good. All we need is a β such that the last entry of Z β is equal to 1. Principal Components Analysis Principal Components Analysis (PCA) is a standard technique for dimensionality reduction of data sets. It is a way to automatically find simplifying linear relationships in the data. It is used everywhere in signal processing, machine learning, and statistics, with applications including data compression, pattern recognition, and factor analysis. There are two ways to think about PCA. The first is statistical: we are trying to find a transform that is carefully tuned to the (secondorder) statistics of the data. The second is geometrical: given a set of vectors, we are trying to find a subspace of a certain dimension that comes closest to containing this set. The Karhunen-Loeve Transform The Karhunen-Loeve (KL) transform is an orthobasis that is tailored to the statistics of a class of random vectors. Suppose that x R D 10

12 is random and has 4 mean and covariance E[x] = 0, E[xx T ] = R. Then the KL transform (or the KL basis) is simply the eigenvector V of R = V ΛV T : x = D α n v n, α n = x, v n. This transform has the property that if we want to truncate the sum above (i.e. compress the vector by using fewer than D numbers to represent it), we get an error that is optimal in the mean-square error sense. Let s set this problem up carefully. We want to find a subspace T of dimension K such that when we project x onto T, we lose as little of x (in expectation) as possible. We want to solve T E [ ] x t 2 2 t T subject to dim(t ) = K. For a fixed T, we know how to solve the inner optimization program if we have an orthobasis, so we can re-write the above as a search of sets of K orthogonal vectors in R D : Q:D K E [ x QQ T x 2 2 ] subject to Q T Q = I. 4 Modifying this discussion to vectors that are not zero-mean is straightforward. 11

13 Now notice that [ ] E x QQ T x 2 2 = E [ ] (I QQ T )x 2 2 = E[trace((I QQ T )xx T (I QQ T )] = trace((i QQ T ) E[xx T ](I QQ T ) = trace((i QQ T )R(I QQ T ), where in the second step above we have used the fact that for any vector v, v 2 2 = trace(vv T ). Now notice that trace((i QQ T )R(I QQ T ) = trace(r) 2 trace(qq T R) + trace(qq T ). We now apply three facts: trace(r) does not depend on Q, trace(qq T ) = trace(q T Q) = K also does not depend on Q, and trace(qq T R) = trace(q T RQ), to transform into the equivalent program maximize W :D K trace(w T RW ) subject to W T W = I. In the Technical Details section below, we show that this expression is maximized by taking Q = [ v 1 v 2 v K ], where the v k correspond to the K eigenvectors of R corresponding to the K largest eigenvalues. Moral: The best (in terms of mean-squared error) way to get a K term approximation of random data is to transform into the orthobasis formed by the eigenvectors of the covariance matrix, then 12

14 truncating the coefficients to K terms. This set of eigenvectors V is called the KL transform. In some sense, the v 1,..., v K are the K most important features of x they are completely determined by the covariance matrix R. Examples in R 2 : 13

15 14

16 PCA on observed data A very similar procedure to the above solves a common geometrical problem. Suppose that I have a bunch of data points x 1, x 2,..., x N R D, and I want to find the K-dimensional affine space (subspace plus offset) that comes closest to containing them. Example We don t even need to think of the data as random here; they are just points that we want to fit with a hyperplane. From Chapter Here is 14 a picture of Hastie, 5 Tibshirani, and Friedman Our goal is to find an offset µ R D and a matrix Q with orthonormal columns such that x n µ + Qθ n for all n = 1,..., N, 5 This is pulled from Chapter 14 of Tibshirani and Hastie s Elements of Statistical Learning. 15

17 for some θ n R K. We cast this as the following optimization problem. Given x 1,..., x N, solve µ,q,{θ n } x n µ Qθ n 2 2 subject to Q T Q = I. If we fix µ and Q, then by arguments very similar to those we have made before, the optimal θ n are given by ˆθ n = Q T (x n µ). This means our objective reduces to solving µ,q (I QQ T )(x n µ) 2 2 subject to Q T Q = I. The offset µ is uncontrained; if we again fix Q, we can solve for the optimal µ by taking a gradient and setting it equal to zero: ( N ) µ (I QQ T )(x n µ) 2 2 = 2 (I QQ T )(x n µ) (( N ) ) = 2(I QQ T ) x n Nµ. We can make the gradient zero by taking the offset µ to be the sample mean (average of all the observed vectors): ˆµ = 1 N x n. 16

18 All that remains is solving for Q. We have Q:D K (I QQ T )(x n ˆµ) 2 2 subject to Q T Q = I. Again, using an argument that perfectly parallels that in the Technical Details section below, this program is solved by forming S = (x n ˆµ)(x n ˆµ) T, taking and eigenvalue decomposition S = W ΛW T, and then taking Q = [ w 1 w 2 w K ], where w 1,..., w K are the eigenvectors of S corresponding to the K largest eigenvalues. So even though we posed this problem as being purely geometrical, the answer parallels the statistical KL transform we simply replace the true covariance matix R with the sample covariance N 1 S. 17

19 Technical Details: Subspace Approx. Lemma We prove the subspace approximation lemma from page 2. First, with Q fixed, we can break the optimization over Θ into a series of least-squares problems. Let a 1,..., a N be the columns of A, and θ 1,..., θ N be the columns of Θ. Then Θ A QΘ 2 F is exactly the same as θ 1,...,θ N a n Qθ n 2 2. The above is our classic closest point problem, and is optimized by taking θ n = Q T a n (since the columns of Q are orthonormal). Thus we can write the original problem (2) as Q:M r a n QQ T a n 2 2 subject to Q T Q = I, and then take ˆΘ = ˆQ T A. Expanding the functional and using the fact that (I QQ T ) 2 = (I QQ T ), we have a n QQ T a n 2 2 = = a T n(i QQ T )a n a n 2 2 a T nqq T a n. 18

20 Since the first term does not depend on Q, our optimization program is equivalent to maximize Q:M r a T nqq T a n subject to Q T Q = I. Now recall that for any vector v, v, v = trace(vv T ). Thus a n QQ T a n = trace(q T a n a T nq) ( ( N ) ) = trace Q T a n a T n Q ( ) = trace Q T (AA T )Q. The matrix AA T has eigenvalue decomposition AA T = UΣ 2 U T, where U and Σ come from the SVD of A (we will take U to be M M, possible adding zeros down the diagonal of Σ 2 ). Now ( ) ( ) trace Q T (AA T )Q = trace Q T UΣ 2 U T Q ( ) = trace W T Σ 2 W, where W = U T Q. Notice that W also has orthonormal columns, as W T W = Q T UU T Q = Q T Q = I. Thus our optimization program has become maximize W :M r trace(w T Σ 2 W ) subject to W T W = I. 19

21 After we solve this, we can take any ˆQ such that Ŵ = U T ˆQ. This last optimization program is equivalent to a simple linear program that is solvable by inspection. Let w 1,..., w r be the columns of W. Then r trace(w T Σ 2 W ) = w T p Σ 2 w p Notice that = = p=1 r p=1 M w p [m] 2 σ 2 m m=1 M h[m]σm, 2 where h[m] = m=1 h[m] = r W [p, m] 2 p=1 r w p [m] 2. is a sum of the squares of a row of W. Since the sum of the squares of every column of W is one, the sum of the squares of every entry in W must be r, and so M h[m] = r. m=1 It is clear that h[m] is non-negative, but it also true that h[m] 1. Here is why: since the columns of W are orthonormal, they can be considered as part of an orthonormal basis for R M. That is, there is a M (M r) matrix W 0 such that the M M matrix [ W W 0 ] has both orthonormal columns and orthonormal rows thus the sum of the squares of each row are equal to one. Thus the sum of the squares of the first r entries cannot be larger than this. p=1 20

22 Thus the maximum value trace(w T Σ 2 W ) can take is given by the linear program maximize h R M M h[m]σ 2 m m=1 subject to M h[m] = r, 0 h[m] 1. m=1 We can intuit the answer to this program. Since all of the σm 2 and all of the h[m] are positive, we want to have as much weight as possible assigned to the largest singular values. Since the weights are constrained to be less than 1, this simply means we max out the first r terms; the solution to the program above is ĥ[m] = { 1, m = 1,..., r 0, m = r + 1,..., M. This means that the sum of the squares of the first r rows in Ŵ are equal to one, while the rest are zero. There might be many such matrices that fit this bill, but one of them is [ I Ŵ =, 0] where above, I is the r r identity matrix, and 0 is a (M r) r matrix of all zeros. It is easy to see that choosing ˆQ = [ u 1 u 2 u r ] satisfies [ ] I U T ˆQ =. 0 21

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition We are interested in more than just sym+def matrices. But the eigenvalue decompositions discussed in the last section of notes will play a major role in solving general

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 51 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD) School of Computing National University of Singapore CS CS524 Theoretical Foundations of Multimedia More Linear Algebra Singular Value Decomposition (SVD) The highpoint of linear algebra Gilbert Strang

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015 Solutions to Final Practice Problems Written by Victoria Kala vtkala@math.ucsb.edu Last updated /5/05 Answers This page contains answers only. See the following pages for detailed solutions. (. (a x. See

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2 Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch

More information

MIT Final Exam Solutions, Spring 2017

MIT Final Exam Solutions, Spring 2017 MIT 8.6 Final Exam Solutions, Spring 7 Problem : For some real matrix A, the following vectors form a basis for its column space and null space: C(A) = span,, N(A) = span,,. (a) What is the size m n of

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

More information

. = V c = V [x]v (5.1) c 1. c k

. = V c = V [x]v (5.1) c 1. c k Chapter 5 Linear Algebra It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix Because they form the foundation on which we later work,

More information

Linear Systems. Carlo Tomasi

Linear Systems. Carlo Tomasi Linear Systems Carlo Tomasi Section 1 characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix and of

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5. 2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector

More information

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions Linear Systems Carlo Tomasi June, 08 Section characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

ECE 275A Homework #3 Solutions

ECE 275A Homework #3 Solutions ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

ECE 275A Homework # 3 Due Thursday 10/27/2016

ECE 275A Homework # 3 Due Thursday 10/27/2016 ECE 275A Homework # 3 Due Thursday 10/27/2016 Reading: In addition to the lecture material presented in class, students are to read and study the following: A. The material in Section 4.11 of Moon & Stirling

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

18.06 Professor Johnson Quiz 1 October 3, 2007

18.06 Professor Johnson Quiz 1 October 3, 2007 18.6 Professor Johnson Quiz 1 October 3, 7 SOLUTIONS 1 3 pts.) A given circuit network directed graph) which has an m n incidence matrix A rows = edges, columns = nodes) and a conductance matrix C [diagonal

More information

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis

More information

Review of Some Concepts from Linear Algebra: Part 2

Review of Some Concepts from Linear Algebra: Part 2 Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17 Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

7. Dimension and Structure.

7. Dimension and Structure. 7. Dimension and Structure 7.1. Basis and Dimension Bases for Subspaces Example 2 The standard unit vectors e 1, e 2,, e n are linearly independent, for if we write (2) in component form, then we obtain

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

Linear Algebra Primer

Linear Algebra Primer Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................

More information

Chapter 6 - Orthogonality

Chapter 6 - Orthogonality Chapter 6 - Orthogonality Maggie Myers Robert A. van de Geijn The University of Texas at Austin Orthogonality Fall 2009 http://z.cs.utexas.edu/wiki/pla.wiki/ 1 Orthogonal Vectors and Subspaces http://z.cs.utexas.edu/wiki/pla.wiki/

More information

Signal Analysis. Principal Component Analysis

Signal Analysis. Principal Component Analysis Multi dimensional Signal Analysis Lecture 2E Principal Component Analysis Subspace representation Note! Given avector space V of dimension N a scalar product defined by G 0 a subspace U of dimension M

More information

1 Linearity and Linear Systems

1 Linearity and Linear Systems Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Linear Algebra Fundamentals

Linear Algebra Fundamentals Linear Algebra Fundamentals It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix. Because they form the foundation on which we later

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

MATH36001 Generalized Inverses and the SVD 2015

MATH36001 Generalized Inverses and the SVD 2015 MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Problem # Max points possible Actual score Total 120

Problem # Max points possible Actual score Total 120 FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang

More information

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2 HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For

More information

Linear Algebra, part 3 QR and SVD

Linear Algebra, part 3 QR and SVD Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB Glossary of Linear Algebra Terms Basis (for a subspace) A linearly independent set of vectors that spans the space Basic Variable A variable in a linear system that corresponds to a pivot column in the

More information

Linear Algebra, Summer 2011, pt. 2

Linear Algebra, Summer 2011, pt. 2 Linear Algebra, Summer 2, pt. 2 June 8, 2 Contents Inverses. 2 Vector Spaces. 3 2. Examples of vector spaces..................... 3 2.2 The column space......................... 6 2.3 The null space...........................

More information

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition and Least Squares Problems The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

(v, w) = arccos( < v, w >

(v, w) = arccos( < v, w > MA322 Sathaye Notes on Inner Products Notes on Chapter 6 Inner product. Given a real vector space V, an inner product is defined to be a bilinear map F : V V R such that the following holds: For all v

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector

More information

A PRIMER ON SESQUILINEAR FORMS

A PRIMER ON SESQUILINEAR FORMS A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form

More information

MATH 581D FINAL EXAM Autumn December 12, 2016

MATH 581D FINAL EXAM Autumn December 12, 2016 MATH 58D FINAL EXAM Autumn 206 December 2, 206 NAME: SIGNATURE: Instructions: there are 6 problems on the final. Aim for solving 4 problems, but do as much as you can. Partial credit will be given on all

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal

More information

A Brief Outline of Math 355

A Brief Outline of Math 355 A Brief Outline of Math 355 Lecture 1 The geometry of linear equations; elimination with matrices A system of m linear equations with n unknowns can be thought of geometrically as m hyperplanes intersecting

More information

Vector and Matrix Norms. Vector and Matrix Norms

Vector and Matrix Norms. Vector and Matrix Norms Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose

More information