IV. Matrix Approximation using Least-Squares
|
|
- Corey Green
- 5 years ago
- Views:
Transcription
1 IV. Matrix Approximation using Least-Squares
2 The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that has rank r 1? As before, we will use the Frobenius norm to measure the distance between two matrices: A B 2 F = M m=1 A[m, n] B[m, n] 2. Recall that X 2 F is also equal to the sum of the squares of the singular values of X. We can now formulate our problem as X A X 2 F subject to rank(x) = r. (1) The functional above is standard least-squares, but the constraint set (the set of all M N matrices that have a rank of r) is a complicated entity. Nevertheless, as with many things in this class, the SVD reveals the solution immediately. Low-rank approximation. Let A be a matrix with SVD A = UΣV T = R σ p u p v T p. p=1 1 We will assume that r < R, as for r = R the answer is easy, and for R < r min(m, N) the question is not well-posed. 1
3 Then (1) is solved simply by truncating the SVD: ˆX = r σ p u p v T p = U r Σ r V T r, p=1 where U r contains the first r columns of U, V r contains the first r columns of V, and Σ r is the first r columns and r rows of Σ. The framed result above, known as the Eckart-Young theorem, is an immediate consequence of the following lemma, which we will actually use again later in this set of notes. Subspace Approximation Lemma. A = UΣV T, the optimization program For fixed A with SVD Q:M r Θ:r N A QΘ 2 F subject to Q T Q = I, (2) has solution ˆQ = U r, ˆΘ = U T r A, where U r = [ u 1 u 2 u r ] contains the first r columns of U. We prove this lemma in the technical details section at the end of the notes. To see how it implies the Eckart-Young theorem, we can interpret the search over M r matrices Q with orthonormal columns as a search over all possible column spaces of dimension r. Then the search over Θ finds the best linear combinations in a column spaces 2
4 to approximate the columns of A. Since any rank-r matrix can be represented this way, the optimization program (2) is equivalent to (1); if ˆQ, ˆΘ solve (2), then  = ˆQ ˆΘ solves (1). Also note that ˆΘ = U T r UΣV T = [ I 0 ] ΣV T, where I is the r r identity matrix, and 0 is a r (R r) matrix of zeros. This matrix of all zeros has the same effect as removing all but the first r terms along the diagonal of Σ and all but the first r rows of V T. Thus ˆQ ˆΘ = U r [ I 0 ] ΣV T = U r Σ r V T r. What is the error between A and its best rank-r approximation Â? Well, R A  = σ p u p v T p, p=r+1 and so the error matrix has singular values σ r+1,..., σ R. Since the Frobenius norm (squared) can be calculated by summing the squares of the singular values, A  2 F = R p=r+1 σ 2 p. In what follows, we use this low-rank matrix approximation result to develop two fundamental tools: total least-squares, and principal components analysis. 3
5 Total Least-Squares Our fundamental approach thus far to solving y Ax is to optimize y Ax 2 x 2. Thought of another way, if we can t find a x such that y = Ax exactly, we are looking for the smallest possible perturbation we could add to y so that there is an exact solution. Mathematically, the standard least-squares program above is equivalent to solving y,x y 2 2 subject to (y + y) = Ax. This reformulation makes it clear that least-squares implicitly assumes that all of the error (i.e. all of the reasons we can t find an exact solution) lies in the measured data y. But what if the entries of A are also subject to error? That is, how can we account for modeling error as well as measurement error? Total least-squares (TLS) is a framework for doing exactly this in a principled manner. TLS finds the smallest perturbations y, A such that (y + y) = (A + A)x has an exact solution. It does this by solving A, y,x A 2 F + y 2 2 subject to (y + y) = (A + A)x. Example: 1D linear regression Say we are given a set of points (a 1, y 1 ), (a 2, y 2 ),, (a M, y M ) 4
6 Suppose that the goal is to find the best line that fits these points. (For simplicity, we will only consider lines that pass through the origin.) That is, we are looking for the slope x such that the a m x are as close to the y m as possible. The standard least-squares framework models this problem as follows. We observe y m = a m x + noise, or in matrix form, The solution is of course y = a 1 a 2. a M x + noise. ˆx = (A T A) 1 A T y = M m=1 a m y m M. m=1 a 2 m This solution s the size of the residual r 2 2 = y Ax 2 2 = M y m a m x 2. m=1 Geometrically, we are choosing the slope that s the sum of the squares of the vertical distances of the points to the line we choose to approximate them: 5
7 In contrast, the TLS estimate (which we will see how to compute below) s the distance in the plane of the points to the line we choose: This distance includes changes in both the a m and y m. 6
8 Solving TLS We will assume that A is an M N matrix, with M > N, and rank(a) = N (i.e. A is overdetermined with full column rank). The problem only really makes sense if rank(a) < M, otherwise there is always an exact solution. By being careful with the details, the method we present here can also be extended to the case where rank(a) < N < M, but I will leave it to you to fill in those gaps. We want to find A, y, x such that (y + y) = (A + A)x, for y, A of minimal size. Rewrite this as where (A + A)x (y + y) = 0 [ A + A y + y ] [ ] x = 0 1 [ x (C + ) = 0 1] C = [ A y ], = [ A y ]. Note that both C and are M (N + 1) matrices. The result of the progression of equations above says that we [ are x looking for a (of minimal size) such that there is a vector 1] in the nullspace of C +. Since v Null(C + ) αv Null(C + ) for all α R, and x in arbitrary, we are really just asking that C + has a nullspace; as long as there is at least one vector in the nullspace 7
9 whose last entry is nonzero, we can find a vector of the required form just by normalizing. In short, this means that our task is to find such that the M (N + 1) matrix C + is rank deficient, that is rank(c + ) < N + 1. Put another way, we want to solve the optimization program 2 F subject to rank(c + ) = N. Making the substitution X = C +, this is equivalent to solving X C X 2 F subject to rank(x) = N, and then taking ˆ = ˆX C. This is a low-rank approximation problem 2, and we now know exactly how to solve it. Take the SVD of C, C = W ΓZ T = N+1 γ n w n z T n, and create ˆX by leaving out the last term in the sum above 3 : ˆX = γ n w n z T n. Then ˆ = ˆX C = γ N+1 w N+1 z T N+1. 2 Or at least a lower rank approximation problem. 3 If C has fewer than N + 1 non-zero singular values, then it is already rank deficient, and we can take ˆX = C ˆ = 0. 8
10 Now we are ready to construct the actual estimate ˆx. Recall that we want a vector such that [ [ (C + ˆ ) x x = 0, meaning ˆX = 0. 1] 1] The null space of ˆX is (by construction) simply the span of zn+1, meaning we need to find a scalar α such that [ x 1] = α z N+1. Thus we can take ˆx TLS = 1 z N+1 [N + 1] z N+1 [1] z N+1 [2]. z N+1 [N]. If it happens that z N+1 (N + 1) = 0, this means y = 0, and we would need an x such that (A + A)x = y. Such an x may or may not exist (and probably doesn t), so in this case there is no TLS solution. In the special case where the smallest singular value of C = [ A y ] is not unique, i.e. γ 1 γ 2 γ q > γ q+1 = γ q+2 = = γ N+1, for some q < N, then the TLS solution may not be unique. We take Z = [ z q+1 z q+2 z N+1 ], 9
11 and try to find a vector in the span that has the right form; any vector x such that [ x 1] Span ({z q+1,..., z N+1 }) is equally good. All we need is a β such that the last entry of Z β is equal to 1. Principal Components Analysis Principal Components Analysis (PCA) is a standard technique for dimensionality reduction of data sets. It is a way to automatically find simplifying linear relationships in the data. It is used everywhere in signal processing, machine learning, and statistics, with applications including data compression, pattern recognition, and factor analysis. There are two ways to think about PCA. The first is statistical: we are trying to find a transform that is carefully tuned to the (secondorder) statistics of the data. The second is geometrical: given a set of vectors, we are trying to find a subspace of a certain dimension that comes closest to containing this set. The Karhunen-Loeve Transform The Karhunen-Loeve (KL) transform is an orthobasis that is tailored to the statistics of a class of random vectors. Suppose that x R D 10
12 is random and has 4 mean and covariance E[x] = 0, E[xx T ] = R. Then the KL transform (or the KL basis) is simply the eigenvector V of R = V ΛV T : x = D α n v n, α n = x, v n. This transform has the property that if we want to truncate the sum above (i.e. compress the vector by using fewer than D numbers to represent it), we get an error that is optimal in the mean-square error sense. Let s set this problem up carefully. We want to find a subspace T of dimension K such that when we project x onto T, we lose as little of x (in expectation) as possible. We want to solve T E [ ] x t 2 2 t T subject to dim(t ) = K. For a fixed T, we know how to solve the inner optimization program if we have an orthobasis, so we can re-write the above as a search of sets of K orthogonal vectors in R D : Q:D K E [ x QQ T x 2 2 ] subject to Q T Q = I. 4 Modifying this discussion to vectors that are not zero-mean is straightforward. 11
13 Now notice that [ ] E x QQ T x 2 2 = E [ ] (I QQ T )x 2 2 = E[trace((I QQ T )xx T (I QQ T )] = trace((i QQ T ) E[xx T ](I QQ T ) = trace((i QQ T )R(I QQ T ), where in the second step above we have used the fact that for any vector v, v 2 2 = trace(vv T ). Now notice that trace((i QQ T )R(I QQ T ) = trace(r) 2 trace(qq T R) + trace(qq T ). We now apply three facts: trace(r) does not depend on Q, trace(qq T ) = trace(q T Q) = K also does not depend on Q, and trace(qq T R) = trace(q T RQ), to transform into the equivalent program maximize W :D K trace(w T RW ) subject to W T W = I. In the Technical Details section below, we show that this expression is maximized by taking Q = [ v 1 v 2 v K ], where the v k correspond to the K eigenvectors of R corresponding to the K largest eigenvalues. Moral: The best (in terms of mean-squared error) way to get a K term approximation of random data is to transform into the orthobasis formed by the eigenvectors of the covariance matrix, then 12
14 truncating the coefficients to K terms. This set of eigenvectors V is called the KL transform. In some sense, the v 1,..., v K are the K most important features of x they are completely determined by the covariance matrix R. Examples in R 2 : 13
15 14
16 PCA on observed data A very similar procedure to the above solves a common geometrical problem. Suppose that I have a bunch of data points x 1, x 2,..., x N R D, and I want to find the K-dimensional affine space (subspace plus offset) that comes closest to containing them. Example We don t even need to think of the data as random here; they are just points that we want to fit with a hyperplane. From Chapter Here is 14 a picture of Hastie, 5 Tibshirani, and Friedman Our goal is to find an offset µ R D and a matrix Q with orthonormal columns such that x n µ + Qθ n for all n = 1,..., N, 5 This is pulled from Chapter 14 of Tibshirani and Hastie s Elements of Statistical Learning. 15
17 for some θ n R K. We cast this as the following optimization problem. Given x 1,..., x N, solve µ,q,{θ n } x n µ Qθ n 2 2 subject to Q T Q = I. If we fix µ and Q, then by arguments very similar to those we have made before, the optimal θ n are given by ˆθ n = Q T (x n µ). This means our objective reduces to solving µ,q (I QQ T )(x n µ) 2 2 subject to Q T Q = I. The offset µ is uncontrained; if we again fix Q, we can solve for the optimal µ by taking a gradient and setting it equal to zero: ( N ) µ (I QQ T )(x n µ) 2 2 = 2 (I QQ T )(x n µ) (( N ) ) = 2(I QQ T ) x n Nµ. We can make the gradient zero by taking the offset µ to be the sample mean (average of all the observed vectors): ˆµ = 1 N x n. 16
18 All that remains is solving for Q. We have Q:D K (I QQ T )(x n ˆµ) 2 2 subject to Q T Q = I. Again, using an argument that perfectly parallels that in the Technical Details section below, this program is solved by forming S = (x n ˆµ)(x n ˆµ) T, taking and eigenvalue decomposition S = W ΛW T, and then taking Q = [ w 1 w 2 w K ], where w 1,..., w K are the eigenvectors of S corresponding to the K largest eigenvalues. So even though we posed this problem as being purely geometrical, the answer parallels the statistical KL transform we simply replace the true covariance matix R with the sample covariance N 1 S. 17
19 Technical Details: Subspace Approx. Lemma We prove the subspace approximation lemma from page 2. First, with Q fixed, we can break the optimization over Θ into a series of least-squares problems. Let a 1,..., a N be the columns of A, and θ 1,..., θ N be the columns of Θ. Then Θ A QΘ 2 F is exactly the same as θ 1,...,θ N a n Qθ n 2 2. The above is our classic closest point problem, and is optimized by taking θ n = Q T a n (since the columns of Q are orthonormal). Thus we can write the original problem (2) as Q:M r a n QQ T a n 2 2 subject to Q T Q = I, and then take ˆΘ = ˆQ T A. Expanding the functional and using the fact that (I QQ T ) 2 = (I QQ T ), we have a n QQ T a n 2 2 = = a T n(i QQ T )a n a n 2 2 a T nqq T a n. 18
20 Since the first term does not depend on Q, our optimization program is equivalent to maximize Q:M r a T nqq T a n subject to Q T Q = I. Now recall that for any vector v, v, v = trace(vv T ). Thus a n QQ T a n = trace(q T a n a T nq) ( ( N ) ) = trace Q T a n a T n Q ( ) = trace Q T (AA T )Q. The matrix AA T has eigenvalue decomposition AA T = UΣ 2 U T, where U and Σ come from the SVD of A (we will take U to be M M, possible adding zeros down the diagonal of Σ 2 ). Now ( ) ( ) trace Q T (AA T )Q = trace Q T UΣ 2 U T Q ( ) = trace W T Σ 2 W, where W = U T Q. Notice that W also has orthonormal columns, as W T W = Q T UU T Q = Q T Q = I. Thus our optimization program has become maximize W :M r trace(w T Σ 2 W ) subject to W T W = I. 19
21 After we solve this, we can take any ˆQ such that Ŵ = U T ˆQ. This last optimization program is equivalent to a simple linear program that is solvable by inspection. Let w 1,..., w r be the columns of W. Then r trace(w T Σ 2 W ) = w T p Σ 2 w p Notice that = = p=1 r p=1 M w p [m] 2 σ 2 m m=1 M h[m]σm, 2 where h[m] = m=1 h[m] = r W [p, m] 2 p=1 r w p [m] 2. is a sum of the squares of a row of W. Since the sum of the squares of every column of W is one, the sum of the squares of every entry in W must be r, and so M h[m] = r. m=1 It is clear that h[m] is non-negative, but it also true that h[m] 1. Here is why: since the columns of W are orthonormal, they can be considered as part of an orthonormal basis for R M. That is, there is a M (M r) matrix W 0 such that the M M matrix [ W W 0 ] has both orthonormal columns and orthonormal rows thus the sum of the squares of each row are equal to one. Thus the sum of the squares of the first r entries cannot be larger than this. p=1 20
22 Thus the maximum value trace(w T Σ 2 W ) can take is given by the linear program maximize h R M M h[m]σ 2 m m=1 subject to M h[m] = r, 0 h[m] 1. m=1 We can intuit the answer to this program. Since all of the σm 2 and all of the h[m] are positive, we want to have as much weight as possible assigned to the largest singular values. Since the weights are constrained to be less than 1, this simply means we max out the first r terms; the solution to the program above is ĥ[m] = { 1, m = 1,..., r 0, m = r + 1,..., M. This means that the sum of the squares of the first r rows in Ŵ are equal to one, while the rest are zero. There might be many such matrices that fit this bill, but one of them is [ I Ŵ =, 0] where above, I is the r r identity matrix, and 0 is a (M r) r matrix of all zeros. It is easy to see that choosing ˆQ = [ u 1 u 2 u r ] satisfies [ ] I U T ˆQ =. 0 21
The Singular Value Decomposition
The Singular Value Decomposition We are interested in more than just sym+def matrices. But the eigenvalue decompositions discussed in the last section of notes will play a major role in solving general
More informationSingular Value Decomposition
Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationSingular Value Decomposition
Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationSingular Value Decomposition
Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationLinear Algebra Review. Fei-Fei Li
Linear Algebra Review Fei-Fei Li 1 / 51 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationSingular Value Decomposition (SVD)
School of Computing National University of Singapore CS CS524 Theoretical Foundations of Multimedia More Linear Algebra Singular Value Decomposition (SVD) The highpoint of linear algebra Gilbert Strang
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationSolutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015
Solutions to Final Practice Problems Written by Victoria Kala vtkala@math.ucsb.edu Last updated /5/05 Answers This page contains answers only. See the following pages for detailed solutions. (. (a x. See
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationThe Singular Value Decomposition
The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition
More informationLinear Algebra Review. Fei-Fei Li
Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More information2. Review of Linear Algebra
2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationFinal Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2
Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch
More informationMIT Final Exam Solutions, Spring 2017
MIT 8.6 Final Exam Solutions, Spring 7 Problem : For some real matrix A, the following vectors form a basis for its column space and null space: C(A) = span,, N(A) = span,,. (a) What is the size m n of
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationMaths for Signals and Systems Linear Algebra in Engineering
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE
More information. = V c = V [x]v (5.1) c 1. c k
Chapter 5 Linear Algebra It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix Because they form the foundation on which we later work,
More informationLinear Systems. Carlo Tomasi
Linear Systems Carlo Tomasi Section 1 characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix and of
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques, which are widely used to analyze and visualize data. Least squares (LS)
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More information2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.
2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector
More informationLinear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions
Linear Systems Carlo Tomasi June, 08 Section characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix
More informationMODULE 8 Topics: Null space, range, column space, row space and rank of a matrix
MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x
More informationLecture notes: Applied linear algebra Part 1. Version 2
Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and
More informationECE 275A Homework #3 Solutions
ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationECE 275A Homework # 3 Due Thursday 10/27/2016
ECE 275A Homework # 3 Due Thursday 10/27/2016 Reading: In addition to the lecture material presented in class, students are to read and study the following: A. The material in Section 4.11 of Moon & Stirling
More informationFoundations of Computer Vision
Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply
More information18.06 Professor Johnson Quiz 1 October 3, 2007
18.6 Professor Johnson Quiz 1 October 3, 7 SOLUTIONS 1 3 pts.) A given circuit network directed graph) which has an m n incidence matrix A rows = edges, columns = nodes) and a conductance matrix C [diagonal
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationReview of Some Concepts from Linear Algebra: Part 2
Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationBindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17
Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and
More informationLecture 5 Singular value decomposition
Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn
More informationStat 159/259: Linear Algebra Notes
Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More information7. Dimension and Structure.
7. Dimension and Structure 7.1. Basis and Dimension Bases for Subspaces Example 2 The standard unit vectors e 1, e 2,, e n are linearly independent, for if we write (2) in component form, then we obtain
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationHomework 1. Yuan Yao. September 18, 2011
Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to
More informationLinear Algebra Primer
Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................
More informationChapter 6 - Orthogonality
Chapter 6 - Orthogonality Maggie Myers Robert A. van de Geijn The University of Texas at Austin Orthogonality Fall 2009 http://z.cs.utexas.edu/wiki/pla.wiki/ 1 Orthogonal Vectors and Subspaces http://z.cs.utexas.edu/wiki/pla.wiki/
More informationSignal Analysis. Principal Component Analysis
Multi dimensional Signal Analysis Lecture 2E Principal Component Analysis Subspace representation Note! Given avector space V of dimension N a scalar product defined by G 0 a subspace U of dimension M
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationLinear Algebra Fundamentals
Linear Algebra Fundamentals It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix. Because they form the foundation on which we later
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationMATH36001 Generalized Inverses and the SVD 2015
MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationProblem # Max points possible Actual score Total 120
FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More informationSingular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces
Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang
More informationσ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2
HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For
More informationLinear Algebra, part 3 QR and SVD
Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationbe a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u
MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationGlossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB
Glossary of Linear Algebra Terms Basis (for a subspace) A linearly independent set of vectors that spans the space Basic Variable A variable in a linear system that corresponds to a pivot column in the
More informationLinear Algebra, Summer 2011, pt. 2
Linear Algebra, Summer 2, pt. 2 June 8, 2 Contents Inverses. 2 Vector Spaces. 3 2. Examples of vector spaces..................... 3 2.2 The column space......................... 6 2.3 The null space...........................
More informationThe Singular Value Decomposition and Least Squares Problems
The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving
More informationThe Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will
More information(v, w) = arccos( < v, w >
MA322 Sathaye Notes on Inner Products Notes on Chapter 6 Inner product. Given a real vector space V, an inner product is defined to be a bilinear map F : V V R such that the following holds: For all v
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationPrincipal Component Analysis
Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental
More informationLinear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg
Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector
More informationA PRIMER ON SESQUILINEAR FORMS
A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form
More informationMATH 581D FINAL EXAM Autumn December 12, 2016
MATH 58D FINAL EXAM Autumn 206 December 2, 206 NAME: SIGNATURE: Instructions: there are 6 problems on the final. Aim for solving 4 problems, but do as much as you can. Partial credit will be given on all
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More informationSingular Value Decomposition
Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal
More informationA Brief Outline of Math 355
A Brief Outline of Math 355 Lecture 1 The geometry of linear equations; elimination with matrices A system of m linear equations with n unknowns can be thought of geometrically as m hyperplanes intersecting
More informationVector and Matrix Norms. Vector and Matrix Norms
Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose
More information