The Singular Value Decomposition

Size: px
Start display at page:

Download "The Singular Value Decomposition"

Transcription

1 The Singular Value Decomposition We are interested in more than just sym+def matrices. But the eigenvalue decompositions discussed in the last section of notes will play a major role in solving general systems of equations y = Ax, y R M, A is M N, x R N. We have seen that a symmetric positive definite matrix can be decomposed as A = V ΛV T, where V is an orthogonal matrix (V T V = V V T = I) whose columns are the eigenvectors of A, and Λ is a diagonal matrix containing the eigenvalues of A. Because both orthogonal and diagonal matrices are trivial to invert, this eigenvalue decomposition makes it very easy to solve systems of equations y = Ax and analyze the stability of these solutions. The singular value decomposition (SVD) takes apart an arbitrary M N matrix A in a similar manner. The SVD of a real-valued M N matrix A with rank 1 R is where 1. U is an M R matrix A = UΣV T U = [ u 1 u 2 u R ], whose columns u m R M are orthonormal. Note that while U T U = I, in general UU T I when R < M. The columns of U are an orthobasis for the range space of A. 1 Recall that the rank of a matrix is the number of linearly independent columns of a matrix (which is always equal to the number of linearly independent rows). 36

2 2. V is an N R matrix V = [ v 1 v 2 v R ], whose columns v n R N are orthonormal. Again, while V T V = I, in general V V T I when R < N. The columns of V are an orthobasis for the range space of A T (recall that Range(A T ) consists of everything which is orthogonal to the nullspace of A). 3. Σ is an R R diagonal matrix with positive entries: σ Σ = 0 σ σ R We call the σ r the singular values of A. By convention, we will order them such that σ 1 σ 2 σ R. 4. The v 1,..., v R are eigenvectors of the positive semi-definite matrix A T A. Note that A T A = V ΣU T UΣV T = V Σ 2 V T, and so the singular values σ 1,..., σ R are the square roots of the non-zero eigenvalues of A T A. 5. Similarly, AA T = UΣ 2 U T, and so the u 1,..., u R are eigenvectors of the positive semidefinite matrix AA T. Since the non-zero eigenvalues of A T A 37

3 and AA T are the same, the σ r are also square roots of the eigenvalues of AA T. The rank R is the dimension of the space spanned by the columns of A, this is the same as the dimension of the space spanned by the rows. Thus R min(m, N). We say A is full rank if R = min(m, N). As before, we will often times find it useful to write the SVD as the sum of R rank-1 matrices: A = UΣV T = σ r u r v T r. When A is overdetermined (M > N), the decomposition looks like this 1 σ A = U... V T σ R When A is underdetermined (M < N), the SVD looks like this 1 A = U σ... V T When A is square and full rank (M = N = R), the SVD looks like 1 A = U σ... V T σ N σ R 38

4 Technical Details: Existence of the SVD In this section we will prove that any M N matrix A with rank(a) = R can be written as A = UΣV T where U, Σ, V have the five properties listed at the beginning of the last section. Since A T A is symmetric positive semi-definite, we can write: A T A = N λ n v n v T n, n=1 where the v n are orthonormal and the λ n are real and non-negative. Since rank(a) = R, we also have rank(a T A) = R, and so λ 1,..., λ R are all strictly positive above, and λ R+1 = = λ N = 0. Set u m = 1 λm Av m, for m = 1,..., R, U = [ u 1 u R ]. Notice that these u m are orthonormal, as u m, u l = 1 λm λ l v T l A T Av m = λ m λ l v T l v m = These u m also happen to be eigenvectors of AA T, as { 1, m = l, 0, m l. AA T u m = 1 λm AA T Av m = λ m Av m = λ m u m. Now let u R+1,..., u M be an orthobasis for the null space of U T concatenating these two sets into u 1,..., u M forms an orthobasis for all of R M. 39

5 Let V = [ v 1 v 2 v R ]. In addition, let V 0 = [ v R+1 v R+2 v N ], V full = [ V V 0 ] and U 0 = [ u R+1 u R+2 u M ], U full = [ U U 0 ]. It should be clear that V full is an N N orthonormal matrix and U full is a M M orthonormal matrix. Consider the M N matrix U T fullav full the entry in the m th rows and n th column of this matrix is { (U T fullav full )[m, n] = u T λn u T mav n = mu n n = 1,..., R 0, n = R + 1,..., N. { λn, m = n = 1,..., R = 0, otherwise. Thus where Σ full [m, n] = U T fullav full = Σ full { λn, m = n = 1,..., R 0, otherwise. Since U full U T full = I and V full V T full = I, we have A = U full Σ full V T full. Since Σ full is non-zero only in the first R locations along its main diagonal, the above reduces to A = UΣV T, Σ = λ1 λ2... λr. 40

6 The Least-Squares Problem We can use the SVD to solve the general system of linear equations y = Ax where y R M, x R N, and A is an M N matrix. Given y, we want to find x in such a way that 1. when there is a unique solution, we return it; 2. when there is no solution, we return something reasonable; 3. when there are an infinite number of solutions, we choose one to return in a smart way. The least-squares framework revolves around finding an x that minimizes the length of the residual r = y Ax. That is, we want to solve the optimization problem minimize x R N y Ax 2 2, (1) where 2 is the standard Euclidean norm. We will see that the SVD of A: A = UΣV T, (2) plays a pivotal role in solving this problem. To start, note that we can write any x R N as x = V α + V 0 α 0. (3) 41

7 Here, V is the N R matrix appearing in the SVD decomposition (2), and V 0 is a N (N R) matrix whose columns are orthogonal to one another and to the columns in V. We have the relations V T V = I, V T 0 V 0 = I, V T V 0 = 0. You can think of V 0 as an orthobasis for the null space of A. Of course, V 0 is not unique, as there are many orthobases for Null(A), but any such set of vectors will serve our purposes here. The decomposition (3) is possible since Range(A T ) and Null(A) partition R N for any M N matrix A. Taking we see that (3) holds since α = V T x, α 0 = V T 0 x, x = V V T x + V 0 V T 0 x = (V V T + V 0 V T 0 )x = x, where we have made use of the fact that V V T + V 0 V T 0 = I, because V V T and V 0 V T 0 are ortho-projectors onto complementary subspaces 2 of R N. So we can solve for x R N by solving for the pair α R R, α 0 R N R. Similarly, we can decompose y as y = Uβ + U 0 β 0, (4) where U is the M R matrix from the SVD decomposition, and U 0 is a M (M R) complementary orthogonal basis. Again, U T U = I, U T 0 U 0 = I, U T U 0 = 0, 2 Subspaces S 1 and S 2 are complementary in R N if S 1 S 2 (everything in S 1 is orthogonal to everything in S 2 ) and S 1 S 2 = R N. You can think of S 1, S 2 as a partition of R N into two orthogonal subspaces. 42

8 and we can think of U 0 as an orthogonal basis for everything in R M that is not in the range of A. As before, we can calculate the decomposition above using β = U T y, β 0 = U T 0 y. Using the decompositions (2), (3), and (4) for A, x, and y, we can write the residual r = y Ax as r = Uβ + U 0 β 0 UΣV T (V α + V 0 α 0 ) = Uβ + U 0 β 0 UΣα (since V T V = I and V T V 0 = 0) = U 0 β 0 + U(β Σα). We want to choose α that minimizes the energy of r: r 2 2 = U 0 β 0 + U(β Σα), U 0 β 0 + U(β Σα) = U 0 β 0, U 0 β U 0 β 0, U(β Σα) + U(β Σα), U(β Σα) = β β Σα 2 2 where the last equality comes from the facts that U T 0 U 0 = I, U T U = I, and U T U 0 = 0. We have no control over β 0 2 2, since it is determined entirely by our observations y. Therefore, our problem has been reduced to finding α that minimizes the second term β Σα 2 2 above, which is non-negative. We can make it zero (i.e. as small as possible) by taking ˆα = Σ 1 β. Finally, the x which minimizes the residual (solves (1)) is ˆx = V ˆα = V Σ 1 β = V Σ 1 U T y. (5) 43

9 Thus we can calculate the solution to (1) simply by applying the linear operator V Σ 1 U T to the input data y. There are two interesting facts about the solution ˆx in (5): 1. When y span({u 1,..., u M }), we have β 0 = U T 0 y = 0, and so the residual r = 0. In this case, there is at least one exact solution, and the one we choose satisfies Aˆx = y. 2. Note that if R < N, then the solution is not unique. In this case, V 0 has at least one column, and any part of a vector x in the range of V 0 is not seen by A, since As such, AV 0 α 0 = UΣV T V 0 α 0 = 0 (since V T V 0 = 0). x = ˆx + V 0 α 0 for any α 0 R N R will have exactly the same residual, since Ax = Aˆx. In this case, our solution ˆx is the solution with smallest norm, since x 2 2 = ˆx + V 0 α 0, ˆx + V 0 α 0 = ˆx, ˆx + 2 ˆx, V 0 α 0 + V 0 α, V 0 α = ˆx V Σ 1 U T y, V 0 α 0 + α (since V T 0 V 0 = I) = ˆx α (since V T V 0 = 0) which is minimized by taking α 0 = 0. To summarize, ˆx = V Σ 1 U T y has the desired properties stated at the beginning of this module, since 1. when y = Ax has a unique exact solution, it must be ˆx, 2. when an exact solution is not available, ˆx is the solution to (1), 44

10 3. when there are an infinite number of minimizers to (1), ˆx is the one with smallest norm. Because the matrix V Σ 1 U T gives us such an elegant solution to this problem, we give it a special name: the pseudo-inverse. The Pseudo-Inverse The pseudo-inverse of a matrix A with singular value decomposition (SVD) A = UΣV T is A = V Σ 1 U T. (6) Other names for A include natural inverse, Lanczos inverse, and Moore-Penrose inverse. Given an observation y, taking ˆx = A y gives us the least squares solution to y = Ax. The pseudo-inverse A always exists, since every matrix (with rank R) has an SVD decomposition A = UΣV T with Σ as an R R diagonal matrix with Σ[r, r] > 0. When A is full rank (R = min(m, N)), then we can calculate the pseudo-inverse without using the SVD. There are three cases: When A is square and invertible (R = M = N), then This is easy to check, as here A = A 1. A = UΣV T where both U, V are N N, 45

11 and since in this case V V T = V T V = I and UU T = U T U = I, A A = V Σ 1 U T UΣV T = V Σ 1 ΣV T = V V T = I. Similarly, AA = I, and so A is both a left and right inverse of A, and thus A = A 1. When A more rows than columns and has full column rank (R = N M), then A T A is invertible, and This type of A is tall and skinny A = (A T A) 1 A T. (7) A, and its columns are linearly independent. To verify equation (7), recall that and so A T A = V ΣU T UΣV T = V Σ 2 V T, (A T A) 1 A T = V Σ 2 V T V ΣU T = V Σ 1 U T, which is exactly the content of (6). 46

12 When A has more columns than rows and has full row rank (R = M N), then AA T is invertible, and This occurs when A is short and fat A = A T (AA T ) 1. (8) A, and its rows are linearly independent. To verify equation (8), recall that and so AA T = UΣV T V ΣU T = UΣ 2 U T, A T (AA T ) 1 = V ΣU T UΣ 2 U T = V Σ 1 U T, which again is exactly (6). A is as close to an inverse of A as possible As discussed in above, when A is square and invertible, A is exactly the inverse of A. When A is not square, we can ask if there is a better right or left inverse. We will argue that there is not. Left inverse Given y = Ax, we would like A y = A Ax = x for any x. That is, we would like A to be a left inverse of A: A A = I. Of course, this is not always possible, especially when A has more columns than rows, M < N. But we can ask if any other matrix H comes closer to being a left inverse 47

13 than A. To find the best left-inverse, we look for the matrix which minimizes min H R N M HA I 2 F. (9) Here, F is the Frobenius norm, defined for an N M matrix Q as the sum of the squares of the entries: 3 Q 2 F = M n=1 N Q[m, n] 2 n=1 With (9), we are finding H such that HA is as close to the identity as possible in the least-squares sense. The pseudo-inverse A minimizes (9). To see this, recognize (see the exercise below) that the solution Ĥ to (9) must obey AA T Ĥ T = A. (10) We can see that this is indeed true for Ĥ = A : AA T A T = UΣV T V ΣU T UΣ 1 V T = UΣV T = A. So there is no N M matrix that is closer to being a left inverse than A. 3 It is also true that Q 2 F is the sum of the squares of the singular values of Q: Q 2 F = λ λ 2 p. This is something that you will prove on the next homework. 48

14 Right inverse If we re-apply A to our solution ˆx = A y, we would like it to be as close as possible to our observations y. That is, we would like AA to be as close to the identity as possible. Again, achieving this goal exactly is not always possible, especially if A has more rows that columns. But we can attempt to find the best right inverse, in the least-squares sense, by solving minimize AH I 2 F. (11) H R N M The solution Ĥ to (11) (see the exercise below) must obey A T AĤ = AT. (12) Again, we show that A satisfies (12), and hence is a minimizer to (11): A T AA = V Σ 2 V T V Σ 1 U T = V ΣU T = A T. Moral: A = V Σ 1 U T is as close (in the least-squares sense) to an inverse of A as you could possibly have. Exercise: 1. Show that the minimizer Ĥ to (9) must obey (10). Do this by using the fact that the derivative of the functional HA I 2 F with respect to an entry H[k, l] in H must obey HA I 2 F H[k, l] = 0, for all 1 k N, 1 l M, to be a solution to (9). Do the same for (11) and (12). 49

15 Stability Analysis of the Pseudo-Inverse We have seen that if we make indirect observations y R M of an unknown vector x 0 R N through a M N matrix A, y = Ax 0, then applying the pseudo-inverse of A gives us the least squares estimate of x 0 : ˆx ls = A y = V Σ 1 U T y, where A = UΣV T is the singular value decomposition (SVD) of A. We will now discuss what happens if our measurements contain noise the analysis here will be very similar to when we looked at the stability of solving square sym+def systems, and in fact this is one of the main reasons we introduced the SVD. Suppose we observe y = Ax 0 + e, where e R M is an unknown perturbation. Say that we again apply the pseudo-inverse to y in an attempt to recover x: ˆx ls = A y = A Ax 0 + A e What effect does the presence of the noise vector e had on our estimate of x 0? We answer this question by comparing ˆx ls to the reconstruction we would obtain if we used standard least-squares on perfectly noise-free observations y clean = Ax 0. This noise-free recon- 50

16 struction can be written as x pinv = A y clean = A Ax 0 = V Σ 1 U T UΣV T x 0 = V V T x 0 = x 0, v r v r. The vector x pinv is the orthogonal projection of x 0 onto the row space (everything orthogonal to the null space) of A. If A has full column rank (R = N), then x pinv = x 0. If not, then the application of A destroys the part of x 0 that is not in x pinv, and so we only attempt to recover the visible components. In some sense, x pinv contains all of the components of x 0 that A does not completely remove, and has them preserved perfectly. The reconstruction error (relative to x pinv is) ˆx ls x pinv 2 2 = A e 2 2 = V Σ 1 U T e 2 2. (13) Now suppose for a moment that the error has unit norm, e 2 2 = 1. Then the worst case for (13) is given by maximize e R M V Σ 1 U T e 2 2 subject to e 2 = 1. Since the columns of U are orthonormal, U T e 2 2 e 2 2, and the above is equivalent to max β R R : β 2 =1 V Σ 1 β 2 2. (14) 51

17 Also, for any vector z R R, we have V z 2 2 = V z, V z = z, V T V z = z, z = z 2 2, since the columns of V are orthonormal. So we can simplify (14) to maximize β R R Σ 1 β 2 2 subject to β 2 = 1. The worst case β (you should verify this at home) will have a 1 in the entry corresponding to the largest entry in Σ 1, and will be zero everywhere else. Thus max β R R : β 2 =1 Σ 1 β 2 2 = max,...,r σ 2 r = 1. σr 2 (Recall that by convention, we order the singular values so that σ 1 σ 2 σ R.) Returning to the reconstruction error (13), we now see that ˆx ls x pinv 2 2 = V Σ 1 U T e e 2 σ 2. R 2 Since U is an M R matrix, it is possible when R < M that the reconstruction error is zero. This happens when e is orthogonal to every column of U, i.e. U T e = 0. Putting this together with the work above means 0 1 U T e 2 σ1 2 2 ˆx ls x pinv U T e 2 σr e 2 σ 2. R 2 Notice that if σ R is small, the worst case reconstruction error can be very bad. 52

18 We can also relate the average case error to the singular values. Say that e is additive Gaussian white noise, that is each entry e[m] is a random variable independent of all the other entries, and distributed e[m] Normal(0, ν 2 ). Then, as we have argued before, the average measurement error is E[ e 2 2] = Mν 2, and the average reconstruction error 4 is [ ] ( E A e 2 2 = ν 2 trace(a T A ) = ν 2 ( = 1 M 1 σ σ σ σ 2 2 ) σr ) E[ e 2 σ 2]. R 2 Again, if σ R is tiny, 1/σ 2 R will dominate the sum above, and the average reconstruction error will be quite large. Exercise: Let D be a diagonal R R matrix whose diagonal elements are positive. Show that the maximizer ˆβ to maximize β R R Dβ 2 2 subject to β 2 = 1 has a 1 in the entry corresponding to the largest diagonal element of D, and is 0 elsewhere. 4 We are using the fact that if e is vector of iid Gaussian random variables, e Normal(0, ν 2 I), then for any matrix M, E[ Me 2 2] = ν 2 trace(m T M). We will argue this carefully as part of the next homework. 53

19 Stable Reconstruction with the Truncated SVD We have seen that if A has very small singular values and we apply the pseudo-inverse in the presence of noise, the results can be disastrous. But it doesn t have to be this way. There are several ways to stabilize the pseudo-inverse. We start be discussing the simplest one, where we simply cut out the part of the reconstruction which is causing the problems. As before, we are given noisy indirect observations of a vector x through a M N matrix A: y = Ax + e. (15) The matrix A has SVD A = UΣV T, and pseudo-inverse A = V Σ 1 U T. We can rewrite A as a sum of rank-1 matrices: A = σ r u r v T r, where R is the rank of A, the σ r are the singular values, and u r R M and v r R N are columns of U and V, respectively. Similarly, we can write the pseudo-inverse as A = 1 σ r v r u T r. Given y as above, we can write the least-squares estimate of x from the noisy measurements as ˆx ls = A y = 1 σ r y, u r v r. (16) 54

20 As we can see (and have seen before) if any one of the σ r are very small, the least-squares reconstruction can be a disaster. A simple way to avoid this is to simply truncate the sum (16), leaving out the terms where σ r is too small (1/σ r is too big). Exactly how many terms to keep depends a great deal on the application, as there are competing interests. On the one hand, we want to ensure that each of the σ r we include has an inverse of reasonable size, on the other, we want the reconstruction to be accurate (i.e. not to deviate from the noiseless least-squares solution by too much). We form an approximation A to A by taking A = R σ r u r v T r, for some R < R. Again, our final answer will depend on which R we use, and choosing R is often times something of an art. It is clear that the approximation A has rank R. Note that the pseudo-inverse of A is also a truncated sum A = R 1 σ r v r u T r. Given noisy data y as in (15), we reconstruct x by applying the truncated pseudo-inverse to y: ˆx trunc = A y = R 1 σ r y, u r v r. 55

21 How good is this reconstruction? To answer this question, we will compare it to the noiseless least-squares reconstruction x pinv = A y clean, where y clean = Ax are noiseless measurements of x. The difference between these two is the reconstruction error (relative to x pinv ) as ˆx trunc x pinv = A y A Ax = A Ax + A e A Ax = (A A )Ax + A e. Proceeding further, we can write the matrix A A as A A = r=r +1 1 σ r v r u T r, and so the first term in the reconstruction error can be written as (A A )Ax = 1 Ax, u r v r σ r = = = r=r +1 r=r +1 r=r +1 r=r +1 1 R σ j x, v j u j, u r v k σ r 1 σ r j=1 R j=1 x, v r v r σ j x, v j u j, u r v r (since u r, u j = 0 unless j = r). The second term in the reconstruction error can also be expanded against the v r : R A 1 e = e, u r v r. σ r 56

22 Combining these expressions, the reconstruction error can be written R 1 ˆx trunc x pinv = e, u r v r + x, v k v k σ r k=r }{{} +1 }{{} = Noise error + Approximation error. Since the v r are mutually orthogonal, and the two sums run over disjoint index sets, the noise error and the approximation error will be orthogonal. Also ˆx trunc x pinv 2 2 = Noise error Approximation error 2 2 R 1 = e, u r 2 + x, v r 2. σ 2 r r=r +1 The reconstruction error, then, is signal dependent and will depend on how much of the vector x is concentrated in the subspace spanned by v R +1,..., v R. We will lose everything in this subspace; if it contains a significant part of x, then there is not much least-squares can do for you. The worst-case noise error occurs when e is aligned with u R : Noise error 2 2 = R 1 σ 2 r e, u r 2 1 σ 2 R e 2 2. As seen before, if the error e is random, this bound is a bit pessimistic. Specifically, if each entry of e is an independent identically distributed Normal random variable with mean zero and variance ν 2, then the expected noise error in the reconstruction will be E[ Noise error 2 2] = 1 ( ) E[ e 2 M σ1 2 σ2 2 σ 2]. R 2 57

23 Stable Reconstruction using Tikhonov Regularization Tikhonov 5 regularization is another way to stabilize the least-squares recovery. It has the nice features that: 1) it can be interpreted using optimization, and 2) it can be computed without direct knowledge of the SVD of A. Recall that we motivated the pseudo-inverse by showing that ˆx LS = A y is a solution to minimize x R N y Ax 2 2. (17) When A has full column rank, ˆx LS is the unique solution, otherwise it is the solution with smallest energy. When A has full column rank but has singular values which are very small, huge variations in x (in directions of the singular vectors v k corresponding to the tiny σ k ) can have very little effect on the residual y Ax 2 2. As such, the solution to (17) can have wildly inaccurate components in the presence of even mild noise. One way to counteract this problem is to modify (17) with a regularization term that penalizes the size of the solution x 2 2 as well as the residual error y Ax 2 2: minimize x R N y Ax δ x 2 2. (18) The parameter δ > 0 gives us a trade-off between accuracy and regularization; we want to choose δ small enough so that the residual 5 Andrey Tikhonov ( ) was a 20 th century Russian mathematician. 58

24 for the solution of (18) is close to that of (17), and large enough so that the problem is well-conditioned. Just as with (17), which is solved by applying the pseudo-inverse to y, we can write the solution to (18) in closed form. To see this, recall that we can decompose any x R N as x = V α + V 0 α 0, where V is the N R matrix (with orthonormal columns) used in the SVD of A, and V 0 is a N N R matrix whose columns are an orthogonal basis for the null space of A. This means that the columns of V 0 are orthogonal to each other and all of the columns of V. Similarly, we can decompose y as y = Uβ + U 0 β 0, where U is the M R matrix used in the SVD of A, and the columns of U 0 are an orthogonal basis for the left null space of A (everything in R M that is not in the range of A). For any x, we can write y Ax = Uβ + U 0 β UΣV T (V α + V 0 α 0 ) = U(β Σα) + U 0 β. Since the columns of U are orthonormal, U T U = I, and also U T 0 U 0 = I, and U T U 0 = 0, we have and y Ax 2 2 = U(β Σα) + U 0 β 0, U(β Σα) + U 0 β 0 = β Σα β 0 2 2, x 2 2 = α α

25 Using these facts, we can write the functional in (18) as y Ax δ x 2 = β Σα δ α δ α (19) We want to choose α and α 0 that minimize (19). It is clear that, just as in the standard least-squares problem, we need α 0 = 0. The part of the functional that depends on α can be rewritten as β Σα δ α 2 2 = (β[k] σ k α[k]) 2 + δα[k] 2. (20) k=1 We can minimize this sum simply by minimizing each term independently. Since d [ (β[k] σk α[k]) 2 + δα[k] 2] = 2β[k]σ k + 2σ 2 dα[k] kα[k] + 2δα[k], we need α[k] = σ k σ 2 k + δ β[k]. Putting this back in vector form, (20) is minimized by and so the minimizer to (18) is ˆα tik = (Σ 2 + δi) 1 Σβ, ˆx tik = V ˆα tik = V (Σ 2 + δi) 1 ΣU T y. (21) We can get a better feel for what Tikhonov regularization is doing by comparing it directly to the pseudo-inverse. The least-squares 60

26 reconstruction ˆx ls can be written as ˆx ls = V Σ 1 U T y 1 = y, u r v r, σ r while the Tikhonov reconstruction ˆx tik derived above is σ r ˆx tik = σr 2 + δ y, u r v r. (22) Notice that when σ r is much larger than δ, σ r σ 2 r + δ 1 σ r, σ r δ, but when σ r is small σ r σ 2 r + δ 0, σ r δ. Thus the Tikhonov reconstruction modifies the important parts (components where the σ r are large) of the pseudo-inverse very little, while ensuring that the unimportant parts (components where the σ r are small) affect the solution only by a very small amount. This damping of the singular values, is illustrated below. 6 5 σ r σ 2 r+δ σ r 61

27 Above, we see the damped multipliers σ r /(σr 2 + δ) versus σ r for δ = 0.1 (blue), δ = 0.05 (red), and δ = 0.01 (green). The black dotted line is 1/σ r, the least-squares multiplier. Notice that for large σ r (σ r > 2 δ, say), the damping has almost no effect. This damping makes the Tikhonov reconstruction exceptionally stable; large multipliers never appear in the reconstruction (22). In fact it is easy to check that no matter the value of σ r. σ r σr 2 + δ 1 2 δ Tikhonov Error Analysis Given noisy observations y = Ax 0 + e, how well will Tikhonov regularization work? The answer to this questions depends on multiple factors including the choice of δ, the nature of the perturbation e, and how well x 0 can be approximated using a linear combination of the singular vectors v r corresponding to the large (relative to δ) singular values. Since a closed-form expression for the solution to (18) exists, we can quantify these trade-offs precisely. We compare the Tikhonov reconstruction to the reconstruction we would obtain if we used standard least-squares on perfectly noisefree observations y clean = Ax 0. This noise-free reconstruction can 62

28 be written as x pinv = A y clean = A Ax 0 = V Σ 1 U T UΣV T x 0 = V V T x 0 = x 0, v r v r. The vector x pinv is the orthogonal projection of x 0 onto the row space (everything orthogonal to the null space) of A. If A has full column rank, then x pinv = x 0. If not, then the application of A destroys the part of x 0 that is not in x pinv, and so we only attempt to recover the visible components. In some sense, x pinv contains all of the components of x 0 that we could ever hope to recover, and has them preserved perfectly. The Tikhonov regularized solution is given by ˆx tik = = = σ r σ 2 r + δ y, u r v r σ r σ 2 r + δ e, u r v r + σ r σ 2 r + δ e, u r v r + σ r σ 2 r + δ Ax 0, u r v r σ 2 r σ 2 r + δ x 0, v r v r, and so the reconstruction error, relative to the best possible recon- 63

29 struction x pinv, is ˆx tik x pinv = = σ r σ 2 r + δ e, u r v r + σ r σ 2 r + δ e, u r v r }{{} + ( σ 2 ) r σ 2 r + δ 1 x 0, v r v r δ σ 2 r + δ x 0, v r v r }{{} = Noise error + Approximation error. The approximation error is signal dependent, and depends on δ. Since the v r are orthonormal, Approximation error 2 2 = δ 2 (σ 2 r + δ) 2 x 0, v r 2. Note that for the components much smaller than δ, σ 2 r δ δ 2 (σ 2 r + δ) 2 1, so this portion of the approximation error will be about the same as if we had simply truncated these components. For large components, σ 2 r δ δ 2 (σ 2 r + δ) 2 δ2 σ 2 r and so this portion of the approximation error will be very small. 64

30 For the noise error energy, we have ( ) 2 Noise error 2 σr 2 = e, u σ 2 r 2 r + δ 1 e, u r 2 4δ 1 4δ e 2 2. The worst-case error is more or less determined by the choice of δ. The regularization makes the effective condition number of A about 1/(2 δ); no matter how small the smallest singular value is, the noise energy will not increase by more than a factor of 1/(4δ) during the reconstruction process. As usual, the average case error is less pessimistic. If each entry of e is an independent identically distributed Normal random variable with mean zero and variance ν 2, then the expected noise error in the reconstruction will be E [ ] R ( ) 2 Noise error 2 σr 2 = E [ e, u σ 2 r 2] r + δ ( ) 2 = ν 2 σr σr 2 + δ Note that = 1 M ( R ( σr 2 1 min, 1 (σr+δ) 2 2 σr 2 4δ σ 2 r (σ 2 r + δ) 2 ) E [ e 2 2]. (23) ), so we can think of the error in (23) as an average of the 1, with the large values simply replaced by σr 2 1/(4δ). 65

31 A Closed Form Expression Tikhonov regularization is in some sense very similar to the truncated SVD, but with one significant advantage: we do not need to explicitly calculate the SVD to solve (18). Indeed, the solution to (18) can be written as ˆx tik = (A T A + δi) 1 A T y. (24) To see that the expression above is equivalent to (21), note that we can write A T A as where V is N N, A T A = V Σ 2 V T = V Σ 2 V T, V = [ V V 0 ], and the N N diagonal matrix Σ is simply Σ padded with zeros: [ ] Σ 0 Σ =. 0 0 The verification of (24) is now straightforward: (A T A + δi) 1 A T y = (V Σ 2 V T + δi) 1 V ΣU T y = V (Σ 2 + δi) 1 V T V ΣU T y [ ] = V (Σ 2 + δi) 1 ΣU T y 0 = V (Σ 2 + δi) 1 ΣU T y = ˆx tik. The expression (24) holds for all M, N, and R. We will leave it as an exercise to show that (A T A + δi) 1 A T y = A T (AA T + δi) 1 y. 66

32 The importance of not needing to explicitly compute the SVD is significant when we are solving large problems. When A is large (M, N > 10 5, say) it may be expensive or even impossible to construct the SVD and compute with it explicitly. However, if it has special structure (if it is sparse, for example), then it may take many fewer than MN operations to compute a matrix vector product Ax. In these situations, a matrix free iterative algorithm can be used to perform the inverse required in (24). A prominent example of such an algorithm is conjugate gradients, which we will see later in this course. Weighted Least Squares Standard least-squares tries to fit a vector x to a set of measurements y by solving minimize x R N y Ax 2 2. Now, what if some of the measurements ore more reliable than others? Or, what if the errors are closely correlated between measurements? There is a systematic way to treat both of these cases using weighted least-squares. Instead of minimizing the energy in the residual we will minimize r 2 2 = y Ax 2 2, W r 2 2 = W y W Ax 2 2, 67

33 for some M M weighting matrix W. When W is a diagonal matrix, W = w 11 w22..., then the error we are minimizing looks like w MM W r 2 2 = w 2 11r[1] 2 + w 2 22r[2] w 2 MMr[M] 2. By adjusting the w mm, we can penalize some of the components of the error more than others. By adding off-diagonal terms, we can account for correlations in the error (we will explore this further later in these notes). Solving minimize W r 2 2 = minimize x R N W y W Ax 2 2, is simple. We simply use least-squares with W A as the matrix, and W y as the observations: ˆx wls = (W A) W y, where (W A) is the pseudo-inverse of W A. For the rest of this section, we will assume that M N (meaning that there are at least as many observations as unknowns) and that A has full column rank. This allows us to write ˆx wls = (A T W T W A) 1 A T W T W y. 68

34 Example: We measure a patient s pulse 3 times, and record y[1] = 70, y[2] = 80, y[3] = 120. In this case, we can take A = What is the least-square estimate for the pulse rate x 0? Now say that we were in a hurry when the third measurement was made, so we would like to weigh less than the others. What is the weighted least-squares estimate when W = 0 1 0? 0 0 w 33 What about the particular case when w 33 = 1/2? 69

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions Linear Systems Carlo Tomasi June, 08 Section characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Linear Systems. Carlo Tomasi

Linear Systems. Carlo Tomasi Linear Systems Carlo Tomasi Section 1 characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix and of

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015 Solutions to Final Practice Problems Written by Victoria Kala vtkala@math.ucsb.edu Last updated /5/05 Answers This page contains answers only. See the following pages for detailed solutions. (. (a x. See

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

. = V c = V [x]v (5.1) c 1. c k

. = V c = V [x]v (5.1) c 1. c k Chapter 5 Linear Algebra It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix Because they form the foundation on which we later work,

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2 HE SINGULAR VALUE DECOMPOSIION he SVD existence - properties. Pseudo-inverses and the SVD Use of SVD for least-squares problems Applications of the SVD he Singular Value Decomposition (SVD) heorem For

More information

MATH36001 Generalized Inverses and the SVD 2015

MATH36001 Generalized Inverses and the SVD 2015 MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications

More information

ECE 275A Homework #3 Solutions

ECE 275A Homework #3 Solutions ECE 75A Homework #3 Solutions. Proof of (a). Obviously Ax = 0 y, Ax = 0 for all y. To show sufficiency, note that if y, Ax = 0 for all y, then it must certainly be true for the particular value of y =

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

Linear Algebra Primer

Linear Algebra Primer Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

MATH 612 Computational methods for equation solving and function minimization Week # 2

MATH 612 Computational methods for equation solving and function minimization Week # 2 MATH 612 Computational methods for equation solving and function minimization Week # 2 Instructor: Francisco-Javier Pancho Sayas Spring 2014 University of Delaware FJS MATH 612 1 / 38 Plan for this week

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Vector Spaces, Orthogonality, and Linear Least Squares

Vector Spaces, Orthogonality, and Linear Least Squares Week Vector Spaces, Orthogonality, and Linear Least Squares. Opening Remarks.. Visualizing Planes, Lines, and Solutions Consider the following system of linear equations from the opener for Week 9: χ χ

More information

Designing Information Devices and Systems II

Designing Information Devices and Systems II EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric

More information

Linear Algebra Fundamentals

Linear Algebra Fundamentals Linear Algebra Fundamentals It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix. Because they form the foundation on which we later

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition and Least Squares Problems The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving

More information

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17 Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and

More information

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector

More information

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian FE661 - Statistical Methods for Financial Engineering 2. Linear algebra Jitkomut Songsiri matrices and vectors linear equations range and nullspace of matrices function of vectors, gradient and Hessian

More information

1 Linearity and Linear Systems

1 Linearity and Linear Systems Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Introduction to Numerical Linear Algebra II

Introduction to Numerical Linear Algebra II Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB Glossary of Linear Algebra Terms Basis (for a subspace) A linearly independent set of vectors that spans the space Basic Variable A variable in a linear system that corresponds to a pivot column in the

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017 Math 4A Notes Written by Victoria Kala vtkala@math.ucsb.edu Last updated June 11, 2017 Systems of Linear Equations A linear equation is an equation that can be written in the form a 1 x 1 + a 2 x 2 +...

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

MIT Final Exam Solutions, Spring 2017

MIT Final Exam Solutions, Spring 2017 MIT 8.6 Final Exam Solutions, Spring 7 Problem : For some real matrix A, the following vectors form a basis for its column space and null space: C(A) = span,, N(A) = span,,. (a) What is the size m n of

More information

Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD) School of Computing National University of Singapore CS CS524 Theoretical Foundations of Multimedia More Linear Algebra Singular Value Decomposition (SVD) The highpoint of linear algebra Gilbert Strang

More information

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP) MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m

More information

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M) RODICA D. COSTIN. Singular Value Decomposition.1. Rectangular matrices. For rectangular matrices M the notions of eigenvalue/vector cannot be defined. However, the products MM and/or M M (which are square,

More information

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6 Chapter 6 Orthogonality 6.1 Orthogonal Vectors and Subspaces Recall that if nonzero vectors x, y R n are linearly independent then the subspace of all vectors αx + βy, α, β R (the space spanned by x and

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Least squares: the big idea

Least squares: the big idea Notes for 2016-02-22 Least squares: the big idea Least squares problems are a special sort of minimization problem. Suppose A R m n where m > n. In general, we cannot solve the overdetermined system Ax

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Linear Algebra, part 3 QR and SVD

Linear Algebra, part 3 QR and SVD Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course

More information

Vector and Matrix Norms. Vector and Matrix Norms

Vector and Matrix Norms. Vector and Matrix Norms Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Problem set 5: SVD, Orthogonal projections, etc.

Problem set 5: SVD, Orthogonal projections, etc. Problem set 5: SVD, Orthogonal projections, etc. February 21, 2017 1 SVD 1. Work out again the SVD theorem done in the class: If A is a real m n matrix then here exist orthogonal matrices such that where

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015

Final Review Written by Victoria Kala SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015 Final Review Written by Victoria Kala vtkala@mathucsbedu SH 6432u Office Hours R 12:30 1:30pm Last Updated 11/30/2015 Summary This review contains notes on sections 44 47, 51 53, 61, 62, 65 For your final,

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?) Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Chapter 6 - Orthogonality

Chapter 6 - Orthogonality Chapter 6 - Orthogonality Maggie Myers Robert A. van de Geijn The University of Texas at Austin Orthogonality Fall 2009 http://z.cs.utexas.edu/wiki/pla.wiki/ 1 Orthogonal Vectors and Subspaces http://z.cs.utexas.edu/wiki/pla.wiki/

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible.

MATH 2331 Linear Algebra. Section 2.1 Matrix Operations. Definition: A : m n, B : n p. Example: Compute AB, if possible. MATH 2331 Linear Algebra Section 2.1 Matrix Operations Definition: A : m n, B : n p ( 1 2 p ) ( 1 2 p ) AB = A b b b = Ab Ab Ab Example: Compute AB, if possible. 1 Row-column rule: i-j-th entry of AB:

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Problem # Max points possible Actual score Total 120

Problem # Max points possible Actual score Total 120 FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

2. Every linear system with the same number of equations as unknowns has a unique solution.

2. Every linear system with the same number of equations as unknowns has a unique solution. 1. For matrices A, B, C, A + B = A + C if and only if A = B. 2. Every linear system with the same number of equations as unknowns has a unique solution. 3. Every linear system with the same number of equations

More information

Math Final December 2006 C. Robinson

Math Final December 2006 C. Robinson Math 285-1 Final December 2006 C. Robinson 2 5 8 5 1 2 0-1 0 1. (21 Points) The matrix A = 1 2 2 3 1 8 3 2 6 has the reduced echelon form U = 0 0 1 2 0 0 0 0 0 1. 2 6 1 0 0 0 0 0 a. Find a basis for the

More information

Pseudoinverse & Orthogonal Projection Operators

Pseudoinverse & Orthogonal Projection Operators Pseudoinverse & Orthogonal Projection Operators ECE 174 Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 48 The Four

More information

Linear Least Squares. Using SVD Decomposition.

Linear Least Squares. Using SVD Decomposition. Linear Least Squares. Using SVD Decomposition. Dmitriy Leykekhman Spring 2011 Goals SVD-decomposition. Solving LLS with SVD-decomposition. D. Leykekhman Linear Least Squares 1 SVD Decomposition. For any

More information

Solutions to Review Problems for Chapter 6 ( ), 7.1

Solutions to Review Problems for Chapter 6 ( ), 7.1 Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold

1. Addition: To every pair of vectors x, y X corresponds an element x + y X such that the commutative and associative properties hold Appendix B Y Mathematical Refresher This appendix presents mathematical concepts we use in developing our main arguments in the text of this book. This appendix can be read in the order in which it appears,

More information

Math 224, Fall 2007 Exam 3 Thursday, December 6, 2007

Math 224, Fall 2007 Exam 3 Thursday, December 6, 2007 Math 224, Fall 2007 Exam 3 Thursday, December 6, 2007 You have 1 hour and 20 minutes. No notes, books, or other references. You are permitted to use Maple during this exam, but you must start with a blank

More information

Cheat Sheet for MATH461

Cheat Sheet for MATH461 Cheat Sheet for MATH46 Here is the stuff you really need to remember for the exams Linear systems Ax = b Problem: We consider a linear system of m equations for n unknowns x,,x n : For a given matrix A

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Notes on Solving Linear Least-Squares Problems

Notes on Solving Linear Least-Squares Problems Notes on Solving Linear Least-Squares Problems Robert A. van de Geijn The University of Texas at Austin Austin, TX 7871 October 1, 14 NOTE: I have not thoroughly proof-read these notes!!! 1 Motivation

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information

Math 407: Linear Optimization

Math 407: Linear Optimization Math 407: Linear Optimization Lecture 16: The Linear Least Squares Problem II Math Dept, University of Washington February 28, 2018 Lecture 16: The Linear Least Squares Problem II (Math Dept, University

More information

Pseudoinverse & Moore-Penrose Conditions

Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 A cautionary tale Notes for 2016-10-05 You have been dropped on a desert island with a laptop with a magic battery of infinite life, a MATLAB license, and a complete lack of knowledge of basic geometry.

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

Solution of Linear Equations

Solution of Linear Equations Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass

More information

Matrix Algebra for Engineers Jeffrey R. Chasnov

Matrix Algebra for Engineers Jeffrey R. Chasnov Matrix Algebra for Engineers Jeffrey R. Chasnov The Hong Kong University of Science and Technology The Hong Kong University of Science and Technology Department of Mathematics Clear Water Bay, Kowloon

More information

Computational math: Assignment 1

Computational math: Assignment 1 Computational math: Assignment 1 Thanks Ting Gao for her Latex file 11 Let B be a 4 4 matrix to which we apply the following operations: 1double column 1, halve row 3, 3add row 3 to row 1, 4interchange

More information