THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Similar documents
Lecture notes: Applied linear algebra Part 1. Version 2

Maths for Signals and Systems Linear Algebra in Engineering

The Singular Value Decomposition

UNIT 6: The singular value decomposition.

COMP 558 lecture 18 Nov. 15, 2010

Notes on Eigenvalues, Singular Values and QR

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Singular Value Decomposition

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Singular Value Decomposition

The Singular Value Decomposition

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Computational math: Assignment 1

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Principal Component Analysis

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Properties of Matrices and Operations on Matrices

The Singular Value Decomposition

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

7. Symmetric Matrices and Quadratic Forms

Singular Value Decomposition

Numerical Methods I Singular Value Decomposition

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Stat 159/259: Linear Algebra Notes

MATH36001 Generalized Inverses and the SVD 2015

Chapter 6: Orthogonality

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

EE731 Lecture Notes: Matrix Computations for Signal Processing

Chapter 7: Symmetric Matrices and Quadratic Forms

Singular Value Decomposition

Linear Algebra. Session 12

1 Linearity and Linear Systems

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

5.6. PSEUDOINVERSES 101. A H w.

Singular Value Decompsition

Singular Value Decomposition

. = V c = V [x]v (5.1) c 1. c k

Linear Algebra, part 3 QR and SVD

October 25, 2013 INNER PRODUCT SPACES

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Singular Value Decomposition (SVD)

Review problems for MA 54, Fall 2004.

Linear Algebra in Actuarial Science: Slides to the lecture

arxiv: v5 [math.na] 16 Nov 2017

NORMS ON SPACE OF MATRICES

Homework 1. Yuan Yao. September 18, 2011

CS 143 Linear Algebra Review

Review of Some Concepts from Linear Algebra: Part 2

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

DS-GA 1002 Lecture notes 10 November 23, Linear models

Matrix decompositions

Linear Algebra Primer

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Linear Algebra Massoud Malek

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

1 Inner Product and Orthogonality

ECE 275A Homework #3 Solutions

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS

Large Scale Data Analysis Using Deep Learning

Vector and Matrix Norms. Vector and Matrix Norms

IV. Matrix Approximation using Least-Squares

FIXED POINT ITERATIONS

LEAST SQUARES SOLUTION TRICKS

Lecture 8: Linear Algebra Background

Chapter 0 Miscellaneous Preliminaries

14 Singular Value Decomposition

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

EECS 275 Matrix Computation

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Pseudoinverse & Moore-Penrose Conditions

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Solutions to Review Problems for Chapter 6 ( ), 7.1

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

5 Selected Topics in Numerical Linear Algebra

MATH 612 Computational methods for equation solving and function minimization Week # 2

Linear Least Squares. Using SVD Decomposition.

15 Singular Value Decomposition

Pseudoinverse & Orthogonal Projection Operators

6 The SVD Applied to Signal and Image Deblurring

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

Summary of Week 9 B = then A A =

STA141C: Big Data & High Performance Statistical Computing

Review of some mathematical tools

Linear Systems. Carlo Tomasi

8 The SVD Applied to Signal and Image Deblurring

Transcription:

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n}, such that A = UΣV T, where Σ R m n is a diagonal matrix with diagonal entries σ 1,...,σ p, that is, σ 1 0 0... 0 σ 2.. Σ =........ 0 σ m 0... 0 in case m n, σ 1 0 σ 2...... Σ = 0 σ n 0... 0.. 0... 0 in case m n. The values σ k are uniquely determined by A, are called the singular values of A. Idea of proof. We assume without loss of generality that A 0, else the assertion is trivial (we may choose Σ = 0 any orthogonal matrices U V ). Moreover, we note that the decomposition A = UΣV T is equivalent to stating that U T AV is a diagonal matrix with non-negative, decreasing diagonal entries. We now recall that the 2-norm of the matrix A is defined as (1) A 2 := max x 2=1 Ax 2. Let x R n be any x with x 2 = 1 where the maximum in (1) is attained. Define moreover σ 1 := A 2 = Ax 2 y := Ax = Ax. Ax 2 σ 1 Complete x to an orthonormal basis V = (x v 2... v n ) Date: November 2016, revised in October 2017. 1

2 MARKUS GRASMAIR of R n, y to an orthonormal basis U = (y u 2... u m ) of R m. Then the product U T AV is of the form ( U T σ1 w T ) AV = =: B 0  for some vector w R n 1 a matrix  R(m 1) (n 1). Now let ( ) σ1 z =. w Then therefore ( σ 2 Bz = 1 + w T ) w Âw Bz 2 2 = (σ 2 1 + w T w) 2 + Âw 2 2 (σ 2 1 + w T w) 2 = (σ 2 1 + w T w) z 2 2 σ 2 1 z 2 2 with equality holding only if w = 0. However, since U V are orthogonal, we have B 2 = A 2 = σ 1 therefore Therefore w = 0 we have Bz 2 2 σ 2 1 z 2 2. U T AV = ( ) σ1 0. 0  Using induction over p (or: applying the same idea to Â), we arrive at the claimed decomposition. Note here that the numbers σ k are indeed decreasing, as σ 1 = A 2  2 = σ 2. Remark 2. While this Theorem contains a constructive proof of the existence of the SVD, it is not that useful if one actually wants to compute an SVD either analytically or numerically. Better methods for the analytic computation will be discussed below in Section 2, while the numerical computation will be discussed later in this class. From the previous theorem it follows that we can write A = U Σ V T R m n R m m R m n R n n However, in particular if m n are very different, this decomposition of A contains lots zero columns or rows in Σ, which make the last columns of either U or V redundant. It can thus be an advantage to use instead a reduced singular value decomposition either of the form in case m > n with A = Û ˆΣV T σ 1 0... ˆΣ =... 0 σ n Û = (u 1 u 2... u n ) R m n,

THE SINGULAR VALUE DECOMPOSITION 3 or in case m < n with T A = U ˆΣ ˆV σ 1 0... ˆΣ =... 0 σ m ˆV = (v 1 v 2... v m ) R n m. That is, we reduce the rectangular matrix Σ to a square matrix containing the singular values, remove all the redundant columns from either U or V, depending on which is the bigger matrix. In the following, we will always use the reduced singular value decomposition, simply write this reduced decomposition as A = UΣV T. However, it is always necessary to keep in mind that one of the matrices U V will be rectangular. 2. Interpretation of the SVD Assume that A R m n has the singular value decomposition A = UΣV T. As a consequence, we have A T = V Σ T U T = V ΣU T, which is a singular value decomposition of A T. In particular, this implies that A A T have the same singular values, which in turn implies that A 2 = A T 2. Next we note that the matrices U T U V T V are always the identity matrices on the respective spaces. That is, if m n we have U R m n, V R n n, U T U = I n n = V T V, whereas if n m we have U R m m, V R n m, Thus, U T U = I m m = V T V. A T A = V ΣU T UΣV T = V Σ 2 V T AA T = UΣV T V ΣU T = UΣ 2 U T. This shows that the values σ 2 1,...,σ 2 r are exactly the non-zero eigenvalues of both the matrices A T A AA T with corresponding eigenvectors u k (for A T A) v k (for AA T ), respectively: Lemma 3. The non-zero singular values of A R m n are precisely the non-zero eigenvalues of any of the positive semi-definite matrices A T A AA T. For symmetric matrices, this immediately implies the following connection between singular values eigenvalues: Corollary 4. If A = A T R n n is a symmetric matrix, then its singular values are the (non-negative) square roots of its eigenvalues. If A is SPD, then its singular values eigenvalues are the same. The results above allow us to compute the singular value decomposition of a matrix A by computing eigenvalue decompositions of either A T A or AA T (depending on which of these is easier to compute, that is, which of these is the smaller matrix):

4 MARKUS GRASMAIR Example 5. Consider the matrix A = ( ) 1 c 0 1 with c 0. This matrix has a single geometric eigenvalue λ 1 = 1 with corresponding eigenvector (1, 0). A possible Jordan decomposition of A reads ( ) ( ) ( ) c 1 0 1 1 c 0 A =. 0 1 0 1 0 1 Next we will compute a singular value decomposition of A. To that end, we will compute first the eigenvalues eigenvectors of AA T (doing the computations with A T A would work fine as well). We have ( ) AA T 1 + c 2 c = c 1 with eigenvalues λ 1,2 = 1 + c2 2 ± c 2 + c4 4. As a consequence, the singular values of A are λ 1 λ 2, or σ 1 = 1 + c2 2 + c 2 + c4 σ 2 = 1 + c c2 4 2 2 + c4 4. In the particular case c = 8/3 (which noticeable simplifies all the calculations) we have σ 1 = λ 1 = 3 σ 2 = λ 2 = 1/3. Moreover, the eigenvalues of AA T corresponding to λ 1 λ 2 are u 1 = 1 ( ) 3 u 10 1 2 = 1 ( ) 1. 10 3 Thus we can write AA T = UΣ 2 U T = 1 ( ) ( ) ( ) 3 1 9 0 1 3 1. 10 1 3 0 1/9 10 1 3 Moreover, the matrix U in this decomposition of AA T can be chosen to be precisely the matrix U in the singular value decomposition A = UΣV T of A. Now the equation A = UΣV T implies that V = A T UΣ 1 = 1 ( ) 1 3. 10 3 1 We therefore obtain the singular value decomposition ( ) 1 8/3 = 1 ( ) ( ) ( ) 3 1 3 0 1 1 3. 0 1 10 1 3 0 1/3 10 3 1 Remark 6. We note that the singular value decomposition allows for a useful geometric interpretation of linear mappings: In the case m = n, the mappings U V are orthogonal, thus either rotations or reflections. In particular, the mappings U V leave the unit sphere unchanged. On the other h, the mapping Σ is diagonal therefore transforms the unit sphere to an ellipse with semi-axes of lengths σ 1,...,σ n parallel to the coordinate axes. In total, the mapping A = UΣV T transforms the unit sphere into an ellipse with semi-axes of lengths given by the singular values, parallel to u 1,...,u n. In the case n > m, the situation is similar, although the application of V T will be the composition of a rotation/reflection with a projection onto a lower-dimensional space, while for n < m the result will be an n-dimensional ellipse embedded in R m.

THE SINGULAR VALUE DECOMPOSITION 5 Remark 7. Another interpretation of the singular value decomposition can be obtained by considering it as a solution of an optimisation problem (similarly as in the proof of Theorem 1). Indeed, consider the problem (2) max 1 2 Ax 2 2 subject to 1 2 x 2 2 = 1 2. In order to solve this optimisation problem, we introduce its Lagrangian L(x, λ) = 1 2 Ax 2 2 λ 2 ( x 2 2 1 ). The possible cidates for the (primal-dual) solutions of (2) are then the pairs (x, λ) R n R satisfying the KKT conditions x L(x, λ) = 0 x 2 = 1, that is, the pairs (x, λ) satisfying the equations A T Ax λx = 0 x 2 = 1. The solutions of this equations are precisely the eigenvector eigenvalue pairs for the matrix A T A. Similarly, if we consider the problem (3) max 1 2 AT x 2 2 subject to 1 2 x 2 2 = 1 2, then we obtain as solutions precisely the eigenvector eigenvalue pairs for the matrix AA T. Since the singular values of A are exactly the square roots of the eigenvalues of A T A AA T, the singular vectors are the corresponding eigenvectors, this allows us to interpret singular values singular vectors as KKT points (or critical points) of the optimisation problems (2) (3). 3. Matrix Properties via the SVD An alternative way of formulating the singular value decomposition is to write p (4) A = σ k u k vk T, where p = min{m, n}, u k, v k denote the k-th column of U V, respectively. In particular, we obtain with this notation that p Ax = σ k (v k, x)u k for every x R n. Now let That is, we have r := max { k : σ k > 0 }. σ 1 σ 2... σ r > 0 = σ r+1 =... = σ p. Then the decomposition (4) shortens to r (5) A = σ k u k vk T, we have Ax = r σ r (v k, x)u k. Since the vectors u k are linearly independent, we immediately obtain the following: We have rank A = r. The range of A is Ran A = span{u 1,..., u r }.

6 MARKUS GRASMAIR The kernel of A is ker A = span{v 1,..., v r }. Additionally, we have seen in the proof of the existence of the singular value decomposition that A 2 = σ 1, it is possible to show that A F = (σ 2 1 +... + σ 2 r) 1/2. Here A F denotes the Frobenius norm of A given by ( A F = i,j a 2 ij) 1/2. Moreover, truncating the expansion (5) results in the best possible low rank approximations of A, as the following Theorem shows: Theorem 8. If A has the singular value decomposition A = UΣV T, then the matrix k A k := σ j u j vj T j=1 with 1 k p solves the optimisation problems Moreover min rank(b) k A B 2 min A B F. rank(b) k A A k 2 = σ k+1 ( r A A k F = j=k+1 σ 2 j ) 1/2. In other words, the first terms of the singular value decomposition provide the best low rank approximations of the matrix A both with respect to the 2-norm with respect to the Frobenius norm. 4. Pseudoinverses Assume now that A R m n has the singular value decomposition A = UΣV T, that m > n that A has full rank, that is, rank A = n. Σ R n n is invertible we can define A := V Σ 1 U T R n m, the pseudo-inverse (or Moore Penrose inverse) of A. Note that A A = V Σ 1 U T UΣV T = V V T = I n n, whereas AA = UΣV T V Σ 1 U T = UU T is the orthogonal projection onto the range of A. Then the matrix

THE SINGULAR VALUE DECOMPOSITION 7 Lemma 9. If m > n, rank A = n, b R m, then A b is the solution of the least-squares problem (6) min x R n Ax b 2 2. Proof. Since the matrix ΣV T R n n is invertible with inverse V Σ 1, the vector x solves (6), if only if we have x = V Σ 1 y, where y solves the optimisation problem min y R n Uy b 2 2. That is, Uy is simply the orthogonal projection of b onto the range of U, which means that Uy = UU T b. Since U R m n with n < m is injective, it follows that y = U T b. Thus x = V Σ 1 y = V Σ 1 U T b = A b is the unique solution of (6). Now assume that m < n, but that A R m n still has full rank (that is, rank A = m). Then we can again define A := V Σ 1 U T, as Σ R m m is invertible. However, in this situation we have AA = UU T = I m m, whereas A A = V V T is the projection onto Ran V = (ker A). Lemma 10. If m < n, rank A = m, b R m, then A b solves the optimisation problem (7) min x x 2 2 subject to Ax = b. Proof. First we note that x := A b satisfies Ax = AA b = b, which means that x is indeed admissible for (7). Now assume that y is another vector satisfying Ay = b. Then A Ay = A b = x therefore A Ay 2 = x 2. On the other h, A A = V V T is an orthogonal projection therefore A Ay 2 y 2 with equality only if A Ay = y. This shows that indeed x is the unique solution of (7). Now we note that in both situations discussed above we could alternatively write p r A := σ 1 k v ku T k = σ 1 k v ku T k. This last definition, however, is still meaningful if the matrix A does not have full rank. That is, for arbitrary A R m n of rank r we can define the pseudoinverse r A := σ 1 k v ku T k R n m. In this case, usually neither AA nor A A can be an identity matrix, but A still retains some semblance of an inverse of A, as the following result shows: Lemma 11. For every A R m n the following identities hold: (A ) = A, A AA = A, AA A = A.

8 MARKUS GRASMAIR Moreover, the vector A b comes as close to solving the linear system Ax = b as one can reasonably hope for: Theorem 12. If A R m n b R m, then x := A b solves the bilevel optimisation problem min x x 2 2 such that x solves min ˆx Aˆx b 2 2. In other words, application of the pseudoinverse A to b selects from all least squares solutions of the equation Ax = b the one with the smallest norm. 5. Truncated SVD the L-curve method Let now A R m n with pseudoinverse A R n m, assume that b R m. Denote moreover by x := A b the true solution of the equation Ax = b. In many practical applications, we have the problem that the right h side of this equation is subject to measurement errors that instead of the true data b we measure some noisy data b = b + n for some noise n R n of size n. If we use the pseudoinverse for solving the noisy system Ax = b, we then obtain a noisy solution that is, the error we obtain is Specifically, the worst case error is x = A b = A (b + n ) = x + A n, x x = A n. max x x 2 = max A n =, n 2 n 2 σ r with σ r being the smallest positive singular value of A. In case the matrix A has small non-zero singular values, this means that the error in the solution might be several orders of magnitude larger than the error in the data. Additionally, the mapping b A b will be very sensitive with respect to small variations in b. That is, small changes in the measurements might lead to huge changes in the supposed solution of the system. As a possible remedy, we may truncate the singular value decomposition of the matrix A: Given ε > 0 we define A ε := σ k u k vk T A ε = k:σ k ε k:σ k ε σ 1 k v ku T k. That is, we ignore all the singular values of A that are below the threshold ε. Using this truncated matrix for solving the noisy system Ax = b, we obtain a regularised solution x ε := A εb.

THE SINGULAR VALUE DECOMPOSITION 9 In order to estimate the quality of this regularised solution, we note that the worst case error is max x ε x 2 = max A ε(b + n ) x 2 n 2 n 2 A εb x 2 + max n 2 = A εb x 2 + min k:σ k ε A εb x 2 + ε. A εb 2 That is, the error in the solution splits into a regularised data error given by /ε a regularisation error given by A εb x 2. Now we note that we can write x = k (x, v k )v k =: k x k v k. σ k That is, x k are the coefficients of x with respect to the orthonormal basis v 1,...,v r of Ran A. With this notation we have x ε := A εb = x k v k therefore k:σ k ε ( A εb x 2 = x ε x 2 = (x k )2) 1/2. k:σ k <ε In other words, the regularisation error is exactly the norm of the coefficients of x (with respect to V ) corresponding to singular values smaller than ε. Apart from the actual computation of the SVD of A, a major challenge in this regularised inversion is the choice of the parameter ε. In the following we will briefly discuss one heuristic method, which is called the L-curve method. As a first step, we note that, instead of choosing the parameter ε, we may as well choose the number of singular values that are used in the truncated SVD of A. That is, we denote for k = 1,..., r k A k := σ 1 j v j u T j Then which implies that the mapping j=1 x (k) := A k b = x (k) 2 2 = is increasing in k. At the same time, therefore the mapping k j=1 k j=1 Ax (k) b = k x (k) 2 σ 1 j u j, b v j. σ 1 j u j, b 2, n j=k+1 u j, b u j, k Ax (k) b 2

10 MARKUS GRASMAIR is decreasing. Now an often observed behaviour of these mappings is that for small k the data fit Ax (k) b 2 significantly improves with k for relatively small increases of x (k) 2. However, once the true solution x is sufficiently well reconstructed, further increases in k will mainly reconstruct the noise. Therefore the norm x (k) 2 will increase significantly with a comparably small decrease of the residual Ax (k) b 2. The best reconstructions should therefore be obtained for parameters k for which neither x (k) 2 Ax (k) b 2 show large variations in k. Now consider a log-log-plot of these two mappings, that is, plot the curve given by the points ( (k) log( x 2 ), log( Ax (k) 2 ) ) R 2 for k = 1,..., r. Following the considerations above, one would expect that these points follow an L-shape, the corner of the L marks those parameters k for which the most plausible solutions can be expected to be obtained. Example 13 (Discrete deconvolution). Let k : R R be some bounded integrable function. Given a function g : [0, 1] R, we want to find a function f : [0, 1] R such that k f = g, that is, 1 0 f(y)k(x y) dy = g(x) for all x [0, 1]. We discretise this equation by using the midpoint rule (the choice of the quadrature rule does not fundamentally change the results) obtain the system of equations 1 n k(x i x j )f j = g i n j=1 with x j = (j 1/2)/n, j = 1,..., n, f j f(x j ), g i := g(x i ). Consider in particular the case k(x) = e x2 /2. That is, the function k is, up to scaling, the stard Gaussian kernel of variance 1. In this case, it turns out that already with n = 100, the resulting system of equations is sufficiently ill-posed as to yield useless results even if the only error comes from rounding errors due to computations with only double precision. We assume in the following that the true solution is f (x) = x 2 (1 x 2 ) that we are given exact (that is, exact up to machine precision) data g = k f. As can be seen in Figure 1, the unregularised solution of the equation k f = g is essentially useless. However, using a truncated singular value decomposition, one may obtain an almost perfect reconstruction of f. In Figure 1 the reconstructions f (k) using k = 8, k = 33, k = 50 singular values are shown. For k = 8 k = 33, the reconstructions capture the actual shape of f reasonably well, for values of k between 9 32, the reconstructions become visually indistinguishable from f. In the case of noisy data g = k f + n the situtation is even worse, it is only possible to obtain reasonable reconstructions with a very small number of singular values. As an example, consider the situation shown in Figure 2. Here, using k = 9 singular values for the reconstruction provides a reasonable result, in which the error is only slightly larger than the measurement error. For k = 15, however, the solution is dominated by large oscillations essentially useless. In Figure 3, the L-curves both for the noise-free the noisy case are shown. The former predicts that the best reconstructions in the noise-free case can be found with k 30, whereas the latter predicts good reconstructions for k 9. In both cases, these predictions match reality surprisingly well.

THE SINGULAR VALUE DECOMPOSITION 11 Figure 1. First row, left: true solution f (x) = x 2 (1 x 2 ) for 0 x 1; First row, right: given noise-free data g = k f with k being a Gaussian kernel of variance 1; Second row, left: solution of the discretised equation k f = g without any regularisation; Second row, right: regularised solution using TSVD with 8 singular values; Last row, left: regularised solution using TSVD with 33 singular values; Last row, right: regularised solution using TSVD with 50 singular values. In the second last row, the red curve shows always the true solution f, while the blue curve shows the reconstruction f (k). 6. Regularisation In the following, we will briefly discuss an alternative interpretation of the truncated SVD, which allows for the definition of more general families of regularisation methods.

12 MARKUS GRASMAIR Figure 2. First row, left: true solution f (x) = x 2 (1 x 2 ) for 0 x 1; First row, right: the blue curve shows the noise-free data g = k f, the red curve shows noisy data g = g+n with n being a realisation of an i.i.d. Gaussian rom variable with stard deviation = 0.002; Last row, left: regularised solution from noisy data using TSVD with 9 singular values; Last row, right: regularised solution from noisy data using TSVD with 15 singular values. The red curve shows the true solution f, while the blue curve shows the reconstruction f (k). Figure 3. L-curves for the numerical examples depicted in Figures 1 2. The left figure shows the L-curve in the noise-free case. Here the corner of the L is formed by the reconstructions with 28 31 singular values. The left figure shows the L-curve in the noisy case. Here the corner of the L is formed by the reconstruction with 9 singular values. We start by writing A = V σ1 1 0...... σ 1 r 0... 0 0 U T.

THE SINGULAR VALUE DECOMPOSITION 13 Define now the function let Then we can equivalently write Moreover, setting f : R 0 R 0, f(s) = f(σ 1 ) 0... f(σ) =... f ε (s) := A = V f(σ)u T. { 1/s if s > 0, 0 if s = 0, f(σ p ) { 1/s if s ε, 0 if s < ε, we obtain that A ε = V f ε (Σ)U T. That is, the approximation A ε to the pseudoinverse of A can be obtained by approximating the function f (which essentially is the mapping s 1/s) by means of the bounded function f ε. Because of the boundedness of f ε the approximation is more stable than A itself, but the increased stability comes at the cost of a potential decrease in accuracy in case of a low noise level. This interpretation of the truncated singular value decomposition allows us to consider alternative regularisation methods by using different approximations of f. Amongst the most important are: Lavrentiev regularisation defined by the function { 1 g λ (s) := λ+s if s > 0, 0 if s = 0.. Instead of truncating the function 1/s, we shift it to the left thus get rid of the singularity at zero. This method is particularly interesting if A R n n is a positive definite square matrix, in which case V g λ (Σ)U T = (λ I +A) 1. That is, the regularised solution x λ of Ax = b can be found by solving the system (λ I +A)x = b. Tikhonov regularisation defined by the function h α (s) := s s 2 + α. In this case, V h α (Σ)U T = (α I n n +A T A) 1 A T. That is, the regularised solution x α can be found by solving the system (α I +A T A)x = A T b. Department of Mathematical Sciences, Norwegian University of Science Technology, 7491 Trondheim, Norway E-mail address: markus.grasmair@ntnu.no