MATH 3795 Lecture 10. Regularized Linear Least Squares.

MATH 3795 Lecture 10. Regularized Linear Least Squares. Dmitriy Leykekhman Fall 2008 Goals Understanding the regularization. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 1

Review. Consider the linear least square problem From the last lecture: min Ax x R b 2 2. n Let A = UΣV T be the Singular Value Decomposition of A R m n with singular values σ 1 σ r > σ r+1 = = σ min {m,n} = 0 The minimum norm solution is x = r i=1 u T i b v i σ i If even one singular value σ i is small, then small perturbations in b can lead to large errors in the solution. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 2

Regularized Linear Least Squares Problems. If σ 1 /σ r 1, then it might be useful to consider the regularized linear least squares problem (Tikhonov regularization) 1 min x R n 2 Ax b 2 2 + λ 2 x 2 2. Here λ > 0 is the regularization parameter. The regularization parameter λ > 0 is not known a-priori and has to be determined based on the problem data. See later. Observe that 1 min x 2 Ax b 2 2 + λ ( 2 x 2 2 = min x ) ( λi A b x 0 ) 2. 2 D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 3

Regularized Linear Least Squares Problems. Thus 1 min x 2 Ax b 2 2 + λ ( ) ( ) 2 x 2 2 = min λi A b 2 x x. (1) 0 2 For λ > 0 the matrix ( A λi ) R (m+n) n has always full rank n. Hence, for λ > 0, the regularized linear least squares problem (1) has a unique solution. The normal equation corresponding to (1) are given by ( A λi ) T ( ) ( ) T ( λi A A x = (A T A+λI)x = A T b = λi b 0 ) T. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 4

Regularized Linear Least Squares Problems. SVD Decomposition: A = UΣV T, where U R m m and V R n n are orthogonal matrices and Σ R m n is a diagonal matrix with diagonal entries Thus the normal to (1) can be written as σ 1 σ r > σ r+1 = = σ min {m,n} = 0. (A T A + λi)x λ = A T b, (V Σ T U }{{ T U } ΣV T + λ }{{} I )x λ = V Σ T U T b. =I =V V T D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 5

Regularized Linear Least Squares Problems. SVD Decomposition: A = UΣV T, where U R m m and V R n n are orthogonal matrices and Σ R m n is a diagonal matrix with diagonal entries Thus the normal to (1) can be written as σ 1 σ r > σ r+1 = = σ min {m,n} = 0. (A T A + λi)x λ = A T b, (V Σ T U }{{ T U } ΣV T + λ }{{} I )x λ = V Σ T U T b. =I =V V T Rearranging terms we obtain V (Σ T Σ + λi)v T x λ = V Σ T U T b, multiplying both sides by V T from the left and setting z = V T x λ we get (Σ T Σ + λi)z = Σ T U T b. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 5

Regularized Linear Least Squares Problems. The normal equation corresponding to (1), is equivalent to where z = V T x λ. Solution z i = (A T A + λi)x λ = A T b, (Σ T Σ + λi) z = Σ T U T b. }{{} diagonal { σi(u T i b), i = 1,..., r, σi 2 +λ 0, i = r + 1,..., n. Since x λ = V z = n i=1 z iv i, the solution of the regularized linear least squares problem (1) is given by x λ = r i=1 σ i (u T i b) σ 2 i + λ v i. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 6

Regularized Linear Least Squares Problems. Note that lim x λ = lim λ 0 λ 0 r i=1 σ i (u T i b) σ 2 i + λ v i = r i=1 u T i b σ i v i = x i.e., the solution of the regularized LLS problem (1) converges to the minimum norm solution of the LLS problem as λ goes to zero. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 7

Regularized Linear Least Squares Problems. Note that lim x λ = lim λ 0 λ 0 r i=1 σ i (u T i b) σ 2 i + λ v i = r i=1 u T i b σ i v i = x i.e., the solution of the regularized LLS problem (1) converges to the minimum norm solution of the LLS problem as λ goes to zero. The representation x λ = r i=1 σ i (u T i b) σ 2 i + λ v i of the solution of the regularized LLS also reveals the regularizing property of adding the term λ 2 x 2 2 to the (ordinary) least squares functional. We have that σ i (u T i b) σ 2 i + λ { 0, if 0 σi λ u T i b σ i, if σ i λ. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 7

Regularized Linear Least Squares Problems. Hence, adding λ 2 x 2 2 to the (ordinary) least squares functional acts as a filter. Contributions from singular values which are large relative to the regularization parameter λ are left (almost) unchanged whereas contributions from small singular values are (almost) eliminated. It raises an important question: How to choose λ? D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 8

Regularized Linear Least Squares Problems. Suppose that the data are b = b ex + δb. We want to compute the minimum norm solution of the (ordinary) LLS with unperturbed data b ex r u T i xex = b v i σ i but we can only compute with b = b ex + δb, we don t know b ex. i=1 The solution of the regularized least squares problem is x λ = r ( σi (u T i b ex) i=1 σ 2 i + λ + σ i(u T i δb) ) σi 2 + λ v i. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 9

Regularized Linear Least Squares Problems. We observed that r i=1 On the other hand σ i (u T i δb) σ 2 i + λ σ i (u T i b ex) σ 2 i + λ x ex as λ 0. { 0, if 0 σi λ u T i δb σ i, if σ i λ, which suggests to choose λ sufficiently large to ensure that errors δb in the data are not magnified by small singular values. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 10

for i = 0:7 % solve regularized LLS for different lambda lambda(i+1) = 10^(-i) xlambda = V * (sigma.*(u *b)./ (sigma.^2 + lambda(i+1))) err(i+1) = norm(xlambda - xex); end D. Leykekhman loglog(lambda,err, * ); - MATH 3795 Introduction to Computational Mathematics ylabel( x^{ex} Linear Least Squares - 11 x_{\lambda} _2 ); xl Regularized Linear Least Squares Problems. % Compute A t = 10.^(0:-1:-10) ; A = [ ones(size(t)) t t.^2 t.^3 t.^4 t.^5]; % compute exact data xex = ones(6,1); bex = A*xex; % data perturbation of 0.1% deltab = 0.001*(0.5-rand(size(bex))).*bex; b = bex+deltab; % compute SVD of A [U,S,V] = svd(a); sigma = diag(s);

Regularized Linear Least Squares Problem. The error x ex x λ 2 for different values of λ (loglog-scale): D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 12

Regularized Linear Least Squares Problem. The error x ex x λ 2 for different values of λ (loglog-scale): For this example λ 10 3 is a good choice for the regularization parameter λ. However, we could only create this figure with the knowledge of the desired solution x ex. How can we determine a λ 0 so that x ex x λ 2 is small without knowledge of x ex. One approach is the Morozov discrepancy principle. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 12

Regularized Linear Least Squares Problem. Suppose b = b ex + δb. We do not know the perturbation δb, but we assume that we know its size δb. Suppose the unknown desired solution x ex satisfies Ax ex = b ex. Hence, Ax ex b = Ax ex b ex δb = δb. Since the exact solution satisfies Ax ex b = δb we want to find a regularization parameter λ 0 such that the solution x λ of the regularized least squares problem satisfies Ax λ b = δb This is Morozov s discrepancy principle. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 13

Regularized Linear Least Squares Problem. Let s see now this works for the previous Matlab example. The error x ex x λ 2 for different values of λ (log-log-scale) The residual Ax λ b 2 and δb 2 for different values of λ (log-log- scale) D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 14

Regularized Linear Least Squares Problem. Morozov s discrepancy principle: Find λ 0 such that Ax λ b = δb D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 15

Regularized Linear Least Squares Problem. Morozov s discrepancy principle: Find λ 0 such that Ax λ b = δb To compute Ax λ b for given λ 0 we need to solve a regularized linear least squares problem 1 min x 2 Ax b 2 2 + λ ( ) ( 2 x 2 2 = min λi A b x x 0 ) 2 2 to get x λ and then we have to compute Ax λ b. Let f(λ) = Ax λ b δb. Finding λ 0 such that f(λ) = 0 is a root finding problem. We will discuss in the future how to solve such problems. In this case f maps a scalar λ into a scalar f(λ) = Ax λ b δb, but the evaluation of f requires the solution of a regularized LLS problems and can be rather expensive. D. Leykekhman - MATH 3795 Introduction to Computational Mathematics Linear Least Squares 15