Applied Numerical Linear Algebra. Lecture 8 1/ 45
Perturbation Theory for the Least Squares Problem When A is not square, we define its condition number with respect to the 2-norm to be k 2 (A) σ max (A)/σ min (A). This reduces to the usual condition number when A is square. The next theorem justifies this definition. 2/ 45
THEOREM 3.4. Suppose that A is m-by-n with m n and has full rank. Suppose that x minimizes Ax b 2. Let r = A x b be the residual. Let x minimize (A + δa) x (b + δb) 2. Assume ǫ max( δa 2 b 2, δb 2 b 2 ) < 1 k = σmin(a) 2(A) σ. Then max(a) { } x x 2 k2 (A) ǫ + tan θ k2 2 x cosθ (A) + O(ǫ 2 ) ǫ k LS + O(ǫ 2 ), where sinθ = r 2 b 2. In other words, θ is the angle between the vectors b and Ax and measures whether the residual norm r 2 is large (near b ) or small (near 0). k LS is the condition number for the least squares problem. Sketch of Proof. Expand x = ((A + δa) T (A + δa)) 1 (A + δa) T (b + δb) in powers of δa and δb, and throw away all but the linear terms in δa and δb. 3/ 45
Householder Transformations A Householder transformation (or reflection) is a matrix of the form P = I 2uu T where u 2 = 1. It is easy to see that P = P T and P P T = (I 2uu T )(I 2uu T ) = I 4uu T + 4uu T uu T = I, so P is a symmetric, orthogonal matrix. It is called a reflection because Px is reflection of x in the plane through 0 perpendicular to u. 4/ 45
Given a vector x, it is easy to find a Householder reflection P = I 2uu T to zero out all but the first entry of x: Px = [c, 0,..., 0] T = c e 1. We do this as follows. Write Px = x 2u(u T x) = c e 1 so that u = 1 2(u T x) (x ce 1), i.e., u is a linear combination of x and e 1. Since x 2 = Px 2 = c, u must be parallel to the vector ũ = x ± x 2 e 1, and so u = ũ/ ũ 2. One can verify that either choice of sign yields a u satisfying Px = ce 1, as long as ũ 0. We will use ũ = x + sign(x 1 )e 1, since this means that there is no cancellation in computing the first component of u. In summary, we get ũ = x 1 + sign(x 1 ) x 2 x 2. x n with u = ũ ũ 2. We write this as u = House(x). (In practice, we can store ũ instead of u to save the work of computing u, and use the formula P = I (2/ ũ 2 2 )ũũt instead of P = I 2uu T.) 5/ 45
EXAMPLE 3.5. We show how to compute the QR decomposition of a 5-by-4 matrix A using Householder transformations. This example will make the pattern for general m-by-n matrices evident. In the matrices below, P i is an orthogonal matrix, x denotes a generic nonzero entry, and o denotes a zero entry. 1. Choose P 1 so x x x x o x x x A 1 P 1 A = o x x x o x x x. o x x x 6/ 45
[ ] 1 0 2. Choose P 2 = 0 P 2 so 3. Choose P 3 = A 2 P 2 A 1 = 1 0 1 0 P 3 A 3 P 3 A 2 = x x x x o x x x o o x x o o x x o o x x so x x x x o x x x o o x x o o o x o o o x.. 7/ 45
4. Choose P 4 = 1 1 0 1 0 P 4 A 3 P 4 A 3 = so x x x x o x x x o o x x o o o x o o o o Here, we have chosen a Householder matrix P i to zero out the subdiagonal entries in column i; this does not disturb the zeros already introduced in previous columns. Let us call the final 5-by-4 upper triangular matrix R A 4. Then A = P T 1 PT 2 PT 3 P 4 R = QR, where Q is the first four columns of P T 1 PT 2 PT 3 PT 4 = P 1P 2 P 3 P 4 (since all P i are symmetric) and R is the first four rows of R.. 8/ 45
Here is the general algorithm for QR decomposition using Householder transformations. ALGORITHM 3.2. QR factorization using Householder reflections: for i = 1 to min(m 1,n) u i = House(A(i : m,i)) P i = I 2u i ui T A(i : m,i : n) = P i A(i : m,i : n) end for 9/ 45
Here are some more implementation details. We never need to form P i explicitly but just multiply (I 2u i u T i )A(i : m, i : n) = A(i : m, i : n) 2u i (u T i A(i : m, i : n)), which costs less. To store P i, we need only u i, or ũ i and ũ i. These can be stored in column i of A; in fact it need not be changed! Thus QR can be overwritten on A, where Q is stored in factored form P 1,...,P n 1, and P i is stored as ũ i below the diagonal in column i of A. (We need an extra array of length n for the top entry of ũ i, since the diagonal entry is occupied by R ii ). To solve the least squares problem min Ax b 2 using A = QR, we need to compute Q T b. This is done as follows: Q T b = P n P n 1 P 1 b, so we need only keep multiplying b by P 1, P 2,..., P n : for i = 1 to n γ = 2 u T i b(i : m) b(i : m) = b(i : m) + γu i end for 10/ 45
The cost is n dot products γ = 2 u T i b and n saxpys b + γu i. The cost of computing A = QR this way is 2n 2 m 2 3 n3, and the subsequent cost of solving the least squares problem given QR is just an additional O(mn). The LAPACK routine for solving the least squares problem using QR is sgels. Just as Gaussian elimination can be reorganized to use matrix-matrix multiplication and other Level 3 BLAS, the same can be done for the QR decomposition. In Matlab, if the m-by-n matrix A has more rows than columns and b is m by 1, A \ b solves the least squares problem. The QR decomposition itself is also available via [Q, R] = qr(a). 11/ 45
QR decomposition using Householder reflections We can use Householder reflections to calculate the QR factorization of an m-by-n matrix A with m n. Let x be an arbitrary real m-dimensional column vector of A such that x = α for a scalar α. If the algorithm is implemented using floating-point arithmetic, then α should get the opposite sign as the k-th coordinate of x, where x k is to be the pivot coordinate after which all entries are 0 in matrix A s final upper triangular form, to avoid loss of significance. In the complex case, set α = e iarg x k x (Stoer Bulirsch 2002, p. 225) and substitute transposition by conjugate transposition in the construction of Q below. 12/ 45
Then, where e 1 is the vector (1, 0,..., 0) T, is the Euclidean norm and I is an m-by-m identity matrix, set In the case of complex A set u = x + αe 1, v = u u, Q = I 2vv T. Q = I (1 + w)vv H, where w = x H v/v H x and where x H is the conjugate transpose (transjugate) of x, Q is an m-by-m Householder matrix and Qx = (α, 0,, 0) T. 13/ 45
This can be used to gradually transform an m-by-n matrix A to upper triangular form. First, we multiply A with the Householder matrix Q 1 we obtain when we choose the first matrix column for x. This results in a matrix Q 1 A with zeros in the left column (except for the first row). α 1... 0 Q 1 A =. A 0 14/ 45
This can be repeated for A (obtained from Q 1 A by deleting the first row and first column), resulting in a Householder matrix Q 2. Note that Q 2 is smaller than Q 1. Since we want it really to operate on Q 1 A instead of A we need to expand it to the upper left, filling in a 1, or in general: ( ) Ik 1 0 Q k = 0 Q k. After t iterations of this process, t = min(m 1, n), R = Q t Q 2 Q 1 A is an upper triangular matrix. So, with Q = Q T 1 QT 2 QT t, A = QR is a QR decomposition of A. 15/ 45
Example how to compute QR decomposition using Householder reflections. Let us calculate the decomposition of 12 51 4 A = 6 167 68. 4 24 41 First, we need to find a reflection that transforms the first column of matrix A, vector a 1 = (12, 6, 4) T, to a 1 e 1 = (14, 0, 0) T. Now, u = x + αe 1, and v = u u. 16/ 45
Here, Therefore and v = 1 14 ( 1, 3, 2) T, and then α = 14 and x = a 1 = (12, 6, 4) T u = ( 2, 6, 4) T = (2)( 1, 3, 2) T 2 Q 1 = I 1 3 ( 1 3 2 ) 14 14 2 = I 1 1 3 2 3 9 6 7 2 6 4 6/7 3/7 2/7 = 3/7 2/7 6/7. 2/7 6/7 3/7 17/ 45
Now observe: 14 21 14 Q 1 A = 0 49 14, 0 168 77 so we already have almost a triangular matrix. We only need to zero the (3, 2) entry. Take the (1, 1) minor, and then apply the process again to ( ) A 49 14 = M 11 =. 168 77 By the same method as above, we obtain the matrix of the Householder transformation Q 2 = 1 0 0 0 7/25 24/25 0 24/25 7/25 after performing a direct sum with 1 to make sure the next step in the process works properly. 18/ 45
Now, we find 6/7 69/175 58/175 Q = Q1 T Q2 T = 3/7 158/175 6/175 2/7 6/35 33/35 Then 0.8571 0.3943 0.3314 Q = Q1 T Q2 T = 0.4286 0.9029 0.0343 0.2857 0.1714 0.9429 14 21 14 R = Q 2 Q 1 A = Q T A = 0 175 70. 0 0 35 The matrix Q is orthogonal and R is upper triangular, so A = QR is the required QR-decomposition. 19/ 45
Tridiagonalization using Householder transformation This procedure is taken from the book: Numerical Analysis, Burden and Faires, 8th Edition. In the first step, to form the Householder matrix in each step we need to determine α and r, which are given by: n α = sgn(a 21 ) aj1 2 ; r = j=2 1 2 (α2 a 21 α); 20/ 45
From α and r, construct vector v: where v 1 = 0;, v 2 = a21 α 2r, and Then compute: v k = a k1 2r and obtaing matrix A (1) as v 1 v (1) = v 2..., v n for each k = 3, 4..n P 1 = I 2v (1) (v (1) ) t A (1) = P 1 AP 1 21/ 45
Having found P 1 and computed A (1) the process is repeated for k = 2, 3,..., n as follows: n α = sgn(a k+1,k ) ajk 2 ; r = v k j = ak jk 2r j=k+1 1 2 (α2 a k+1,k α); v k 1 = vk 2 =.. = vk k = 0; v k k+1 = ak k+1,k α 2r for j = k + 2; k + 3,..., n P k = I 2v (k) (v (k) ) t A (k+1) = P k A (k) P k 22/ 45
Example 1 Orthogonal matrices In this example, the given matrix A is transformed to the similar tridiagonal matrix A 1 by using Householder Method. We have A = 5 1 0 1 6 3, 0 3 7 23/ 45
Steps: 1. First compute α as n α = sgn(a 21 ) aj1 (a 2 = 21 2 + a2 31 ) = (1 2 + 0 2 ) = 1. j=2 2. Using α we find r as r = 1 2 (α2 a 21 α) = 1 2 (( 1)2 1 ( 1)) = 1. 24/ 45
3. From α and r, construct vector v: where v 1 = 0;, v 2 = a21 α 2r, and v 1 v (1) = v 2..., v n v k = a k1 2r for each k = 3, 4..n 25/ 45
To do that we compute: v 1 = 0, v 2 = a 21 α 2r v 3 = a 31 2r = 0. = 1 ( 1) 2 1 = 1, and we have 0 v (1) = 1, 0 26/ 45
Then compute matrix P 1 P 1 = I 2v (1) (v (1) ) T and P 1 = 1 0 0 0 1 0 0 0 1 After that we can obtain matrix A (1) as A (1) = P 1 AP 1 = 5 1 0 1 6 3 0 3 7. 27/ 45
Example 2 Orthogonal matrices In this example, the given matrix A is transformed to the similar tridiagonal matrix A 2 by using Householder Method. We have 4 1 2 2 A = 1 2 0 1 2 0 3 2, 2 1 2 1 28/ 45
Steps: 1. First compute α as n α = sgn(a 21 ) aj1 (a 2 = ( 1) 21 2 + a2 31 + a2 41 ) j=2 = 1 (1 2 + ( 2) 2 + 2 2 ) = ( 1) 1 + 4 + 4 = 9 = 3. 2. Using α we find r as r = 1 2 (α2 a 21 α) = 1 2 (( 3)2 1 ( 3)) = 6. 29/ 45
3. From α and r, construct vector v: where v 1 = 0;, v 2 = a21 α 2r, and v 1 v (1) = v 2..., v n v k = a k1 2r for each k = 3, 4..n 30/ 45
To do that we compute: v 1 = 0, v 2 = a 21 α 2r = 1 ( 3) 2 6 v 3 = a 31 2r = 2 1 6 = 2 6 v 4 = a 41 2r = 2 2 6 = 1. 6 = 2 6 and we have v (1) = 0 2 6 1 6 1 6, 31/ 45
Then compute matrix P 1 0 2 P 1 = I 2v (1) (v (1) ) T = I 2 6 1 6 1 6 [ ] 2 0 6 1 6 1 6 1 0 0 0 and P 1 = 0 1/3 2/3 2/3 0 2/3 2/3 1/3 0 2/3 1/3 2/3 After that we can obtain matrix A (1) as A (1) = P 1 AP 1 32/ 45
Thus, the first Householder matrix: 1 0 0 0 P 1 = 0 1/3 2/3 2/3 0 2/3 2/3 1/3, 0 2/3 1/3 2/3 4 3 0 0 A 1 = P 1 AP 1 = 3 10/3 1 4/3 0 1 5/3 4/3, 0 4/3 4/3 1 33/ 45
Used A 1 to form 1 0 0 0 P 2 = 0 1 0 0 0 0 3/5 4/5, 0 0 4/5 3/5 4 3 0 0 A 2 = P 2 A 1 P 2 = 3 10/3 5/3 0 0 5/3 33/25 68/75, 0 0 68/75 149/75 As we can see, the final result is a tridiagonal symmetric matrix which is similar to the original one. The process finished after 2 steps. 34/ 45
Given s Rotation Orthogonal matrices A Givens rotation is represented by a matrix of the form 1 0 0 0........... 0 c s 0 G(i, j, θ) =....... 0 s c 0....... 0 0 0 1 where c = cos() and s = sin() appear at the intersections i-th and j-th rows and columns. 35/ 45
That is, the non-zero elements of Givens matrix is given by: g k k = 1 for k i, j (1) g i i = c (2) g j j = c (3) g j i = s (4) g i j = s for i > j (5) (sign of sine switches for j > i) 36/ 45
Given s Rotation Orthogonal matrices The product G(i, j, θ)x represents a counterclockwise rotation of the vector x in the (i, j) plane of θ radians, hence the name Givens rotation. When a Givens rotation matrix G multiplies another matrix, A, from the left, GA, only rows i and j of A are affected. Thus we restrict attention to the following problem. Given a and b, find c = cosθ and s = sinθ such that [ c s s c ][ a = b] [ r 0]. Explicit calculation of θ is rarely necessary or desirable. Instead we directly seek c, s, and r. An obvious solution would be r = a 2 + b 2 (6) c = a/r (7) s = b/r. (8) 37/ 45
Example Orthogonal matrices Given the following 3x3 Matrix, perform two iterations of the Given s Rotation to bring the matrix to an upper Triangular matrix. A = 6 5 0 5 1 4 0 4 3 In order to form the desired matrix, we must zero elements (2,1) and (3,2). We first select element (2,1) to zero. Using a rotation matrix of: G 1 = c s 0 s c 0 0 0 1 38/ 45
We have the following matrix multiplication: c s 0 s c 0 6 5 0 5 1 4 0 0 1 0 4 3 Where: r = 6 2 + 5 2 = 7.8102 (9) c = 6/r = 0.7682 (10) s = 5/r = 0.6402 (11) Plugging in these values for c and s and performing the matrix multiplication above yields a new A of: 7.8102 4.4813 2.5607 A = 0 2.4327 3.0729 0 4 3 We now want to zero element (3,2) to finish off the process. Using the same idea as before, we have a rotation matrix of: G 2 = 1 0 0 0 c s 0 s c 39/ 45
We are presented with the following matrix multiplication: 1 0 0 7.8102 4.4813 2.5607 0 c s 0 2.4327 3.0729 0 s c 0 4 3 Where: r = ( 2.4327) 2 + 4 2 = 4.6817 (12) c = 2.4327/r = 0.5196 (13) s = 4/r = 0.8544 (14) Plugging in these values for c and s and performing the multiplications gives us a new matrix of: 7.8102 4.4813 2.5607 R = 0 4.6817 0.9664 0 0 4.1843 40/ 45
Calculating the QR decomposition This new matrix R is the upper triangular matrix needed to perform an iteration of the QR decomposition. Q is now formed using the transpose of the rotation matrices in the following manner: Q = G T 1 GT 2 Performing this matrix multiplication yields: 0.7682 0.3327 0.5470 Q = 0.6402 0.3992 0.6564 0 0.8544 0.5196 41/ 45
Rank-deficient Least Squares Problems Proposition 3.1 Let A be m by n with m n and rank A = r < n. Then there is an n r dimensional set of vectors that minimize Ax b 2. Proof. Let Az = 0. Then of x minimizes Ax b 2 then x + z also minimizes A(x + z) b 2. This means that the least-squares solution is not unique. 42/ 45
Proposition 3.2 Let σ min > 0 is the smallest singular value of A. Then 1. If x minimizes Ax b 2, then x 2 ut n b σ min where u n is the last column of U in SVD decomposition of A = UΣV T. 2. Changing b to b + δb can change x to x + δx where δx 2 is as large as δb 2 σ min, or the solution is very ill-conditioned. Proof. 1: Ax = b, then x = A 1 b. Using svd of A we can write UΣV T x = b and thus x = (UΣV T ) 1 b = VΣ 1 U T b since UU T = I, VV T = I. The matrix A + = VΣ 1 U T is Moore-Penrose pseodoinverse of A. Thus x = VΣ 1 U T b = A + b. Then x 2 = Σ 1 U T b 2 (Σ 1 U T b) n = ut n b σ min. 2. We have x + δx 2 = Σ 1 U T (b + δb) 2 (Σ 1 U T (b + δb)) n = u T n (b+δb) σ min = ut n b σ min δx 2 δb 2 σ min. + ut n δb σ min. Choose δb which is parallel to u n. Then 43/ 45
Proposition 3.3 When A is exactly singular, then x that minimize Ax b 2 can be characterized as follows. Let A = UΣV T have rank r < n. Write svd of A as [ Σ1 0 A =[U 1, U 2 ] 0 0 ] [V 1, V 2 ] T = U 1 Σ 1 V T 1 Here, size(σ 1 ) = r r and is nonsingular, U 1 and V 1 have r columns. Let σ = σ min (Σ 1 ).Then All solutions x can be written as x = V 1 Σ 1 1 UT 1 + V 3z The solution x has minimal norm x 2 when z = 0. Then x = V 1 Σ 1 1 UT 1 and x 2 b 2 σ. Changing b to b + δb can change x as δb 2 σ. 44/ 45
Proof. Choose [U,Ũ] = [U 1,U 2,Ũ] be an m m orthogonal matrix. Then Ax b 2 2 = [U 1,U 2,Ũ]T (Ax b) 2 2 = [U 1,U 2,Ũ]T (U 1 Σ 1 V T 1 x b) 2 2 = [I r r,o m (n r),0 m m n ] T (Σ 1 V T 1 x [U 1,U 2,Ũ]T b) 2 2 = [Σ 1 V T 1 x UT 1 b; UT 2 b; ŨT b] T 2 2 = Σ 1 V T 1 x U T 1 b 2 2 + U T 2 b 2 2 + ŨT b 2 2 Then Ax b 2 is minimized when Σ 1 V1 Tx U 1b = 0 or x = (Σ 1 V1 T) 1 U1 Tb or x = V 1Σ 1 1 UT 1 b + V 3z, where V 3 z = V1 TV 2z = 0 45/ 45