Chapter 7. Lanczos Methods. 7.1 The Lanczos Algorithm

Size: px

Start display at page:

Download "Chapter 7. Lanczos Methods. 7.1 The Lanczos Algorithm"

Juliana Houston
5 years ago
Views:

1 Chapter 7 Lanczos Methods In this chapter we develop the Lanczos method, a technique that is applicable to large sparse, symmetric eigenproblems The method involves tridiagonalizing the given matrix A However, unlike the Householder approach, no intermediate (an full) submatrices are generated Equally important, information about A s extremal eigenvalues tends to emerge long before the tridiagonalization is complete This makes the Lanczos algorithm particularly useful in situations where a few of A s largest or smallest eigenvalues are desired 71 The Lanczos Algorithm Suppose A R n n is large, sparse and symmetric There exists an orthogonal matrix Q, which transforms A to a tridiagonal matrix T Q T AQ = T tridiagonal (711) Remark 711 (a) Such Q can be generated by Householder transformations or Givens rotations (b) Almost for all A (ie all eigenvalues are distinct) and almost for any q 1 R n with q 1 2 = 1, there exists an orthogonal matrix Q with first column q 1 satisfying (711) q 1 determines T uniquely up to the sign of the columns (that is, we can multiply each column with -1) Let (x R n ) K[x, A, m] is called a Krylov-matrix Let K[x, A, m] = [x, Ax, A 2 x,, A m 1 x] R n m (712) K(x, A, m) = Range(K[x, A, m]) = Span(x, Ax,, A m 1 x) (713) K(x, A, m) is called the Krylov-subspace generated by K[x, A, m] Remark 712 For each H C n m or R n m (m n) with rank(h) = m, there exists an Q C n m or R n m and an upper triangular R C m m or R m m with Q Q = I m such that H = QR (714)

2 262 Chapter 7 Lanczos Methods Q is uniquely determined, if we require all r ii > 0 Theorem 711 Let A be symmetric (Hermitian), 1 m n be given and dimk(x, A, m) = m then (a) If K[x, A, m] = Q m R m (715) is an QR factorization, then Q maq m = T m is an m m tridiagonal matrix and satisfies AQ m = Q m T m + r m e T m, Q mr m = 0 (716) (b) Let x 2 = 1 If Q m C n m with the first column x and Q mq m = I m and satisfies where T m is tridiagonal, then AQ m = Q m T m + r m e T m, K[x, A, m] = [x, Ax,, A m 1 x] = Q m [e 1, T m e 1,, T m 1 m e 1 ] (717) is an QR factorization of K[x, A, m] Proof: (a) Since From (715), we have AK(x, A, j) K(x, A, j + 1), j < m (718) Span(q 1,, q i ) = K(x, A, i), i m (719) So we have This implies That is q i+1 K(x, A, i) (718) AK(x, A, i 1) = A(span(q 1,, q i 1 )) q i+1aq j = 0, j = 1,, i 1, i + 1 m (Q maq m ) ij = (T m ) ij = q i Aq j = 0 for i > j + 1 So T m is upper Hessenberg and then tridiagonal (since T m is Hermitian) It remains to show (716) Since [x, Ax,, A m 1 x] = Q m R m and AK[x, A, m] = K[x, A, m] Am xe T m, 0 1 0

3 71 The Lanczos Algorithm 263 we have AQ m R m = Q m R m + Q mq ma m xe T m + (I Q m Q m)a m xe T m Then 0 0 AQ m = 1 Q m [R m + Q ma m xe T m]rm 1 + (I Q m Q m)a m xe T mrm 1 = Q m [R m R 1 m + γq ma m xe T m] + γ(i Q m Q m)a m x e T m }{{} r m = Q m H m + r m e T m with Q mr m = 0, where H m is an upper Hessenberg matrix But Q maq m = H m is Hermitian, so H m = T m is tridiagonal (b) We check (717): x = Q m e 1 coincides the first column Suppose that i-th columns are equal, ie A i 1 x = Q m Tm i 1 e 1 A i x = AQ m Tm i 1 e 1 = (Q m T m + r m e T m)tm i 1 e 1 = Q m Tme i 1 + r m e T mtm i 1 e 1 But e T mtm i 1 e 1 = 0 for i < m Therefore, A i x = Q m T i e 1 the (i + 1)-th columns are equal It is clearly that (e 1, T m e 1,, Tm m 1 e 1 ) is an upper triangular matrix Theorem 711 If x = q 1 with q 1 2 = 1 satisfies rank(k[x, A, n]) = n (that is {x, Ax,, A n 1 x} are linearly independent), then there exists an unitary matrix Q with first column q 1 such that Q AQ = T is tridiagonal Proof: From Theorem 711(a) m = n, we have Q m = Q unitary and AQ = QT Uniqueness: Let Q AQ = T, Q A Q = T and Q 1 e 1 = Qe 1 K[q 1, A, n] = QR = Q = QD, R = D R Substitute Q by QD, where D = diag(ɛ 1,, ɛ n ) with ɛ i = 1 Then Q R (QD) A(QD) = D Q AQD = D T D = tridiagonal

4 264 Chapter 7 Lanczos Methods So Q is unique up to multiplying the columns of Q by a factor ɛ with ɛ = 1 In the following paragraph we will investigate the Lanczos algorithm for the real case, ie, A R n n How to find an orthogonal matrix Q = (q 1,, q n ) with Q T Q = I n such that Q T AQ = T = tridiagonal and Q is almost uniquely determined Let AQ = QT, (7110) where Q = [q 1,, q n ] and T = α 1 β 1 0 β 1 βn 1 It implies that the j-th column of (7110) forms: 0 β n 1 α n Aq j = β j 1 q j 1 + α j q j + β j q j+1, (7111) for j = 1,, n with β 0 = β n = 0 By multiplying (7111) by q T j we obtain Define r j = (A α j I)q j β j 1 q j 1 Then q T j Aq j = α j (7112) r j = β j q j+1 with β j = ± r j 2 (7113) and if β j 0 then q j+1 = r j /β j (7114) So we can determine the unknown α j, β j, q j in the following order: Given q 1, α 1, r 1, β 1, q 2, α 2, r 2 β 2, q 3, The above formula define the Lanczos iterations: j = 0, r 0 = q 1, β 0 = 1, q 0 = 0 Do while (β j 0) q j+1 = r j /β j, j := j + 1 α j = q T j Aq j, r j = (A α j I)q j β j 1 q j 1, β j = r j 2 (7115) There is no loss of generality in choosing the β j to be positive The q j are called Lanczos vectors With careful overwriting and use of the formula α j = q T j (Aq j β j 1 q j 1 ), the whole process can be implemented with only a pair of n-vectors

5 71 The Lanczos Algorithm 265 Algorithm 711 (Lanczos Algorithm) Given a symmetric A R n n and w R n having unit 2-norm The following algorithm computes a j j symmetric tridiagonal matrix T j with the property that σ(t j ) σ(a) The diagonal and subdiagonal elements of T j are stored in α 1,, α j and β 1,, β j 1 respectively v i := 0 (i = 1,, n) β 0 := 1 j := 0 Do while (β j 0) if (j 0), then for i = 1,, n, t := w i, w i := v i /β j, v i := β j t end for end if v := Aw + v, j := j + 1, α j := w T v, v := v α j w, β j := v 2 Remark 713 (a) If the sparity is exploited and only kn flops are involved in each call (Aw) (k n), then each Lanczos step requires about (4+k)n flops to execute (b) The iteration stops before complete tridiagonalizaton if q 1 is contained in a proper invariant subspace From the iteration (7115) we have α 1 β 1 β A(q 1,, q m ) = (q 1,, q m ) 1 r βm 1 {}} m { + (0,, 0, β m q m+1 ) }{{} r me T m This implies That is β m 1 α m β m = 0 if and only if r m = 0 A(q 1,, q m ) = (q 1,, q m )T m Range(q 1,, q m ) = Range(K[q 1, A, m]) is the invariant subspace of A and the eigenvalues of T m are the eigenvalues of A Theorem 712 Let A be symmetric and q 1 be a given vector with q 1 2 = 1 The Lanczos iterations (7115) runs until j = m where m = rank[q 1, Aq 1,, A n 1 q 1 ] Moreover, for j = 1,, m we have AQ j = Q j T j + r j e T j (7116) with α 1 β 1 β T j = 1 β j 1 and Q j = [q 1,, q j ] β j 1 α j

6 266 Chapter 7 Lanczos Methods has orthonormal columns satisfying Range(Q j ) = K(q 1, A, j) Proof: By induction on j Suppose the iteration has produced Q j = [q 1,, q j ] such that Range(Q j ) = K(q 1, A, j) and Q T j Q j = I j It is easy to see from (7115) that (7116) holds Thus Q T j AQ j = T j + Q T j r j e T j Since α i = q T i Aq i for i = 1,, j and q T i+1aq i = q T i+1(β i q i+1 + α i q i + β i 1 q i 1 ) = q T i+1(β i q i+1 ) = β i for i = 1,, j 1 we have Q T j AQ j = T j Consequently Q T j r j = 0 If r j 0 then q j+1 = r j / r j 2 is orthogonal to q 1,, q j and q j+1 Span{Aq j, q j, q j 1 } K(q 1, A, j + 1) Thus Q T j+1q j+1 = I j+1 and Range(Q j+1 ) = K(q 1, A, j + 1) On the other hand, if r j = 0, then AQ j = Q j T j This says that Range(Q j ) = K(q 1, A, j) is invariant From this we conclude that j = m = dim[k(q 1, A, n)] Encountering a zero β j in the Lanczos iteration is a welcome event in that it signals the computation of an exact invariant subspace However an exactly zero or even small β j is rarely in practice Consequently, other explanations for the convergence of T js eigenvalues must be sought Theorem 713 Suppose that j steps of the Lanczos algorithm have been performed and that S T j T j S j = diag(θ 1,, θ j ) is the Schur decomposition of the tridiagonal matrix T j, if Y j R n j is defined by then for i = 1,, j we have where S j = (s pq ) Y j = [y 1,, y j ] = Q j S j Ay i θ i y i 2 = β j s ji Proof: Post-multiplying (7116) by S j gives ie, AY j = Y j diag(θ 1,, θ j ) + r j e T j S j, Ay i = θ i y i + r j (e T j S j e i ), i = 1,, j The proof is complete by taking norms and recalling r j 2 = β j Remark 714 The theorem provides error bounds for T js eigenvalues: min θ i µ β j s ji i = 1,, j µ σ(a)

7 71 The Lanczos Algorithm 267 Note that in section 10 the (θ i, y i ) are Ritz pairs for the subspace R(Q j ) If we use the Lanczos method to compute AQ j = Q j T j + r j e T j and set E = τww T where τ = ±1 and w = aq j + br j, then it can be shown that (A + E)Q j = Q j (T j + τa 2 e j e T j ) + (1 + τab)r j e T j If 0 = 1 + τab, then the eigenvalues of the tridiagonal matrix T j = T j + τa 2 e j e T j are also eigenvalues of A+E We may then conclude from theorem 612 that the interval [λ i (T j ), λ i 1 (T j )] where i = 2,, j, each contains an eigenvalue of A + E Suppose we have an approximate eigenvalue λ of A One possibility is to choose τa 2 so that det( T j λi j ) = (α j + τa 2 λ)p j 1 ( λ) β 2 j 1p j 2 ( λ) = 0, where the polynomial p i (x) = det(t i xi i ) can be evaluated at λ using (53) The following theorems are known as the Kaniel-Paige theory for the estimation of eigenvalues which obtained via the Lanczos algorithm Theorem 714 Let A be n n symmetric matrix with eigenvalues λ 1 λ n and corresponding orthonormal eigenvectors z 1,, z n If θ 1 θ j are the eigenvalues of T j obtained after j steps of the Lanczos iteration, then λ 1 θ 1 λ 1 (λ 1 λ n ) tan (φ 1 ) 2 [c j 1 (1 + 2ρ 1 )] 2, where cos φ 1 = q T 1 z 1, ρ 1 = (λ 1 λ 2 )/(λ 2 λ n ) and c j 1 is the Chebychev polynomial of degree j 1 Proof: From Courant-Fischer theorem we have θ 1 = max y 0 y T T j y y T y = max y 0 (Q j y) T A(Q j y) (Q j y) T (Q j y) = max w T Aw 0 w K(q 1,A,j) w T w Since λ 1 is the maximum of w T Aw/w T w over all nonzero w, it follows that λ 1 θ 1 To obtain the lower bound for θ 1, note that θ 1 = max p P j 1 q T 1 p(a)ap(a)q 1 q T 1 p(a) 2 q 1, where P j 1 is the set of all j 1 degree polynomials If q 1 = n d i z i then q T 1 p(a)ap(a)q 1 q T 1 p(a) 2 q 1 = n d2 i p(λ i ) 2 λ i n d2 i p(λ i) 2

8 268 Chapter 7 Lanczos Methods n i=2 λ 1 (λ 1 λ n ) d2 i p(λ i ) 2 d 2 1p(λ 1 ) 2 + n i=2 d2 i p(λ i) 2 We can make the lower bound tight by selecting a polynomial p(x) that is large at x = λ 1 in comparison to its value at the remaining eigenvalues Set p(x) = c j 1 [ x λ n λ 2 λ n ], where c j 1 (z) is the (j 1)-th Chebychev polynomial generated by c j (z) = 2zc j 1 (z) c j 2 (z), c 0 = 1, c 1 = z These polynomials are bounded by unity on [-1,1] It follows that p(λ i ) is bounded by unity for i = 2,, n while p(λ 1 ) = c j 1 (1 + 2ρ 1 ) Thus, θ 1 λ 1 (λ 1 λ n ) (1 d2 1) d c 2 j 1 (1 + 2ρ 1) The desired lower bound is obtained by noting that tan (φ 1 ) 2 = (1 d 2 1)/d 2 1 Corollary 715 Using the same notation as Theorem 714 λ n θ j λ n + (λ 1 λ n ) tan 2 (φ n ) c 2 j 1 (1 + 2ρ, n) where ρ n = (λ n 1 λ n )/(λ 1 λ n 1 ) and cos (φ n ) = q T 1 z n Proof: Apply Theorem 714 with A replaced by A Example 711 L j 1 R j 1 = ( λ 2 λ 1 ) 2(j 1) 1 [C j 1 (2 λ 1 λ 2 1)] 1 2 [C j 1 (1 + 2ρ 1 )] 2 power method λ 1 /λ 2 j=5 j= / / L j 1 /R j / / L j 1 /R j 1 Rounding errors greatly affect the behavior of algorithm 711, the Lanczos iteration The basic difficulty is caused by loss of orthogonality among the Lanczos vectors To avoid these difficulties we can reorthogonalize the Lanczos vectors

9 71 The Lanczos Algorithm Reorthogonalization Since let AQ j = Q j T j + r j e T j, AQ j Q j T j = r j e T j + F j (7117) I Q T j Q j = C T j + j + C j, (7118) where C j is strictly upper triangular and j is diagonal (For simplicity, suppose (C j ) i,i+1 = 0 and i = 0) Definition 711 θ i and y i Q j s i are called Ritz value and Ritz vector, respectively, if T j s i = θ i s i Let Θ j diag(θ 1,, θ j ) = S T j T j S j where S j = [ s 1 s j ] Theorem 716 (Paige Theorem) Assume that (a) S j and Θ j are exact! (Since j n) (b) local orthogonality is maintained ( ie q T i+1q i = 0, i = 1,, j 1, r T j q j = 0, and (C j ) i,i+1 = 0 ) Let Then F T j Q j Q T j F j = K j K T j, j T j T j j N j N T j, G j = S T j (K j + N j )S j (r ik ) (a) y T i q j+1 = r ii /β ji, where y i = Q j s i, β ji = β j s ji (b) For i k, (θ i θ k )y T i y k = r ii ( s jk s ji ) r kk ( s ji s jk ) (r ik r ki ) (7119) Proof: Multiplied (7117) from left by Q T j, we get which implies that Subtracted (7120) from (7121), we have Q T j AQ j Q T j Q j T j = Q T j r j e T j + Q T j F j, (7120) Q T j A T Q j T j Q T j Q j = e j r T j Q j + F T j Q j (7121) (Q T j γ j )e T j e j (Q T j γ j ) T = (C T j T j T j C T j ) + (C j T j T j C j ) + ( j T j T j j ) + F T j Q j Q j F T j = (C T j T j T j C T j ) + (C j T j T j C j ) + (N j N T j ) + (K j K T j )

10 270 Chapter 7 Lanczos Methods This implies that Thus, (Q T j r j )e T j = C j T j T j C j + N j + K j which implies that y T i q j+1 β ji = s T i (Q T j r j )e T j s i = s T i (C j T j T j C j )s i + s T i (N j + K j )s i = (s T i C j s i )θ i θ i (s T i C j s i ) + r ii, y T i q j+1 = r ii β ji Similarly, (7119) can be obtained by multiplying (7120) from left and right by s T i s i, respectively and Remark 715 Since y T i q j+1 = r ii β ji = { O(esp), if βji = O(1), (not converge!) O(1), if β ji = O(esp), (converge for (θ j, y j )) we have that q T j+1y i = O(1) when the algorithm converges, ie, q j+1 is not orthogonal to < Q j > where Q j s i = y i (i) Full Reorthogonalization by MGS: Orthogonalize q j+1 to all q 1,, q j by q j+1 := q j+1 j (qj+1q T i )q i If we incorporate the Householder computations into the Lanczos process, we can produce Lanczos vectors that are orthogonal to working accuracy: r 0 := q 1 (given unit vector) Determine P 0 = I 2v 0 v T 0 /v T 0 v 0 so that P 0 r 0 = e 1 ; α 1 := q T 1 Aq 1 ; Do j = 1,, n 1, r j := (A α j )q j β j 1 q j 1 (β 0 q 0 0), w := (P j 1 P 0 )r j, Determine P j = I 2v j v T j /v T j v j such that P j w = (w 1,, w j, β j, 0,, 0) T, q j+1 := (P 0 P j )e j+1, α j+1 := q T j+1aq j+1 This is the complete reorthogonalization Lanczos scheme

11 71 The Lanczos Algorithm 271 (ii) Selective Reorthogonalization by MGS If β ji = O( eps), (θ j, y j ) good Ritz pair Do q j+1 q 1,, q j Else not to do Reorthogonalization (iii) Restart after m-steps (Do full Reorthogonalization) (iv) Partial Reorthogonalization Do reorthogonalization with previous (eg k = 5) Lanczos vectors {q 1,, q k } For details see the books: Parlett: Symmetric Eigenvalue problem (1980) pp257 Golub & Van Loan: Matrix computation (1981) pp332 To (7119): The duplicate pairs can occur! i k, (θ i θ k ) yi T y }{{} k = O(esp) O(1), if y i = y k Q i Q k How to avoid the duplicate pairs? The answer is using the implicit Restart Lanczos algorithm: Let be a Lanczos decomposition AQ j = Q j T j + r j e T j In principle, we can keep expanding the Lanczos decomposition until the Ritz pairs have converged Unfortunately, it is limited by the amount of memory to storage of Q j Restarted the Lanczos process once j becomes so large that we cannot store Q j Implicitly restarting method Choose a new starting vector for the underlying Krylov sequence A natural choice would be a linear combination of Ritz vectors that we are interested in 712 Filter polynomials Assume A has a complete system of eigenpairs (λ i, x i ) and we are interested in the first k of these eigenpairs Expand u 1 in the form k n u 1 = γ i x i + γ i x i If p is any polynomial, we have p(a)u 1 = k γ i p(λ i )x i + i=k+1 n γ i p(λ i )x i i=k+1

12 272 Chapter 7 Lanczos Methods Choose p so that the values p(λ i ) (i = k+1,, n) are small compared to the values p(λ i ) (i = 1,, k) Then p(a)u 1 is rich in the components of the x i that we want and deficient in the ones that we do not want p is called a filter polynomial Suppose we have Ritz values µ 1,, µ m and µ k+1,, µ m are not interesting Then take p(t) = (t µ k+1 ) (t µ m ) 713 Implicitly restarted algorithm Let AQ m = Q m T m + β m q m+1 e T m (7122) be a Lanczos decomposition with order m Choose a filter polynomial p of degree m k and use the implicit restarting process to reduce the decomposition to a decomposition A Q k = Q k Tk + β k q k+1 e T k of order k with starting vector p(a)u 1 Let ν 1,, ν m be eigenvalues of T m and suppose that ν 1,, ν m k correspond to the part of the spectrum we are not interested in Then take The starting vector p(a)u 1 is equal to p(t) = (t ν 1 )(t ν 2 ) (t ν m k ) p(a)u 1 = (A ν m k I) (A ν 2 I)(A ν 1 I)u 1 = (A ν m k I) [ [(A ν 2 I) [(A ν 1 I)u 1 ]]] In the first, we construct a Lanczos decomposition with starting vector (A ν 1 I)u 1 From (7122), we have where (A ν 1 I)Q m = Q m (T m ν 1 I) + β m q m+1 e T m (7123) = Q m U 1 R 1 + β m q m+1 e T m, T m ν 1 I = U 1 R 1 is the QR factorization of T m κ 1 I Postmultiplying by U 1, we get It implies that where (Q (1) m (A ν 1 I)(Q m U 1 ) = (Q m U 1 )(R 1 U 1 ) + β m q m+1 (e T mu 1 ) AQ (1) m = Q (1) m T m (1) + β m q m+1 b (1)T m+1, Q (1) m = Q m U 1, T (1) m = R 1 U 1 + ν 1 I, b (1)T m+1 = e T mu 1 : one step of single shifted QR algorithm)

13 71 The Lanczos Algorithm 273 Remark 716 Q (1) m is orthonormal By the definition of T (1) m, we get U 1 T (1) m U T 1 = U 1 (R 1 U 1 + ν 1 I)U T 1 = U 1 R 1 + ν 1 I = T m (7124) Therefore, ν 1, ν 2,, ν m are also eigenvalues of T (1) m Since T m is tridiagonal and U 1 is the Q-factor of the QR factorization of T m ν 1 I, it implies that U 1 and T m (1) are upper Hessenberg From (7124), T m (1) is symmetric Therefore, T m (1) is also tridiagonal The vector b (1)T m+1 = e T mu 1 has the form [ b (1)T m+1 = 0 0 U (1) m 1,m U m,m (1) ie, only the last two components of b (1) m+1 are nonzero For on postmultiplying (7123) by e 1, we get (A ν 1 I)q 1 = (A ν 1 I)(Q m e 1 ) = Q (1) m R 1 e 1 = r (1) 11 q (1) 1 ] ; Since T m is unreduced, r (1) 11 is nonzero Therefore, the first column of Q (1) m multiple of (A κ 1 I)q 1 is a Repeating this process with ν 2,, ν m k, the result will be a Krylov decomposition AQ (m k) m with the following properties = Q (m k) m T m (m k) + β m q m+1 b (m k)t m+1 i Q (m k) m ii T (m k) m is orthonormal is tridiagonal iii The first k 1 components of b (m k)t m+1 are zero iv The first column of Q (m k) m is a multiple of (A ν 1 I) (A ν m k I)q 1 Corollary 711 Let ν 1,, ν m be eigenvalues of T m If the implicitly restarted QR step is performed with shifts ν 1,, ν m k, then the matrix T m (m k) has the form [ ] T (m k) m = T (m k) kk T (m k) k,m k 0 T (m k) k+1,k+1 where T (m k) k+1,k+1 is an upper triangular matrix with Ritz value ν 1,, ν m k on its diagonal,

14 274 Chapter 7 Lanczos Methods Therefore, the first k columns of the decomposition can be written in the form where Q (m k) k AQ (m k) k = Q (m k) k T (m k) kk + t k+1,k q (m k) k+1 e T k + β k u mk q m+1 e T k, consists of the first k columns of Q (m k) m submatrix of order k of T m (m k) we set then, T (m k) kk is the leading principal, and u km is from the matrix U = U 1 U m k Hence if Q k = Q (m k) k, T k = T (m k) kk, β k = t k+1,k q (m k) k+1 + β k u mk q m+1 2, 1 q k+1 = β k (t k+1,kq (m k) k+1 + β k u mk q m+1 ), A Q k = Q k Tk + β k q k+1 e T k is a Lanczos decomposition whose starting vector is proportional to (A ν 1 I) (A ν m k I)q 1 Avoid any matrix-vector multiplications in forming the new starting vector Get its Lanczos decomposition of order k for free For large n the major cost will be in computing QU 72 Approximation from a subspace Assume that A is symmetric and {(α i, z i )} n be eigenpairs of A with α 1 α 2 α n Define ρ(x) = ρ(x, A) = xt Ax x T x Algorithm 721 (Rayleigh-Ritz-Quotient procedure) Give a subspace S (m) = span{q} with Q T Q = I m ; Set H := ρ(q) = Q T AQ; Compute the p ( m) eigenpairs of H, which are of interest, say Hg i = θ i g i for i = 1,, p; Compute Ritz vectors y i = Qg i, for i = 1,, p; Check Ay i θ i y i 2 T ol, for i = 1,, p By the minimax characterization of eigenvalues, we have α j = λ j (A) = min max ρ(f, A) F j R n f F j

15 72 Approximation from a subspace 275 Define β j = min max ρ(g, A), for j m G j S m g Gj Since G j S m and S (m) = span{q}, it implies that G j = Q G j for some G j R m Therefore, β j = min G j R m max s G j ρ(s, H) = λ j (H) θ j, for j = 1,, m For any m by m matrix B, there is associated a residual matrix R(B) AQ QB Theorem 721 For given orthonormal n by m matrix Q, for all m by m matrix B Proof: Since R(H) R(B) R(B) R(B) = Q A 2 Q B (Q AQ) (Q AQ)B + B B = Q A 2 Q H 2 + (H B) (H B) = R(H) R(H) + (H B) (H B) and (H B) (H B) is positive semidefinite, it implies that R(B) 2 R(H) 2 Since we have that which implies that Hg i = θ i g i, for i = 1,, m, Q T AQg i = θ i g i, QQ T A(Qg i ) = θ i (Qg i ) Let y i = Qg i Then QQ T y i = Q(Q T Q)g i = y i Take P Q = QQ T which is a projection on span{q} Then which implies that ie, r i = Ay i θ i y i S m = span{q} (QQ T )Ay i = θ i (QQ T )y i, P Q (Ay i θ i y i ) = 0,

16 276 Chapter 7 Lanczos Methods Theorem 721 Let θ j for j = 1,, m be eigenvalues of H = Q T AQ Then there is α j σ(a) such that Furthermore, θ j α j R 2 = AQ QH 2, for j = 1,, m m (θ j α j ) 2 2 R 2 F j=1 Proof: See the detail in Chapter 11 of The symmetric eigenvalue problem, Parlett(1981) Theorem 722 Let y be a unit vector θ = ρ(y), α be an eigenvalue of A closest to θ and z be its normalized eigenvector Let r = min λi α λ i (A) θ and ψ = (y, z) Then where r(y) = Ay θy θ α r(y) 2 /r, sin ψ r(y) /r, Proof: Decompose y = z cos ψ + w sin ψ with z T w = 0 and w 2 = 1 Hence r(y) = z(α θ) cos ψ + (A θ)w sin ψ Since Az = αz and z T w = 0, we have z T (A θ)w = 0 and so r(y) 2 2 = (α θ) 2 cos 2 ψ + (A θ)w 2 2 sin 2 ψ (A θ)w 2 2 sin 2 ψ (7225) Let w = α i α ξ iz i Then (A θ)w 2 2 = w T (A θ)(a θ)w = α i α (α i θ) 2 ξ 2 i r 2 ( α i α ξ 2 i ) = r 2 Therefore, Since r(y) y, we have sin ψ r(y) 2 r which implies that From above equation, we get 0 = y T r(y) = (α θ) cos 2 ψ + w T (A θ)w sin 2 ψ, sin 2 ψ = 1 k + 1 = k cos2 ψ sin 2 ψ = wt (A θ)w θ α α θ w T (A α)w, cos2 ψ = k k + 1 = wt (A θ)w w T (A α)w (7226)

17 72 Approximation from a subspace 277 Substituting (7226) into (7225), r(y) 2 2 can be rewritten as r(y) 2 2 = (θ α)w T (A α)(a θ)w/w T (A α)w (7227) By assumption there are no eigenvalues of A separating α and θ Thus (A αi)(a θi) is positive definite and so w T (A α)(a θ)w = α i α α i α α i θ ξ 2 i r α i α α i α ξ 2 i r α i α(α i α)ξ 2 i = r w T (A α)w (7228) Substituting (7228) into (7227), the theorem s first inequality appears 100 years old and still alive : Eigenvalue problems Hank / G Gloub / Van der Vorst / A priori bounds for interior Ritz approximations Given subspace S m = span{q}, let {(θ i, y i )} m be Ritz pairs of H = Q T AQ and Az i = α i z i, i = 1,, n Lemma 723 For each j m for any unit s S m satisfying s T z i = 0, i = 1,, j 1 Then where ψ i = (y i, z i ) j 1 α j θ j ρ(s) + (α 1 θ i ) sin 2 ψ i (7229) j 1 ρ(s) + (α 1 α i ) sin 2 ψ i, Proof: Take j 1 s = t + r i y i, where t y i for i = 1,, j 1 and s 2 = 1 Assumption s T z i = 0 for i = 1,, j 1 and lead to y i z i cos ψ i 2 2 = (y i z i cos ψ i ) T (y i z i cos ψ i ) = 1 cos 2 ψ i cos 2 ψ i + cos 2 ψ i = 1 cos 2 ψ i = sin 2 ψ i r i = s T y i = s T (y i z i cos ψ) s 2 sin ψ

18 278 Chapter 7 Lanczos Methods Let (θ i, g i ) for i = 1,, m be eigenpairs of symmetric H with g T i g k = 0 for i k and y i = Qg i Then Combining (7230) with t T Ay i = 0, we get Thus 0 = g T i (Q T AQ)g k = y T i Ay k for i k (7230) j 1 ρ(s) = t T At + (yi T Ay i )ri 2 j 1 ρ(s) α 1 = t T (A α 1 )t + (θ i α 1 )ri 2 tt (A α 1 )t t T t j 1 + (θ i α 1) ri 2 j 1 ρ(t) α 1 + (θ i α 1 ) sin 2 ϕ i Note that ρ(t) θ j = min{ρ(u); u S (m), u y i, i < j} Therefore, the second inequality in (7229) appears Let ϕ ij = (z i, y j ) for i = 1,, n and j = 1,, m Then ϕ ii = ϕ i and n i=j+1 Proof: Since y T j y i = 0 for i j and we have From (7231), which implies that y j = n z i cos ϕ ij (7231) cos ϕ ij sin ϕ i (7232) j 1 cos 2 ϕ ij = sin 2 ϕ j cos 2 ϕ ij (7233) (y i cos ϕ i z i ) T (y i cos ϕ i z i ) = sin 2 ϕ i, cos ϕ ij = y T j z i = y T j (y i cos ϕ i z i ) y j 2 y i cos ϕ i z i 2 sin ϕ i 1 = (y j, y j ) = n cos 2 ϕ ij j 1 sin 2 ϕ j = 1 cos 2 ϕ jj = cos 2 ϕ ij + n cos 2 ϕ ij (7234) i=j+1

19 73 Krylov subspace 279 Lemma 724 For each j = 1,, m, Proof: By (7231), sin ϕ j [(θ j α j )+ j 1 (α j+1 α i ) sin 2 ϕ i ]/(α j+1 α j ) (7235) ρ(y j, A α j I) = θ j α j = n (α i α j ) cos 2 ϕ ij It implies that = j 1 θ j α j + (α j α i ) cos 2 ϕ ij n (α i α j ) cos 2 ϕ ij i=j+1 (α j+1 α j ) n i=j+1 cos 2 ϕ ij j 1 = (α j+1 α j )(sin 2 ϕ j cos 2 ϕ ij ) (from (7235)) Solve sin 2 ϕ j and use (9310) to obtain inequality (7235) Explanation: By Lemmas 723 and 724, we have j = 1 : θ 1 ρ(s), s T z 1 = 0 (Lemma 723) j = 1 : sin 2 ϕ 1 θ 1 α 1 α 2 α 1 ρ(s) α 1 α 2 α 1, s T z 1 = 0 (Lemma 724) j = 2 : θ 2 ρ(s) + (α 1 α 1 ) sin 2 ϕ 1 ρ(s) + (α 1 α 1 ) ρ(ξ) α 1 α 2 α 1, s T z 1 = s T z 2 = 0, ξ T z 1 = 0 (Lemma 723) j = 2 : sin 2 ϕ 2 (Lemma724) (θ 2 α 2 ) + (α 3 α 1 ) sin 2 ϕ 1 α 3 α 2 j=1,j=2 [ρ(s) + (α 1 α 1 )( ρ(t) α 1 α 2 α 1 ) α 2 ] + α 3 α 1 α 3 α 2 ( ρ(t) α 1 α 2 α 1 ) 73 Krylov subspace Definition 731 Given a nonzero vector f, K m (f) = [f, Af,, A m 1 f] is called Krylov matrix and S m = K m (f) = f, Af,, A m 1 f is called Krylov subspace which are created by Lanczos if A is symmetric or Arnoldi if A is unsymmetric

20 280 Chapter 7 Lanczos Methods Lemma 731 Let {(θ i, y i )} m be Ritz pairs of K m (f) If ω is a polynomial with degree m 1 (ie, ω P m 1 ), then ω(a)f y k if and only if ω(θ k ) = 0, k = 1,, m Proof: Let where π(ξ) P m 2 Thus and exercise! Define Corollary 732 µ(ξ) ω(ξ) = (ξ θ k )π(ξ), π(a)f K m (f) y T k ω(a)f = y T k (A θ k )π(a)f = r T k π(a)f = 0 ( r k Q = K m (f)) m (ξ θ i ) and π k (ξ) µ(ξ) (ξ θ k ) y k = π k(a)f π k (A)f Proof: Since π k (θ i ) = 0 for θ i θ k, from Lemma 731, π k (A)f y i, i k Thus, π k (A)f // y k and then y k = π k(a)f π k (A)f Lemma 733 Let h be the normalized projection of f orthogonal to Z j, Z j span(z 1,, z j ) For each π P m 1 and each j m, [ ] sin (f, Z j 2 ) π(a)h ρ(π(a)f, A α j I) (α n α j ) (7336) cos (f, Z j ) π(α j ) Proof: Let ψ = (f, Z j ) = cos 1 f Z j and let g be the normalized projection of f onto Z j so that Since Z j is invariant under A, f = g cos ψ + h sin ψ s π(a)f = π(a)g cos ψ + π(a)h sin ψ, where π(a)g Z j and π(a)h (Z j ) A little calculation yields ρ(s, A α j I) = g (A α j I)π 2 (A)g cos 2 ψ + h (A α j I)π 2 (A)h sin 2 ψ π(a)f 2 (7337) The eigenvalues of A are labeled so that α 1 α 2 α n and

21 73 Krylov subspace 281 (a) v (A α j I)v 0 for all v Z j, in particular, v = π(a)g; (b) w (A α j I)w (α n α j ) w 2 for all w (Z j ), in particular, w = π(a)h Used (a) and (b) to simplify (7337), it becomes [ ] 2 π(a)h sin ψ ρ(s, A α j I) (α n α j ) π(a)f The proof is completed by using s 2 = π(a)f 2 = n π 2 (α i ) cos 2 (f, z i ) π 2 (α j ) cos 2 (f, z j ) 731 The Error Bound of Kaniel and Saad The error bounds come from choosing π P m 1 in Lemma 733 such that (i) π(α j ) is large, while π(a)h is small as possible, and (ii) ρ(s, A α j I) 0 where s = π(a)f To (i): Note that π(a)h 2 = n i=j+1 π2 (α i ) cos 2 (f, z j ) n i=j+1 cos2 (f, z j ) max i>j π2 (α i ) max π 2 (τ) τ [α j+1,α n] Chebychev polynomial solves min π P n j max τ [αj+1,α n ] π 2 (τ) To (ii): The following facts are known: (a) 0 θ j α j, (Cauchy interlace Theorem) (b) θ j α j ρ(s, A α j I), if s y i, for all i < j, (By minimax Theorem) (c) θ j α j ρ(s, A α j I)+ j 1 (α n α i ) sin 2 (y i, z i ), if s z i, for all i < j (Lemma 723) Theorem 734 (Saad) Let θ 1 θ m be the Ritz values from K m (f) (by Lanczos or Arnoldi) and let (α i, z i ) be the eigenpairs of A For j = 1,, m, and 0 θ j α j (α n α j ) where r = (α j α j+1 )/(α j+1 α n ) [ sin (f, Z j ) j 1 k=1 ( θ k α n θ k α j ) cos (f, Z j )T m j (1 + 2r) tan (z j, K m ) sin (f, Zj ) j 1 k=1 ( α k α n α k α j ) cos (f, Z j )T m j (1 + 2r), ] 2

22 282 Chapter 7 Lanczos Methods Proof: Apply Lemmas 733 and 731 To ensure (b), it requires s y i for i = 1,, j 1 By Lemma 731, we construct By Lemma 733 for this π(ξ) : π(ξ) = (ξ θ 1 ) (ξ θ j 1 ) π(ξ), π P m j π(a)h π(α j ) = (A θ 1) (A θ j 1 ) π(a)h (α j θ 1 ) (α j θ j 1 ) π(α j ) j 1 α n α k π(τ) α n θ k max τ [α j+1,α j ] π(α j ) k=1 j 1 α n α k π(τ) α j α k min max π Pm j j π(α j ) k=1 j 1 α n α k 1 α j α k T m j (1 + 2r) (7338) k=1 since h Z j On combining (b), Lemma 733 and (7338), the first of the results is obtained To prove the second inequality: π is chosen to satisfy π(α i ) = 0 for i = 1,, j 1 so that Therefore, s = π(a)f = z j π(α j ) cos (f, z j ) + π(a)h sin ψ tan (s, z j ) = sin (f, Zj ) π(a)h, cos (f, z j ) π(α j ) where π(ξ) = (ξ α 1 ) (ξ α j 1 ) π(ξ) with π(ξ) P m j The proof is completed by choosing π by Chebychev polynomial as above Theorem 735 Let θ m θ 1 be Royleigh-Ritz values of K m (f) and Az j = α j z j for j = n,, 1 with α n α 1, then and 0 α j θ j (α j α 1 ) [ sin (f, Z j ) 1 k= j+1 ( α n θ k cos (f, z j )T m j (1 + 2r) [ tan(z j, K m ) sin (f, 1 Z j ) cos (f, z j ) T m j (1 + 2r) where r = (α j 1 α j )/(α n α j 1 ) k= j+1 ( α k α n α k α j ) ]2 α k θ j ) ] 2,,

23 74 Applications to linear Systems and Least Squares 283 Theorem 736 (Kaniel) The Rayleigh-Ritz (θ j, y j ) from K m (f) to (α j, z j ) satisfy and 0 θ j α j (α n α j ) [ sin (f, Z j ) j 1 k=1 ( α k α n ] 2 α k α j ) cos (f, z j )T m j (1 + 2r) j 1 + (α n α k ) sin 2 (y k, z k ) k=1 sin 2 (y j, z j ) (θ j α j ) + j 1 k=1 (α j+1 α k ) sin 2 (y k, z k ) α j+1 α j, where r = (α j α j+1 )/(α j+1 α n ) 74 Applications to linear Systems and Least Squares 741 Symmetric Positive Definite System Recall: Let A be symmetric positive definite and Ax = b Then x minimizes the functional φ(x) = 1 2 xt Ax b T x (741) An approximate minimizer of φ can be regarded as an approximate solution to Ax = b One way to produce a sequence {x j } that converges to x is to generate a sequence of orthonormal vectors {q j } and to let x j minimize φ over span{q 1,, q j }, where j = 1,, n Let Q j = [q 1,, q j ] Since x span{q 1,, q j } φ(x) = 1 2 yt (Q T j AQ j )y y T (Q T j b) for some y R j, it follows that where x j = Q j y j, (742) (Q T j AQ j )y j = Q T j b (743) Note that Ax n = b We now consider how this approach to solving Ax = b can be made effective when A is large and sparse There are two hurdles to overcome: (i) the linear system (743) must be easily solved; (ii) we must be able to compute x j without having to refer to q 1,, q j explicitly as (742) suggests To (i): we use Lanczos algorithm algorithm 711 to generate the q i After j steps we obtain AQ j = Q j T j + r j e T j, (744)

24 284 Chapter 7 Lanczos Methods where α 1 β 1 0 T j = Q T β j AQ j = 1 α 2 βj 1 and T jy j = Q T j b (745) 0 β j 1 α j With this approach, (743) becomes a symmetric positive definite tridiagonal system which may be solved by LDL T Cholesky decomposition, ie, where L j = µ µ j 1 Compared the entries of (746), we get Note that we need only calculate T j = L j D j L T j, (746) 1 0 d 1 0 and D j = 0 0 d j d 1 = α 1, for i = 2,, j, µ i = β i 1 /d i 1, d i = α i β i 1 µ i (747) µ j = β j 1 /d j 1 d j = α j β j 1 µ j (748) in order to obtain L j and D j from L j 1 and D j 1 To (ii): Trick: we define C j = [c 1,, c j ] R n j and p j R j by the equations and observe that It follows from (749) that and therefore C j L T j = Q j, L j D j p j = Q T j b x j = Q j T 1 j Q T j b = Q j (L j D j L T j ) 1 Q T j b = C j p j [c 1, µ 2 c 1 + c 2,, µ j c j 1 + c j ] = [q 1,, q j ], C j = [C j 1, c j ], c j = q j µ j c j 1 If we set p j = [ρ 1,, ρ j ] T in L j D j p j = Q T j b, then that equation becomes ρ 1 q1 T b [ ρ 2 q2 T b L j 1 D j µ j d j 1 d j ] = ρ j 1 ρ j q T j 1b q T j b (749)

25 74 Applications to linear Systems and Least Squares 285 Since L j 1 D j 1 p j 1 = Q T j 1b, it follows that p j = [ pj 1 ρ j ], ρ j = (q T j b µ j d j 1 ρ j 1 )/d j and thus x j = C j p j = C j 1 p j 1 + ρ j c j = x j 1 + ρ j c j This is precisely the kind of recursive formula for x j that we need Together with (748) and (749) it enables us to make the transition from (q j 1, c j 1, x j 1 ) to (q j, c j, x j ) with a minimal amount of work and storage A further simplification results if we set q 1 = b/β 0 where β 0 = b 2 For this choice of a Lanczos starting vector we see that q T i b = 0 for i = 2, 3, It follows from (744) that Ax j = AQ j y j = Q j T j y j + r j e T j y j = Q j Q T j b + r j e T j y j = b + r j e T j y j Thus, if β j = r j 2 = 0 in the Lanczos iteration, then Ax j = b Moreover, since Ax j b 2 = β j e T j y j, the iteration provides estimates of the current residual Algorithm 741 Given b R n and a symmetric positive definite A R n n The following algorithm computes x R n such that Ax = b β 0 = b 2, q 1 = b/β 0, α 1 = q T 1 Aq 1, d 1 = α 1, c 1 = q 1, x 1 = b/α 1 For j = 1,, n 1, r j = (A α j )q j β j 1 q j 1 (β 0 q 0 0), β j = r j 2, If β j = 0 then else Set x = x j and stop; q j+1 = r j /β j, α j+1 = q T j+1aq j+1, µ j+1 = β j /d j, d j+1 = α j+1 µ j+1 β j, ρ j+1 = µ j+1 d j ρ j /d j+1, c j+1 = q j+1 µ j+1 c j, x j+1 = x j + ρ j+1 c j+1, end if end for x = x n This algorithm requires one matrix-vector multiplication and 5n flops per iteration

26 286 Chapter 7 Lanczos Methods 742 Symmetric Indefinite Systems A key feature in the above development is the idea of computing LDL T Cholesky decomposition of tridiagonal T j Unfortunately, this is potentially unstable if A, and consequently T j, is not positive definite Paige and Saunders (1975) had developed the recursion for x j by an LQ decomposition of T j At the j-th step of the iteration we will Given rotations J 1,, J j 1 such that d 1 0 e 2 d 2 T j J 1 J j 1 = L j = f 3 e 3 d 3 0 f j e j d j Note that with this factorization, x j is given by x j = Q j y j = Q j T 1 j Q T j b = W j s j, where W j R n j and s j R j are defined by W j = Q j J 1 J j 1 and L j s j = Q T j b Scrutiny of these equations enables one to develop a formula for computing x j from x j 1 and an easily computed multiple of w j, the last column of W j 743 Connection of Algorithm 741 and CG method Let x L j : Iterative vector generated by Algorithm 741 x CG i : Iterative vector generated by CG method with, x CG 0 = 0 Since r CG 0 = b Ax 0 = b = p CG 0, then Claim: x CG i = x L i for i = 1, 2,, (a) CG method (A variant version): x CG 1 = α CG 0 p 0 = bt b b T Ab b = xl 1 x 0 = 0, r 0 = b, For k = 1,, n, if r k 1 = 0 then set x = x k 1 and quit else β k = r T k 1 r k 1/r T k 2 r k 2 (β 1 0), p k = r k 1 + β k p k 1 (p 1 r 0 ), α k = r T k 1 r k 1/p T k Ap k, x k = x k 1 + α k p k, r k = r k 1 α k Ap k, end if end for x = x n (7410)

27 74 Applications to linear Systems and Least Squares 287 Define R k = [r 0,, r k 1 ] R n k and 1 β B k = β k 0 1 From p j = r j 1 + β j p j 1 (j = 2,, k) and p 1 = r 0, it follows R k = P k B k Since the columns of P k = [p 1,, p k ] are A-conjugate, we see that R T k AR k = B T k diag(p T 1 Ap 1,, p T k Ap k )B k is tridiagonal Since span{p 1,, p k }=span{r 0,, r k 1 }=span{b, Ab,, A k 1 b} and r 0,, r k 1 are mutually orthogonal, it follows that if k = diag(β 0,, β k 1 ), β i = r i 2, then the columns of R k 1 k form an orthonormal basis for span{b, Ab,, A k 1 b} Consequently the columns of this matrix are essentially the Lanczos vectors of algorithm 741, ie, qi L = ±ri 1/β CG i 1 (i = 1,, k) Moreover, T k = 1 k BT k diag(p T i Ap i )B k 1 k The diagonal and subdiagonal of this matrix involve quantities that are readily available during the conjugate gradient iteration Thus, we can obtain good estimate of A s extremal eigenvalues (and condition number) as we generate the x k in (7411) p CG i = c L i constant Show that c L i are A-orthogonal Since C j L T j = Q j C j = Q j L T j, it implies that Cj T AC j = L 1 j Q T j AQ j L T j = L 1 j L j D j L T j L T = L 1 j T j L T j j = D j So {c i } j are A-orthogonal (b) It is well known that x CG j minimizes the functional φ(x) = 1 2 xt Ax b T x in the subspace span{r 0, Ar 0,, A j 1 r 0 } and x L j minimize φ(x) = 1 2 xt Ax b T x in the subspace span{q 1,, q j } We also know that K[q 1, A, j] = Q j R j which implies K(q 1, A, j) =span {q 1,, q j } But q 1 = b/ b 2, r 0 = b, so span {r 0, Ar 0,, A j 1 r 0 } = K(q 1, A, j) =span {q 1,, q j } therefore we have x CG j = x L j

28 288 Chapter 7 Lanczos Methods 744 Bidiagonalization and the SVD Suppose U T AV = B the bidiagonalization of A R m n and that U = [u 1,, u m ], U T U = I m, V = [v 1,, v n ], V T V = I n, (7411) and B = α 1 β 1 0 β n 1 0 α n 0 0 (7412) Recall that this decomposition serves as a front end for the SV D algorithm Unfortunately, if A is large and sparse, then we can expect large, dense submatrices to arise during the Householder transformation for the bidiagonalization It would be nice to develop a method for computing B directly without any orthogonal update of the matrix A We compare columns in the equations AV = UB and A T U = V B T : Av j = α j u j + β j 1 u j 1, β 0 u 0 0, A T u j = α j v j + β j v j+1, β n v n+1 0, for j = 1,, n Define We may conclude that r j = Av j β j 1 u j 1 and p j = A T u j α j v j α j = ± r j 2, u j = r j /α j, v j+1 = p j /β j, β j = ± p j 2 These equations define the Lanczos method for bidiagonaling a rectangular matrix (by Paige (1974)): Given v 1 R n with unit 2-norm r 1 = Av 1, α 1 = r 1 2 For j = 1,, n, If α j = 0 then stop; else u j = r j /α j, p j = A T u j α j v j, β j = p j 2, If β j = 0 then stop; else v j+1 = p j /β j, r j+1 = Av j+1 β j u j, α j+1 = r j+1 2 end if end if end for (7413)

29 74 Applications to linear Systems and Least Squares 289 It is essentially equivalent [ to applying ] the Lanczos tridiagonalization scheme to the symmetric matrix C = 0 A A T We know that 0 λ i (C) = σ i (A) = λ n+m i+1 (C) for i = 1,, n Because of this, the large singular values of the bidiagonal matrix α 1 β 1 0 B j = tend to be very good approximations to the large singular βj 1 0 α j values of A 745 Least square problems As detailed in chapter III the full-rank LS problem min Ax b 2 can be solved by the bidiagonalization (7411)-(7412) In particular, n x LS = V y LS = a i v i, where y = (a 1,, a n ) T solves the bidiagonal system By = (u T 1 b,, u T nb) T Disadvantage: Note that because B is upper bidiagonal, we cannot solve for y until the bidiagonalization is complete We are required to save the vectors v 1,, v n an unhappy circumstance if n is very large Modification: It can be accomplished more favorably if A is reduced to lower bidiagonal form: α 1 0 β 1 α 2 U T AV = B =, m n + 1, α n 0 β n 0 0 where V = [v 1,, v n ] and U = [u 1,, u m ] It is straightforward to develop a Lanczos procedure which is very similar to (7413) Let V j = [v 1,, v j ], U j = [u 1,, u j ] and α 1 0 β 1 α 2 B j = R (j+1) j αj 0 β j and consider minimizing Ax b 2 over all vectors of the form x = V j y, y R j Since m AV j y b 2 = U T AV j y U T b 2 = B j y Uj+1b T 2 + (u T i b) 2, i=j+2

30 290 Chapter 7 Lanczos Methods it follows that x j = V j y j is the minimizer of the LS problem over span{v j }, where y j minimizes the (j + 1) j LS problem min B j y U T j+1b 2 Since B j is lower bidiagonal, it is easy to compute Jacobi rotations J 1,, J j such that J j J 1 B j = is upper bidiagonal Let J j J 1 U T j+1b = So y j = R 1 j [ dj u [ Rj 0 ] ], then [ B j y Uj+1b T 2 = J j J 1 y J j J 1 Uj+1b T Rj 2 = 0 d j, x j = V j y j = V j R 1 d j = W j d j Let j W j = (W j 1, w j ), w j = (v j w j 1 r j 1,j )/r jj ] [ dj y u ] 2 where [ r j 1,j ] and r jj are elements of R j R j can be computed from R j 1 Similarly, dj 1 d j =, x j can be obtained from x j 1 : Thus δ j [ dj 1 x j = W j d j = (W j 1, w j ) δ j For details see Paige-Saunders (1978) x j = x j 1 + w j δ j ] = W j 1 d j 1 + w j δ j 746 Error Estimation of least square problems Continuity of A + of the function: R m n R m n defined by A A + Lemma 742 If {A i } converges to A and rank(a i ) = rank(a) = n, then {A + i } also converges to A + Proof: Since lim i A T i A i = A T A nonsingular, so A + i = (A T i A i ) 1 A T i i (A T A) 1 A T = A + Example 741 Let A ε = ε 0 0 ε 0, rank(a 0 ) < 2 But A + ε = with ε > 0 and A 0 = [ /ε 0 ] A + 0 = [ , then A ε A 0 as ] as ε 0

31 74 Applications to linear Systems and Least Squares 291 Theorem 743 Let A, B R m n, then holds Without proof A + B + F 2 A B F max{ A + 2 2, B + 2 2} Remark 741 It does not follow that A B implies A + B + Because A + can diverges to, see example Theorem 744 If rank(a) = rank(b) then A + B + F µ A + 2 B + 2 A B F, where µ = { 2, if rank(a) < min(m, n) 1, if rank(a) = min(m, n) Pseudo-Inverse of A: A + is the unique solution of equations A + AA + = A +, (AA + ) = AA +, AA + A = A, (A + A) = A + A P A = AA + is Hermitian P A is idempotent, and R(P A ) = R(A) P A is the orthogonal projection onto R(A) Similarly, R(A) = A + A is the projection onto R(A ) Furthermore, ρ 2 LS = b AA + b 2 2 = (I AA + )b 2 2 Lemma 741 (Banach Lemma) B 1 A 1 A B A 1 B 1 Proof: From ((A + δa) 1 A 1 )(A + δa) = I I A 1 δa, follows lemma immediately Theorem 745 (i) The product P B P A can be written in the form P B PA = (B + ) R B E PA, where PA = I P A, B = A + E Thus P B PA B+ 2 E (ii) If rank(a) = rank(b), then P B PA min{ B+ 2, A + 2 } E Proof: P B P A = P BP A = (B + ) B P A = (B + ) (A + E) P A = (B + ) E P A = (B + ) B (B + ) E P A = (B + ) R B E P A ( R B 1, P A 1) Part (ii) follows from the fact that rank(a) rank(b) P B P A P B P A Exercise! (Using C-S decomposition)

32 292 Chapter 7 Lanczos Methods Theorem 746 It holds Proof: F 1 {}}{{}}{{}}{ B + A + = B + P B ER A A + + B + P B PA RBR A A + B + A + = B + P B ER A A + + (B B) + R B E P A R BE P A (AA ) + B + BB + (B A)A + AA + + B + BB + (I AA + ) (I B + B)(A + A)A + = B + (B A)A + + B + (I AA + ) (I B + B)A + = B + A + (Substitute P B = BB +, E = B A, R A = AA +, ) F 2 F 3 Theorem 747 If B = A + E, then B + A + F 2 E F max{ A + 2 2, B + 2 2} Proof: Suppose rank(b) rank(a) Then the column spaces of F 1 and F 2 are orthogonal to the column space of F 3 Hence B + A + 2 F = F 1 + F 2 2 F + F 3 2 F ((I B + B)B + = 0) Since F 1 + F 2 = B + (P B EA + P A + P B PA ), we have F 1 + F 2 2 F B + 2 2( P B EA + P A 2 F + P B P A 2 F ) By Theorems 745 and 746 follows that Thus P B EA + P A 2 F + P B PA 2 F P B EA + 2 F + PB P A 2 F = P B EA + 2 F + PB EA + 2 F = EA + 2 F E 2 F A F 1 + F 2 F A + 2 B + 2 E F (P B P A = P B ER A A + = P B EA + ) By Theorem 746 we have F 3 F A + 2 R BR A F = A + 2 R A R B F = A + 2 A + ER B A E F The final bound is symmetric in A and B, it also holds when rank(b) rank(a) Theorem 748 If rank(a) = rank(b), then B + A + F 2 A + 2 B + 2 E F (see Wedin (1973)) From above we have B + A + F B + 2 2k 2 (A) E F A 2 This bound implies that as E approaches zero, the relative error in B + approaches zero, which further implies that B + approach A + Corollary 749 lim B A B + = A + rank(a) = rank(b) as B approaches A (See Stewart 1977)

33 74 Applications to linear Systems and Least Squares Perturbation of solutions of the least square problems We first state two corollaries of Theorem (SVD) Theorem 7410 (SVD) If A R m n then there exists orthogonal matrices U = [u 1,, u m ] R m m and V = [v 1,, v n ] R n n such that U T AV = diag(σ 1,, σ p ) where p = min(m, n) and σ 1 σ 2 σ p 0 Corollary 7411 If the SVD is given by Theorem 7410 and σ 1 σ r > σ r+1 = = σ p = 0, then (a) rank(a) = r (b) N (A) =span{v r+1,, v n } (c) Range(A) =span{u 1,, u r } (d) A = Σ r σ i u i vi T = U r Σ r Vr T, where U r = [u 1,, u r ], V r = [v 1,, v r ] and Σ r = diag(σ 1,, σ r ) (e) A 2 F = σ σ 2 r (f) A 2 = σ 1 Proof: exercise! Corollary 7412 Let SVD of A R m n is given by Theorem 7410 If k < r = rank(a) and A k = Σ k σ i u i vi T, then min A X 2 = A A k 2 = σ k+1 (7414) rank(x)=k,x R m n Proof: Let X R m n with rank(x) = k Let τ 1,, τ n with τ 1 τ n 0 be the singular values of X Since A = X + (A X) and τ k+1 = 0, then σ k+1 = τ k+1 σ k+1 A X 2 For the matrix A k = U ΣV T ( Σ = diag(σ 1,, σ k, 0,, 0)) we have A A k 2 = U(Σ Σ)V T 2 = Σ Σ 2 = σ k+1 LS-problem: Ax b 2 =min! x LS = A + b Perturbated LS-problem: (A + E)y (b + f) 2 = min! y = (A + E) + (b + f) Lemma 7413 Let A, E R m n and rank(a) = r (a) If rank(a + E) > r then holds (A + E) E 2 (b) If rank(a + E) r and A + 2 E 2 < 1 then rank(a + E) = r and (A + E) + 2 A A + 2 E 2

34 294 Chapter 7 Lanczos Methods Proof: Let τ 1 τ n be the singular values of A + E To (a): If τ k is the smallest nonzero singular value, then k r + 1 because of rank(a + E) > r By Corollary 746, we have E 2 = (A + E) A 2 τ r+1 τ k and therefore (A + E) + 2 = 1/τ k 1/ E 2 To (b): Let σ 1 σ n be the singular values of A, then σ r 0 because of rank(a) = r and A + 2 = 1/σ r Since A + 2 E 2 < 1 so E 2 < σ r, and then by Corollary 746 it must be rank(a + E) r, so we have rank(a + E) = r By Weyl s theorem (Theorem 615) we have τ r σ r E 2 and furthermore here σ r E 2 > 0, so one obtains (A + E) + 2 = 1/τ r 1/(σ r E 2 ) = A + 2 /(1 A + 2 E 2 ) Lemma 7414 Let A, E R m n, b, f R m and x = A + b, y = (A + E) + (b + f) and r = b Ax, then holds y x = [ (A + E) + EA + + (A + E) + (I AA + ) +(I (A + E) + (A + E)A + ]b + (A + E) + f = (A + E) + Ex + (A + E) + (A + E) +T E T r +(I (A + E) + (A + E))E T A +T x + (A + E) + f Proof: y x = [(A + E) + A + ]b + (A + E) + f and for (A + E) + A + one has the decomposition (A + E) + A + = (A + E) + EA + + (A + E) + A + +(A + E) + (A + E A)A + = (A + E) + EA + + (A + E) + (I AA + ) (I (A + E) + (A + E))A + Let C := A + E and apply the generalized inverse to C we obtain C + = C + CC + = C + C +T C + and A T (I AA + ) = A T A T AA + = A T A T A +T A T = A T A T A T + A T = 0, also A + = A T A +T A + and (I C + C)C T = 0 Hence it holds and C + (I AA + ) = C + C +T E T (I AA + ) (I C + C)A + = (I C + C)E T A +T A + If we substitute this into the second and third terms in the decomposition of (A+E) + A + then we have the result (r = (I AA + )b, x = A + b): y x = [ (A + E) + EA + + (A + E) + (A + E) +T E T (I AA T ) +(I (A + E) + (A + E))E T A +T A + ]b + (A + E) + f = (A + E) + Ex + (A + E) + (A + E) +T E T r +(I (A + E) + (A + E))E T A +T x + (A + E) + f

35 75 Unsymmetric Lanczos Method 295 Theorem 7415 Let A, E R m n, b, f R m, and x = A + b 0, y = (A + E) + (b + f) and r = b Ax If rank(a) = r, rank(a + E) r and A + 2 E 2 < 1, then holds y x 2 x 2 A 2 A A + 2 E 2 Proof: From Lemma 7414 follows [ 2 E 2 A 2 + A A + 2 E 2 E 2 A 2 r 2 x 2 + f 2 A 2 x 2 y x 2 (A + E) + 2 [ E 2 x 2 + (A + E) + 2 E 2 r 2 + f 2 ] + I (A + E) + (A + E) 2 E 2 A + 2 x 2 Since I (A + E) + (A + E) is symmetric and it holds (I (A + E) + (A + E)) 2 = I (A + E) + (A + E) From this follows I (A + E) + (A + E) 2 = 1, if (A + E) + (A + E) I Together with the estimation of Lemma 7413(b), we obtain [ ] A + 2 A + 2 y x 2 2 E 1 A + 2 x 2 + f 2 + E 2 E 2 1 A + 2 r 2 2 E 2 and y x 2 A [ 2 A E 2 + f ] 2 A + 2 E 2 r 2 + x 2 1 A + 2 E 2 A 2 A 2 x 2 1 A + 2 E 2 A 2 x 2 ] 75 Unsymmetric Lanczos Method Suppose A R n n and that a nonsingular matrix X exists such that α 1 γ 1 0 X 1 β AX = T = 1 α 2 γn 1 0 β n 1 α n Let X = [x 1,, x n ] and X T = Y = [y 1,, y n ] Compared columns in AX = XT and A T Y = Y T T, we find that and Ax j = γ j 1 x j 1 + α j x j + β j x j+1, γ 0 x 0 0 A T y j = β j 1 y j 1 + α j y j + γ j y j+1, β 0 y 0 0 for j = 1,, n 1 These equations together with Y T X = I n imply α j = y T j Ax j and β j x j+1 = γ j (A α j )x j γ j 1 x j 1, γ j y j+1 = p j (A α j ) T y j β j 1 y j 1 (751) These is some flexibility in choosing the scale factors β j and γ j A canonical choice is to set β j = γ j 2 and γ j = x T j+1p j giving:

36 296 Chapter 7 Lanczos Methods Algorithm 751 (Biorthogonalization method of Lanczos) Given x 1, y 1 R n with x T 1 x 1 = y T 1 y 1 = 1 For j = 1,, n 1, α j = y T j Ax j, r j = (A α j )x j γ j 1 x j 1 (γ 0 x 0 0), β j = r j 2 If β j > 0 then x j+1 = r j /β j, p j = (A α j ) T y j β j 1 y j 1 (β 0 y 0 0), γ j = x T j+1p j, else stop; If γ j 0 then y j+1 = p j /γ j else stop; end for α n = x T nay n (752) Define X j = [x 1,, x j ], Y j = [y 1,, y j ] and T j to be the leading j j principal submatrix of T, it is easy to verify that AX j = X j T j + γ j e T j, A T Y j = Y j T T j + p j e T j (753) Remark 751 (i) p T j γ j = β j γ j x T j+1y j+1 = β j γ j from (751) (ii) Break of the algorithm (752) occurs if p T j γ j = 0: (a) γ j = 0 β j = 0 Then X j is an invariant subspace of A (by (753)) (b) p j = 0 γ j = 0 Then Y j is an invariant subspace of A T (by (753)) (c) p T j γ j = 0 but p j γ j = 0, then (752) breaks down We begin the algorithm (752) with a new starting vector (iii) If p T j γ j is very small, then γ j or β j small Hence y j+1 or x j+1 are large, so the algorithm (752) is unstable Definition 751 An upper Hessenberg matrix H = (h ij ) is called unreducible, if h i+1,i 0, for i = 1,, n 1 (that is subdiagonal entries are nonzero) A tridiagonal matrix T = (t ij ) is called unreducible, if t i,i 1 0 for i = 2,, n and t i,i+1 0 for i = 1,, n 1 Theorem 752 Let A R n n Then (i) If x 0 so that K[x 1, A, n] = [x 1, Ax 1,, A n 1 x 1 ] nonsingular and if X is a nonsingular matrix such that K[x 1, A, n] = XR, where R is an upper triangular matrix, then H = X 1 AX is an upper unreducible Hessenberg matrix (ii) Let X be a nonsingular matrix with first column x 1 and if H = X 1 AX is an upper Hessenberg matrix, then holds K[x 1, A, n] = XK[e 1, H, n] XR, where R is an upper triangular matrix Furthermore, if H is unreducible, then R is nonsingular

37 75 Unsymmetric Lanczos Method 297 (iii) If H = X 1 AX and H = Y 1 AY where H and H are both upper Hessenberg matrices, H is unreducible and the first columns x 1 and y 1 of X and Y, respectively, are linearly dependent, then J = X 1 Y is an upper triangular matrix and H = J 1 HJ Proof: ad(i): Since x 1, Ax 1,, A n 1 x 1 are linearly independent, so A n x 1 is the linear combination of {x 1, Ax 1,, A n 1 x 1 }, ie, there exists c 0,, c n 1 such that n 1 A n x 1 = c i A i x 1 Let 0 0 c 0 1 c1 C = c n 1 Then we have K[x 1, A, n]c = [Ax 1, A 2 x 1,, A n 1 x 1, A n x 1 ] = AK[x 1, A, n] XRC = AXR We then have i=0 Thus X 1 AX = RCR 1 = H is an unreducible Hessenberg matrix ad(ii): From A = XHX 1 follows that A i x 1 = XH i X 1 x 1 = XH i e 1 Then K[x 1, A, n] = [x 1, Ax 1,, A n 1 x 1 ] = [Xe 1, XHe 1,, XH n 1 e 1 ] = X[e 1, He 1,, H n 1 e 1 ] If H is upper Hessenberg, then R = [e 1, He 1,, H n 1 e 1 ] is upper triangular If H is unreducible upper Hessenberg, then R is nonsingular, since r 11 = 1, r 22 = h 21, r 33 = h 21 h 32,, and so on ad(iii): Let y 1 = λx 1 We apply (ii) to the matrix H It follows K[x 1, A, n] = XR 1 Applying (ii) to H, we also have K[y 1, A, n] = Y R 2 Here R 1 and R 2 are upper triangular Since y 1 = λx 1, so λk[x 1, A, n] = λxr 1 = Y R 2 Since R 1 is nonsingular, by (ii) we have R 2 is nonsingular and X 1 Y = λr 1 R 1 2 = J is upper triangular So H = Y 1 AY = (Y 1 X)X 1 AX(X 1 Y ) = J 1 HJ Theorem 753 Let A R n n, x, y R n with K[x, A, n] and K[y, A T, n] nonsingular Then (i) If B = K[y, A T, n] T K[x, A, n] = (y T A i+j 2 x) i,j=1,,n has a decomposition B = LDL T, where L is a lower triangular with l ii = 1 and D is diagonal (that is all principal determinants of B are nonzero) and if X = K[x, A, n]l 1, then T = X 1 AX is an unreducible tridiagonal matrix

Krylov Subspace Methods for Large/Sparse Eigenvalue Problems

Krylov Subspace Methods for Large/Sparse Eigenvalue Problems Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan April 17, 2012 T.-M. Huang (Taiwan Normal University) Krylov