The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008
1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization algorithm 5 The block Lanczos algorithm 6 The conjugate gradient algorithm
The Lanczos algorithm Let A be a real symmetric matrix of order n The Lanczos algorithm constructs an orthogonal basis of a Krylov subspace spanned by the columns of K k = ( v, Av,, A k 1 v ) Gram Schmidt orthogonalization (Arnoldi) v 1 = v h i,j = (Av j, v i ), i = 1,..., j j v j = Av j h i,j v i i=1 h j+1,j = v j, if h j+1,j = 0 then stop v j+1 = v j h j+1,j
AV k = V k H k + h k+1,k v k+1 (e k ) T H k is an upper Hessenberg matrix with elements h i,j Note that h i,j = 0, j = 1,..., i 2, i > 2 H k = V T k AV k If A is symmetric, H k is symmetric and therefore tridiagonal H k = J k We also have AV n = V n J n, if no v j is zero before step n since v n+1 = 0 because v n+1 is a vector orthogonal to a set of n orthogonal vectors in a space of dimension n Otherwise there exists an m < n for which AV m = V m J m and the algorithm has found an invariant subspace of A, the eigenvalues of J m being eigenvalues of A
starting from a vector ṽ 1 = v/ v α 1 = (Av 1, v 1 ), ṽ 2 = Av 1 α 1 v 1 and then, for k = 2, 3,... η k 1 = ṽ k v k = ṽ k η k 1 α k = (v k, Av k ) = (v k ) T Av k ṽ k+1 = Av k α k v k η k 1 v k 1
A variant of the Lanczos algorithm has been proposed by Chris Paige to improve the local orthogonality in finite precision computations α k = (v k ) T (Av k η k 1 v k 1 ) ṽ k+1 = (Av k η k 1 v k 1 ) α k v k Since we can suppose that η i 0, the tridiagonal Jacobi matrix J k has real and simple eigenvalues which we denote by θ (k) j They are known as the Ritz values and are the approximations of the eigenvalues of A given by the Lanczos algorithm
Theorem Let χ k (λ) be the determinant of J k λi (which is a monic polynomial), then v k = p k (A)v 1, p k (λ) = ( 1) k 1 χ k 1(λ) η 1 η k 1 The polynomials p k of degree k 1 are called the normalized Lanczos polynomials The polynomials p k satisfy a scalar three term recurrence η k p k+1 (λ) = (λ α k )p k (λ) η k 1 p k 1 (λ), k = 1, 2,... with initial conditions, p 0 0, p 1 1
Theorem Consider the Lanczos vectors v k. There exists a measure α such that (v k, v l ) = p k, p l = b a p k (λ)p l (λ)dα(λ) where a λ 1 = λ min and b λ n = λ max, λ min and λ max being the smallest and largest eigenvalues of A Proof. Let A = QΛQ T be the spectral decomposition of A Since the vectors v j are orthonormal and p k (A) = Qp k (Λ)Q T, we have where ˆv = Q T v 1 (v k, v l ) = (v 1 ) T p k (A) T p l (A)v 1 = (v 1 ) T Qp k (Λ)Q T Qp l (Λ)Q T v 1 = (v 1 ) T Qp k (Λ)p l (Λ)Q T v 1 n = p k (λ j )p l (λ j )[ˆv j ] 2, j=1
The last sum can be written as an integral for a measure α which is piecewise constant 0 if λ < λ 1 α(λ) = i j=1 [ˆv j] 2 if λ i λ < λ i+1 n j=1 [ˆv j] 2 if λ n λ The measure α has a finite number of points of increase at the (unknown) eigenvalues of A
The Lanczos algorithm can be used to solve linear systems Ax = c when A is symmetric and c is a given vector Let x 0 be a given starting vector and r 0 = c Ax 0 be the corresponding residual Let v = v 1 = r 0 / r 0 x k = x 0 + V k y k We request the residual r k = c Ax k to be orthogonal to the Krylov subspace of dimension k V T k r k = V T k c V T k Ax 0 V T k AV ky k = V T k r 0 J k y k = 0 But, r 0 = r 0 v 1 and V T k r 0 = r 0 e 1 J k y k = r 0 e 1
The Lanczos algorithm in finite precision arithmetic It is well known since Lanczos that the basis vectors v k may loose their orthogonality. Moreover multiple copies of the already converged Ritz values appear again and again Consider an example devised by Z. Strakoš: a diagonal matrix with elements ( ) i 1 λ i = λ 1 + (λ n λ 1 )ρ n i, i = 1,..., n n 1 We choose n = 30, λ 1 = 0.1, λ n = 100, ρ = 0.9
5 0 5 10 15 20 30 25 20 25 15 20 10 15 10 5 5 0 0 log 10 ( Ṽ T 30Ṽ30 ) for the Strakos30 matrix 30
In this example the first Ritz value to converge is the largest one λ n Then v k n = p k (λ n )v 1 n must converge to zero (in exact arithmetic). What happens? 0 Log 10 v, i=30 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 30 Strakos30, log 10 ( v k 30 ) with (dashed), without (solid) reorthogonalization and their difference (dotted)
More iterations 0 Log 10 v, i=30 2 4 6 8 10 12 14 16 18 10 20 30 40 50 60 70 80 90 100 Strakos30, log 10 ( v k 30 ) with (dashed) and without reorthogonalization (solid)
Distances to the largest eigenvalue of A 5 Log 10 v, i=30 0 5 10 15 20 20 40 60 80 100 120 140 Strakos30, log 10 ( v30 k ) and the distances to the 10 largest Ritz values
This behavior can be studied by looking at perturbed scalar three-term recurrences Theorem Let j be given and p j,k be the polynomial determined by p j,j 1 = 0, p j,j = 1 η k+1 p j,k+1 (λ) = (λ α k ) p j,k (λ) η k p j,k 1 (λ), k = j,... Then the computed Lanczos vector is ṽ k+1 = p 1,k+1 (A)ṽ 1 + k p l+1,k+1 (A) f l l=1 η l+1
Note that the first term ˇv k+1 = p 1,k+1 (A)ṽ 1 is different from what we have in exact arithmetic since the coefficients of the polynomial are the ones computed in finite precision Proposition The associated polynomial p j,k, k j is given by p j,k (λ) = ( 1) k j χ j,k 1(λ) η j+1 η k where χ j,k (λ) is the determinant of J j,k λi, J j,k being the tridiagonal matrix obtained from the coefficients of the second order recurrence from step j to step k, that is discarding the j 1 first rows and columns of J k
The nonsymmetric Lanczos algorithm When the matrix A is not symmetric we cannot generally construct a vector v k+1 orthogonal to all the previous basis vectors by only using the two previous vectors v k and v k 1 Construct bi-orthogonal sequences using A T choose two starting vectors v 1 and ṽ 1 with (v 1, ṽ 1 ) 0 normalized such that (v 1, ṽ 1 ) = 1. We set v 0 = ṽ 0 = 0. Then for k = 1, 2,... z k = Av k ω k v k η k 1 v k 1 w k = A T ṽ k ω k ṽ k η k 1 ṽ k 1 ω k = (ṽ k, Av k ), η k η k = (z k, w k ) v k+1 = zk η k, ṽ k+1 = w k η k
and ω 1 η 1 J k = Then, in matrix form η 1 ω 2 η 2......... η k 2 ω k 1 η k 1 η k 1 ω k V k = [v 1 v k ], Ṽ k = [ṽ 1 ṽ k ] AV k = V k J k + η k v k+1 (e k ) T A T Ṽ k = Ṽ k J T k + η kṽ k+1 (e k ) T
Theorem If the nonsymmetric Lanczos algorithm does not break down with η k η k being zero, the algorithm yields biorthogonal vectors such that (ṽ i, v j ) = 0, i j, i, j = 1, 2,... The vectors v 1,..., v k span K k (A, v 1 ) and ṽ 1,..., ṽ k span K k (A T, ṽ 1 ). The two sequences of vectors can be written as v k = p k (A)v 1, ṽ k = p k (A T )ṽ 1 where p k and p k are polynomials of degree k 1 η k p k+1 = (λ ω k )p k η k 1 p k 1 η k p k+1 = (λ ω k ) p k η k 1 p k 1
The algorithm breaks down if at some step we have (z k, w k ) = 0 Either a) z k = 0 and/or w k = 0 If z k = 0 we can compute the eigenvalues or the solution of the linear system Ax = c. If z k 0 and w k = 0, the only way to deal with this situation is to restart the algorithm b) The more dramatic situation ( serious breakdown ) is when (z k, w k ) = 0 with z k and w k 0 Need to use look ahead strategies or restart
For our purposes we will use the nonsymmetric Lanczos algorithm with a symmetric matrix! We can choose η k = ± η k = ± (z k, w k ) with for instance, η k 0 and η k = sgn[(z k, w k )] η k. Then p k = ±p k
The Golub Kahan bidiagonalization algorithm Useful when the matrix is A T A, ex. A T Ax = c The first algorithm (LB1) reduces A to upper bidiagonal form Let q 0 = c/ c, r 0 = Aq 0, δ 1 = r 0, p 0 = r 0 /δ 1, then for k = 1, 2,... u k = A T p k 1 δ k q k 1 γ k = u k q k = u k /γ k r k = Aq k γ k p k 1 δ k+1 = r k p k = r k /δ k+1
If and P k = ( p 0 p k 1), Q k = ( q 0 q k 1) δ 1 γ 1...... B k = δ k 1 γ k 1 δ k then P k and Q k, which is an orthogonal matrix, satisfy the equations AQ k = P k B k A T P k = Q k Bk T + γ kq k (e k ) T and A T AQ k = Q k B T k B k + γ k δ k q k (e k ) T
The second algorithm (LB2) reduces A to lower bidiagonal form Let p 0 = c/ c, u 0 = A T p 0, γ 1 = u 0, q 0 = u 0 /γ 1, r 1 = Aq 0 γ 1 p 0, δ 1 = r 1, p 1 = r 1 /δ 1, then for k = 2, 3,... u k 1 = A T p k 1 δ k 1 q k 2 γ k = u k 1 q k 1 = u k 1 /γ k r k = Aq k 1 γ k p k 1 δ k = r k p k = r k /δ k
If and P k+1 = ( p 0 p k), Q k = ( q 0 q k 1) γ 1. δ.. 1 C k =......... γk a k + 1 by k matrix, then P k and Q k, which is an orthogonal matrix, satisfy the equations δ k AQ k = P k+1 C k A T P k+1 = Q k Ck T + γ k+1q k (e k+1 ) T
Of course, by eliminating P k+1 in these equations we obtain A T AQ k = Q k C T k C k + γ k+1 δ k q k (e k ) T and C T k C k = B T k B k = J k B k is the Cholesky factor of J k and C T k C k
The block Lanczos algorithm See Golub and Underwood We consider only 2 2 blocks Let X 0 be an n 2 given matrix, such that X T 0 X 0 = I 2. Let X 1 = 0 be an n 2 matrix. Then, for k = 1, 2,... Ω k = X T k 1 AX k 1 R k = AX k 1 X k 1 Ω k X k 2 Γ T k 1 X k Γ k = R k The last step is the QR decomposition of R k such that X k is n 2 with X T k X k = I 2 We obtain a block tridiagonal matrix
The matrix R k can eventually be rank deficient and in that case Γ k is singular One of the columns of X k can be chosen arbitrarily To complete the algorithm, we choose this column to be orthogonal with the previous block vectors X j The block Lanczos algorithm generates a sequence of matrices such that X T j X i = δ ij I 2
Proposition where C (i) k X i = are 2 2 matrices i k=0 A k X 0 C (i) k Theorem The matrix valued polynomials p k satisfy p k (λ)γ k = λp k 1 (λ) p k 1 (λ)ω k p k 2 (λ)γ T k 1 p 1 (λ) 0, p 0 (λ) I 2 where λ is a scalar and p k (λ) = k j=0 λj X 0 C (k) j
λ[p 0 (λ),..., p N 1 (λ)] = [p 0 (λ),..., p N 1 (λ)]j N +[0,..., 0, p N (λ)γ N ] and as P(λ) = [p 0 (λ),..., p N 1 (λ)] T J N P(λ) = λp(λ) [0,..., 0, p N (λ)γ N ] T where J N is block tridiagonal Theorem Considering the matrices X k, there exists a matrix measure α such that b Xi T X j = p i (λ) T dα(λ)p j (λ) = δ ij I 2 a where a λ 1 = λ min and b λ n = λ max
Proof. δ ij I 2 = X T i X j = = k,l = k,l ( i (C (i) k k=0 = k,l ( n = m=1 )T X T 0 A k ) ( j l=0 A l X 0 C (j) l (C (i) k )T X0 T QΛ k+l Q T X 0 C (j) l (C (i) k )T ˆX Λ k+l ˆX T C (j) l (C (i) k )T k ( n m=1 λ k+l m λ k m(c (i) k )T ) ˆX m ˆX T m ˆX m ˆX T m ) C (j) l ( l ) λ l mc (j) l where ˆX m are the columns of ˆX = X0 T Q which is a 2 n matrix )
Hence X T i X j = n p i (λ m ) T ˆX m ˆX m T p j (λ m ) m=1 The sum in the right hand side can be written as an integral for a 2 2 matrix measure 0 if λ < λ 1 α(λ) = i j=1 ˆX j ˆX j T if λ i λ < λ i+1 n ˆX j=1 j ˆX j T if λ n λ Then b Xi T X j = p i (λ) T dα(λ) p j (λ) a
The conjugate gradient algorithm The conjugate gradient (CG) algorithm is an iterative method to solve linear systems Ax = c where the matrix A is symmetric positive definite (Hestenes and Stiefel 1952) It can be obtained from the Lanczos algorithm by using the LU factorization of J k starting from a given x 0 and r 0 = c Ax 0 : for k = 0, 1,... until convergence do β k = (r k, r k ) (r k 1, r k 1 ), β 0 = 0 p k = r k + β k p k 1 γ k = (r k, r k ) (Ap k, p k ) x k+1 = x k + γ k p k r k+1 = r k γ k Ap k
In exact arithmetic the residuals r k are orthogonal and Moreover v k+1 = ( 1) k r k / r k α k = 1 γ k 1 + β k 1 γ k 2, β 0 = 0, γ 1 = 1 η k = βk γ k 1 The iterates are given by x k+1 = x 0 + s k (A)r 0 where s k is a polynomial of degree k
Let ɛ k A = (Aɛ k, ɛ k ) 1/2 be the A-norm of the error ɛ k = x x k Theorem Consider all the iterative methods that can be written as x k+1 = x 0 + q k (A)r 0, x 0 = x 0, r 0 = c Ax 0 where q k is a polynomial of degree k Of all these methods, CG is the one which minimizes ɛ k A at each iteration
As a consequence Theorem ɛ k+1 2 A max 1 i n (t k+1(λ i )) 2 ɛ 0 2 A for all polynomials t k+1 of degree k + 1 such that t k+1 (0) = 1 Theorem ( ) k κ 1 ɛ k A 2 ɛ 0 A κ + 1 where κ = λn λ 1 is the condition number of A This bound is usually overly pessimistic
CG convergence in variable precision 10 CG, Strakos30, n=30 0 10 20 30 8 40 50 stand 60 32 70 128 reorth 64 0 50 100 150 200 250 Strakos30, log 10 ( r k )
W.E. Arnoldi, The principle of minimized iterations in the solution of the matrix eigenvalue problem, Quarterly of Appl. Math., v 9, (1951), pp 17 29 G.H. Golub and C. Van Loan, Matrix Computations, Third Edition, Johns Hopkins University Press, (1996) G.H. Golub and R. Underwood, The block Lanczos method for computing eigenvalues, in Mathematical Software III, J. Rice Ed., (1977), pp 361 377 M.R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Nat. Bur. Stand., v 49 n 6, (1952), pp 409 436 C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, J. Res. Nat. Bur. Standards, v 45, (1950), pp 255 282 C. Lanczos, Solution of systems of linear equations by minimized iterations, J. Res. Nat. Bur. Standards, v 49, (1952), pp 33 53
G. Meurant, Computer solution of large linear systems, North Holland, (1999) G. Meurant, The Lanczos and Conjugate Gradient algorithms, from theory to finite precision computations, SIAM, (2006) G. Meurant and Z. Strakoš, The Lanczos and conjugate gradient algorithms in finite precision arithmetic, Acta Numerica, (2006) C.C. Paige, The computation of eigenvalues and eigenvectors of very large sparse matrices, Ph.D. thesis, University of London, (1971) Z. Strakoš, On the real convergence rate of the conjugate gradient method, Linear Alg. Appl., v 154 156, (1991), pp 535 549