Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Zdeněk Strakoš and Petr Tichý Institute of Computer Science AS CR, Technical University of Berlin. International Conference On Preconditioning, May 19-21, 2005. Emory University, Atlanta, Georgia, U.S.A.

A system of linear algebraic equations Consider a system of linear algebraic equations Ax = b where A R n n is symmetric positive definite, b R n. Discretization of elliptic partial differential equations: Energy norm. The conjugate gradient method minimizes at the jth step the energy norm of the error on the given j-dimensional Krylov subspace. Z. Strakoš and P. Tichý 2

The conjugate gradient method (CG) Given x 0 R n, r 0 = b Ax 0. CG computes a sequence of iterates x j, so that where x x j A = x j x 0 + K j (A, r 0 ) min x u A, u x 0 +K j (A,r 0 ) K j (A, r 0 ) span {r 0,Ar 0,,A j 1 r 0 }, x x j A ( (x x j ),A(x x j ) ) 1 2. Z. Strakoš and P. Tichý 3

CG Hestenes and Stiefel (1952) given x 0, r 0 = b Ax 0, p 0 = r 0, for j = 0, 1, 2,... γ j = (r j, r j ) (p j,ap j ) x j+1 = x j + γ j p j r j+1 = r j γ j Ap j δ j+1 = (r j+1, r j+1 ) (r j, r j ) p j+1 = r j+1 + δ j+1 p j Z. Strakoš and P. Tichý 4

Preconditioned Conjugate Gradients (PCG) The CG-iterates are thought of beeing applied to Âˆx = ˆb. We consider symmetric preconditioning Change of variables Â = L 1 AL T, ˆb = L 1 b. M LL T, γ j ˆγ j, δ j ˆδ j, x j L T ˆx j, r j L ˆr j, s j M 1 r j, p j L T ˆp j. The preconditioner M is chosen so that a linear system with the matrix M is easy to solve while the matrix L 1 AL T should ensure fast convergence of CG. Z. Strakoš and P. Tichý 5

Algorithm of PCG given x 0, r 0 = b Ax 0, s 0 = M 1 r 0, p 0 = s 0, for j = 0, 1, 2,... γ j = (r j, s j ) (p j,ap j ) x j+1 = x j + γ j p j r j+1 = r j γ j Ap j s j+1 = M 1 r j+1 δ j+1 = (r j+1, s j+1 ) (r j, s j ) p j+1 = s j+1 + δ j+1 p j Z. Strakoš and P. Tichý 6

How to measure quality of approximation?... it depends on a problem. using residual information, normwise backward error, relative residual norm. using error estimates, estimate of the A-norm of the error, estimate of the Euclidean norm of the error. If the system is well-conditioned - it does not matter. Z. Strakoš and P. Tichý 7

A message from history Using of the residual vector r j as a measure of the goodness of the estimate x j is not reliable [HeSt-52, p. 410]. The function (x x j,a(x x j )) can be used as a measure of the goodness of x j as an estimate of x [HeSt-52, p. 413]. Z. Strakoš and P. Tichý 8

Various convergence characteristics 10 4 Example using [GuSt-00], n = 48. 10 2 10 0 10 2 10 4 10 6 10 8 10 10 10 12 10 14 normwise relative backward error relative euclidean norm of the error relative residual norm relative A norm of the error 0 5 10 15 20 25 30 35 40 45 50 Z. Strakoš and P. Tichý 9

Outline 1. CG and Gauss Quadrature 2. Construction of estimates in CG and PCG 3. Estimates in finite precision arithmetic 4. Rounding error analysis 5. Numerical experiments 6. Conclusions Z. Strakoš and P. Tichý 10

1. CG and Gauss Quadrature Z. Strakoš and P. Tichý 11

CG and Gauss Quadrature At any iteration step j, CG (implicitly) determines weights and nodes of the j-point Gauss quadrature ξ ζ f(λ) dω(λ) = j i=1 ω (j) i f(θ (j) i ) + R j (f). For f(λ) λ 1 the formula takes the form x x 0 2 A r 0 2 = j-th Gauss quadrature + x x j 2 A r 0 2. This formula was a base for CG error estimation in [DaGoNa-78, GoFi-93, GoMe-94, GoSt-94, GoMe-97,... ]. Z. Strakoš and P. Tichý 12

Equivalent formulas Continued fractions [GoMe-94, GoSt-94, GoMe-97] r 0 2 C n = r 0 2 C j + x x j 2 A, C n, C j... continued fractions corresponding to ω(λ) and ω (j) (λ). Warnick [Wa-00] r T 0 (x x 0 ) = r T 0 (x j x 0 ) + x x j 2 A. Hestenes and Stiefel [HeSt-52, De-93, StTi-02] x x 0 2 A = j 1 i=0 γ i r i 2 + x x j 2 A. The last formula is derived purely algebraically! Z. Strakoš and P. Tichý 13

Hestenes and Stiefel formula (derivation) Using local orthogonality between r i+1 and p i, Then x x i 2 A x x i+1 2 A = γ i r i 2. x x 0 2 A x x j 2 A = = j 1 i=0 j 1 i=0 ( x xi 2 A x x i+1 2 A) γ i r i 2. The approach to derivation of this formula is very important for its understanding in finite precision arithmetic. Z. Strakoš and P. Tichý 14

Local A-orthogonality Standard derivation of this formula uses global A-orthogonality among direction vectors [AxKa-01, p. 274], [Ar-04, p. 8]. A local A-orthogonality and Pythagorean theorem should be used instead: Z. Strakoš and P. Tichý 15

2. Construction of estimates in CG and PCG Z. Strakoš and P. Tichý 16

Construction of estimate in CG Idea: Consider, for example, x x j 2 A = r 0 2 [C n C j ]. Run d extra steps. Subtracting identity for x x j+d 2 A gives x x j 2 A = r 0 2 [C j+d C j ] }{{} EST 2 + x x j+d 2 A. When x x j 2 A x x j+d 2 A, we have a tight (lower) bound [GoSt-94, GoMe-97]. Z. Strakoš and P. Tichý 17

Mathematically equivalent estimates Continued fractions [GoSt-94, GoMe-97] η j,d = r 0 2 [C j+d C j ], Warnick [Wa-00] µ j,d = r T 0 (x j+d x j ), Hestenes and Stiefel [HeSt-52] ν j,d = j+d 1 i=j γ i r i 2. Z. Strakoš and P. Tichý 18

Construction of estimate in PCG The A-norm of the error can be estimated similarly as in ordinary CG. Extension of the Gauss Quadrature formulas based on continued fractions was published in [Me-99]. Extension of the HS estimate: use the HS formula for Âˆx = ˆb and substitution Â = L 1 AL T, ˆx j = L T x j, ˆγ i = γ i, ˆr i = L 1 r i [De-93, AxKa-01, StTi-04, Ar-04] ˆx ˆx j 2 Â }{{} x x j 2 A = j+d 1 i=j ˆγ i ˆr i 2 Â }{{} γ i (r i, s i ) 2 + ˆx ˆx j+d 2. Â }{{} x x j+d 2 A In many problems it is convenient to use a stopping criterion that relates the relative A-norm of the error to a discretization error, see [Ar-04]. Z. Strakoš and P. Tichý 19

Estimating the relative A-norm of the error To estimate the relative A-norm of the error we use the identities x x j 2 A = ν j,d + x x j+d 2 A, x 2 A = ν 0,j+d + 2b T x 0 x 0 2 A }{{} ξ j+d 2 + x x j+d A. Define j,d ν j,d ξ j+d. If x A x x 0 A then j,d > 0 and j,d = x x j 2 A x x j+d 2 A x 2 A x x j+d 2 A x x j 2 A x 2 A. Z. Strakoš and P. Tichý 20

3. Estimates in finite precision arithmetic Z. Strakoš and P. Tichý 21

CG behavior in finite precision arithmetic orthogonality is lost, convergence is delayed! 10 0 (E) x x j A (FP) x x j A 10 2 (E) I V T j V j F (FP) I V T j V j F 10 4 10 6 10 8 10 10 10 12 10 14 10 16 0 20 40 60 80 100 120 Z. Strakoš and P. Tichý 22

Estimates need not work The identity x x j 2 A = EST 2 + x x j+d 2 A need not hold during the finite precision CG computations. An example: µ j,d = r T 0 (x j+d x j ) does not work! 10 0 x x j A µ j,d 1/2 I V T j V j F 10 2 10 4 10 6 10 8 10 10 10 12 10 14 10 16 0 20 40 60 80 100 120 Z. Strakoš and P. Tichý 23

4. Rounding error analysis Z. Strakoš and P. Tichý 24

Rounding error analysis Without a proper rounding error analysis, there is no justification that the proposed estimates will work in finite precision arithmetic. Do the estimates give good information in practical computations? estimate CG PCG η j,d (continued fractions) yes yes? [GoSt-94, GoMe-97, Me-99] µ j,d (Warnick) no no [StTi-02] ν j,d (Hestenes and Stiefel) yes yes [StTi-02, StTi-04] Based on [GrSt-92], [Gr-89], ε limit. Z. Strakoš and P. Tichý 25

Hestenes and Stiefel estimate (CG) [StTi-02]: Rounding error analysis based on - detailed proof of preserving local orthogonality in CG, - results [Pa-71, Pa-76, Pa-80], [Gr-89, Gr-97]. Theorem: Let ε κ(a) 1. Then the CG approximate solutions computed in finite precision arithmetic satisfy x x j 2 A x x j+d 2 A = ν j,d + x x j A E j,d + O(ε 2 ), E j,d ( κ(a)) ε x x 0 A. Main result: Until x x j A reaches a level close to ε x x 0 A, the estimate ν j,d must work. Z. Strakoš and P. Tichý 26

Hestenes and Stiefel estimate (PCG) [StTi-04]: Analysis based on: - rounding error analysis from [StTi-02], - solving of Ms j+1 = r j+1 enjoys perfect normwise backward stability [Hi-96, p. 206]. Similar result as for CG: Until x x j A reaches a level close to ε x x 0 A, the estimate ν j,d = j+d 1 i=j γ i (r i, s i ) must work. Z. Strakoš and P. Tichý 27

5. Numerical Experiments Z. Strakoš and P. Tichý 28

Estimating the A-norm of the error P. Benner: Large-Scale Control Problems, optimal cooling of steel profiles, PCG, κ(a) = 9.7e + 04, n = 5177, d = 4, L = cholinc(a, 0). 10 0 10 5 10 10 10 15 10 20 true residual norm normwise backward error A norm of the error estimate 0 50 100 150 200 250 300 350 Z. Strakoš and P. Tichý 29

Estimating the A-norm of the error R. Kouhia: Cylindrical shell (Matrix Market), matrix s3dkt3m2, PCG, κ(a) = 3.62e + 11, n = 90499, d = 200, L = cholinc(a, 0). 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 10 12 true residual norm normwise backward error A norm of the error estimate 0 500 1000 1500 2000 2500 3000 Z. Strakoš and P. Tichý 30

Estimating the relative A-norm of the error R. Kouhia: Cylindrical shell (Matrix Market), matrix s3dkt3m2, PCG, κ(a) = 3.62e + 11, n = 90499, d = 200, L = cholinc(a, 0). 10 1 10 0 10 1 10 2 d = 200 d = 80 d = 10 d = 1 10 3 10 4 10 5 10 6 10 7 10 8 relative A norm of the error estimate 10 9 0 500 1000 1500 2000 2500 3000 Z. Strakoš and P. Tichý 31

Estimating the relative A-norm of the error R. Kouhia: Cylindrical shell (Matrix Market), matrix s3rmt3m3, PCG, κ(a) = 2.40e + 10, n = 5357, d = 50, L = cholinc(a, 1e 5). 10 0 10 2 10 4 10 6 10 8 10 10 10 12 relative A norm of the error numerically stable estimate numerically unstable estimate 0 100 200 300 400 500 600 700 Z. Strakoš and P. Tichý 32

Reconstruction of the convergence curve R. Kouhia: Cylindrical shell (Matrix Market), matrix s3dkt3m2, PCG, κ(a) = 3.62e + 11, n = 90499, d = 100, L = cholinc(a, 0). 10 0 10 1 10 2 10 3 relative A norm of the error estimate (d=100) 0 500 1000 1500 2000 2500 Z. Strakoš and P. Tichý 33

6. Conclusions Various formulas (based on Gauss quadrature) are mathematically equivalent to the formulas present (but somehow hidden) in the original Hestenes and Stiefel paper. Hestenes and Stiefel estimate is very simple, it can be computed almost for free and it has been proved numerically stable. and 1/2 j,d We suggest the estimates ν 1/2 j,d software realizations of the CG and PCG methods. to be incorporated into any The estimates are tight if the A-norm of the error reasonably decreases. Open problem: The adaptive choice of the parameter d. Z. Strakoš and P. Tichý 34

Thank you for your attention! More details can be found in Strakoš, Z. and Tichý, P., On error estimation in the Conjugate Gradient method and why it works in finite precision computations, Electron. Trans. Numer. Anal. (ETNA), Volume 13, pp. 56-80, 2002. Strakoš, Z. and Tichý, P., Simple estimation of the A-norm of the error in the Preconditioned Conjugate Gradients, submitted to BIT Numerical Mathematics, 2004. http://www.cs.cas.cz/ strakos, http://www.cs.cas.cz/ tichy Z. Strakoš and P. Tichý 35