Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method

Similar documents
Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method

ERROR ESTIMATION IN PRECONDITIONED CONJUGATE GRADIENTS

Error Estimation in Preconditioned Conjugate Gradients. September 15, 2004

Sensitivity of Gauss-Christoffel quadrature and sensitivity of Jacobi matrices to small changes of spectral data

Moments, Model Reduction and Nonlinearity in Solving Linear Algebraic Problems

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Matching moments and matrix computations

Principles and Analysis of Krylov Subspace Methods

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

On the Vorobyev method of moments

On numerical stability in large scale linear algebraic computations

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

On the interplay between discretization and preconditioning of Krylov subspace methods

The Lanczos and conjugate gradient algorithms

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

CHARLES UNIVERSITY, PRAGUE Faculty of Mathematics and Physics Department of Numerical Mathematics ON SOME OPEN PROBLEMS IN KRYLOV SUBSPACE METHODS

On Numerical Stability in Large Scale Linear Algebraic Computations

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

Lanczos tridiagonalization, Krylov subspaces and the problem of moments

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Conjugate Gradient Method

Numerical behavior of inexact linear solvers

ON EFFICIENT NUMERICAL APPROXIMATION OF THE BILINEAR FORM c A 1 b

On the choice of abstract projection vectors for second level preconditioners

Composite convergence bounds based on Chebyshev polynomials and finite precision conjugate gradient computations

9.1 Preconditioned Krylov Subspace Methods

Computational Linear Algebra

High performance Krylov subspace method variants

Finite-choice algorithm optimization in Conjugate Gradients

Computational Linear Algebra

7.2 Steepest Descent and Preconditioning

1 Error analysis for linear systems

Notes on Some Methods for Solving Linear Systems

AN ITERATIVE METHOD WITH ERROR ESTIMATORS

6.4 Krylov Subspaces and Conjugate Gradients

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Solving large sparse Ax = b.

On the properties of Krylov subspaces in finite precision CG computations

1 Conjugate gradients

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Master Thesis Literature Study Presentation

On solving linear systems arising from Shishkin mesh discretizations

The Conjugate Gradient Method

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Reduced Synchronization Overhead on. December 3, Abstract. The standard formulation of the conjugate gradient algorithm involves

Chapter 7 Iterative Techniques in Matrix Algebra

Krylov subspace methods from the analytic, application and computational perspective

On the accuracy of saddle point solvers

4.6 Iterative Solvers for Linear Systems

Topics. Review of lecture 2/11 Error, Residual and Condition Number. Review of lecture 2/16 Backward Error Analysis The General Case 1 / 22

ANY FINITE CONVERGENCE CURVE IS POSSIBLE IN THE INITIAL ITERATIONS OF RESTARTED FOM

Tikhonov Regularization of Large Symmetric Problems

Jae Heon Yun and Yu Du Han

Iterative methods for Linear System

Stopping criteria for iterations in finite element methods

Introduction. Chapter One

The Conjugate Gradient Method for Solving Linear Systems of Equations

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

Contents. Preface... xi. Introduction...

Total least squares. Gérard MEURANT. October, 2008

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

CG and MINRES: An empirical comparison

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

ETNA Kent State University

Non-stationary extremal eigenvalue approximations in iterative solutions of linear systems and estimators for relative error

CG versus MINRES on Positive Definite Systems

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Is there life after the Lanczos method? What is LOBPCG?

From Stationary Methods to Krylov Subspaces

Inexactness and flexibility in linear Krylov solvers

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

MS 28: Scalable Communication-Avoiding and -Hiding Krylov Subspace Methods I

Chebyshev semi-iteration in Preconditioning

ITERATIVE PROJECTION METHODS FOR SPARSE LINEAR SYSTEMS AND EIGENPROBLEMS CHAPTER 4 : CONJUGATE GRADIENT METHOD

Numerical Methods I Non-Square and Sparse Linear Systems

CG VERSUS MINRES: AN EMPIRICAL COMPARISON

Introduction to Iterative Solvers of Linear Systems

On the influence of eigenvalues on Bi-CG residual norms

EXASCALE COMPUTING. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. P. Ghysels W.

Some minimization problems

The Newton-ADI Method for Large-Scale Algebraic Riccati Equations. Peter Benner.

Iterative methods for positive definite linear systems with a complex shift

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

Lecture # 20 The Preconditioned Conjugate Gradient Method

Various ways to use a second level preconditioner

The Nullspace free eigenvalue problem and the inexact Shift and invert Lanczos method. V. Simoncini. Dipartimento di Matematica, Università di Bologna

Course Notes: Week 4

An interior-point method for large constrained discrete ill-posed problems

Iterative Methods for Sparse Linear Systems

Variants of BiCGSafe method using shadow three-term recurrence

Numerical Methods in Matrix Computations

Combination Preconditioning of saddle-point systems for positive definiteness

Lab 1: Iterative Methods for Solving Linear Systems

Key words. GMRES method, convergence bounds, worst-case GMRES, ideal GMRES, field of values

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

On a residual-based a posteriori error estimator for the total error

MATH 304 Linear Algebra Lecture 18: Orthogonal projection (continued). Least squares problems. Normed vector spaces.

Transcription:

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method Zdeněk Strakoš and Petr Tichý Institute of Computer Science AS CR, Technical University of Berlin. Emory University, February 2006.

A system of linear algebraic equations Consider a system of linear algebraic equations Ax = b where A R n n is symmetric positive definite, b R n. Discretization of elliptic self-adjoint partial differential equations: Energy norm. The conjugate gradient method minimizes at the jth step the energy norm of the error on the given j-dimensional Krylov subspace. Z. Strakoš and P. Tichý 2

The conjugate gradient method (CG) Given x 0 R n, r 0 = b Ax 0. CG computes a sequence of iterates x j, so that where x x j A = x j x 0 + K j (A, r 0 ) min x u A, u x 0 +K j (A,r 0 ) K j (A, r 0 ) span {r 0,Ar 0,,A j 1 r 0 }, x x j A ( (x x j ),A(x x j ) ) 1 2. Z. Strakoš and P. Tichý 3

CG Hestenes and Stiefel (1952) given x 0, r 0 = b Ax 0, p 0 = r 0, for j = 0, 1, 2,... γ j = (r j, r j ) (p j,ap j ) x j+1 = x j + γ j p j r j+1 = r j γ j Ap j δ j+1 = (r j+1, r j+1 ) (r j, r j ) p j+1 = r j+1 + δ j+1 p j Z. Strakoš and P. Tichý 4

Preconditioned Conjugate Gradients (PCG) The CG-iterates are thought of being applied to ˆx = ˆb. We consider symmetric preconditioning Change of variables  = L 1 AL T, ˆb = L 1 b. M LL T, γ j ˆγ j, δ j ˆδ j, x j L T ˆx j, r j L ˆr j, s j M 1 r j, p j L T ˆp j. The preconditioner M is chosen so that a linear system with the matrix M is easy to solve while the matrix L 1 AL T should ensure fast convergence of CG. Z. Strakoš and P. Tichý 5

Algorithm of PCG given x 0, r 0 = b Ax 0, s 0 = M 1 r 0, p 0 = s 0, for j = 0, 1, 2,... γ j = (r j, s j ) (p j,ap j ) x j+1 = x j + γ j p j r j+1 = r j γ j Ap j s j+1 = M 1 r j+1 δ j+1 = (r j+1, s j+1 ) (r j, s j ) p j+1 = s j+1 + δ j+1 p j Z. Strakoš and P. Tichý 6

When to stop? When to stop? Should we bother? 10 2 10 0 fair comparison of the direct and iterative principles iterative principle direct principle accuracy needed 10 2 accuracy of the approximate solution 10 4 10 6 10 8 10 10 10 12 10 14 10 16 10 18 0 50 100 150 computational cost Z. Strakoš and P. Tichý 7

How to measure a quality of approximation? Measuring accuracy?... it depends on a problem. using residual information, normwise backward error, relative residual norm. using error estimates, estimate of the A-norm of the error, estimate of the Euclidean norm of the error. If the original system is well-conditioned - it does not matter. Z. Strakoš and P. Tichý 8

A message from history Using of the residual vector r j as a measure of the goodness of the estimate x j is not reliable [HeSt-52, p. 410]. The function (x x j,a(x x j )) can be used as a measure of the goodness of x j as an estimate of x [HeSt-52, p. 413]. Z. Strakoš and P. Tichý 9

Various convergence characteristics 10 4 Example using [GuSt-00], n = 48. 10 2 10 0 10 2 10 4 10 6 10 8 10 10 10 12 10 14 normwise relative backward error relative euclidean norm of the error relative residual norm relative A norm of the error 0 5 10 15 20 25 30 35 40 45 50 Z. Strakoš and P. Tichý 10

Outline 1. CG and Gauss Quadrature 2. Construction of estimates in CG and PCG 3. Estimates in finite precision arithmetic 4. Rounding error analysis 5. Numerical experiments 6. Conclusions Z. Strakoš and P. Tichý 11

1. CG and Gauss Quadrature Z. Strakoš and P. Tichý 12

CG and Gauss Quadrature Ax = b, x 0 T j y j = r 0 e 1 x j = x 0 +Q j y j ξ ζ j i=1 f(λ) dω(λ) ( ω (j) i f θ (j) i ) ω (j) ω(λ) Z. Strakoš and P. Tichý 13

CG and Gauss Quadrature At any iteration step j, CG represents the matrix formulation of the j-point Gauss quadrature of the Riemann-Stieltjes integral determined by A and r 0, ξ ζ f(λ) dω(λ) = j i=1 ω (j) i f(θ (j) i ) + R j (f). For f(λ) λ 1 the formula takes the form x x 0 2 A r 0 2 = j-th Gauss quadrature + x x j 2 A r 0 2. This was a base for CG error estimation in [DaGoNa-78, GoFi-93, GoMe-94, GoSt-94, GoMe-97,... ]. Z. Strakoš and P. Tichý 14

Equivalent formulas Continued fractions [GoMe-94, GoSt-94, GoMe-97] r 0 2 C n = r 0 2 C j + x x j 2 A. Warnick [Wa-00] r T 0 (x x 0 ) = r T 0 (x j x 0 ) + x x j 2 A. Hestenes and Stiefel [HeSt-52, De-93, StTi-02] x x 0 2 A = j 1 i=0 γ i r i 2 + x x j 2 A. The last formula is derived purely algebraically! Z. Strakoš and P. Tichý 15

Local A-orthogonality Standard derivation of this formula uses global A-orthogonality among direction vectors [AxKa-01, p. 274], [Ar-04, p. 8]. It follows, however, from the local A-orthogonality and Pythagorean theorem, Z. Strakoš and P. Tichý 16

Hestenes and Stiefel formula (derivation) Using local orthogonality between r i+1 and p i, Then x x i 2 A x x i+1 2 A = γ i r i 2. x x 0 2 A x x j 2 A = = j 1 i=0 j 1 i=0 ( x xi 2 A x x i+1 2 A) γ i r i 2. The approach to derivation of this formula is very important for its justification in finite precision arithmetic. Z. Strakoš and P. Tichý 17

2. Construction of estimates in CG and PCG Z. Strakoš and P. Tichý 18

Construction of estimate in CG Idea: Consider, for example, x x j 2 A = r 0 2 [C n C j ]. Run d extra steps. Subtracting identity for x x j+d 2 A gives x x j 2 A = r 0 2 [C j+d C j ] }{{} EST 2 + x x j+d 2 A. When x x j 2 A x x j+d 2 A, we have a tight (lower) bound [GoSt-94, GoMe-97]. Z. Strakoš and P. Tichý 19

Mathematically equivalent estimates Continued fractions [GoSt-94, GoMe-97] η j,d = r 0 2 [C j+d C j ], Warnick [Wa-00] µ j,d = r T 0 (x j+d x j ), Hestenes and Stiefel [HeSt-52] ν j,d = j+d 1 i=j γ i r i 2. Z. Strakoš and P. Tichý 20

Construction of estimate in PCG The A-norm of the error can be estimated similarly as in ordinary CG. Extension of the Gauss Quadrature formulas based on continued fractions was published in [Me-99]. Extension of the HS estimate: use the HS formula for ˆx = ˆb and substitution  = L 1 AL T, ˆx j = L T x j, ˆγ i = γ i, ˆr i = L 1 r i [De-93, AxKa-01, StTi-04, Ar-04] ˆx ˆx j 2  }{{} x x j 2 A = j+d 1 i=j ˆγ i ˆr i 2  }{{} γ i (r i, s i ) 2 + ˆx ˆx j+d 2.  }{{} x x j+d 2 A In many problems it is convenient to use a stopping criterion that relates the relative A-norm of the error to a discretization error, see [Ar-04]. Z. Strakoš and P. Tichý 21

Estimating the relative A-norm of the error To estimate the relative A-norm of the error we use the identities x x j 2 A = ν j,d + x x j+d 2 A, x 2 A = ν 0,j+d + 2b T x 0 x 0 2 A }{{} ξ j+d 2 + x x j+d A. Define j,d ν j,d ξ j+d. If x A x x 0 A then j,d > 0 and j,d = x x j 2 A x x j+d 2 A x 2 A x x j+d 2 A x x j 2 A x 2 A. Z. Strakoš and P. Tichý 22

3. Estimates in finite precision arithmetic Z. Strakoš and P. Tichý 23

CG behavior in finite precision arithmetic orthogonality is lost, convergence is delayed! 10 0 (E) x x j A (FP) x x j A 10 2 (E) I V T j V j F (FP) I V T j V j F 10 4 10 6 10 8 10 10 10 12 10 14 10 16 0 20 40 60 80 100 120 Z. Strakoš and P. Tichý 24

A principal, often overlooked, problem Are estimates derived assuming exact arithmetic applicable in finite precision computations? 10 2 10 0 sophisticated error estimates and effects of rounding errors finite precision computation exact arithmetic computation 10 2 norm of the error which is to be estimated 10 4 10 6 10 8 10 10 10 12 10 14 10 16 10 18 0 10 20 30 40 50 60 70 80 90 100 iteration number Z. Strakoš and P. Tichý 25

Estimates need not work The identity x x j 2 A = EST 2 + x x j+d 2 A need not hold during the finite precision CG computations. An example: µ j,d = r T 0 (x j+d x j ) does not work! 10 0 x x j A µ j,d 1/2 I V T j V j F 10 2 10 4 10 6 10 8 10 10 10 12 10 14 10 16 0 20 40 60 80 100 120 Z. Strakoš and P. Tichý 26

4. Rounding error analysis Z. Strakoš and P. Tichý 27

Rounding error analysis Without a proper rounding error analysis, there is no justification whatsoever that the proposed estimates work in finite precision arithmetic. Do the estimates give good information in practical computations? estimate CG PCG η j,d (continued fractions) yes yes? [GoSt-94, GoMe-97, Me-99] µ j,d (Warnick) no no [StTi-02] ν j,d (Hestenes and Stiefel) yes yes [StTi-02, StTi-04] Based on [GrSt-92], [Gr-89], ε limit. Z. Strakoš and P. Tichý 28

Hestenes and Stiefel estimate (CG) [StTi-02]: Rounding error analysis based on - detailed investigation of preserving local orthogonality in CG, - results [Pa-71, Pa-76, Pa-80], [Gr-89, Gr-97]. Theorem: Let ε κ(a) 1. Then the CG approximate solutions computed in finite precision arithmetic satisfy x x j 2 A x x j+d 2 A = ν j,d + x x j A E j,d + O(ε 2 ), E j,d ( κ(a)) ε x x 0 A. Z. Strakoš and P. Tichý 29

Hestenes and Stiefel estimate (PCG) [StTi-04]: Analysis based on: - rounding error analysis from [StTi-02], - solving of Ms j+1 = r j+1 enjoys perfect normwise backward stability [Hi-96, p. 206]. Similar result as for CG: Until x x j A reaches a level close to ε x x 0 A, the estimate ν j,d = j+d 1 i=j γ i (r i, s i ) must work. Z. Strakoš and P. Tichý 30

5. Numerical Experiments Z. Strakoš and P. Tichý 31

Estimating the A-norm of the error P. Benner: Large-Scale Control Problems, optimal cooling of steel profiles, PCG, κ(a) = 9.7e + 04, n = 5177, d = 4, L = cholinc(a, 0). 10 0 10 5 10 10 10 15 10 20 true residual norm normwise backward error A norm of the error estimate 0 50 100 150 200 250 300 350 Z. Strakoš and P. Tichý 32

Estimating the A-norm of the error R. Kouhia: Cylindrical shell (Matrix Market), matrix s3dkt3m2, PCG, κ(a) = 3.62e + 11, n = 90499, d = 200, L = cholinc(a, 0). 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 10 12 true residual norm normwise backward error A norm of the error estimate 0 500 1000 1500 2000 2500 3000 Z. Strakoš and P. Tichý 33

Estimating the relative A-norm of the error R. Kouhia: Cylindrical shell (Matrix Market), matrix s3dkt3m2, PCG, κ(a) = 3.62e + 11, n = 90499, d = 200, L = cholinc(a, 0). 10 1 10 0 10 1 10 2 d = 200 d = 80 d = 10 d = 1 10 3 10 4 10 5 10 6 10 7 10 8 relative A norm of the error estimate 10 9 0 500 1000 1500 2000 2500 3000 Z. Strakoš and P. Tichý 34

Estimating the relative A-norm of the error R. Kouhia: Cylindrical shell (Matrix Market), matrix s3rmt3m3, PCG, κ(a) = 2.40e + 10, n = 5357, d = 50, L = cholinc(a, 1e 5). 10 0 10 2 10 4 10 6 10 8 10 10 10 12 relative A norm of the error numerically stable estimate numerically unstable estimate 0 100 200 300 400 500 600 700 Z. Strakoš and P. Tichý 35

Reconstruction of the convergence curve R. Kouhia: Cylindrical shell (Matrix Market), matrix s3dkt3m2, PCG, κ(a) = 3.62e + 11, n = 90499, d = 100, L = cholinc(a, 0). 10 0 10 1 10 2 10 3 relative A norm of the error estimate (d=100) 0 500 1000 1500 2000 2500 Z. Strakoš and P. Tichý 36

6. Conclusions Various formulas (based on Gauss quadrature) are mathematically equivalent to the formulas present (but somehow hidden) in the original Hestenes and Stiefel paper. Hestenes and Stiefel estimate is very simple, it can be computed almost for free and it has been proved numerically stable. We suggest to incorporate the estimates ν 1/2 j,d software realizations of the CG and PCG methods. and 1/2 j,d into any The estimates are tight if the A-norm of the error reasonably decreases. Open problem: The adaptive choice of the parameter d. Z. Strakoš and P. Tichý 37

More details can be found in Strakoš, Z. and Tichý, P., On error estimation in the Conjugate Gradient method and why it works in finite precision computations, Electronic Trans. Numer. Anal. (ETNA), 13: pp. 56-80, (2002). Strakoš, Z. and Tichý, P., Simple estimation of the A-norm of the error in the Preconditioned Conjugate Gradients, BIT Numerical Mathematics, 45: pp. 789-817, (2005). http://www.cs.cas.cz/ strakos, http://www.cs.cas.cz/ tichy Meurant, G. and Strakoš, Z., The Lanczos and conjugate gradient methods in finite precision arithmetic, Acta Numerica, 15 (to appear in 2006). Z. Strakoš and P. Tichý 38

Thank you for your kind attention! Z. Strakoš and P. Tichý 39