Iterative solvers for saddle point algebraic linear systems: tools of the trade V. Simoncini Dipartimento di Matematica, Università di Bologna and CIRSA, Ravenna, Italy valeria@dm.unibo.it 1
The problem A BT B C u v = f g A = A T, C = C T 0 Mx = b with M large, real indefinite symmetric matrix Computational Fluid Dynamics (Elman, Silvester, Wathen 2005) Elasticity problems Mixed (FE) formulations of II and IV order elliptic PDEs Linearly Constrained Programs Linear Regression in Statistics Image restoration and registration... Survey: Benzi, Golub and Liesen, Acta Num 2005 2
Iterative solver. Convergence considerations. Mx = b M is symmetric and indefinite MINRES (Paige & Saunders, 75) x k x 0 +K k (M,r 0 ), s.t. min b Mx k r k = b Mx k, k = 0,1,..., x 0 starting guess If spec(m) [ a, b] [c,d], with b a = d c, then ( ) k ad bc b Mx 2k 2 b Mx 0 ad+ bc Note: more general but less tractable bounds available 3
Iterative solver. Convergence considerations. Mx = b M is symmetric and indefinite MINRES (Paige & Saunders, 75) x k x 0 +K k (M,r 0 ), s.t. min b Mx k r k = b Mx k, k = 0,1,..., x 0 starting guess If spec(m) [ a, b] [c,d], with b a = d c, then ( ) k ad bc b Mx 2k 2 b Mx 0 ad+ bc Note: more general but less tractable bounds available 4
Features of MINRES Residual minimizing solver for indefinite linear systems Short-term recurrence (possibly with Lanczos recurrence) Positive definite preconditioner, but not necessarily factorized (not as LL T ) Outline Harmonic Ritz values and superlinear convergence Enhancing MINRES convergence Estimating the Saddle-point problem inf-sup constant and stopping criteria 5
Features of MINRES Residual minimizing solver for indefinite linear systems Short-term recurrence (possibly with Lanczos recurrence) Positive definite preconditioner, but not necessarily factorized (not as LL T ) Outline Harmonic Ritz values and superlinear convergence Enhancing MINRES convergence Estimating the Saddle-point problem inf-sup constant and stopping criteria 6
Harmonic Ritz values K k (M,r 0 ) = span{r 0,Mr 0,...,M k 1 r 0 } (x 0 = 0) x k = φ k 1 (M)r 0 K k (M,r 0 ), φ k 1 polyn. of deg k 1 Therefore r k = r 0 Mx k = ϕ k (M)r 0, ϕ k polyn. of deg k,ϕ k (0) = 1 Harmonic Ritz values: roots of ϕ m (residual polynomial) Remark: Harmonic Ritz values approximate eigenvalues of M (Paige, Parlett & van der Vorst, 95) 7
Harmonic Ritz values K k (M,r 0 ) = span{r 0,Mr 0,...,M k 1 r 0 } (x 0 = 0) x k = φ k 1 (M)r 0 K k (M,r 0 ), φ k 1 polyn. of deg k 1 Therefore r k = r 0 Mx k = ϕ k (M)r 0, ϕ k polyn. of deg k,ϕ k (0) = 1 Harmonic Ritz values: roots of ϕ k (residual polynomial) Remark: Harmonic Ritz values approximate eigenvalues of M (Paige, Parlett & van der Vorst, 95) 8
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 9
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 10
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 11
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 12
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 13
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 14
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 15
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 16
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 17
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 18
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 19
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 20
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 21
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 22
Typical convergence pattern 60 40 Harmonic Ritz values 20 0 20 40 60 0 10 20 30 40 50 60 70 number of MINRES iterations Harmonic Ritz values as iterations proceed 23
Superlinear convergence 10 0 10 1 residual norm 10 2 10 3 10 4 10 5 10 6 10 7 0 10 20 30 40 50 60 70 number of iterations 24
Superlinear convergence Generalization of CG well-known result (van der Sluis & van der Vorst 86) MINRES: (van der Vorst and Vuik 93, van der Vorst 03) (λ i,z i ) eigenpairs of M TH: Assume r k = r 0 +s with r 0 z 1 Let r j be the MINRES residual after j iterations with r 0. Then r k+j F k r j, where F k = max k 2 θ (k) 1 λ 1 θ (k) 1 λ k λ 1 λ k and θ (k) 1 is the harmonic Ritz value closest to λ 1 in K k (A,r 0 ). (for a proof, Simoncini & Szyld 11, unpublished) 25
Superlinear convergence. A simple example. M = M δ M + δ = 10 3, spec(m ) R,spec(M + ) R + both well clustered b = 1 26
Superlinear convergence. An experiment. M, b data with one tiny negative eigenvalue 10 2 10 1 10 0 10 0 10 1 min i θ k (i) δ residual norm 10 2 10 4 min i θ k (i) δ 10 2 10 3 10 4 10 6 10 5 10 8 20 40 60 80 100 120 number of iterations 10 6 0 10 20 30 40 50 60 70 80 number of iterations M 1,b 1 data with negative eigenvalue closest to zero removed 27
Superlinear convergence. An experiment. M, b data with one tiny negative eigenvalue 10 2 10 1 10 0 MINRES(M, b) r j 10 0 10 1 min i θ k (i) δ residual norm 10 2 10 4 MINRES(M 1, b 1 ) min i θ k (i) δ 10 2 10 3 10 4 10 6 10 5 10 8 20 40 60 80 100 120 number of iterations 10 6 0 10 20 30 40 50 60 70 80 number of iterations M 1,b 1 data with negative eigenvalue closest to zero removed 28
Enhancing MINRES convergence for Saddle Point Problems Spectral properties M = A BT B O 0 < λ n λ 1 eigs of A 0 < σ m σ 1 sing. vals of B spec(m) subset of (Rusten & Winther 1992) [ 1 2 (λ n λ 2 n +4σ1 2), 1 ] 2 (λ 1 λ 2 1 +4σ2 m) [ λ n, 1 ] 2 (λ 1 + λ 21 +4σ21 ) More results under different hypotheses 29
Block diagonal Preconditioner P ideal = A 0 0 BA 1 B T MINRES converges in at most 3 its. A more practical choice: P = Ã 0 spd. Ã A S BA 1 B T 0 S spectrum in [ a, b] [c,d], a,b,c,d > 0 Robust preconditioners for PDE systems (Survey, Mardal & Winther, 11) 30
Block diagonal Preconditioner P ideal = A 0 0 BA 1 B T MINRES converges in at most 3 its. A more practical choice: P = Ã 0 spd. Ã A S BA 1 B T 0 S spectrum in [ a, b] [c,d], a,b,c,d > 0 Robust preconditioners for PDE systems (Survey, Mardal & Winther, 11) 31
A quasi-optimal approximate Schur complement (joint with M. Olshanskii) S BA 1 B T For certain operators, S is quasi-optimal: spec(ba 1 B T S 1 ) well clustered except for few eigenvalues O Note: well clustered eigs mesh-independent 32
A quasi-optimal approximate Schur complement (joint with M. Olshanskii) S BA 1 B T For certain operators, S is quasi-optimal: spec(ba 1 B T S 1 ) well clustered except for few eigenvalues O Often: well clustered eigs also mesh/parameter-independent 33
The role of S Claim: The presence of outliers in BA 1 B T S 1 is accurately inherited by the preconditioned matrix MP 1 O spec(ba 1 B T S 1 ) O spec(mp 1 ) (for a proof, see Olshanskii & Simoncini, SIMAX 10) 34
Side effects These spectral peculiarities have an effect on the preconditioned problem MP 1... and influence the convergence of MINRES on MP 1 Can we eliminate this influence? 35
Effect on MINRES convergence 10 0 10 2 10 4 10 6 0 20 40 60 80 100 number of iterations Stokes-type problem 36
Eliminating the stagnation phase: Deflated MINRES Y = [y 1,...,y s ]: approximate eigenbasis of M * Approximation space: Augmented Lanczos sequence v j+1 span{y,v 1,v 2,...,v j }, v j+1 = 1 obtained by standard Lanczos method with coeff.matrix G := M MY(Y T MY) 1 Y T M * MINRES method: r j =ˆb Mû j GK j (G,v 1 ) û j obtained with a short-term recurrence 37
http://www.stanford.edu/group/sol/software/minres.html Augmented MINRES: Given M, b, maxit, tol, P, and Y with orthonormal columns u = Y(Y T MY) 1 Y T b starting approx, r = b Mu, r 1 = r, y = P 1 r, etc while (i < maxit) i = i+1 v = y/β; y = Mv MY(Y T MY) 1 Y T Mv if i 2, y = y (β/β 0 )r 1 α = v T y, y = y r 2 α/β r 1 = r 2, r 2 = y y = P 1 r 2 β 0 = β, β = r2 Ty e 0 = e, δ = c d+sα ḡ = s d cα e = sβ d = cβ γ = [ḡ, β] c = ḡ/γ, s = β/γ, φ = c φ, φ = s φ w 1 = w 2, w 2 = w w = (v e 0 w 1 δw 2 )γ 1 g = Y(Y T MY) 1 Y T Mwφ u = u g +φw ζ = χ 1 /γ, χ 1 = χ 2 δz, χ 2 = eζ Check preconditioned residual norm ( φ) for convergence end 38
Stokes-type problem with variable viscosity in Ω R d divν(x)du+ p = f in Ω, div u = 0 in Ω, u = 0 on Ω, with 0 < ν min ν(x) ν max <. (Here, ν(x) = 2µ+ τ s ε 2 + Du(x) 2) u : velocity vector field p : pressure Du = 1 2 ( u+ T u) rate of deformation tensor Prec. S: pressure mass matrix wrto weighted product (ν 1, ) L2 (Ω) (BA 1 B)S 1 well clustered spectrum, Except for few small negative eigenvalues (Grinevich & Olshanskii, 09) 39
Stokes-type problem with variable viscosity in Ω R d divν(x)du+ p = f in Ω, div u = 0 in Ω, u = 0 on Ω, with 0 < ν min ν(x) ν max <. (Here, ν(x) = 2µ+ τ s ε 2 + Du(x) 2) u : velocity vector field p : pressure Du = 1 2 ( u+ T u) rate of deformation tensor Prec. S: pressure mass matrix wrto weighted product (ν 1, ) L2 (Ω) (BA 1 B T )S 1 well clustered spectrum, except for few small positive eigenvalues (Grinevich & Olshanskii, 09) 40
Bercovier-Engelman regularized model of Bingham viscoplastic fluid * One zero pressure mode (eigvec easy to approx) * One small eigenvalue of precon d Schur Complement rough eigvec approximation : {ũ 2, p 2 } T {u 2,p 2 } T p 2 = 0 if 1 2 τ s y 1 2 +τ s, 1 if 0 y < 1 2 τ s, 1 if 1 y > 1 2 +τ s, and ũ 2 = Ã 1 B T p 2 41
Original and Deflated convergence histories 10 0 MINRES vs. Deflated MINRES, δ=1e 3 non deflated deflated 10 2 10 4 10 6 0 20 40 60 80 100 120 iteration number à = IC(A,δ),δ = 10 3 (same with δ = 0) 42
A stopping criterion for Stokes mixed approximation (joint with D. Silvester) M = A BT B 0, P = Ã 0 0 S Ã A, S BA 1 B T Natural norm for the problem: energy norm in the u-space and p-space, x x k Pideal For stable discretization, heuristic relation between error and residual: x x k Pideal 2 γ 2 b Mx k P 1 ideal 2 γ 2 b Mx k P 1 < tol γ inf-sup constant: γ 2 qt BA 1 B T q q T Qq, q 0 43
A stopping criterion for Stokes mixed approximation (joint with D. Silvester) M = A BT B 0, P = Ã 0 0 S Ã A, S BA 1 B T Natural norm for the problem: energy norm in the u-space and p-space, x x k Pideal For stable discretization, heuristic relation between error and residual: x x k Pideal 2 γ 2 b Mx k P 1 ideal 2 γ 2 b Mx k P 1 < tol γ inf-sup constant: γ 2 qt BA 1 B T q q T Qq, q 0 44
Estimating the inf-sup constant For the preconditioned problem (Elman etal, 05): λ 1 2 (δ δ 2 +4δγ 2 ) δ λ + with δ = λ min (AÃ 1 ). If these bounds are tight (equalities), then γ 2 = λ2 λ λ + λ + In practice, adaptive estimate with Harmonic Ritz values: γ 2 γk 2 = (θ(k) ) 2 θ (k) θ (k) +, kth MINRES iteration θ (k) + Corresponding analogous results for Potential flow problem 45
Estimating the inf-sup constant For the preconditioned problem (Elman etal, 05): λ 1 2 (δ δ 2 +4δγ 2 ) δ λ + with δ = λ min (AÃ 1 ). If these bounds are tight (equalities), then γ 2 = λ2 λ λ + λ + In practice, adaptive estimate with Harmonic Ritz values: γ 2 γk 2 = (θ(k) ) 2 θ (k) θ (k) +, kth MINRES iteration θ (k) + Corresponding analogous results for Potential flow problem 46
IFISS Problem: smooth colliding flow with quartic polyn velocity soln k γ k λ k ց λ + 4 0.9807 1.0212 5 0.9340 1.0065 6 0.9147 1.0024 7 0.6689 0.9896 8 0.6618 0.9887 Optimal gamma=0.2162 9 0.3581 0.9705 10 0.3502 0.9671 11 0.2992 0.9305 12 0.2917 0.9218 13 0.2684 0.9071 14 0.2576 0.9031 15 0.2468 0.9000 16 0.2328 0.8973 17 0.2320 0.8971 18 0.2239 0.8945 19 0.2238 0.8945 20 0.2177 0.8904 21 0.2174 0.8897 47
Another IFISS example. Flow over a step. Optimal γ e k : error at iteration k r k : residual at iteration k 10 2 nv=1 10 2 nv=4 10 0 10 0 10 2 10 2 10 4 2/γ 2 r k M 10 4 2/γ 2 r k M e k E e k E 10 6 r k M 10 6 r k M 0 10 20 30 40 iteration number k 0 10 20 30 40 iteration number k E = P ideal, M = P 1 48
Another IFISS example. Flow over a step. Adaptive γ k e k : error at iteration k r k : residual at iteration k 10 2 nv=1 10 2 nv=4 10 0 10 0 10 2 10 2 10 4 2/γ 2 k r k M 10 4 2/γ 2 k r k M e k E e k E 10 6 r k M 10 6 r k M 0 10 20 30 40 iteration number k 0 10 20 30 40 iteration number k E = P ideal, M = P 1 49
Conclusions MINRES is rich in information to be exploited Adaptive problem-related stopping criteria available References: * Maxim A. Olshanskii and V. Simoncini Acquired clustering properties and solution of certain saddle point systems. SIMAX, 2010. * David J. Silvester and V. Simoncini An Optimal Iterative Solver for Symmetric Indefinite Systems stemming from Mixed Approximation. ACM TOMS, 2011. * V. Simoncini and Daniel B. Szyld, unpublished, 2011. 50