CG and MINRES: An empirical comparison

School of Engineering CG and MINRES: An empirical comparison Prequel to LSQR and LSMR: Two least-squares solvers David Fong and Michael Saunders Institute for Computational and Mathematical Engineering Stanford University 5th International Conference on High Performance Scientific Computing March 5 9, 2012 Hanoi, Vietnam

2/43 Abstract For iterative solution of symmetric systems Ax = b, the conjugate gradient method (CG) is commonly used when A is positive definite, while the minimal residual method (MINRES) is typically reserved for indefinite systems. We investigate the sequence of solutions generated by each method and suggest that even if A is positive definite, MINRES may be preferable to CG if iterations are to be terminated early. The classic symmetric positive-definite system comes from the full-rank least-squares (LS) problem min Ax b. Specialization of CG and MINRES to the associated normal equation A T Ax = A T b leads to LSQR and LSMR respectively. We include numerical comparisons of these two LS solvers because they motivated this retrospective study of CG versus MINRES.

3/43 1 CG and MINRES The Lanczos Process Properties Backward Errors 2 LSQR and LSMR 3 LSMR Derivation Golub-Kahan bidiagonalization Properties 4 LSMR Experiments Backward Errors 5 Summary

Part I: CG and MINRES Iterative algorithms for Ax = b, A = A T based on the Lanczos process Krylov-subspace methods: x k = V k y k 4/43

5/43 Lanczos process (summary) β 1 v 1 = b AV k = V k+1 H k V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α 2......... β k β k α ( ) k Tk H k = β k+1 e T k

5/43 Lanczos process (summary) β 1 v 1 = b V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α 2......... β k β k α ( ) k Tk H k = β k+1 e T k AV k = V k+1 H k r k = b Ax k = β 1 v 1 AV k y k = V k+1 (β 1 e 1 H k y k ), Aim: β 1 e 1 H k y k

5/43 Lanczos process (summary) β 1 v 1 = b V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α 2......... β k β k α ( ) k Tk H k = β k+1 e T k AV k = V k+1 H k r k = b Ax k = β 1 v 1 AV k y k = V k+1 (β 1 e 1 H k y k ), Aim: β 1 e 1 H k y k Two subproblems CG T k y k = β 1 e 1 x k = V k y k MINRES min H k y k β 1 e 1 x k = V k y k

6/43 Common practice Ax = b, A = A T

6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES

6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES Experiment: CG vs MINRES on A 0

6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES Experiment: CG vs MINRES on A 0 Hestenes and Stiefel (1952) proposed both CG and CR for A 0 and proved many properties CR MINRES when A 0 They both minimize r k = b Ax k in the Krylov subspace

7/43 Theoretical properties for Ax = b, A 0 CG CR (MINRES) x x k HS 1952 HS 1952 x x k A HS 1952 HS 1952 x k Steihaug 1983 Fong 2012

7/43 Theoretical properties for Ax = b, A 0 CG CR (MINRES) x x k HS 1952 HS 1952 x x k A HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 CR (MINRES) r k HS 1952 r k / x k Fong 2012

8/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward error for square systems Ax = b An approximate solution x k is acceptable iff E, f st (A + E)x k = b + f E A α f b β Smallest perturbations E, f : (Titley-Peloquin 2010) E = α A ψ x k r kx T k f = β b ψ r k E A = α r k ψ f b = β r k ψ

9/43 Backward error for square systems, β = 0 E (k) = r kx T k x k 2 (A + E (k) )x k = b E (k) = r k x k Data: Tim Davis s sparse matrix collection Real, symmetric posdef examples that include b Plot log 10 E (k) for CG and MINRES

10/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward Error of CG vs MINRES on A 0 2 0 Name:Simon_raefsky4, Dim:19779x19779, nnz:1316789, id=7 CG MINRES 0 1 Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13 CG MINRES 2 2 4 3 6 4 log( r / x ) 8 log( r / x ) 5 10 6 12 7 14 8 16 9 18 0 1 2 3 4 5 6 7 8 9 10 iteration count x 10 4 10 0 50 100 150 200 250 300 350 400 450 iteration count Name:Schenk_AFE_af_shell8, Dim:504855x504855, nnz:17579155, id=11 1 CG MINRES 2 Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22 0 CG MINRES 1 3 2 log( r / x ) 4 5 6 7 log( r / x ) 3 4 5 6 7 8 8 9 9 10 0 200 400 600 800 1000 1200 1400 iteration count 10 0 0.5 1 1.5 2 2.5 3 3.5 iteration count x 10 4

11/43 Part II: LSQR and LSMR LSQR CG on A T Ax = A T b LSMR MINRES on A T Ax = A T b

12/43 What problems do LSQR and LSMR solve? solve Ax = b

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b ( ) ( min A b x λi 0) 2

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b ( ) ( min A b x λi 0) 2 Properties A is rectangular (m n) and often sparse A can be an operator ( allows preconditioning) Av, A T u plus O(m + n) operations per iteration

13/43 Why invent another algorithm?

14/43 Reason one CG vs MINRES

15/43 Reason two Monotone convergence of residuals r k and A T r k

16/43 min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0

16/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0 LSQR r k LSQR log A T r k 1.01 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 0 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 1 1 0.99 0.98 2 r 0.97 0.96 log A T r 3 0.95 4 0.94 0.93 5 0.92 0 50 100 150 200 250 300 350 400 450 500 iteration count 6 0 50 100 150 200 250 300 350 400 450 500 iteration count

16/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0 LSQR LSMR r k log A T r k 1 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 LSQR LSMR 0 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 LSQR LSMR 0.99 1 0.98 2 0.97 r 0.96 0.95 log A T r 3 4 0.94 5 0.93 6 0.92 0 50 100 150 200 250 300 350 400 450 500 iteration count 7 0 50 100 150 200 250 300 350 400 450 500 iteration count

17/43 LSMR Derivation

18/43 Golub-Kahan bidiagonalization Given A (m n) and b (m 1) Direct bidiagonalization U T( b A ) ( 1 V ) = ( ) ( b AV ) = U ( β 1 e 1 B )

18/43 Golub-Kahan bidiagonalization Given A (m n) and b (m 1) Direct bidiagonalization U T( b A ) ( 1 V ) = ( ) ( b AV ) = U ( β 1 e 1 B ) Iterative bidiagonalization Bidiag(A, b) Half a page in the 1965 Golub-Kahan SVD paper

19/43 Golub-Kahan bidiagonalization (2) b = U k+1 (β 1 e 1 ) AV k = U k+1 B k ( ) A T U k = V k B T Ik k 0 where α 1 β 2 α 2 B k =...... β k α k β k+1 U k = ( u 1 u k ) V k = ( v 1 v k )

20/43 Golub-Kahan bidiagonalization (3) V k spans the Krylov subspace: span{v 1,..., v k } = span{a T b, (A T A)A T b,..., (A T A) k 1 A T b}

20/43 Golub-Kahan bidiagonalization (3) V k spans the Krylov subspace: span{v 1,..., v k } = span{a T b, (A T A)A T b,..., (A T A) k 1 A T b} Define x k = V k y k Subproblem to solve min y k min y k r k = min β 1 e 1 B k y k (LSQR) y k ( ) A T r k = min y k β B T 1 e 1 k B k y β k+1 e T k (LSMR) k where r k = b Ax k, βk = α k β k

21/43 Computational and storage requirement Storage Work m n m n MINRES on A T Ax = A T b Av 1 x, v 1, v 2, w 1, w 2 8 LSQR Av, u x, v, w 3 5 LSMR Av, u x, v, h, h 3 6 where h k, h k are scalar multiples of w k, w k

22/43 Theoretical properties for min Ax = b LSQR LSMR x x k HS 1952 HS 1952 r r k HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 r k PS 1982 Fong 2012

22/43 Theoretical properties for min Ax = b LSQR LSMR x x k HS 1952 HS 1952 r r k HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 r k PS 1982 Fong 2012 LSQR LSMR A T r k FS 2011 A T r k / r k mostly A T r k / r k A T r k / r k

23/43 LSMR Experiments

24/43 Overdetermined systems Test Data Tim Davis, University of Florida Sparse Matrix Collection LPnetlib: Linear Programming Problems A = (Problem.A) b = Problem.c (127 problems)

24/43 Overdetermined systems Test Data Tim Davis, University of Florida Sparse Matrix Collection LPnetlib: Linear Programming Problems A = (Problem.A) b = Problem.c (127 problems) Solve min Ax b 2 with LSQR and LSMR

25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x

25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x Two estimates given by Stewart (1975 and 1977) E 1 = ext x 2 E 2 = rrt A r 2 E 1 = e x E 2 = AT r r e = ˆr r computable

26/43 log 10 E 2 for LSQR and LSMR typical 0 Name:lp pilot ja, Dim:2267x940, nnz:14977, id=657 LSQR LSMR 1 2 log(e2) 3 4 5 6 0 100 200 300 400 500 600 700 800 900 1000 iteration count

27/43 log 10 E 2 for LSQR and LSMR rare 0 1 Name:lp sc205, Dim:317x205, nnz:665, id=665 LSQR LSMR 2 3 log(e2) 4 5 6 7 8 0 20 40 60 80 100 120 iteration count

28/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Exact µ(x) (Waldén, Karlson, & Sun 1995, Higham 2002) [ ( C A I rrt µ(x) = σ min (C) r x r 2 )]

29/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Cheaper estimate µ(x) (Grcar, Saunders, & Su 2007) ( ) ( ) A r K = r I v = 0 x min y Ky v µ(x) = Ky x

29/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Cheaper estimate µ(x) (Grcar, Saunders, & Su 2007) ( ) ( ) A r K = r I v = 0 x min y Ky v µ(x) = Ky x r = b - A*x; p = colamd(a); eta = norm(r)/norm(x); K = [A(:,p); eta*speye(n)]; v = [r; zeros(n,1)]; [c,r] = qr(k,v,0); mutilde = norm(c)/norm(x);

30/43 Backward errors for LSQR typical Name:lp cre a, Dim:7248x3516, nnz:18168, id=609 0 1 E2 E1 Optimal 2 log(backward Error for LSQR) 3 4 5 6 7 8 0 200 400 600 800 1000 1200 1400 1600 iteration count

31/43 Backward errors for LSQR rare 0 1 Name:lp pilot, Dim:4860x1441, nnz:44375, id=654 E2 E1 Optimal log(backward Error for LSQR) 2 3 4 5 6 7 0 100 200 300 400 500 600 iteration count

32/43 Backward errors for LSMR typical 0 1 Name:lp ken 11, Dim:21349x14694, nnz:49058, id=638 E2 E1 Optimal 2 log(backward Error for LSMR) 3 4 5 6 7 8 0 50 100 150 200 250 iteration count

33/43 Backward errors for LSMR rare 0 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 E2 E1 Optimal 2 log(backward Error for LSMR) 3 4 5 6 7 8 9 0 10 20 30 40 50 60 70 80 90 iteration count

34/43 For LSMR E 2 optimal BE almost always Typical: E 2 µ(x) Rare: E 1 µ(x) 0 1 Name:lp ken 11, Dim:21349x14694, nnz:49058, id=638 E2 E1 Optimal 0 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 E2 E1 Optimal 2 2 log(backward Error for LSMR) 3 4 5 log(backward Error for LSMR) 3 4 5 6 6 7 7 8 8 0 50 100 150 200 250 iteration count 9 0 10 20 30 40 50 60 70 80 90 iteration count

35/43 Optimal backward errors µ(x) Seem monotonic for LSMR Usually not for LSQR Typical for LSQR and LSMR Rare LSQR, typical LSMR 0 Name:lp scfxm3, Dim:1800x990, nnz:8206, id=73 LSQR LSMR 0 Name:lp cre c, Dim:6411x3068, nnz:15977, id=611 LSQR LSMR 1 1 2 2 log(optimal Backward Error) 3 4 log(optimal Backward Error) 3 4 5 5 6 6 7 7 0 100 200 300 400 500 600 700 800 900 1000 iteration count 8 0 200 400 600 800 1000 1200 1400 1600 iteration count

36/43 Optimal backward errors µ(x LSMR ) µ(x LSQR ) almost always Typical Rare 0 1 Name:lp pilot, Dim:4860x1441, nnz:44375, id=654 LSQR LSMR 0 1 Name:lp standgub, Dim:1383x361, nnz:3338, id=693 LSQR LSMR 2 log(optimal Backward Error) 2 3 4 5 log(optimal Backward Error) 3 4 5 6 7 6 8 7 0 100 200 300 400 500 600 iteration count 9 0 50 100 150 iteration count

37/43 Errors in x k x LSQR x x LSMR x seems true x k x for LSMR and LSQR 2 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 LSQR LSMR 1 0 Name:lp pds 02, Dim:7716x2953, nnz:16571, id=649 LSQR LSMR 0 1 1 2 log x k x * 2 3 log x k x * 3 4 4 5 5 6 6 7 0 10 20 30 40 50 60 70 80 90 iteration count 7 0 10 20 30 40 50 60 70 iteration count

38/43 Square consistent systems Ax = b Backward error: r k x k LSQR slightly faster than LSMR in most cases 2 Name:Hamm.hcircuit, Dim:105676x105676, nnz:513072, id=152 LSQR LSMR 0 1 Name:IBM EDA.trans5, Dim:116835x116835, nnz:749800, id=176 LSQR LSMR 0 2 2 3 log( r / x ) 4 6 log( r / x ) 4 5 6 7 8 8 9 10 0 0.5 1 1.5 2 2.5 iteration count x 10 4 10 0 2000 4000 6000 8000 10000 12000 iteration count

39/43 Underdetermined systems Infinitely many solutions Ax = b Unique solution min x st Ax = b Theorem LSQR and LSMR both return the minimum-norm solution 0 1 Name:U lp pds 10, Dim:16558x49932, nnz:107605, id=418 LSQR LSMR 2 Name:U lp israel, Dim:174x316, nnz:2443, id=336 LSQR LSMR 0 2 3 2 log( r / x ) 4 5 log( r / x ) 4 6 6 7 8 8 9 0 20 40 60 80 100 120 140 iteration count 10 0 200 400 600 800 1000 1200 1400 1600 1800 2000 iteration count

40/43 Summary

41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k

41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k MINRES r k r k / x k r k /(α A x k + β b )

41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k MINRES r k r k / x k r k /(α A x k + β b ) For MINRES, backward errors are monotonic safe to stop early

42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n

42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n LSMR A T r k A T r k / r k almost always optimal BE almost always

43/43 References: LSMR: An iterative algorithm for sparse least-squares problems David Fong and Michael Saunders, SISC 2011 CG versus MINRES: An empirical comparison David Fong and Michael Saunders, SQU Journal for Science 2012 Kindest thanks: Georg Bock and colleagues Phan Thanh An and colleagues