CG and MINRES: An empirical comparison

Size: px

Start display at page:

Download "CG and MINRES: An empirical comparison"

Corey Hall
5 years ago
Views:

School of Engineering CG and MINRES: An empirical comparison Prequel to LSQR and LSMR: Two least-squares solvers David Fong and Michael Saunders Institute for

1 School of Engineering CG and MINRES: An empirical comparison Prequel to LSQR and LSMR: Two least-squares solvers David Fong and Michael Saunders Institute for Computational and Mathematical Engineering Stanford University 5th International Conference on High Performance Scientific Computing March 5 9, 2012 Hanoi, Vietnam

2 2/43 Abstract For iterative solution of symmetric systems Ax = b, the conjugate gradient method (CG) is commonly used when A is positive definite, while the minimal residual method (MINRES) is typically reserved for indefinite systems. We investigate the sequence of solutions generated by each method and suggest that even if A is positive definite, MINRES may be preferable to CG if iterations are to be terminated early. The classic symmetric positive-definite system comes from the full-rank least-squares (LS) problem min Ax b. Specialization of CG and MINRES to the associated normal equation A T Ax = A T b leads to LSQR and LSMR respectively. We include numerical comparisons of these two LS solvers because they motivated this retrospective study of CG versus MINRES.

3 3/43 1 CG and MINRES The Lanczos Process Properties Backward Errors 2 LSQR and LSMR 3 LSMR Derivation Golub-Kahan bidiagonalization Properties 4 LSMR Experiments Backward Errors 5 Summary

4 Part I: CG and MINRES Iterative algorithms for Ax = b, A = A T based on the Lanczos process Krylov-subspace methods: x k = V k y k 4/43

5 5/43 Lanczos process (summary) β 1 v 1 = b AV k = V k+1 H k V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α β k β k α ( ) k Tk H k = β k+1 e T k

6 5/43 Lanczos process (summary) β 1 v 1 = b V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α β k β k α ( ) k Tk H k = β k+1 e T k AV k = V k+1 H k r k = b Ax k = β 1 v 1 AV k y k = V k+1 (β 1 e 1 H k y k ), Aim: β 1 e 1 H k y k

7 5/43 Lanczos process (summary) β 1 v 1 = b V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α β k β k α ( ) k Tk H k = β k+1 e T k AV k = V k+1 H k r k = b Ax k = β 1 v 1 AV k y k = V k+1 (β 1 e 1 H k y k ), Aim: β 1 e 1 H k y k Two subproblems CG T k y k = β 1 e 1 x k = V k y k MINRES min H k y k β 1 e 1 x k = V k y k

8 6/43 Common practice Ax = b, A = A T

9 6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES

10 6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES Experiment: CG vs MINRES on A 0

11 6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES Experiment: CG vs MINRES on A 0 Hestenes and Stiefel (1952) proposed both CG and CR for A 0 and proved many properties CR MINRES when A 0 They both minimize r k = b Ax k in the Krylov subspace

12 7/43 Theoretical properties for Ax = b, A 0 CG CR (MINRES) x x k HS 1952 HS 1952 x x k A HS 1952 HS 1952 x k Steihaug 1983 Fong 2012

13 7/43 Theoretical properties for Ax = b, A 0 CG CR (MINRES) x x k HS 1952 HS 1952 x x k A HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 CR (MINRES) r k HS 1952 r k / x k Fong 2012

14 8/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward error for square systems Ax = b An approximate solution x k is acceptable iff E, f st (A + E)x k = b + f E A α f b β

15 8/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward error for square systems Ax = b An approximate solution x k is acceptable iff E, f st (A + E)x k = b + f E A α f b β Smallest perturbations E, f : (Titley-Peloquin 2010) E = α A ψ x k r kx T k f = β b ψ r k E A = α r k ψ f b = β r k ψ

16 8/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward error for square systems Ax = b An approximate solution x k is acceptable iff E, f st (A + E)x k = b + f E A α f b β Smallest perturbations E, f : (Titley-Peloquin 2010) E = α A ψ x k r kx T k f = β b ψ r k E A = α r k ψ f b = β r k ψ Stopping rule: r k ψ α A x k + β b

17 9/43 Backward error for square systems, β = 0 E (k) = r kx T k x k 2 (A + E (k) )x k = b E (k) = r k x k Data: Tim Davis s sparse matrix collection Real, symmetric posdef examples that include b Plot log 10 E (k) for CG and MINRES

18 10/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward Error of CG vs MINRES on A Name:Simon_raefsky4, Dim:19779x19779, nnz: , id=7 CG MINRES 0 1 Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13 CG MINRES log( r / x ) 8 log( r / x ) iteration count x iteration count Name:Schenk_AFE_af_shell8, Dim:504855x504855, nnz: , id=11 1 CG MINRES 2 Name:BenElechi_BenElechi1, Dim:245874x245874, nnz: , id=22 0 CG MINRES log( r / x ) log( r / x ) iteration count iteration count x 10 4

19 11/43 Part II: LSQR and LSMR LSQR CG on A T Ax = A T b LSMR MINRES on A T Ax = A T b

20 12/43 What problems do LSQR and LSMR solve? solve Ax = b

21 12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2

22 12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b

23 12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b ( ) ( min A b x λi 0) 2

24 12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b ( ) ( min A b x λi 0) 2 Properties A is rectangular (m n) and often sparse A can be an operator ( allows preconditioning) Av, A T u plus O(m + n) operations per iteration

25 13/43 Why invent another algorithm?

26 14/43 Reason one CG vs MINRES

27 15/43 Reason two Monotone convergence of residuals r k and A T r k

28 16/43 min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0

29 16/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0 LSQR r k LSQR log A T r k 1.01 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 0 Name:lp fit1p, Dim:1677x627, nnz:9868, id= r log A T r iteration count iteration count

30 16/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0 LSQR LSMR r k log A T r k 1 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 LSQR LSMR 0 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 LSQR LSMR r log A T r iteration count iteration count

31 17/43 LSMR Derivation

32 18/43 Golub-Kahan bidiagonalization Given A (m n) and b (m 1) Direct bidiagonalization U T( b A ) ( 1 V ) = ( ) ( b AV ) = U ( β 1 e 1 B )

33 18/43 Golub-Kahan bidiagonalization Given A (m n) and b (m 1) Direct bidiagonalization U T( b A ) ( 1 V ) = ( ) ( b AV ) = U ( β 1 e 1 B ) Iterative bidiagonalization Bidiag(A, b) Half a page in the 1965 Golub-Kahan SVD paper

34 19/43 Golub-Kahan bidiagonalization (2) b = U k+1 (β 1 e 1 ) AV k = U k+1 B k ( ) A T U k = V k B T Ik k 0 where α 1 β 2 α 2 B k = β k α k β k+1 U k = ( u 1 u k ) V k = ( v 1 v k )

35 20/43 Golub-Kahan bidiagonalization (3) V k spans the Krylov subspace: span{v 1,..., v k } = span{a T b, (A T A)A T b,..., (A T A) k 1 A T b}

36 20/43 Golub-Kahan bidiagonalization (3) V k spans the Krylov subspace: span{v 1,..., v k } = span{a T b, (A T A)A T b,..., (A T A) k 1 A T b} Define x k = V k y k Subproblem to solve min y k min y k r k = min β 1 e 1 B k y k (LSQR) y k ( ) A T r k = min y k β B T 1 e 1 k B k y β k+1 e T k (LSMR) k where r k = b Ax k, βk = α k β k

37 21/43 Computational and storage requirement Storage Work m n m n MINRES on A T Ax = A T b Av 1 x, v 1, v 2, w 1, w 2 8 LSQR Av, u x, v, w 3 5 LSMR Av, u x, v, h, h 3 6 where h k, h k are scalar multiples of w k, w k

38 22/43 Theoretical properties for min Ax = b LSQR LSMR x x k HS 1952 HS 1952 r r k HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 r k PS 1982 Fong 2012

39 22/43 Theoretical properties for min Ax = b LSQR LSMR x x k HS 1952 HS 1952 r r k HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 r k PS 1982 Fong 2012 LSQR LSMR A T r k FS 2011 A T r k / r k mostly A T r k / r k A T r k / r k

40 23/43 LSMR Experiments

41 24/43 Overdetermined systems Test Data Tim Davis, University of Florida Sparse Matrix Collection LPnetlib: Linear Programming Problems A = (Problem.A) b = Problem.c (127 problems)

42 24/43 Overdetermined systems Test Data Tim Davis, University of Florida Sparse Matrix Collection LPnetlib: Linear Programming Problems A = (Problem.A) b = Problem.c (127 problems) Solve min Ax b 2 with LSQR and LSMR

43 25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x

44 25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x Two estimates given by Stewart (1975 and 1977) E 1 = ext x 2 E 2 = rrt A r 2 E 1 = e x E 2 = AT r r e = ˆr r computable

45 25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x Two estimates given by Stewart (1975 and 1977) E 1 = ext x 2 E 2 = rrt A r 2 E 1 = e x E 2 = AT r r e = ˆr r computable Theorem E LSMR 2 E LSQR 2

46 26/43 log 10 E 2 for LSQR and LSMR typical 0 Name:lp pilot ja, Dim:2267x940, nnz:14977, id=657 LSQR LSMR 1 2 log(e2) iteration count

47 27/43 log 10 E 2 for LSQR and LSMR rare 0 1 Name:lp sc205, Dim:317x205, nnz:665, id=665 LSQR LSMR 2 3 log(e2) iteration count

48 28/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Exact µ(x) (Waldén, Karlson, & Sun 1995, Higham 2002) [ ( C A I rrt µ(x) = σ min (C) r x r 2 )]

49 29/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Cheaper estimate µ(x) (Grcar, Saunders, & Su 2007) ( ) ( ) A r K = r I v = 0 x min y Ky v µ(x) = Ky x

50 29/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Cheaper estimate µ(x) (Grcar, Saunders, & Su 2007) ( ) ( ) A r K = r I v = 0 x min y Ky v µ(x) = Ky x r = b - A*x; p = colamd(a); eta = norm(r)/norm(x); K = [A(:,p); eta*speye(n)]; v = [r; zeros(n,1)]; [c,r] = qr(k,v,0); mutilde = norm(c)/norm(x);

51 30/43 Backward errors for LSQR typical Name:lp cre a, Dim:7248x3516, nnz:18168, id= E2 E1 Optimal 2 log(backward Error for LSQR) iteration count

52 31/43 Backward errors for LSQR rare 0 1 Name:lp pilot, Dim:4860x1441, nnz:44375, id=654 E2 E1 Optimal log(backward Error for LSQR) iteration count

53 32/43 Backward errors for LSMR typical 0 1 Name:lp ken 11, Dim:21349x14694, nnz:49058, id=638 E2 E1 Optimal 2 log(backward Error for LSMR) iteration count

54 33/43 Backward errors for LSMR rare 0 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 E2 E1 Optimal 2 log(backward Error for LSMR) iteration count

55 34/43 For LSMR E 2 optimal BE almost always Typical: E 2 µ(x) Rare: E 1 µ(x) 0 1 Name:lp ken 11, Dim:21349x14694, nnz:49058, id=638 E2 E1 Optimal 0 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 E2 E1 Optimal 2 2 log(backward Error for LSMR) log(backward Error for LSMR) iteration count iteration count

56 35/43 Optimal backward errors µ(x) Seem monotonic for LSMR Usually not for LSQR Typical for LSQR and LSMR Rare LSQR, typical LSMR 0 Name:lp scfxm3, Dim:1800x990, nnz:8206, id=73 LSQR LSMR 0 Name:lp cre c, Dim:6411x3068, nnz:15977, id=611 LSQR LSMR log(optimal Backward Error) 3 4 log(optimal Backward Error) iteration count iteration count

57 36/43 Optimal backward errors µ(x LSMR ) µ(x LSQR ) almost always Typical Rare 0 1 Name:lp pilot, Dim:4860x1441, nnz:44375, id=654 LSQR LSMR 0 1 Name:lp standgub, Dim:1383x361, nnz:3338, id=693 LSQR LSMR 2 log(optimal Backward Error) log(optimal Backward Error) iteration count iteration count

58 37/43 Errors in x k x LSQR x x LSMR x seems true x k x for LSMR and LSQR 2 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 LSQR LSMR 1 0 Name:lp pds 02, Dim:7716x2953, nnz:16571, id=649 LSQR LSMR log x k x * 2 3 log x k x * iteration count iteration count

59 38/43 Square consistent systems Ax = b Backward error: r k x k LSQR slightly faster than LSMR in most cases 2 Name:Hamm.hcircuit, Dim:105676x105676, nnz:513072, id=152 LSQR LSMR 0 1 Name:IBM EDA.trans5, Dim:116835x116835, nnz:749800, id=176 LSQR LSMR log( r / x ) 4 6 log( r / x ) iteration count x iteration count

60 39/43 Underdetermined systems Infinitely many solutions Ax = b Unique solution min x st Ax = b Theorem LSQR and LSMR both return the minimum-norm solution 0 1 Name:U lp pds 10, Dim:16558x49932, nnz:107605, id=418 LSQR LSMR 2 Name:U lp israel, Dim:174x316, nnz:2443, id=336 LSQR LSMR log( r / x ) 4 5 log( r / x ) iteration count iteration count

61 40/43 Summary

62 41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k

63 41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k MINRES r k r k / x k r k /(α A x k + β b )

64 41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k MINRES r k r k / x k r k /(α A x k + β b ) For MINRES, backward errors are monotonic safe to stop early

65 42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n

66 42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n LSMR A T r k A T r k / r k almost always optimal BE almost always

67 42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n LSMR A T r k A T r k / r k almost always optimal BE almost always For LSMR, optimal backward errors seem monotonic safe to stop early

68 43/43 References: LSMR: An iterative algorithm for sparse least-squares problems David Fong and Michael Saunders, SISC 2011 CG versus MINRES: An empirical comparison David Fong and Michael Saunders, SQU Journal for Science 2012 Kindest thanks: Georg Bock and colleagues Phan Thanh An and colleagues

CG versus MINRES on Positive Definite Systems

School of Engineering CG versus MINRES on Positive Definite Systems Michael Saunders SOL and ICME, Stanford University Joint work with David Fong Recent Advances on Optimization In honor of Philippe Toint