CG and MINRES: An empirical comparison

Similar documents
CG versus MINRES on Positive Definite Systems

LSMR: An iterative algorithm for least-squares problems

Experiments with iterative computation of search directions within interior methods for constrained optimization

AMS subject classifications. 15A06, 65F10, 65F20, 65F22, 65F25, 65F35, 65F50, 93E24

CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES: AN EMPIRICAL COMPARISON

Generalized MINRES or Generalized LSQR?

Block Bidiagonal Decomposition and Least Squares Problems

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

The Lanczos and conjugate gradient algorithms

MS&E 318 (CME 338) Large-Scale Numerical Optimization

F as a function of the approximate solution x

Math 310 Final Project Recycled subspaces in the LSMR Krylov method

Numerical Methods in Matrix Computations

Total least squares. Gérard MEURANT. October, 2008

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

EECS 275 Matrix Computation

Computational Linear Algebra

Numerical behavior of inexact linear solvers

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Iterative Methods for Linear Systems of Equations

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Introduction to Iterative Solvers of Linear Systems

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method

Ax=b. Zack 10/4/2013

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Linear Inverse Problems

Computational Linear Algebra

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Efficient Estimation of the A-norm of the Error in the Preconditioned Conjugate Gradient Method

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Notes on Some Methods for Solving Linear Systems

Golub-Kahan iterative bidiagonalization and determining the noise level in the data

1 Conjugate gradients

9.1 Preconditioned Krylov Subspace Methods

Conjugate Gradient Method

Chapter 7 Iterative Techniques in Matrix Algebra

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Matrices and Moments: Least Squares Problems

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

COMPUTING PROJECTIONS WITH LSQR*

APPLIED NUMERICAL LINEAR ALGEBRA

LARGE SPARSE EIGENVALUE PROBLEMS

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Solving the normal equations system arising from interior po. linear programming by iterative methods

2017 Society for Industrial and Applied Mathematics

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Solutions of Linear system, vector and matrix equation

Lanczos Bidiagonalization with Subspace Augmentation for Discrete Inverse Problems

Combination Preconditioning of saddle-point systems for positive definiteness

On Lagrange multipliers of trust-region subproblems

Moments, Model Reduction and Nonlinearity in Solving Linear Algebraic Problems

Residual statics analysis by LSQR

UNIFYING LEAST SQUARES, TOTAL LEAST SQUARES AND DATA LEAST SQUARES

Research Article Some Generalizations and Modifications of Iterative Methods for Solving Large Sparse Symmetric Indefinite Linear Systems

Introduction. Chapter One

Backward perturbation analysis for scaled total least-squares problems

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Incomplete Cholesky preconditioners that exploit the low-rank property

Course Notes: Week 1

Definition 1. A set V is a vector space over the scalar field F {R, C} iff. there are two operations defined on V, called vector addition

6.4 Krylov Subspaces and Conjugate Gradients

Algorithmen zur digitalen Bildverarbeitung I

On the choice of abstract projection vectors for second level preconditioners

Lecture 10b: iterative and direct linear solvers

EECS 275 Matrix Computation

Lecture 9 Least Square Problems

Linear Solvers. Andrew Hazel

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

M.A. Botchev. September 5, 2014

Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method

Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems

Computing approximate PageRank vectors by Basis Pursuit Denoising

Chebyshev semi-iteration in Preconditioning

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Testing matrix function algorithms using identities. Deadman, Edvin and Higham, Nicholas J. MIMS EPrint:

Orthogonalization and least squares methods

Some minimization problems

Stopping criteria for iterations in finite element methods

On the accuracy of saddle point solvers

Numerical Methods I Non-Square and Sparse Linear Systems

FINDING PARALLELISM IN GENERAL-PURPOSE LINEAR PROGRAMMING

Study on Preconditioned Conjugate Gradient Methods for Solving Large Sparse Matrix in CSR Format

AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007.

Gram-Schmidt Orthogonalization: 100 Years and More

On the loss of orthogonality in the Gram-Schmidt orthogonalization process

Transcription:

School of Engineering CG and MINRES: An empirical comparison Prequel to LSQR and LSMR: Two least-squares solvers David Fong and Michael Saunders Institute for Computational and Mathematical Engineering Stanford University 5th International Conference on High Performance Scientific Computing March 5 9, 2012 Hanoi, Vietnam

2/43 Abstract For iterative solution of symmetric systems Ax = b, the conjugate gradient method (CG) is commonly used when A is positive definite, while the minimal residual method (MINRES) is typically reserved for indefinite systems. We investigate the sequence of solutions generated by each method and suggest that even if A is positive definite, MINRES may be preferable to CG if iterations are to be terminated early. The classic symmetric positive-definite system comes from the full-rank least-squares (LS) problem min Ax b. Specialization of CG and MINRES to the associated normal equation A T Ax = A T b leads to LSQR and LSMR respectively. We include numerical comparisons of these two LS solvers because they motivated this retrospective study of CG versus MINRES.

3/43 1 CG and MINRES The Lanczos Process Properties Backward Errors 2 LSQR and LSMR 3 LSMR Derivation Golub-Kahan bidiagonalization Properties 4 LSMR Experiments Backward Errors 5 Summary

Part I: CG and MINRES Iterative algorithms for Ax = b, A = A T based on the Lanczos process Krylov-subspace methods: x k = V k y k 4/43

5/43 Lanczos process (summary) β 1 v 1 = b AV k = V k+1 H k V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α 2......... β k β k α ( ) k Tk H k = β k+1 e T k

5/43 Lanczos process (summary) β 1 v 1 = b V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α 2......... β k β k α ( ) k Tk H k = β k+1 e T k AV k = V k+1 H k r k = b Ax k = β 1 v 1 AV k y k = V k+1 (β 1 e 1 H k y k ), Aim: β 1 e 1 H k y k

5/43 Lanczos process (summary) β 1 v 1 = b V k = ( v 1 v 2 v k ) α 1 β 2 β T k = 2 α 2......... β k β k α ( ) k Tk H k = β k+1 e T k AV k = V k+1 H k r k = b Ax k = β 1 v 1 AV k y k = V k+1 (β 1 e 1 H k y k ), Aim: β 1 e 1 H k y k Two subproblems CG T k y k = β 1 e 1 x k = V k y k MINRES min H k y k β 1 e 1 x k = V k y k

6/43 Common practice Ax = b, A = A T

6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES

6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES Experiment: CG vs MINRES on A 0

6/43 Common practice Ax = b, A positive definite A indefinite A = A T Use CG Use MINRES Experiment: CG vs MINRES on A 0 Hestenes and Stiefel (1952) proposed both CG and CR for A 0 and proved many properties CR MINRES when A 0 They both minimize r k = b Ax k in the Krylov subspace

7/43 Theoretical properties for Ax = b, A 0 CG CR (MINRES) x x k HS 1952 HS 1952 x x k A HS 1952 HS 1952 x k Steihaug 1983 Fong 2012

7/43 Theoretical properties for Ax = b, A 0 CG CR (MINRES) x x k HS 1952 HS 1952 x x k A HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 CR (MINRES) r k HS 1952 r k / x k Fong 2012

8/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward error for square systems Ax = b An approximate solution x k is acceptable iff E, f st (A + E)x k = b + f E A α f b β

8/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward error for square systems Ax = b An approximate solution x k is acceptable iff E, f st (A + E)x k = b + f E A α f b β Smallest perturbations E, f : (Titley-Peloquin 2010) E = α A ψ x k r kx T k f = β b ψ r k E A = α r k ψ f b = β r k ψ

8/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward error for square systems Ax = b An approximate solution x k is acceptable iff E, f st (A + E)x k = b + f E A α f b β Smallest perturbations E, f : (Titley-Peloquin 2010) E = α A ψ x k r kx T k f = β b ψ r k E A = α r k ψ f b = β r k ψ Stopping rule: r k ψ α A x k + β b

9/43 Backward error for square systems, β = 0 E (k) = r kx T k x k 2 (A + E (k) )x k = b E (k) = r k x k Data: Tim Davis s sparse matrix collection Real, symmetric posdef examples that include b Plot log 10 E (k) for CG and MINRES

10/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Backward Error of CG vs MINRES on A 0 2 0 Name:Simon_raefsky4, Dim:19779x19779, nnz:1316789, id=7 CG MINRES 0 1 Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13 CG MINRES 2 2 4 3 6 4 log( r / x ) 8 log( r / x ) 5 10 6 12 7 14 8 16 9 18 0 1 2 3 4 5 6 7 8 9 10 iteration count x 10 4 10 0 50 100 150 200 250 300 350 400 450 iteration count Name:Schenk_AFE_af_shell8, Dim:504855x504855, nnz:17579155, id=11 1 CG MINRES 2 Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22 0 CG MINRES 1 3 2 log( r / x ) 4 5 6 7 log( r / x ) 3 4 5 6 7 8 8 9 9 10 0 200 400 600 800 1000 1200 1400 iteration count 10 0 0.5 1 1.5 2 2.5 3 3.5 iteration count x 10 4

11/43 Part II: LSQR and LSMR LSQR CG on A T Ax = A T b LSMR MINRES on A T Ax = A T b

12/43 What problems do LSQR and LSMR solve? solve Ax = b

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b ( ) ( min A b x λi 0) 2

12/43 What problems do LSQR and LSMR solve? solve Ax = b min Ax b 2 min st x Ax = b ( ) ( min A b x λi 0) 2 Properties A is rectangular (m n) and often sparse A can be an operator ( allows preconditioning) Av, A T u plus O(m + n) operations per iteration

13/43 Why invent another algorithm?

14/43 Reason one CG vs MINRES

15/43 Reason two Monotone convergence of residuals r k and A T r k

16/43 min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0

16/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0 LSQR r k LSQR log A T r k 1.01 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 0 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 1 1 0.99 0.98 2 r 0.97 0.96 log A T r 3 0.95 4 0.94 0.93 5 0.92 0 50 100 150 200 250 300 350 400 450 500 iteration count 6 0 50 100 150 200 250 300 350 400 450 500 iteration count

16/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G min Ax b Measure of convergence r k = b Ax k r k ˆr, A T r k 0 LSQR LSMR r k log A T r k 1 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 LSQR LSMR 0 Name:lp fit1p, Dim:1677x627, nnz:9868, id=625 LSQR LSMR 0.99 1 0.98 2 0.97 r 0.96 0.95 log A T r 3 4 0.94 5 0.93 6 0.92 0 50 100 150 200 250 300 350 400 450 500 iteration count 7 0 50 100 150 200 250 300 350 400 450 500 iteration count

17/43 LSMR Derivation

18/43 Golub-Kahan bidiagonalization Given A (m n) and b (m 1) Direct bidiagonalization U T( b A ) ( 1 V ) = ( ) ( b AV ) = U ( β 1 e 1 B )

18/43 Golub-Kahan bidiagonalization Given A (m n) and b (m 1) Direct bidiagonalization U T( b A ) ( 1 V ) = ( ) ( b AV ) = U ( β 1 e 1 B ) Iterative bidiagonalization Bidiag(A, b) Half a page in the 1965 Golub-Kahan SVD paper

19/43 Golub-Kahan bidiagonalization (2) b = U k+1 (β 1 e 1 ) AV k = U k+1 B k ( ) A T U k = V k B T Ik k 0 where α 1 β 2 α 2 B k =...... β k α k β k+1 U k = ( u 1 u k ) V k = ( v 1 v k )

20/43 Golub-Kahan bidiagonalization (3) V k spans the Krylov subspace: span{v 1,..., v k } = span{a T b, (A T A)A T b,..., (A T A) k 1 A T b}

20/43 Golub-Kahan bidiagonalization (3) V k spans the Krylov subspace: span{v 1,..., v k } = span{a T b, (A T A)A T b,..., (A T A) k 1 A T b} Define x k = V k y k Subproblem to solve min y k min y k r k = min β 1 e 1 B k y k (LSQR) y k ( ) A T r k = min y k β B T 1 e 1 k B k y β k+1 e T k (LSMR) k where r k = b Ax k, βk = α k β k

21/43 Computational and storage requirement Storage Work m n m n MINRES on A T Ax = A T b Av 1 x, v 1, v 2, w 1, w 2 8 LSQR Av, u x, v, w 3 5 LSMR Av, u x, v, h, h 3 6 where h k, h k are scalar multiples of w k, w k

22/43 Theoretical properties for min Ax = b LSQR LSMR x x k HS 1952 HS 1952 r r k HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 r k PS 1982 Fong 2012

22/43 Theoretical properties for min Ax = b LSQR LSMR x x k HS 1952 HS 1952 r r k HS 1952 HS 1952 x k Steihaug 1983 Fong 2012 r k PS 1982 Fong 2012 LSQR LSMR A T r k FS 2011 A T r k / r k mostly A T r k / r k A T r k / r k

23/43 LSMR Experiments

24/43 Overdetermined systems Test Data Tim Davis, University of Florida Sparse Matrix Collection LPnetlib: Linear Programming Problems A = (Problem.A) b = Problem.c (127 problems)

24/43 Overdetermined systems Test Data Tim Davis, University of Florida Sparse Matrix Collection LPnetlib: Linear Programming Problems A = (Problem.A) b = Problem.c (127 problems) Solve min Ax b 2 with LSQR and LSMR

25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x

25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x Two estimates given by Stewart (1975 and 1977) E 1 = ext x 2 E 2 = rrt A r 2 E 1 = e x E 2 = AT r r e = ˆr r computable

25/43 Backward error estimates A T Aˆx = A T b ˆr = b Aˆx exact (A + E i ) T (A + E i )x = (A + E i ) T b r = b Ax any x Two estimates given by Stewart (1975 and 1977) E 1 = ext x 2 E 2 = rrt A r 2 E 1 = e x E 2 = AT r r e = ˆr r computable Theorem E LSMR 2 E LSQR 2

26/43 log 10 E 2 for LSQR and LSMR typical 0 Name:lp pilot ja, Dim:2267x940, nnz:14977, id=657 LSQR LSMR 1 2 log(e2) 3 4 5 6 0 100 200 300 400 500 600 700 800 900 1000 iteration count

27/43 log 10 E 2 for LSQR and LSMR rare 0 1 Name:lp sc205, Dim:317x205, nnz:665, id=665 LSQR LSMR 2 3 log(e2) 4 5 6 7 8 0 20 40 60 80 100 120 iteration count

28/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Exact µ(x) (Waldén, Karlson, & Sun 1995, Higham 2002) [ ( C A I rrt µ(x) = σ min (C) r x r 2 )]

29/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Cheaper estimate µ(x) (Grcar, Saunders, & Su 2007) ( ) ( ) A r K = r I v = 0 x min y Ky v µ(x) = Ky x

29/43 Backward error - optimal µ(x) min E E st (A + E) T (A + E)x = (A + E) T b Cheaper estimate µ(x) (Grcar, Saunders, & Su 2007) ( ) ( ) A r K = r I v = 0 x min y Ky v µ(x) = Ky x r = b - A*x; p = colamd(a); eta = norm(r)/norm(x); K = [A(:,p); eta*speye(n)]; v = [r; zeros(n,1)]; [c,r] = qr(k,v,0); mutilde = norm(c)/norm(x);

30/43 Backward errors for LSQR typical Name:lp cre a, Dim:7248x3516, nnz:18168, id=609 0 1 E2 E1 Optimal 2 log(backward Error for LSQR) 3 4 5 6 7 8 0 200 400 600 800 1000 1200 1400 1600 iteration count

31/43 Backward errors for LSQR rare 0 1 Name:lp pilot, Dim:4860x1441, nnz:44375, id=654 E2 E1 Optimal log(backward Error for LSQR) 2 3 4 5 6 7 0 100 200 300 400 500 600 iteration count

32/43 Backward errors for LSMR typical 0 1 Name:lp ken 11, Dim:21349x14694, nnz:49058, id=638 E2 E1 Optimal 2 log(backward Error for LSMR) 3 4 5 6 7 8 0 50 100 150 200 250 iteration count

33/43 Backward errors for LSMR rare 0 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 E2 E1 Optimal 2 log(backward Error for LSMR) 3 4 5 6 7 8 9 0 10 20 30 40 50 60 70 80 90 iteration count

34/43 For LSMR E 2 optimal BE almost always Typical: E 2 µ(x) Rare: E 1 µ(x) 0 1 Name:lp ken 11, Dim:21349x14694, nnz:49058, id=638 E2 E1 Optimal 0 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 E2 E1 Optimal 2 2 log(backward Error for LSMR) 3 4 5 log(backward Error for LSMR) 3 4 5 6 6 7 7 8 8 0 50 100 150 200 250 iteration count 9 0 10 20 30 40 50 60 70 80 90 iteration count

35/43 Optimal backward errors µ(x) Seem monotonic for LSMR Usually not for LSQR Typical for LSQR and LSMR Rare LSQR, typical LSMR 0 Name:lp scfxm3, Dim:1800x990, nnz:8206, id=73 LSQR LSMR 0 Name:lp cre c, Dim:6411x3068, nnz:15977, id=611 LSQR LSMR 1 1 2 2 log(optimal Backward Error) 3 4 log(optimal Backward Error) 3 4 5 5 6 6 7 7 0 100 200 300 400 500 600 700 800 900 1000 iteration count 8 0 200 400 600 800 1000 1200 1400 1600 iteration count

36/43 Optimal backward errors µ(x LSMR ) µ(x LSQR ) almost always Typical Rare 0 1 Name:lp pilot, Dim:4860x1441, nnz:44375, id=654 LSQR LSMR 0 1 Name:lp standgub, Dim:1383x361, nnz:3338, id=693 LSQR LSMR 2 log(optimal Backward Error) 2 3 4 5 log(optimal Backward Error) 3 4 5 6 7 6 8 7 0 100 200 300 400 500 600 iteration count 9 0 50 100 150 iteration count

37/43 Errors in x k x LSQR x x LSMR x seems true x k x for LSMR and LSQR 2 1 Name:lp ship12l, Dim:5533x1151, nnz:16276, id=688 LSQR LSMR 1 0 Name:lp pds 02, Dim:7716x2953, nnz:16571, id=649 LSQR LSMR 0 1 1 2 log x k x * 2 3 log x k x * 3 4 4 5 5 6 6 7 0 10 20 30 40 50 60 70 80 90 iteration count 7 0 10 20 30 40 50 60 70 iteration count

38/43 Square consistent systems Ax = b Backward error: r k x k LSQR slightly faster than LSMR in most cases 2 Name:Hamm.hcircuit, Dim:105676x105676, nnz:513072, id=152 LSQR LSMR 0 1 Name:IBM EDA.trans5, Dim:116835x116835, nnz:749800, id=176 LSQR LSMR 0 2 2 3 log( r / x ) 4 6 log( r / x ) 4 5 6 7 8 8 9 10 0 0.5 1 1.5 2 2.5 iteration count x 10 4 10 0 2000 4000 6000 8000 10000 12000 iteration count

39/43 Underdetermined systems Infinitely many solutions Ax = b Unique solution min x st Ax = b Theorem LSQR and LSMR both return the minimum-norm solution 0 1 Name:U lp pds 10, Dim:16558x49932, nnz:107605, id=418 LSQR LSMR 2 Name:U lp israel, Dim:174x316, nnz:2443, id=336 LSQR LSMR 0 2 3 2 log( r / x ) 4 5 log( r / x ) 4 6 6 7 8 8 9 0 20 40 60 80 100 120 140 iteration count 10 0 200 400 600 800 1000 1200 1400 1600 1800 2000 iteration count

40/43 Summary

41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k

41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k MINRES r k r k / x k r k /(α A x k + β b )

41/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for Ax = b, A 0 CG and MINRES x x k x x k A x k MINRES r k r k / x k r k /(α A x k + β b ) For MINRES, backward errors are monotonic safe to stop early

42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n

42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n LSMR A T r k A T r k / r k almost always optimal BE almost always

42/43 S T A N F O R D U N I V E R S I T Y C O M P U T A T I O N A L A N D M A T H E M A T I C A L E N G I N E E R I N G Theoretical properties for min Ax b LSQR and LSMR x x k r r k r k x k x k min-length x if rank(a) < n LSMR A T r k A T r k / r k almost always optimal BE almost always For LSMR, optimal backward errors seem monotonic safe to stop early

43/43 References: LSMR: An iterative algorithm for sparse least-squares problems David Fong and Michael Saunders, SISC 2011 CG versus MINRES: An empirical comparison David Fong and Michael Saunders, SQU Journal for Science 2012 Kindest thanks: Georg Bock and colleagues Phan Thanh An and colleagues