Error Bounds for Iterative Refinement in Three Precisions

Similar documents
Inexactness and flexibility in linear Krylov solvers

Lanczos tridigonalization and Golub - Kahan bidiagonalization: Ideas, connections and impact

A stable variant of Simpler GMRES and GCR

Componentwise Error Analysis for Stationary Iterative Methods

Director of Research School of Mathematics

Numerical behavior of inexact linear solvers

Principles and Analysis of Krylov Subspace Methods

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Dense LU factorization and its error analysis

Review Questions REVIEW QUESTIONS 71

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

HOW TO MAKE SIMPLER GMRES AND GCR MORE STABLE

AMS 147 Computational Methods and Applications Lecture 17 Copyright by Hongyun Wang, UCSC

Solving large sparse Ax = b.

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

WHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS

On the loss of orthogonality in the Gram-Schmidt orthogonalization process

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

For δa E, this motivates the definition of the Bauer-Skeel condition number ([2], [3], [14], [15])

c 2008 Society for Industrial and Applied Mathematics

1 Error analysis for linear systems

Topics. Review of lecture 2/11 Error, Residual and Condition Number. Review of lecture 2/16 Backward Error Analysis The General Case 1 / 22

Ax = b. Systems of Linear Equations. Lecture Notes to Accompany. Given m n matrix A and m-vector b, find unknown n-vector x satisfying

On the influence of eigenvalues on Bi-CG residual norms

Preconditioned inverse iteration and shift-invert Arnoldi method

Scientific Computing: An Introductory Survey

arxiv: v1 [cs.na] 29 Jan 2012

AMSC/CMSC 466 Problem set 3

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Linear System of Equations

Eigenvalue Problems Computation and Applications

ANONSINGULAR tridiagonal linear system of the form

c 2013 Society for Industrial and Applied Mathematics

Accurate Solution of Triangular Linear System

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

Lecture 7. Floating point arithmetic and stability

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

On the accuracy of saddle point solvers

A Method for Constructing Diagonally Dominant Preconditioners based on Jacobi Rotations

Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers

PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Iterative Solution of Sparse Linear Least Squares using LU Factorization

AMS526: Numerical Analysis I (Numerical Linear Algebra)

On solving linear systems arising from Shishkin mesh discretizations

ERROR AND SENSITIVTY ANALYSIS FOR SYSTEMS OF LINEAR EQUATIONS. Perturbation analysis for linear systems (Ax = b)

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

9.1 Preconditioned Krylov Subspace Methods

6 Linear Systems of Equations

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

Computational Methods. Eigenvalues and Singular Values

Iterative Methods for Linear Systems

7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.

Scientific Computing

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Testing matrix function algorithms using identities. Deadman, Edvin and Higham, Nicholas J. MIMS EPrint:

Iterative refinement for linear systems and LAPACK

Analysis of Block LDL T Factorizations for Symmetric Indefinite Matrices

Linear Algebra Massoud Malek

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

LAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006.

Preconditioning for Accurate Solutions of Linear Systems and Eigenvalue Problems

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Error Analysis for Solving a Set of Linear Equations

Error Bounds from Extra-Precise Iterative Refinement

Block Bidiagonal Decomposition and Least Squares Problems

Linear System of Equations

Testing Linear Algebra Software

Domain Decomposition-based contour integration eigenvalue solvers

Incomplete LU Preconditioning and Error Compensation Strategies for Sparse Matrices

Error Bounds from Extra Precise Iterative Refinement

Cheat Sheet for MATH461

Outline. Math Numerical Analysis. Errors. Lecture Notes Linear Algebra: Part B. Joseph M. Mahaffy,

ON AUGMENTED LAGRANGIAN METHODS FOR SADDLE-POINT LINEAR SYSTEMS WITH SINGULAR OR SEMIDEFINITE (1,1) BLOCKS * 1. Introduction

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Elements of Floating-point Arithmetic

Math 411 Preliminaries

Intel Math Kernel Library (Intel MKL) LAPACK

c 2015 Society for Industrial and Applied Mathematics

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Iterative Methods for Solving A x = b

LU Preconditioning for Overdetermined Sparse Least Squares Problems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

SOLUTION of linear systems of equations of the form:

Utilizing Quadruple-Precision Floating Point Arithmetic Operation for the Krylov Subspace Methods. SIAM Conference on Applied 1

LAPACK-Style Codes for Pivoted Cholesky and QR Updating

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Numerical Analysis: Solutions of System of. Linear Equation. Natasha S. Sharma, PhD

ROUNDOFF ERROR ANALYSIS OF THE CHOLESKYQR2 ALGORITHM

Scientific Computing: Dense Linear Systems

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico

Accurate polynomial evaluation in floating point arithmetic

Transcription:

Error Bounds for Iterative Refinement in Three Precisions Erin C. Carson, New York University Nicholas J. Higham, University of Manchester SIAM Annual Meeting Portland, Oregon July 13, 018

Hardware Support for Multiprecision Computation Use of low precision in machine learning has driven emergence of lowprecision capabilities in hardware: Half precision (FP16) defined as storage format in 008 IEEE standard ARM NEON: SIMD architecture, instructions for 8x16-bit, 4x3-bit, x64-bit AMD Radeon Instinct MI5 GPU, 017: single: 1.3 TFLOPS, half: 4.6 TFLOPS NVIDIA Tesla P100, 016: native ISA support for 16-bit FP arithmetic NVIDIA Tesla V100, 017: tensor cores for half precision; 4x4 matrix multiply in one clock cycle double: 7 TFLOPS, half+tensor: 11 TFLOPS (16x!) Google's Tensor processing unit (TPU): quantizes 3-bit FP computations into 8-bit integer arithmetic Aurora Exascale supercomputer: (01) Expected extensive support for reduced-precision arithmetic (3/16/8-bit) 1

Iterative Refinement for Ax = b Iterative refinement: well-established method for improving an approximate solution to Ax = b A is n n and nonsingular; u is unit roundoff Solve Ax 0 = b by LU factorization for i = 0: maxit r i = b Ax i Solve Ad i = r i x i+1 = x i + d i via d i = U 1 (L 1 r i )

Iterative Refinement for Ax = b Iterative refinement: well-established method for improving an approximate solution to Ax = b A is n n and nonsingular; u is unit roundoff Solve Ax 0 = b by LU factorization (in precision u) for i = 0: maxit r i = b Ax i Solve Ad i = r i x i+1 = x i + d i via d i = U 1 (L 1 r i ) (in precision u ) (in precision u) (in precision u) "Traditional" (high-precision residual computation) [Wilkinson, 1948] (fixed point), [Moler, 1967] (floating point)

Iterative Refinement for Ax = b Iterative refinement: well-established method for improving an approximate solution to Ax = b A is n n and nonsingular; u is unit roundoff Solve Ax 0 = b by LU factorization for i = 0: maxit r i = b Ax i Solve Ad i = r i x i+1 = x i + d i via d i = U 1 (L 1 r i ) (in precision u) (in precision u) (in precision u) (in precision u) "Fixed-Precision" [Jankowski and Woźniakowski, 1977], [Skeel, 1980], [Higham, 1991]

Iterative Refinement for Ax = b Iterative refinement: well-established method for improving an approximate solution to Ax = b A is n n and nonsingular; u is unit roundoff Solve Ax 0 = b by LU factorization (in precision u 1/ ) for i = 0: maxit r i = b Ax i Solve Ad i = r i x i+1 = x i + d i via d i = U 1 (L 1 r i ) (in precision u) (in precision u) (in precision u) "Low-precision factorization" [Langou et al., 006], [Arioli and Duff, 009], [Hogg and Scott, 010], [Abdelfattah et al., 016]

Iterative Refinement in 3 Precisions Existing analyses only support at most two precisions Can we combine the performance benefits of low-precision factorization IR with the accuracy of traditional IR? 3

Iterative Refinement in 3 Precisions Existing analyses only support at most two precisions Can we combine the performance benefits of low-precision factorization IR with the accuracy of traditional IR? 3-precision iterative refinement u f = factorization precision, u = working precision, u r = residual precision u f u u r 3

Iterative Refinement in 3 Precisions Existing analyses only support at most two precisions Can we combine the performance benefits of low-precision factorization IR with the accuracy of traditional IR? 3-precision iterative refinement u f = factorization precision, u = working precision, u r = residual precision u f u u r New analysis generalizes existing types of IR: Traditional u f = u, u r = u [C. and Higham, SIAM SISC 40(), 018] Fixed precision Lower precision factorization u f = u = u r u f = u = u r (and improves upon existing analyses in some cases) 3

Iterative Refinement in 3 Precisions Existing analyses only support at most two precisions Can we combine the performance benefits of low-precision factorization IR with the accuracy of traditional IR? 3-precision iterative refinement u f = factorization precision, u = working precision, u r = residual precision u f u u r New analysis generalizes existing types of IR: Traditional u f = u, u r = u [C. and Higham, SIAM SISC 40(), 018] Fixed precision Lower precision factorization u f = u = u r u f = u = u r (and improves upon existing analyses in some cases) Enables new types of IR: (half, single, double), (half, single, quad), (half, double, quad), etc. 3

Key Analysis Innovations I Obtain tighter upper bounds: Typical bounds used in analysis: A(x x i ) A x x i 4

Key Analysis Innovations I Obtain tighter upper bounds: Typical bounds used in analysis: A(x x i ) A x x i Define μ i : A(x x i ) = μ i A x x i 4

Key Analysis Innovations I Obtain tighter upper bounds: Typical bounds used in analysis: A(x x i ) A x x i Define μ i : A(x x i ) = μ i A x x i For a stable refinement scheme, in early stages we expect r i u x x i A x i x μ i 1 4

Key Analysis Innovations I Obtain tighter upper bounds: Typical bounds used in analysis: A(x x i ) A x x i Define μ i : A(x x i ) = μ i A x x i For a stable refinement scheme, in early stages we expect r i u x x i A x i x μ i 1 But close to convergence, r i A x x i μ i 1 4

Key Analysis Innovations I r i = μ i () A x x i x x i = VΣ 1 U T r i = n j=1 u j T r i v j σ j (A = UΣV T ) 5

Key Analysis Innovations I r i = μ i () A x x i x x i = VΣ 1 U T r i = n j=1 u j T r i v j σ j (A = UΣV T ) x x i n j=n+1 k u j T r i σ j 1 σ n+1 k n j=n+1 k u j T r i = P kr i σ n+1 k where P k = U k U k T, U k = [u n+1 k,, u n ] 5

Key Analysis Innovations I r i = μ i () A x x i x x i = VΣ 1 U T r i = n j=1 u j T r i v j σ j (A = UΣV T ) x x i n j=n+1 k u j T r i σ j 1 σ n+1 k n j=n+1 k u j T r i = P kr i σ n+1 k where P k = U k U k T, U k = [u n+1 k,, u n ] () r i σ n+1 k μ i P k r i σ 1 5

Key Analysis Innovations I r i = μ i () A x x i x x i = VΣ 1 U T r i = n j=1 u j T r i v j σ j (A = UΣV T ) x x i n j=n+1 k u j T r i σ j 1 σ n+1 k n j=n+1 k u j T r i = P kr i σ n+1 k where P k = U k U k T, U k = [u n+1 k,, u n ] () r i σ n+1 k μ i P k r i σ 1 μ i 1 if r i contains significant component in span(u k ) for any k s.t. σ n+1 k σ n 5

Key Analysis Innovations I r i = μ i () A x x i x x i = VΣ 1 U T r i = n j=1 u j T r i v j σ j (A = UΣV T ) x x i n j=n+1 k u j T r i σ j 1 σ n+1 k n j=n+1 k u j T r i = P kr i σ n+1 k where P k = U k U k T, U k = [u n+1 k,, u n ] () r i σ n+1 k μ i P k r i σ 1 μ i 1 if r i contains significant component in span(u k ) for any k s.t. σ n+1 k σ n Expect μ i 1 when r i is "typical", i.e., contains sizeable components in the direction of each left singular vector In that case, x x i is not "typical", i.e., it contains large components in right singular vectors corresponding to small singular values of A 5

Key Analysis Innovations I r i = μ i () A x x i x x i = VΣ 1 U T r i = n j=1 u j T r i v j σ j (A = UΣV T ) x x i n j=n+1 k u j T r i σ j 1 σ n+1 k n j=n+1 k u j T r i = P kr i σ n+1 k where P k = U k U k T, U k = [u n+1 k,, u n ] () r i σ n+1 k μ i P k r i σ 1 μ i 1 if r i contains significant component in span(u k ) for any k s.t. σ n+1 k σ n Expect μ i 1 when r i is "typical", i.e., contains sizeable components in the direction of each left singular vector In that case, x x i is not "typical", i.e., it contains large components in right singular vectors corresponding to small singular values of A Wilkinson (1977), comment in unpublished manuscript: μ i () increases with i 5

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f example: LU solve: u s = u f 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f Assume computed solution d i to Ad i = r i satisfies: example: LU solve: u s = u f 1. d i = I + u s E i d i, u s E i < 1 normwise relative forward error is bounded by multiple of u s and is less than 1 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f Assume computed solution d i to Ad i = r i satisfies: 1. d i = I + u s E i d i, u s E i < 1 normwise relative forward error is bounded by multiple of u s and is less than 1 example: LU solve: u s = u f u s E i 3nu f A 1 L U 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f Assume computed solution d i to Ad i = r i satisfies: 1. d i = I + u s E i d i, u s E i < 1 normwise relative forward error is bounded by multiple of u s and is less than 1 example: LU solve: u s = u f u s E i 3nu f A 1 L U. r i A d i u s(c 1 A d i + c r i ) normwise relative backward error is at most max c 1, c u s 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f Assume computed solution d i to Ad i = r i satisfies: 1. d i = I + u s E i d i, u s E i < 1 normwise relative forward error is bounded by multiple of u s and is less than 1. r i A d i u s(c 1 A d i + c r i ) normwise relative backward error is at most max c 1, c u s example: LU solve: u s = u f u s E i 3nu f A 1 L U max c 1, c u s 3nu f L U A 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f Assume computed solution d i to Ad i = r i satisfies: 1. d i = I + u s E i d i, u s E i < 1 normwise relative forward error is bounded by multiple of u s and is less than 1. r i A d i u s(c 1 A d i + c r i ) normwise relative backward error is at most max c 1, c u s example: LU solve: u s = u f u s E i 3nu f A 1 L U max c 1, c u s 3nu f L U A 3. r i A d i u s G i d i componentwise relative backward error is bounded by a multiple of u s 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f Assume computed solution d i to Ad i = r i satisfies: 1. d i = I + u s E i d i, u s E i < 1 normwise relative forward error is bounded by multiple of u s and is less than 1. r i A d i u s(c 1 A d i + c r i ) normwise relative backward error is at most max c 1, c u s example: LU solve: u s = u f u s E i 3nu f A 1 L U max c 1, c u s 3nu f L U A 3. r i A d i u s G i d i componentwise relative backward error is bounded by a multiple of u s u s G i 3nu f L U 6

Key Analysis Innovations II Allow for general solver: Let u s be the effective precision of the solve, with u u s u f Assume computed solution d i to Ad i = r i satisfies: example: LU solve: u s = u f 1. d i = I + u s E i d i, u s E i < 1 normwise relative forward error is bounded by multiple of u s and is less than 1. r i A d i u s(c 1 A d i + c r i ) normwise relative backward error is at most max c 1, c u s u s E i 3nu f A 1 L U max c 1, c u s 3nu f L U A 3. r i A d i u s G i d i componentwise relative backward error is bounded by a multiple of u s u s G i 3nu f L U E i, c 1, c, and G i depend on A, r i, n, and u s 6

Forward Error for IR3 Three precisions: u f : factorization precision u: working precision u r : residual computation precision κ A = A 1 A cond(a) = A 1 A cond(a, x) = A 1 A x / x 7

Forward Error for IR3 Three precisions: u f : factorization precision u: working precision u r : residual computation precision κ A = A 1 A cond(a) = A 1 A cond(a, x) = A 1 A x / x Theorem [C. and Higham, SISC 40(), 018] For IR in precisions u f u u r and effective solve precision u s, if φ i u s min cond A, κ A μ i + u s E i is sufficiently less than 1, then the forward error is reduced on the ith iteration by a factor φ i until an iterate x i is produced for which x x i x 4Nu r cond A, x + u, where N is the maximum number of nonzeros per row in A. 7

Forward Error for IR3 Three precisions: u f : factorization precision u: working precision u r : residual computation precision κ A = A 1 A cond(a) = A 1 A cond(a, x) = A 1 A x / x Theorem [C. and Higham, SISC 40(), 018] For IR in precisions u f u u r and effective solve precision u s, if φ i u s min cond A, κ A μ i + u s E i is sufficiently less than 1, then the forward error is reduced on the ith iteration by a factor φ i until an iterate x i is produced for which x x i x 4Nu r cond A, x + u, where N is the maximum number of nonzeros per row in A. Analogous traditional bounds: φ i 3nu f κ A 7

Normwise Backward Error for IR3 Theorem [C. and Higham, SISC 40(), 018] For IR in precisions u f u u r and effective solve precision u s, if φ i c 1 κ A + c u s is sufficiently less than 1, then the residual is reduced on the ith iteration by a factor φ i until an iterate x i is produced for which b A x i Nu b + A x i, where N is the maximum number of nonzeros per row in A. 8

IR3: Summary Standard (LU-based) IR in three precisions (u s = u f ) Half 10 4, Single 10 8, Double 10 16, Quad 10 34 Backward error u f u u r max κ (A) norm comp Forward error H S S 10 4 10 8 10 8 cond A, x 10 8 H S D 10 4 10 8 10 8 10 8 H D D 10 4 10 16 10 16 cond A, x 10 16 H D Q 10 4 10 16 10 16 10 16 S S S 10 8 10 8 10 8 cond A, x 10 8 S S D 10 8 10 8 10 8 10 8 S D D 10 8 10 16 10 16 cond A, x 10 16 S D Q 10 8 10 16 10 16 10 16 9

IR3: Summary Standard (LU-based) IR in three precisions (u s = u f ) Half 10 4, Single 10 8, Double 10 16, Quad 10 34 LP fact. LP fact. LP fact. Backward error u f u u r max κ (A) norm comp Forward error H S S 10 4 10 8 10 8 cond A, x 10 8 H S D 10 4 10 8 10 8 10 8 H D D 10 4 10 16 10 16 cond A, x 10 16 H D Q 10 4 10 16 10 16 10 16 S S S 10 8 10 8 10 8 cond A, x 10 8 S S D 10 8 10 8 10 8 10 8 S D D 10 8 10 16 10 16 cond A, x 10 16 S D Q 10 8 10 16 10 16 10 16 9

IR3: Summary Standard (LU-based) IR in three precisions (u s = u f ) Half 10 4, Single 10 8, Double 10 16, Quad 10 34 LP fact. LP fact. Fixed LP fact. Backward error u f u u r max κ (A) norm comp Forward error H S S 10 4 10 8 10 8 cond A, x 10 8 H S D 10 4 10 8 10 8 10 8 H D D 10 4 10 16 10 16 cond A, x 10 16 H D Q 10 4 10 16 10 16 10 16 S S S 10 8 10 8 10 8 cond A, x 10 8 S S D 10 8 10 8 10 8 10 8 S D D 10 8 10 16 10 16 cond A, x 10 16 S D Q 10 8 10 16 10 16 10 16 9

IR3: Summary Standard (LU-based) IR in three precisions (u s = u f ) Half 10 4, Single 10 8, Double 10 16, Quad 10 34 LP fact. LP fact. Fixed Trad. LP fact. Backward error u f u u r max κ (A) norm comp Forward error H S S 10 4 10 8 10 8 cond A, x 10 8 H S D 10 4 10 8 10 8 10 8 H D D 10 4 10 16 10 16 cond A, x 10 16 H D Q 10 4 10 16 10 16 10 16 S S S 10 8 10 8 10 8 cond A, x 10 8 S S D 10 8 10 8 10 8 10 8 S D D 10 8 10 16 10 16 cond A, x 10 16 S D Q 10 8 10 16 10 16 10 16 9

IR3: Summary Standard (LU-based) IR in three precisions (u s = u f ) Half 10 4, Single 10 8, Double 10 16, Quad 10 34 LP fact. New LP fact. New Fixed Trad. LP fact. New Backward error u f u u r max κ (A) norm comp Forward error H S S 10 4 10 8 10 8 cond A, x 10 8 H S D 10 4 10 8 10 8 10 8 H D D 10 4 10 16 10 16 cond A, x 10 16 H D Q 10 4 10 16 10 16 10 16 S S S 10 8 10 8 10 8 cond A, x 10 8 S S D 10 8 10 8 10 8 10 8 S D D 10 8 10 16 10 16 cond A, x 10 16 S D Q 10 8 10 16 10 16 10 16 9

IR3: Summary Standard (LU-based) IR in three precisions (u s = u f ) Half 10 4, Single 10 8, Double 10 16, Quad 10 34 LP fact. New LP fact. New Fixed Trad. LP fact. New Backward error u f u u r max κ (A) norm comp Forward error H S S 10 4 10 8 10 8 cond A, x 10 8 H S D 10 4 10 8 10 8 10 8 H D D 10 4 10 16 10 16 cond A, x 10 16 H D Q 10 4 10 16 10 16 10 16 S S S 10 8 10 8 10 8 cond A, x 10 8 S S D 10 8 10 8 10 8 10 8 S D D 10 8 10 16 10 16 cond A, x 10 16 S D Q 10 8 10 16 10 16 10 16 Benefit of IR3 vs. "LP fact.": no cond(a, x) term in forward error 9

IR3: Summary Standard (LU-based) IR in three precisions (u s = u f ) Half 10 4, Single 10 8, Double 10 16, Quad 10 34 LP fact. New LP fact. New Fixed Trad. LP fact. New Backward error u f u u r max κ (A) norm comp Forward error H S S 10 4 10 8 10 8 cond A, x 10 8 H S D 10 4 10 8 10 8 10 8 H D D 10 4 10 16 10 16 cond A, x 10 16 H D Q 10 4 10 16 10 16 10 16 S S S 10 8 10 8 10 8 cond A, x 10 8 S S D 10 8 10 8 10 8 10 8 S D D 10 8 10 16 10 16 cond A, x 10 16 S D Q 10 8 10 16 10 16 10 16 Benefit of IR3 vs. traditional IR: As long as κ A 10 4, can use lower precision factorization w/no loss of accuracy! 9

A = gallery('randsvd', 100, 1e9, ) b = randn(100,1) κ A e10, cond A, x 5e9 Standard (LU-based) IR with u f : single, u: double, u r : double 10 0 10

A = gallery('randsvd', 100, 1e9, ) b = randn(100,1) κ A e10, cond A, x 5e9 Standard (LU-based) IR with u f : single, u: double, u r : quad 10 0 10

A = gallery('randsvd', 100, 1e9, ) b = randn(100,1) κ A e10, cond A, x 5e9 Standard (LU-based) IR with u f : single, u: double, u r : quad 10 0 10

A = gallery('randsvd', 100, 1e9, ) b = randn(100,1) κ A e10, cond A, x 5e9 Standard (LU-based) IR with u f : double, u: double, u r : quad 10 0 10

GMRES-Based Iterative Refinement Observation [Rump, 1990]: if L and U are computed LU factors of A in precision u f, then even if κ A u f 1. κ U 1 L 1 A 1 + κ A u f, 11

GMRES-Based Iterative Refinement Observation [Rump, 1990]: if L and U are computed LU factors of A in precision u f, then even if κ A u f 1. κ U 1 L 1 A 1 + κ A u f, GMRES-IR [C. and Higham, SISC 39(6), 017] A r i To compute the updates d i, apply GMRES to U 1 L 1 Ad i = U 1 L 1 r i 11

GMRES-Based Iterative Refinement Observation [Rump, 1990]: if L and U are computed LU factors of A in precision u f, then even if κ A u f 1. κ U 1 L 1 A 1 + κ A u f, GMRES-IR [C. and Higham, SISC 39(6), 017] A r i To compute the updates d i, apply GMRES to U 1 L 1 Ad i = U 1 L 1 r i Solve Ax 0 = b by LU factorization for i = 0: maxit r i = b Ax i Solve Ad i = r i via GMRES on Ad i = x i+1 = x i + d i r i 11

GMRES-Based Iterative Refinement Observation [Rump, 1990]: if L and U are computed LU factors of A in precision u f, then even if κ A u f 1. κ U 1 L 1 A 1 + κ A u f, GMRES-IR [C. and Higham, SISC 39(6), 017] A r i To compute the updates d i, apply GMRES to U 1 L 1 Ad i = U 1 L 1 r i Solve Ax 0 = b by LU factorization for i = 0: maxit u s = u r i = b Ax i Solve Ad i = r i via GMRES on Ad i = x i+1 = x i + d i r i 11

A = gallery('randsvd', 100, 1e9, ) b = randn(100,1) κ A e10, cond A, x 5e9, κ A e4 GMRES-IR with u f : single, u: double, u r : quad 10 0 1

GMRES-IR: Summary Benefits of GMRES-IR: Backward error u f u u r max κ (A) norm comp Forward error LU-IR H S D 10 4 10 8 10 8 10 8 GMRES-IR H S D 10 8 10 8 10 8 10 8 LU-IR S D Q 10 8 10 16 10 16 10 16 GMRES-IR S D Q 10 16 10 16 10 16 10 16 LU-IR H D Q 10 4 10 16 10 16 10 16 GMRES-IR H D Q 10 1 10 16 10 16 10 16 13

GMRES-IR: Summary Benefits of GMRES-IR: Backward error u f u u r max κ (A) norm comp Forward error LU-IR H S D 10 4 10 8 10 8 10 8 GMRES-IR H S D 10 8 10 8 10 8 10 8 LU-IR S D Q 10 8 10 16 10 16 10 16 GMRES-IR S D Q 10 16 10 16 10 16 10 16 LU-IR H D Q 10 4 10 16 10 16 10 16 GMRES-IR H D Q 10 1 10 16 10 16 10 16 With GMRES-IR, lower precision factorization will work for higher κ (A) 13

GMRES-IR: Summary Benefits of GMRES-IR: Backward error u f u u r max κ (A) norm comp Forward error LU-IR H S D 10 4 10 8 10 8 10 8 GMRES-IR H S D 10 8 10 8 10 8 10 8 LU-IR S D Q 10 8 10 16 10 16 10 16 GMRES-IR S D Q 10 16 10 16 10 16 10 16 LU-IR H D Q 10 4 10 16 10 16 10 16 GMRES-IR H D Q 10 1 10 16 10 16 10 16 With GMRES-IR, lower precision factorization will work for higher κ (A) κ A u 1 1 u f 13

GMRES-IR: Summary Benefits of GMRES-IR: Backward error u f u u r max κ (A) norm comp Forward error LU-IR H S D 10 4 10 8 10 8 10 8 GMRES-IR H S D 10 8 10 8 10 8 10 8 LU-IR S D Q 10 8 10 16 10 16 10 16 GMRES-IR S D Q 10 16 10 16 10 16 10 16 LU-IR H D Q 10 4 10 16 10 16 10 16 GMRES-IR H D Q 10 1 10 16 10 16 10 16 If κ A 10 1, can use lower precision factorization w/no loss of accuracy! 13

GMRES-IR: Summary Benefits of GMRES-IR: Backward error u f u u r max κ (A) norm comp Forward error LU-IR H S D 10 4 10 8 10 8 10 8 GMRES-IR H S D 10 8 10 8 10 8 10 8 LU-IR S D Q 10 8 10 16 10 16 10 16 GMRES-IR S D Q 10 16 10 16 10 16 10 16 LU-IR H D Q 10 4 10 16 10 16 10 16 GMRES-IR H D Q 10 1 10 16 10 16 10 16 Try IR3! MATLAB codes available at: https://github.com/eccarson/ir3 Performance results? Stay tuned for the next talk by Azzam Haidar; 3-precision approach on NVIDIA V100 13

Comments and Caveats Convergence tolerance τ for GMRES? Smaller τ more GMRES iterations, potentially fewer refinement steps Larger τ fewer GMRES iterations, potentially more refinement steps 14

Comments and Caveats Convergence tolerance τ for GMRES? Smaller τ more GMRES iterations, potentially fewer refinement steps Larger τ fewer GMRES iterations, potentially more refinement steps Convergence rate of GMRES? 14

Comments and Caveats Convergence tolerance τ for GMRES? Smaller τ more GMRES iterations, potentially fewer refinement steps Larger τ fewer GMRES iterations, potentially more refinement steps Convergence rate of GMRES? If A is ill conditioned and LU factorization is performed in very low precision, it can be a poor preconditioner e.g., if A still has cluster of eigenvalues near origin, GMRES can stagnate until n th iteration, regardless of κ (A) [Liesen and Tichý, 004] 14

Comments and Caveats Convergence tolerance τ for GMRES? Smaller τ more GMRES iterations, potentially fewer refinement steps Larger τ fewer GMRES iterations, potentially more refinement steps Convergence rate of GMRES? If A is ill conditioned and LU factorization is performed in very low precision, it can be a poor preconditioner e.g., if A still has cluster of eigenvalues near origin, GMRES can stagnate until n th iteration, regardless of κ (A) [Liesen and Tichý, 004] Potential remedies: deflation, Krylov subspace recycling, using additional preconditioner 14

Comments and Caveats Convergence tolerance τ for GMRES? Smaller τ more GMRES iterations, potentially fewer refinement steps Larger τ fewer GMRES iterations, potentially more refinement steps Convergence rate of GMRES? If A is ill conditioned and LU factorization is performed in very low precision, it can be a poor preconditioner e.g., if A still has cluster of eigenvalues near origin, GMRES can stagnate until n th iteration, regardless of κ (A) [Liesen and Tichý, 004] Potential remedies: deflation, Krylov subspace recycling, using additional preconditioner Depending on conditioning of A, applying A to a vector must be done accurately (precision u ) in each GMRES iteration 14

Comments and Caveats Convergence tolerance τ for GMRES? Smaller τ more GMRES iterations, potentially fewer refinement steps Larger τ fewer GMRES iterations, potentially more refinement steps Convergence rate of GMRES? If A is ill conditioned and LU factorization is performed in very low precision, it can be a poor preconditioner e.g., if A still has cluster of eigenvalues near origin, GMRES can stagnate until n th iteration, regardless of κ (A) [Liesen and Tichý, 004] Potential remedies: deflation, Krylov subspace recycling, using additional preconditioner Depending on conditioning of A, applying A to a vector must be done accurately (precision u ) in each GMRES iteration Why GMRES? Theoretical purposes: existing analysis and proof of backward stability [Paige, Rozložník, Strakoš, 006] In practice, use any solver you want! 14

Thank You! erinc@cims.nyu.edu math.nyu.edu/~erinc