Dense LU factorization and its error analysis

Similar documents
14.2 QR Factorization with Column Pivoting

Scientific Computing

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

Direct Methods for Solving Linear Systems. Simon Fraser University Surrey Campus MACM 316 Spring 2005 Instructor: Ha Le

c 2013 Society for Industrial and Applied Mathematics

2.1 Gaussian Elimination

CSE 160 Lecture 13. Numerical Linear Algebra

Gaussian Elimination for Linear Systems

Direct Methods for Solving Linear Systems. Matrix Factorization

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version

This can be accomplished by left matrix multiplication as follows: I

AMS526: Numerical Analysis I (Numerical Linear Algebra)

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Gaussian Elimination and Back Substitution

The Solution of Linear Systems AX = B

Review Questions REVIEW QUESTIONS 71

CS412: Lecture #17. Mridul Aanjaneya. March 19, 2015

CHAPTER 6. Direct Methods for Solving Linear Systems

Analysis of Block LDL T Factorizations for Symmetric Indefinite Matrices

Scientific Computing: An Introductory Survey

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

Matrix decompositions

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Lecture Note 2: The Gaussian Elimination and LU Decomposition

Ax = b. Systems of Linear Equations. Lecture Notes to Accompany. Given m n matrix A and m-vector b, find unknown n-vector x satisfying

CALU: A Communication Optimal LU Factorization Algorithm

Linear Algebraic Equations

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x

Computational Methods. Systems of Linear Equations

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Numerical methods for solving linear systems

Scientific Computing: Dense Linear Systems

MTH 464: Computational Linear Algebra

Solving Linear Systems Using Gaussian Elimination. How can we solve

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

LAPACK-Style Codes for Pivoted Cholesky and QR Updating. Hammarling, Sven and Higham, Nicholas J. and Lucas, Craig. MIMS EPrint: 2006.

Computational Linear Algebra

Next topics: Solving systems of linear equations

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Bidiagonal decompositions, minors and applications

Roundoff Analysis of Gaussian Elimination

9. Numerical linear algebra background

BLAS: Basic Linear Algebra Subroutines Analysis of the Matrix-Vector-Product Analysis of Matrix-Matrix Product

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 9

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Solving Linear Systems of Equations

LAPACK-Style Codes for Pivoted Cholesky and QR Updating

Matrix decompositions

Rank revealing factorizations, and low rank approximations

Communication avoiding parallel algorithms for dense matrix factorizations

Linear Systems of n equations for n unknowns

MATH 3511 Lecture 1. Solving Linear Systems 1

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Computation of the mtx-vec product based on storage scheme on vector CPUs

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014

Practical Linear Algebra: A Geometry Toolbox

Chapter 4 No. 4.0 Answer True or False to the following. Give reasons for your answers.

ANALYTICAL MATHEMATICS FOR APPLICATIONS 2018 LECTURE NOTES 3

Numerical Linear Algebra

Numerical Linear Algebra

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU

Solving Linear Systems of Equations

Numerical Methods - Numerical Linear Algebra

A Review of Matrix Analysis

Integer Least Squares: Sphere Decoding and the LLL Algorithm

Sherman-Morrison-Woodbury

SOLVING LINEAR SYSTEMS

Solution of Linear Equations

On the Skeel condition number, growth factor and pivoting strategies for Gaussian elimination

Communication-avoiding LU and QR factorizations for multicore architectures

Solving Linear Systems of Equations

Let x be an approximate solution for Ax = b, e.g., obtained by Gaussian elimination. Let x denote the exact solution. Call. r := b A x.

Lecture 2 Decompositions, perturbations

9. Numerical linear algebra background

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Chapter 2 - Linear Equations

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

FAST VERIFICATION OF SOLUTIONS OF MATRIX EQUATIONS

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

MATH2210 Notebook 2 Spring 2018

CS475: Linear Equations Gaussian Elimination LU Decomposition Wim Bohm Colorado State University

Linear Analysis Lecture 16

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

ECE133A Applied Numerical Computing Additional Lecture Notes

Numerical Linear Algebra

Draft. Lecture 12 Gaussian Elimination and LU Factorization. MATH 562 Numerical Analysis II. Songting Luo

Example: Current in an Electrical Circuit. Solving Linear Systems:Direct Methods. Linear Systems of Equations. Solving Linear Systems: Direct Methods

Introduction to communication avoiding algorithms for direct methods of factorization in Linear Algebra

Process Model Formulation and Solution, 3E4

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

Transcription:

Dense LU factorization and its error analysis Laura Grigori INRIA and LJLL, UPMC February 2016

Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham, 2002] Direct methods of factorization LU factorization Error analysis of LU factorization - main results Block LU factorization 2 of 33

Plan Basis of floating point arithmetic and stability analysis Notation, results, proofs taken from [N.J.Higham, 2002] Direct methods of factorization 3 of 33

Norms and other notations A F = n i=1 j=1 n a ij 2 A 2 = σ max (A) n A = max 1 i n a ij A 1 = max 1 j n j=1 n a ij i=1 Inequalities x y and A B hold componentwise. 4 of 33

Floating point arithmetic The machine precision or unit roundoff is u The maximum relative error for a given rounding procedure u is of order 10 8 in single precision, 2 53 10 16 in double precision Another definition: the smallest number that added to one gives a result different from one The evaluation involving basic arithmetic operations +,,, / in floating point satisfies fl(x op y) = (x op y)(1 + δ), δ u 5 of 33

Relative error Given a real number x and its approximation ˆx, the absolute error and the relative errors are E abs (ˆx) = x ˆx, E rel (ˆx) = x ˆx x (1) The relative error is scale independent Some examples, outline the difference with correct significant digits x = 1.00000, ˆx = 1.00499, E rel (ˆx) = 4.99 10 3 x = 9.00000, ˆx = 8.99899, E rel (ˆx) = 1.12 10 4 When x is a vector, the componentwise relative error is max i x i ˆx i x i 6 of 33

Backward and Forward errors Consider y = f (x) a scalar function of a real scalar variable and ŷ its approximation. Ideally we would like the forward error E rel (ŷ) u Instead we focus on the backward error, For what set of data we have solved the problem? that is we look for min x such that ŷ = f (x + x) 7 of 33

Condition number Assume f is twice continuously differentiable, then ŷ y = f (x + x) f (x) = f (x) x + ( ŷ y xf ) (x) x = y f (x) x + O(( x)2 ) f (x + τ x) ( x) 2, τ (0, 1) 2! The condition number is c(x) = xf (x) f (x) Rule of thumb When consistently defined, we have forward error condition number backward error 8 of 33

Preliminaries Lemma (Lemma 3.1 in [N.J.Higham, 2002]) If δ i u and ρ i = ±1 for i = 1 : n, and nu < 1, then n (1 + δ i ) ρ i = 1 + Θ n, Θ n nu 1 nu = γ n i=1 Other notations γ n = cnu 1 cnu 9 of 33

Inner product in floating point arithmetic Consider computing s n = x T y, with an evaluation from left to right. We denote different errors as 1 + δ i 1 ± δ ŝ 1 = fl(x 1 y 1 ) = x 1 y 1 (1 ± δ) ŝ 2 = fl(ŝ 1 + x 2 y 2 ) = (ŝ 1 + x 2 y 2 (1 ± δ))(1 ± δ) = x 1 y 1 (1 ± δ) 2 + x 2 y 2 (1 ± δ) 2. ŝ n = x 1 y 1 (1 ± δ) n + x 2 y 2 (1 ± δ) n + x 3 y 3 (1 ± δ) n 1 +... + x n y n (1 ± δ) 2 After applying the previous lemma, we obtain ŝ n = x 1 y 1 (1 + Θ n ) + x 2 y 2 (1 + Θ n) +... + x n y n (1 + Θ 2 ) 10 of 33

Inner product in FP arithmetic - error bounds We obtain the following backward and forward errors ŝ n = x 1 y 1 (1 + Θ n ) + x 2 y 2 (1 + Θ n) +... + x n y n (1 + Θ 2 ) fl(x T y) = (x + x) T y = x T (y + y), x γ n x, y γ n y, x T y fl(x T y) n γ n x i y i = γ n x T y i=1 High relative accuracy is obtained when computing x T x No guarantee of high accuracy when x T y << x T y 11 of 33

Plan Basis of floating point arithmetic and stability analysis Direct methods of factorization LU factorization Block LU factorization 12 of 33

Algebra of the LU factorization LU factorization Compute the factorization PA = LU Example Given the matrix Let M 1 = 3 1 3 A = 6 7 3 9 12 3 1 2 1, M 1 A = 3 1 3 0 5 3 3 1 0 9 6 13 of 33

Algebra of the LU factorization In general I k 1 1 A (k+1) = M k A (k) := m k+1,k 1 A (k), where...... m n,k 1 M k = I m k e T k, M 1 k = I + m k e T k where e k is the k-th unit vector, ei T m k = 0, i k The factorization can be written as M n 1... M 1 A = A (n) = U 14 of 33

Algebra of the LU factorization We obtain A = M 1 1... M 1 n 1 U = (I + m 1 e1 T )... (I + m n 1 en 1)U T ( ) n 1 = I + m i ei T U = i=1 1 m 21 1..... U = LU m n1 m n2... 1 15 of 33

The need for pivoting For stability, avoid division by small diagonal elements For example 0 3 3 A = 3 1 3 (2) 6 2 3 has an LU factorization if we permute the rows of matrix A 6 2 3 1 6 2 3 PA = 0 3 3 = 1 3 3 (3) 3 1 3 0.5 1 1.5 Partial pivoting allows to bound the multipliers m ik 1 and hence L 1 16 of 33

Existence of the LU factorization Theorem Given a full rank matrix A of size m n, m n, the matrix A can be decomposed as A = PLU where P is a permutation matrix of size m m, L is a unit lower triangular matrix of size m n and U is a nonsingular upper triangular matrix of size n n. Proof: simpler proof for the square case. Since A is full rank, there is a permutation P 1 such that P 1 a 11 is nonzero. Write the factorization as ( ) ( ) ( ) a11 A P 1 A = 12 1 0 a11 A = 12 A 21 A 22 A 21 /a 11 I 0 A 22 a 1 11 A, 21A 12 where S = A 22 a 1 11 A 21A 12. Since det(a) 0, then det(s) 0. Continue the proof by induction on S. 17 of 33

Solving Ax = b by using Gaussian elimination Composed of 4 steps 1. Factor A = PLU, (2/3)n 3 ) flops 2. Compute P T b to solve LUx = P T b 3. Forward substitution: solve Ly = P T b, n 2 flops 4. Backward substitution: solve Ux = y, n 2 flops 18 of 33

Algorithm to compute the LU factorization Algorithm for computing the in place LU factorization of a matrix of size n n. #flops = 2n 3 /3 1: for k = 1:n-1 do 2: Let a ik be the element of maximum magnitude in A(k : n, k) 3: Permute row i and row k 4: A(k + 1 : n, k) = A(k + 1 : n, k)/a kk 5: for i = k + 1 : n do 6: for j = k + 1 : n do 7: a ij = a ij a ik a kj 8: end for 9: end for 10: end for 19 of 33

Algorithm to compute the LU factorization Left looking approach, pivoting ignored, A of size m n #flops = n 2 m n 3 /3 1: for k = 1:n do 2: for j = k:n do 3: u kj = a kj k 1 i=1 l kiu ij 4: end for 5: for i = k+1:m do 6: l ik = (a ik k 1 j=1 l iju jk )/u kk 7: end for 8: end for 20 of 33

Error analysis of the LU factorization Given the first k 1 columns of L and k 1 rows of U were computed, we have a kj = l k1 u 1j +... + l k,k 1 u k 1,j + u kj, j = k : n a ik = l i1 u 1k +... + l ik u kk, i = k + 1 : m The computed elements of ˆL and Û satisfy: a k 1 k kj ˆl ki û ij û kj γ k ˆl ki û ij, j k, i=1 i=1 k k a ik ˆl ij û jk γ k ˆl ij û jk, i > k. j=1 j=1 21 of 33

Error analysis of the LU factorization (continued) Theorem (Theorem 9.3 in [N.J.Higham, 2002]) Let A R m n, m n and let ˆL R m n and Û Rn n be its computed LU factors obtained by Gaussian elimination (suppose there was no failure during GE). Then, ˆLÛ = A + A, A γ n ˆL Û. Theorem (Theorem 9.4 in [N.J.Higham, 2002]) Let A R m n, m n and let ˆx be the computed solution to Ax = b obtained by using the computed LU factors of A obtained by Gaussian elimination. Then (A + A)ˆx = b, A γ 3n ˆL Û. 22 of 33

Error analysis of Ax = b Theorem (Theorem 9.4 in [N.J.Higham, 2002] continued) Proof. We have the following: Thus (A + A)ˆx = b, A γ 3n ˆL Û. ˆLÛ = A + A, A γn ˆL Û, (ˆL + L)ŷ = b, L γ n ˆL, (Û + U)ˆx = ŷ, U γn Û. b = (ˆL + L)(Û + U)ˆx = (A + A 1 + ˆL U + LÛ + L U)ˆx = (A + A)ˆx, where 2 A (3γ n + γn) ˆL Û γ3n ˆL Û. 23 of 33

Wilkinson s backward error stability result Growth factor g W defined as Note that g W = max i,j,k a k ij max i,j a ij u ij = aij i g W max a ij i,j Theorem (Wilkinson s backward error stability result, see also [N.J.Higham, 2002] for more details) Let A R n n and let ˆx be the computed solution of Ax = b obtained by using GEPP. Then (A + A)ˆx = b, A n 2 γ 3n g W (n) A. 24 of 33

The growth factor The LU factorization is backward stable if the growth factor is small (grows linearly with n). For partial pivoting, the growth factor g(n) 2 n 1, and this bound is attainable. In practice it is on the order of n 2/3 n 1/2 Exponential growth factor for Wilkinson matrix A = diag(±1) 1 0 0 0 1 1 1 0... 0 1 1 1 1............. 0 1 1 1 1 1 1 1 1 1 1 1 25 of 33

Experimental results for special matrices Several errror bounds for GEPP, the normwise backward error η and the componentwise backward error w (r = b Ax). η = w = max i r 1 A 1 x 1 + b 1, r i ( A x + b ) i. matrix cond(a,2) g W L 1 cond(u,1) PA LU F A F η w b hadamard 1.0E+0 4.1E+3 4.1E+3 5.3E+5 0.0E+0 3.3E-16 4.6E-15 randsvd 6.7E+7 4.7E+0 9.9E+2 1.4E+10 5.6E-15 3.4E-16 2.0E-15 chebvand 3.8E+19 2.0E+2 2.2E+3 4.8E+22 5.1E-14 3.3E-17 2.6E-16 frank 1.7E+20 1.0E+0 2.0E+0 1.9E+30 2.2E-18 4.9E-27 1.2E-23 hilb 8.0E+21 1.0E+0 3.1E+3 2.2E+22 2.2E-16 5.5E-19 2.0E-17 26 of 33

Block formulation of the LU factorization Partitioning of matrix A of size n n [ ] A11 A A = 12 A 21 A 22 where A 11 is of size b b, A 21 is of size (m b) b, A 12 is of size b (n b) and A 22 is of size (m b) (n b). Block LU algebra The first iteration computes the factorization: [ ] P1 T L11 A = L 21 I n b [ Ib A 1 ] [ ] U11 U 12 I n b The algorithm continues recursively on the trailing matrix A 1. 27 of 33

Block LU factorization - the algorithm 1. Compute the LU factorization with partial pivoting of the first block column ( ) ( ) A11 L11 P 1 = U A 11 21 L 21 2. Pivot by applying the permutation matrix P T 1 3. Solve the triangular system 4. Update the trailing matrix, Ā = P T 1 A. L 11 U 12 = Ā12 A 1 = Ā 22 L 21 U 12 5. Compute recursively the block LU factorization of A 1. on the entire matrix, 28 of 33

LU Factorization as in ScaLAPACK LU factorization on a P = Pr x Pc grid of processors For ib = 1 to n-1 step b A(ib) = A(ib : n, ib : n) 1. Compute panel factorization find pivot in each column, swap rows 2. Apply all row permutations broadcast pivot information along the rows swap rows at left and right 3. Compute block row of U broadcast right diagonal block of L of current panel 4. Update trailing matrix broadcast right block column of L broadcast down block row of U 29 of 33

Cost of LU Factorization in ScaLAPACK LU factorization on a P = Pr x Pc grid of processors For ib = 1 to n-1 step b A(ib) = A(ib : n, ib : n) 1. Compute panel factorization #messages = O(n log 2 P r ) 2. Apply all row permutations #messages = O(n/b(log 2 P r + log 2 P c)) 3. Compute block row of U #messages = O(n/b log 2 P c) 4. Update trailing matrix #messages = O(n/b(log 2 P r + log 2 P c) 30 of 33

Cost of parallel block LU Consider that we have a P P grid, block size b ( ) 2/3n 3 γ + n2 b + β n2 log P + P P P α ( 1.5n log P + 3.5n b log P ). 31 of 33

Acknowledgement Stability analysis results presented from [N.J.Higham, 2002] Some of the examples taken from [Golub and Van Loan, 1996] 32 of 33

References (1) Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations (3rd Ed.). Johns Hopkins University Press, Baltimore, MD, USA. N.J.Higham (2002). Accuracy and Stability of Numerical Algorithms. SIAM, second edition. Schreiber, R. and Loan, C. V. (1989). A storage efficient WY representation for products of Householder transformations. SIAM J. Sci. Stat. Comput., 10(1):53 57. 33 of 33