CS 542G: Conditioning, BLAS, LU Factorization

Similar documents
Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Matrix decompositions

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

Review Questions REVIEW QUESTIONS 71

Matrix decompositions

Scientific Computing

Scientific Computing: Dense Linear Systems

6 Linear Systems of Equations

Linear Algebra Linear Algebra : Matrix decompositions Monday, February 11th Math 365 Week #4

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

Example: 2x y + 3z = 1 5y 6z = 0 x + 4z = 7. Definition: Elementary Row Operations. Example: Type I swap rows 1 and 3

Numerical Linear Algebra

CS227-Scientific Computing. Lecture 4: A Crash Course in Linear Algebra

This ensures that we walk downhill. For fixed λ not even this may be the case.

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014

1 GSW Sets of Systems

A Review of Matrix Analysis

Numerical Methods Lecture 2 Simultaneous Equations

Matrix Assembly in FEA

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

B553 Lecture 5: Matrix Algebra Review

Chapter 4. Solving Systems of Equations. Chapter 4

5 Solving Systems of Linear Equations

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Numerical Linear Algebra

Lecture 4: Linear Algebra 1

Notes for CS542G (Iterative Solvers for Linear Systems)

Next topics: Solving systems of linear equations

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

Numerical Linear Algebra

Solving PDEs with CUDA Jonathan Cohen

2 Systems of Linear Equations

Sparse BLAS-3 Reduction

LU Factorization. Marco Chiarandini. DM559 Linear and Integer Programming. Department of Mathematics & Computer Science University of Southern Denmark

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Review for Exam Find all a for which the following linear system has no solutions, one solution, and infinitely many solutions.

Linear Least-Squares Data Fitting

Differential Equations

Linear Systems of n equations for n unknowns

Sparsity-Preserving Difference of Positive Semidefinite Matrix Representation of Indefinite Matrices

Solving Linear Systems of Equations

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

4 Elementary matrices, continued

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

4 Elementary matrices, continued

Scientific Computing: An Introductory Survey

Numerical Linear Algebra

Numerical Methods Lecture 2 Simultaneous Equations

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico

Linear Algebra, Summer 2011, pt. 2

Lecture 9: Elementary Matrices

3 QR factorization revisited

Conceptual Questions for Review

Eigenvalues by row operations

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

8 Square matrices continued: Determinants

Chapter 7. Tridiagonal linear systems. Solving tridiagonal systems of equations. and subdiagonal. E.g. a 21 a 22 a A =

Lecture 2: Linear regression

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Gaussian Elimination without/with Pivoting and Cholesky Decomposition

Math 411 Preliminaries

Getting Started with Communications Engineering

Linear Algebra Massoud Malek

Sparse Linear Systems. Iterative Methods for Sparse Linear Systems. Motivation for Studying Sparse Linear Systems. Partial Differential Equations

MTH 464: Computational Linear Algebra

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Lecture Notes in Mathematics. Arkansas Tech University Department of Mathematics. The Basics of Linear Algebra

Scientific Computing: Solving Linear Systems

a s 1.3 Matrix Multiplication. Know how to multiply two matrices and be able to write down the formula

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

Linear Algebra. PHY 604: Computational Methods in Physics and Astrophysics II

Notes on Row Reduction

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Elementary Linear Algebra

Numerical Linear Algebra

Linear Algebra Handout

Topic 15 Notes Jeremy Orloff

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 0

ECS289: Scalable Machine Learning

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Introduction to Matrices

Linear Algebra. Carleton DeTar February 27, 2017

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

Lecture 10: Powers of Matrices, Difference Equations

I-v k e k. (I-e k h kt ) = Stability of Gauss-Huard Elimination for Solving Linear Systems. 1 x 1 x x x x

Week6. Gaussian Elimination. 6.1 Opening Remarks Solving Linear Systems. View at edx

Spanning, linear dependence, dimension

Chapter 12 Block LU Factorization

Applied Linear Algebra in Geoscience Using MATLAB

Linear Solvers. Andrew Hazel

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Vector Spaces. 9.1 Opening Remarks. Week Solvable or not solvable, that s the question. View at edx. Consider the picture

Lecture 12 (Tue, Mar 5) Gaussian elimination and LU factorization (II)

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Transcription:

CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles like minimizing curvature. The fact that we were seeking an interpolating function which minimizes a non-negative quantity (integral of Laplacian squared) gave us some confidence we would at least get a solution even if it might not be unique etc. This is not a proof of course, but a good step on the way to one. However other arbitrary choices of RBF kernel, such as φ(r) = r 2 which doesn t look much different, can fail dramatically. In the r 2 case, it s because we unwittingly are choosing a basis function which is a quadratic polynomial. (With odd powers like r or r 3, we don t get polynomials since the r is actually the norm of x x i or in 1D the absolute value x x i, not a linear term with quadratics or other even powers, that norm or absolute value doesn t make a difference: x x i 2 = (x x i ) 2.) Thus we are accidentally restricting the interpolant f(x) to be a quadratic polynomial, i.e. come from a small-dimensional space (the space of quadratic polynomials in 1D is 3-dimensional) independent of n. This means the linear system we have to solve can t have rank greater than this small number (in 1D, rank three), no matter how big n is: the linear system is singular and has no solution. 2 The Condition Number Last time we arrived at a quantifiable measure of how well-posed a linear system Ax = b is: the condition number κ(a) of the matrix. No matter the algorithm chosen, relative errors in computing A and/or b, not to mention floating point rounding errors during solution, may be amplified by this much. Practically speaking, with double-precision floating point numbers, this puts an upper bound of about 10 12 for κ(a), give or take an order of magnitude or two, beyond which the problem is hopeless to solve as stated. 1

Furthermore, if software reports a condition number of 10 16 or more (i.e. near the reciprocal of machine epsilon for floating point) it s highly likely the matrix is truly singular, and the notion of solving the linear system must be considered very carefully more on this later. The above example of using φ(r) = r 2 for RBF s, if you try it out numerically, illustrates this point: in double-precision floating point a package like MATLAB will estimate the condition number as > 10 16 but not infinite, simply due to round-off error slightly perturbing it back to non-singularity. Incidentally, you may have noticed already a possible cause for concern with the condition number: from its definition κ(a) = A A 1 it s not clear how easy this is to compute. Finding the inverse of a matrix is at least as hard as solving a linear system with the matrix. If we want to use the 2-norm, potentially difficult eigenvalue problems must be solved it s a fairly simple exercise to show that it is: κ 2 (A) = λmax (A T A) λmin (A T A) i.e. the ratio of the square roots of the maximum and minimum eigenvalues of the matrix A T A (we ll discuss this is much greater detail later). In fact, for the most part in practical computation the condition number is never calculated. Instead, it may be calculated exactly for small model problems, or numerically estimated through various means for large matrices, or theoretically bounded through knowledge of how the matrix was produced. 3 Software for Basic Linear Algebra Operations Before finally getting to algorithms for solving a linear system, now is a great time for an aside about software for doing linear algebra. We ll visit several libraries throughout this course as they become relevant. The first one to know about is the BLAS, short for Basic Linear Algebra Subroutines, which is a standardized API for simple matrix and vector operations with many implementations available. See www.netlib.org/blas for the official definition, example source code, papers and more. The BLAS original arose out of readability concerns: FORTRAN, for a long time the standard language of scientific computing, was in earlier incarnations restricted to six characters for identifiers, leading to cryptic function names such as XERBLA or DGEMV. Different scientific computing codes had their own names for common operations such as taking the dot-product of two vectors, equally incomprehensible, and while the update to FORTRAN to allow readable identifiers would come much later (in Fortran 90) at least the community could standardize on one set of cryptic function names for these operations. Later, the big advantage of BLAS evolved into high performance more than readability: as architectures grew more complex, and more sophisticated implementation was required to achieve high 2

performance, it became useful to let vendors do the hard work of optimizing BLAS routines that anyone could access through the API. Today s vendors (Intel, AMD, Apple, nvidia, Sun, etc.) do supply such optimized implementations; there are also some free versions such as ATLAS, or the Goto BLAS (currently the fastest on several machines). To varying degrees these exploit all the things necessary for good performance: vectorization, cache prefetching, careful blocking around cache line and page sizes, multithreading, etc. They also are well tested and should handle difficult cases properly. The BLAS is divided into three levels, which came in that chronological order originally. Level 1 deals with vector operations that should run in O(n) time (for vectors of length n): taking the norm of a vector, computing dot-products, adding two vectors, etc. Level 2 deals with matrix-vector operations that should run in O(n 2 ) time (for n n matrices): e.g. multiplying a matrix times a vector, solving a lower-triangular or upper-triangular linear system, etc. Level 3 deals with matrix-matrix operations that run in O(n 3 ) time: e.g. multiplying two matrices together Only the simplest operations are supported for example, solving a general system of equations is not part of the BLAS but they do cover many types of matrices, reflected in the naming convention of the routines. For example, the name of the level 2 routine DGEMV can be deconstructed as follows: The type prefix D indicates double-precision floating point real numbers (S indicates single-precision reals, C single-precision complex numbers, and Z double-precision complex numbers). The character pair GE indicates a general matrix is being used (other possibilities include symmetric matrices, symmetric matrices packed to use just over half the storage, banded matrices, triangular matrices...). The final MV identifies this as a matrix-vector multiply. There are similarly functions SGEMV, DHEMV, DGETR to name a few variations. Further options include different strides for storing matrices, specifying that the transpose of the matrix be used, and more. The BLAS are callable from FORTRAN 77, which makes them a little baroque when used from C for example: FORTRAN arguments are always passed by reference (i.e. as a pointer), including the scalars such as the dimension of a vector, and depending on the platform to be called from C the function names may need to be lowercase with an underscore appended as a suffix like dgemv. There is an official C 3

version of the API, the CBLAS, but it s not as widely supported. Personally I write my own user-friendly wrappers in C++ with readable names and some of the options I m not in the habit of using filled in with default values. The bottom line about the BLAS is that many scientific computing programs can be structured so most of the time is spent doing these primitive linear algebra operations, so it really pays to use the optimized and correct versions other people have spent their careers implementing. It s easy to write your own code to multiply two matrices, but particularly if done naïvely it s quite possible it will be an order of magnitude slower or worse than a good BLAS implementation for matrices of a decent size. In fact, there should be a further bias: the higher levels of the BLAS afford more opportunities for optimization. If possible, you can get much higher performance by expressing algorithms in terms of the Level 3 BLAS even though in principle these could be reduced to calls to level 2 which in turn could call the level 1 routines. Level 2 and especially level 1 are typically memory-bound: most numbers in the input get loaded into a register and used exactly once, meaning that even with cache prefetching and the like the CPU will often be waiting for memory rather than doing useful work. In level 3, on average, a number in the input will be used O(n) times, and it may be possible to block the computation so that a number gets used many times per memory access, giving much higher performance. 4 Gaussian Elimination Assuming a liner system is adequately well-conditioned to permit numerical solution, how do we solve it? There are many, many algorithms and variants on algorithms for solving linear systems, each with their own assumptions, weaknesses and strengths. For example, if the matrix is symmetric, or if it s sparse (i.e. most of its entries are zero, as opposed to a dense matrix where most or all entries could be nonzero), or if you already have a very good guess at the solution of the system, may all lead you to algorithms which can exploit these facts. Our first example of a linear system, derived from RBF interpolation, in fact was symmetric, but we ll hold off on exploiting that. For now, let s just assume that all we know about the matrix is that it s well-conditioned it might be unsymmetric, dense, etc. and we have no useful guess at what the solution is. The simplest algorithm that works in this case is a version of Gaussian Elimination, or row reduction, which you should have seen in earlier linear algebra courses. This works by subtracting multiples of equations (rows in the matrix with their associated entry in the right-hand-side) off other equations, to eventually reduce the system to upper-triangular form which can be solved with back-substitution. 4

As an example, let s solve the following 3 3 linear system Ax = b: 3 2 5 x 1 1 6 2 9 x 2 3 12 10 28 4 We first try to zero out the entries below the diagonal in the first column, starting with the (2, 1) entry by adding twice the first equation to the second: 3 2 5 0 2 1 12 10 28 x 3 x 1 x 2 x 3 We zero out the (3, 1) entry by subtracting four times the first equation from the third: 3 2 5 x 1 1 0 2 1 x 2 1 0 2 8 8 Finally we zero out the (3, 2) entry below the diagonal in the second column by subtracting the updated second equation from the updated third: 3 2 5 0 2 1 0 0 7 x 3 x 1 x 2 x 3 This is now an upper triangular system, which can be solve by backwards substitution starting from the last equation, 7x 3 = 7, which gives x 3 = 1. This is substituted into the second equation, 2x 2 + x 3 = 1, which gives x 2 = 0. Finally both are substituted into the first equation, 3x 1 2x 2 + 5x 3 = 1, to give x 1 = 2. This form of the procedure can be formalized as a general algorithm, but it turns out there are some significant advantages gained by reworking it into a different form. We won t go through the details, but it turns out you can show the action of the elementary matrix operations (subtracting a multiple of one equation from another) can be assembled into the action of the inverse of a lower triangular matrix L with unit diagonal (i.e. all ones along the diagonal). The row reduction is equivalent to: L 1 Ax = L 1 b Ux = L 1 b Here U = L 1 A is the upper triangular matrix we were left with. In other words, we have implicitly factored the matrix as A = LU, with L unit lower triangular and U upper triangular. 5 1 1 7 1 1 4

5 LU Factorization Expressing Gaussian elimination as an LU factorization offers several advantages. The first is the solution procedure: to solve Ax = b we break it into three steps: Factor A = LU Find c = L 1 b, i.e. solve Lc = b, which can be done with forward substitution Solve Ux = c with backwards substitution It turns out the first step takes O(n 3 ) time, but the following two steps are much cheaper at O(n 2 ) time. If we have several systems to solve that use the same matrix A, we can amortize the cost of factorization over them for huge savings. More importantly, even when there s only one system to solve, is that the factorization A = LU allows for higher performance versions of the algorithm that exploit BLAS level 3, and it also allows for generally rather good estimates of the condition number of A. 5.1 Block Matrices The tool we need to get at a good algorithm is block-partitioning: breaking up a matrix into submatrices or blocks. For example: Here the blocks are: 3 2 5 6 2 9 12 10 28 A 11 = A 12 = A 21 = A 22 = ( 3 2 6 2 ( ) 5 ( A11 A 12 A 21 A 22 ) 9 ( ) 12 10 ( ) 28 For the moment, we ll restrict our blocks to those resulting from partitioning the columns and the rows of the matrix same way, as we did here. 6 )

5.2 LU in a 2 2 Block Structure We can now write down the A = LU factorization in block form: ( ) ( ) ( A11 A 12 L11 0 U11 U 12 = A 21 A 22 L 21 L 22 0 U 22 ) Notice the zero blocks in L and U, resulting from the fact they are triangular matrices. Multiplying out the blocks in LU gives: ( ) ( ) A11 A 12 L11 U 11 L 11 U 12 = A 21 A 22 L 21 U 11 L 21 U 12 + L 22 U 22 Peeling this apart gives us four matrix equations: A 11 = L 11 U 11 A 12 = L 11 U 12 A 21 = L 21 U 11 A 22 = L 21 U 12 + L 22 U 22 Different algorithms for LU can be easily derived from these equations (or slight variations thereof) just by choosing different block partitions. For example, a classic divide-and-conquer approach would be to split A roughly in half along the rows and along the columns: If A is 1 1, the factorization is trivially L = 1, U = A and you can return. Otherwise, split A in half along the rows and along the columns, then recursively factor A 11 = L 11 U 11, solve the block triangular system L 11 U 12 = A 12 for U 12, solve the block triangular system L 21 U 11 writing as U T 11 LT 21 = AT 21 ), = A 21 for L 21 (which you may feel more comfortable form  = A 22 L 21 U 12, and recursively find the factorization  = L 22U 22. This has some significant performance advantages over classic row reduction since the middle three steps (solving the block triangular systems and forming Â), which is where all the work actually gets done, are BLAS Level 3 operations. 7

What amounts to classic row reduction can also be formulated this way, by partitioning A so that A 11 is a 1 1 block and A 22 is an (n 1) (n 1) block. In this case, the algorithm looks like: Set L 11 = 1 and U 11 = A 11 ; stop if A is only 1 1 Copy U 12 = A 12 and set L 21 = A 21 /U 11 Form  = A 22 L 21 U 12, and proceed with the LU factorization of  Note that in the last step  is formed by subtracting the outer product of the column vector L 21 and the row vector U 12, which can easily be seen to be a rank one matrix (in fact, every rank one matrix can be expressed as an outer product of two vectors). This operation is often termed an outer product update or rank one update as a result. It can be interpreted in this case as subtracting multiples of the top row of A off the remaining rows to zero out the entries below the diagonal in U, which is classic row reduction. This variation of LU is also termed right-looking, since after finishing with a column of L we never deal with anything to the left but instead update all of the matrix to the right. (To be a bit more accurate, maybe this should be called right-and-down-looking, since the same is true for the rows of U, but that term hasn t caught on.) 8