Notes on Conditioning

Similar documents
EECS 275 Matrix Computation

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Notes on Solving Linear Least-Squares Problems

Outline. Math Numerical Analysis. Errors. Lecture Notes Linear Algebra: Part B. Joseph M. Mahaffy,

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for At a high level, there are two pieces to solving a least squares problem:

Notes on Eigenvalues, Singular Values and QR

Lecture 7. Gaussian Elimination with Pivoting. David Semeraro. University of Illinois at Urbana-Champaign. February 11, 2014

Computational math: Assignment 1

Lecture 2: Numerical linear algebra

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Conditioning and Stability

Chapter 6 - Orthogonality

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

Linear Algebra V = T = ( 4 3 ).

Linear algebra for MATH2601 Numerical methods

Lecture notes: Applied linear algebra Part 1. Version 2

Chapter 1: Introduction and mathematical preliminaries

Applied Numerical Linear Algebra. Lecture 8

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

The Steepest Descent Algorithm for Unconstrained Optimization

Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

COMP 558 lecture 18 Nov. 15, 2010

Main matrix factorizations

2. Review of Linear Algebra

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

1.1 COMPUTER REPRESENTATION OF NUM- BERS, REPRESENTATION ERRORS

Lecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why?

Least squares: the big idea

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

1 Error analysis for linear systems

Linear Least-Squares Data Fitting

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2012/2013, 4th quarter. Lecture 2: vectors, curves, and surfaces

Topics. Review of lecture 2/11 Error, Residual and Condition Number. Review of lecture 2/16 Backward Error Analysis The General Case 1 / 22

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

CS227-Scientific Computing. Lecture 4: A Crash Course in Linear Algebra

Lecture 7. Floating point arithmetic and stability

Numerical Methods I Eigenvalue Problems

Lecture 6, Sci. Comp. for DPhil Students

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Numerical Methods for Solving Large Scale Eigenvalue Problems

Solving Linear Systems of Equations

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Linear Algebra Massoud Malek

ME751 Advanced Computational Multibody Dynamics

EE731 Lecture Notes: Matrix Computations for Signal Processing

Pseudoinverse and Adjoint Operators

UNIVERSITY OF CAPE COAST REGULARIZATION OF ILL-CONDITIONED LINEAR SYSTEMS JOSEPH ACQUAH

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

nonlinear simultaneous equations of type (1)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Scientific Computing: Solving Linear Systems

Linear Algebra Methods for Data Mining

Maths for Signals and Systems Linear Algebra in Engineering

5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

Applied Linear Algebra in Geoscience Using MATLAB

Mathematics for 3D Graphics

Numerical Linear Algebra

More General Functions Is this technique limited to the monomials {1, x, x 2, x 3,...}?

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2011/2012, 4th quarter. Lecture 2: vectors, curves, and surfaces

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Orthogonal Projection, Low Rank Approximation, and Orthogonal Bases

Let x be an approximate solution for Ax = b, e.g., obtained by Gaussian elimination. Let x denote the exact solution. Call. r := b A x.

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Orthogonal Projection, Low Rank Approximation, and Orthogonal Bases

Lecture 22. r i+1 = b Ax i+1 = b A(x i + α i r i ) =(b Ax i ) α i Ar i = r i α i Ar i

Normed & Inner Product Vector Spaces

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Stability of the Gram-Schmidt process

Least Squares Optimization

Numerical Linear Algebra

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

8. Diagonalization.

Singular Value Decomposition

Lecture 2: Linear Algebra Review

WHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Properties of Linear Transformations from R n to R m

1. Introduction to commutative rings and fields

Linear System of Equations

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra /34

Vector Spaces, Orthogonality, and Linear Least Squares

CS 143 Linear Algebra Review

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Review Questions REVIEW QUESTIONS 71

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercises to Applied Functional Analysis

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Lecture 4: Linear Algebra 1

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Preliminary Examination, Numerical Analysis, August 2016

STA141C: Big Data & High Performance Statistical Computing

MTH 464: Computational Linear Algebra

Error Analysis for Solving a Set of Linear Equations

Transcription:

Notes on Conditioning Robert A. van de Geijn The University of Texas Austin, TX 7872 October 6, 204 NOTE: I have not thoroughly proof-read these notes!!! Motivation Correctness in the presence of error (e.g., when floating point computations are performed takes on a different meaning. For many problems for which computers are used, there is one correct answer, and we expect that answer to be computed by our program. The problem is that, as we will see later, most real numbers cannot be stored exactly in a computer memory. They are stored as approximations, floating point numbers, instead. Hence storing them andor computing with them inherently incurs error. Naively, we would like to be able to define a program that computes with floating point numbers as being correct if it computes an answer that is close to the exact answer. Unfortunately, some problems that are computed this way have the property that a small change in the input yields a large change in the output. Surely we can t blame the program for not computing an answer close to the exact answer in this case. The mere act of storing the input data as a floating point number may cause a completely different output, even if all computation is exact. We will later define stability to be a property of a program. It is what takes the place of correctness. In this note, we instead will focus on when a problem is a good problem, meaning that in exact arithmetic a small change in the input will always cause at most a small change in the output, or a bad problem if a small change may yield a large A good problems will be called well-conditioned. A bad problem will be called ill-conditioned. Notice that small and large are vague. To some degree, norms help us measure size. To some degree, small and large will be in the eyes of the beholder (in other words, situation dependent. 2 Notation Throughout this note, we will talk about small changes (perturbations to scalars, vectors, and matrices. To denote these, we attach a delta to the symbol for a scalar, vector, or matrix. A small change to scalar α C will be denoted by δα C; A small change to vector x C n will be denoted by δx C n ; and A small change to matrix A C m n will be denoted by A C m n. Notice that the delta touches the α, x, and A, so that, for example, δx is not mistaken for δ x.

3 The Prototypical Example: Solving a Linear System Assume that A R n n is nonsingular and x, y R n with Ax = y. The problem here is the function that computes x from y and A. Let us assume that no error is introduced in the matrix A when it is stored, but that in the process of storing y a small error is introduced: δy R n so that now y + δy is stored. The question becomes by how much the solution x changes as a function of δy. In particular, we would like to quantify how a relative change in the right-hand side y ( δy in some norm translates to a relative change in the solution x ( δx. It turns out that we will need to compute norms of matrices, using the norm induced by the vector norm that we use. Since Ax = y, if we use a consistent (induced matrix norm, Also, Hence = Ax A or, equivalently, A(x + δx = y + δy Ax = y Combining ( and (2 we conclude that } A. ( implies that Aδx = δy so that δx = A δy. δx = A δy A δy. (2 δx A A δy. What does this mean? It means that the relative error in the solution is at worst the relative error in the right-hand side, amplified by A A. So, if that quantity is small and the relative error in the right-hand size is small and exact arithmetic is used, then one is guaranteed a solution with a relatively small error. The quantity κ (A = A A is called the condition number of nonsingular matrix A (associated with norm. Are we overestimating by how much the relative error can be amplified? The answer to this is no. For every nonsingular matrix A, there exists a right-hand side y and perturbation δy such that, if A(x + δx = y + δy, δx = A A δy. In order for this equality to hold, we need to find y and δy such that and = Ax = A or, equivalently, A = Ax δx = A δy = A δy. or, equivalently, A = A δy. δy In other words, x can be chosen as a vector that imizes Ax and δy should imize A δy δy. The vector y is then chosen as y = Ax. 2

What if we use the 2-norm? For this norm, κ 2 (A = A 2 A 2 = σ 0 σ n. So, the ratio between the largest and smallest singular value determines whether a matrix is well-conditioned or ill-conditioned. To show for what vectors the imal magnification is attained, consider the SVD Recall that σ 0 ( σ A = UΣV T = u 0 u u n... σ n A 2 = σ 0, v 0 is the vector that imizes z 2= Az 2, and Av 0 = σ 0 u 0 ; ( H v 0 v v n. A 2 = σ n, u n is the vector that imizes z 2= A z 2, and Av n = σ n u n. Now, take y = σ 0 u 0. Then Ax = y is solved by x = v 0. Take δy = βσ u. Then Aδx = δy is solved by x = βv. Now, δy 2 = β σ and δx 2 = β. 2 σ 0 2 Hence δx 2 2 = σ 0 σ n δy 2 2 This is depicted in Figure for n = 2. The SVD can be used to show that A maps the unit ball to an ellipsoid. The singular values are the lengths of the various axes of the ellipsoid. The condition number thus captures the eccentricity of the ellipsoid: the ratio between the lengths of the largest and smallest axes. This is also illustrated in Figure. Number of accurate digits Notice that for scalars δψ and ψ, log 0 ( δψ ψ = log 0(δψ log 0 (ψ roughly equals the number leading decimal digits of ψ + δψ that are accurate, relative to ψ. For example, if ψ = 32.52 and δψ = 0.02, then ψ + δψ = 32.532 which has three accurate digits (highlighted in read. Now, log 0 (32.52 log 0 (0.02.5 (.7 = 3.2. Now, if δx = κ(a δy. then so that log 0 ( δx log 0 ( = log 0 (κ(a + log 0 ( δy log 0 ( log 0 ( log 0 ( δx = [log 0 ( log 0 ( δy ] log 0 (κ(a. In other words, if there were k digits of accuracy in the right-hand side, then it is possible that (due only to the condition number of A there are only k log 0 (κ(a digits of accuracy in the solution. If we start with only 8 digits of accuracy and κ(a = 0 5, we may only get 3 digits of accuracy. If κ(a 0 8, we may not get any digits of accuracy... Exercise. Show that, for a consistent matrix norm, κ(a. We conclude from this that we can generally only expect as much relative accuracy in the solution as we had in the right-hand side. 3

R 2 : Domain of A. R 2 : Codomain of A. Figure : Illustration for choices of vectors y and δy that result in δx 2 2 = σ0 δy 2 σ n 2. Because of the eccentricity of the ellipse, the relatively small change δy relative to y is amplified into a relatively large change δx relative to x. On the right, we see that δy 2 2 = βσ σ 0 (since u 0 2 = u 2 =. On the left, we see that δx 2 = β (since v 0 2 = v 2 =. Alternative exposition Note: the below links conditioning of matrices to the relative condition number of a more general function. For a more thorough treatment, you may want to read Lecture 2 of Trefethen and Bau. That book discusses the subject in much more generality than is needed for our discussion of linear algebra. Thus, if this alternative exposition baffles you, just skip it! Let f : R n R m be a continuous function such that f(y = x. Let be a vector norm. Consider for y 0 ( ( f(y + δy f(y δy κ f (y sup. f(y Letting f(y + δy = x + δx, we find that κ f (y sup ( x + δx ( δy. (Obviously, if δy = 0 or y = 0 or f(y = 0, things get a bit hairy, so let s not allow that. Roughly speaking, κ f (y equals the imum that a(n infitessimally small relative error in y is magnified into a relative error in f(y. This can be considered the relative condition number of function f. A large relative condition number means a small relative error in the input (y can be magnified into a large relative error in the output (x = f(y. This is bad, since small errors will invariable occur. Now, if f(y = x is the function that returns x where Ax = y for a nonsingular matrix A C n n, then via an argument similar to what we did earlier in this section we find that κ f (y κ(a = A A, the condition number of matrix A: lim sup ( f(y + δy f(y f(y ( δy 4

= lim = = sup = A A, ( A (y + δy A (y ( A z ( A (y + δy A (y ( A ( δy A y δy ( A δy δy ( ( A δy [( ] δy A y ( A (δ z z ( A z z δ z ( ( Ax ( A z z where is the matrix norm induced by vector norm. ( x 0 ( Ax 4 Condition Number of a Rectangular Matrix ( δy ( δy Given A C m n with linearly independent columns and y C m, consider the linear least-squares (LLS problem Ax y 2 = min Aw y 2 (3 w and the perturbed problem A(x + δx y 2 = min w+δw A(w + δw (y + δy 2. (4 We will again bound by how much the relative error in y is amplified. 5

Figure 2: Linear least-squares problem Ax y 2 = min v Av y 2. Vector z is the projection of y onto C(A. Notice that the solutions to (3 and (4 respectively satisfy A H Ax = A H y A H A(x + δx = A H (y + δy so that A H Aδx = A H δy (subtracting the first equation from the second and hence δx 2 = (A H A A H δy 2 (A H A A H 2 δy 2. Now, let z = A(A H A A H y be the projection of y onto C(A and let θ be the angle between z and y. Let us assume that y is not orthogonal to C(A so that z 0. Then cos θ = z 2 2 so that and hence We conclude that cos θ 2 = z 2 = Ax 2 A 2 2 2 A 2 cos θ 2 δx 2 A 2 (A H A A H 2 δy 2 = σ 0 δy 2 2 cos θ 2 cos θ σ n 2 where σ 0 and σ n are (respectively the largest and smallest singular values of A, because of the following result: Exercise 2. If A has linearly independent columns, show that (A H A A H 2 = σ n, where σ n equals the smallest singular value of A. Hint: Use the SVD of A. The condition number of A C m n with linearly independent columns is κ 2 (A = σ 0 σ n. Notice the effect of the cos θ. When y is almost perpendicular to C(A, then its projection z is small and cos θ is small. Hence a small relative change in y can be greatly amplified. This makes sense: if y is almost perpendical to C(A, then x 0, and any small δy C(A can yield a relatively large change δx. 6

5 Why Using the Method of Normal Equations Could be Bad Exercise 3. Let A have linearly independent columns. Show that κ 2 (A H A = κ 2 (A 2. Exercise 4. Let A C n n have linearly independent columns. Show that Ax = y if and only if A H Ax = A H y. Reason that if the method of normal equations is used to solve Ax = y, then the condition number of the matrix is unnecessarily squared. Let A C m n have linearly independent columns. If one uses the Method of Normal Equations to solve the linear least-squares problem min x Ax y 2, one ends up solving the square linear system A H Ax = A H y. Now, κ 2 (A H A = κ 2 (A 2. Hence, using this method squares the condition number of the matrix being used. 6 Why Multiplication with Unitary Matrices is a Good Thing Next, consider the computation C = AB where A C m m is nonsingular and B, B, C, C C m n. Then Thus, Also, B = A C so that and hence Thus, (C + C = A(B + B C = AB C = A B C 2 = A B 2 A 2 B 2. B 2 = A C 2 A 2 C 2 A 2. C 2 B 2 C 2 C 2 A 2 A 2 B 2 B 2 = κ 2 (A B 2 B 2. This means that the relative error in matrix C = AB is at most κ 2 (A greater than the relative error in B. The following exercise gives us a hint as to why algorithms that cast computation in terms of multiplication by unitary matrices avoid the buildup of error: Exercise 5. Let U C n n be unitary. Show that κ 2 (U =. This means is that the relative error in matrix C = UB is no greater than the relative error in B when U is unitary. 7