Numerical Linear Algebra Notes

Similar documents
Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Lecture notes: Applied linear algebra Part 1. Version 2

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Lecture 7. Floating point arithmetic and stability

The QR Factorization

Math 407: Linear Optimization

NUMERICAL LINEAR ALGEBRA. Lecture notes for MA 660A/B. Rudi Weikard

Cheat Sheet for MATH461

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Scientific Computing: Dense Linear Systems

14.2 QR Factorization with Column Pivoting

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

The Singular Value Decomposition

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Matrix decompositions

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Lecture 2: Linear Algebra Review

Eigenvalue and Eigenvector Problems

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

The Singular Value Decomposition and Least Squares Problems

Linear Algebra Massoud Malek

5 Selected Topics in Numerical Linear Algebra

This can be accomplished by left matrix multiplication as follows: I

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Linear Analysis Lecture 16

Gaussian Elimination for Linear Systems

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Algebra C Numerical Linear Algebra Sample Exam Problems

MATH36001 Generalized Inverses and the SVD 2015

Linear Algebra, part 3 QR and SVD

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Matrix Theory. A.Holst, V.Ufnarovski

Notes on Eigenvalues, Singular Values and QR

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Problem set 5: SVD, Orthogonal projections, etc.

Lecture 6, Sci. Comp. for DPhil Students

Lecture 9. Errors in solving Linear Systems. J. Chaudhry (Zeb) Department of Mathematics and Statistics University of New Mexico

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Singular Value Decomposition

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Introduction to Numerical Linear Algebra II

Lecture Notes to be used in conjunction with. 233 Computational Techniques

Least-Squares Systems and The QR factorization

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

G1110 & 852G1 Numerical Linear Algebra

Lecture 3: QR-Factorization

Notes on Solving Linear Least-Squares Problems

ESSENTIALS OF COMPUTATIONAL LINEAR ALGEBRA Supplement for MSc Optimization 1

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for At a high level, there are two pieces to solving a least squares problem:

Linear Algebra. Session 12

Lecture Notes to be used in conjunction with. 233 Computational Techniques

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Scientific Computing: Solving Linear Systems

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

EE731 Lecture Notes: Matrix Computations for Signal Processing

Linear Least-Squares Data Fitting

Numerical Methods - Numerical Linear Algebra

5.6. PSEUDOINVERSES 101. A H w.

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)

Review problems for MA 54, Fall 2004.

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Orthogonalization and least squares methods

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Numerical Methods for Solving Large Scale Eigenvalue Problems

Lecture 2: Numerical linear algebra

Outline. Math Numerical Analysis. Errors. Lecture Notes Linear Algebra: Part B. Joseph M. Mahaffy,

MATH 532: Linear Algebra

Orthogonal Transformations

NORMS ON SPACE OF MATRICES

EE731 Lecture Notes: Matrix Computations for Signal Processing

Linear Least Squares. Using SVD Decomposition.

Vector and Matrix Norms. Vector and Matrix Norms

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Chapter 0 Miscellaneous Preliminaries

1 Error analysis for linear systems

Chapter 7: Symmetric Matrices and Quadratic Forms

Class notes: Approximation

LinGloss. A glossary of linear algebra

Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

Linear Algebra in Actuarial Science: Slides to the lecture

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Algebra Primer

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Lecture Summaries for Linear Algebra M51A

Computational math: Assignment 1

Least squares: the big idea

IV. Matrix Approximation using Least-Squares

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Math 6610 : Analysis of Numerical Methods I. Chee Han Tan

Basic Elements of Linear Algebra

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Numerical Linear Algebra

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 5. Ax = b.

OHSx XM511 Linear Algebra: Solutions to Online True/False Exercises

1. General Vector Spaces

MATH 350: Introduction to Computational Mathematics

Transcription:

Numerical Linear Algebra Notes Brian Bockelman October 11, 2006 1 Linear Algebra Background Definition 1.0.1. An inner product on F n (R n or C n ) is a function, : F n F n F satisfying u, v, w F n, c F, 1. v, v 0 with equality iff v = 0 2. u, v + w = u, v + u, w 3. u, v = v, u 4. u, c v = c u, v A norm on F n is a function : F n R, such that for all u, v F n, c F, 1. u 0 with equality iff u = 0 2. c u = c u 3. u + v u + v Theorem 1.0.2. CBS-inequality Examples: u, v 2 u, u v, v 1. (Inner products). Let H C n,n be Hermitian (H = H) and positive definite (i.e., v C n, v Hv 0 with equality iff v = 0) (this is equivalent to all eigenvalues of H positive). If H is real, we call it symmetric positive definite (SPD). Define, u, v = u Hv. It s a simple exercise to show this satisfies the requirements of the inner product. 1

2. (Special example of above) Let A be a nonsingular n n (invertible) matrix in C. Define H = A A. Note Also, if v 0, v C n. Then, (A A) = A (A ) = A A v Hv = v A Av = (Av) (Av) = Av 2 2 Since A is invertible, N (A) = {0}, so v 0 = Av 0 = Av 2 2 > 0. 3. Norms: (a) Induced norm (from inner product, ): v 2 = v, v Exercise: Verify that the norm laws hold. This allows us to restate CBS: u, v u v where is induced from the inner product. (b) p-norms. Let p 1 be real. Let v = (v 1,..., v n ). Then, Important norms: v p = ( v 1 p + + v n p ) 1 p v = lim p v p = max{ v 1,..., v n } p = 1, v 1 = v 1 + + v n p = 2, v 2 = ( v 1 2 + + v n 2 ) 1 2 p =, v = max{ v 1,..., v n } Example: v = ( 2, 3 + i, 4, i) C 4. Then, v 1 = 7 + 10 v 2 = 31 v = 4 (c) Matrix norms: C m,n or R m,n is a vector space in its own right, so anything satisfying norm laws works; Frobenius norm. vec(a) is the vector resulting from stacking columns of A in order. So, any vector norm gives us a matrix norm. In particular, A F = vec(a) 2 2

Key Fact: Operator norm. Let be some norm on vector spaces F m, F n where A is m n. Define the operator norm to be: A = max x 0 Ax x = max x =1 Ax This is a norm (exercise). Fact: Let A be m n, norm are vectors p. Write A = [a 1,..., a n ]. Then, A 1 = max{ a 1 1,..., a n 1 } A = A T 1 = max{ r 1,..., r m 1 } where A T = [r 1,..., r n ]. Key Property: A matrix norm is multiplicative if AB A B. 1. Every linearly independent set in a finite dimensional vector space can be enlarged to a basis. 2. Every orthonormal set in a finite dimensional vector space can be enlarged to an orthonormal basis. 3. Let A be m n, U, V unitary matrices, m m and n n, respectively. Then, UAV F = A F UAV 2 = A 2 Needs this elementary fact: for v C n, V n n unitary, V v 2 = v 2 2 Factorizations 2.1 Schur Factorization Theorem 2.1.1. Schur Triangularization Theorem Let A be an n n complex matrix. Then, there exists a unitary matrix U such that U AU = T where T is a triangular matrix. 3

Proof. Proceed by induction on n. n = 1 is trivial. Suppose the theorem is true for all sizes < n. Let A be an n n matrix, n > 1. Compute an eigenvalue, λ, of A and eigenvector u of unit length. Next, use the key fact that the orthonormal set {u 1 = u} can be expanded to an orthonormal basis u 1,..., u n of C n. Form This is unitary. Calculate, U 1 = [u 1, u 2,..., u n ] U AU = u 1. u n [ λu 1,..., ] u 1λu 1 u 2λu 1 =. A 1 u nλu 1 λ 0 =. A 0 By induction, there exists a unitary matrix U 2 such that U2 A 1 U 2 is upper triangular. Then, form 1 0... 0 0 U 3 =. A 0 Finally, form Then, is upper triangular. Applications U = U 1 U 3 U AU Theorem 2.1.2. Principal Axes Theorem: If A is Hermitian, then there exists a unitary U such that U AU is diagonal and eigenvalues of A are real. If A is real, then U can be chosen to be orhtogonal. Proof. Apply Schur. Then, U AU = T where T is upper triangular for some unitary U. But, T = (U AU) = U A U = U AU 4

= (T ) = T = T So, T is both upper and lower triangular, so T is a diagonal matrix. Thus, λ 1 0 λ 1 0 T =... = T =... 0 λ n 0 λn 2.2 Singular Value Decomposition Theorem 2.2.1. Singular Value Decomposition (SVD) Let A be m n matrix, A C m,n. Then, there exists unitary matrices U and V (can be chosen real if A is real) such that U AV = Σ = σ 1 σ 2... where σ 1 σ 2 σ p 0 with p = min{m, n}. Morover, σ 1,..., σ p are uniquely determined by A. Notation: The σ i s are the singular values of A. The u i s are left singular vectors, and the v i s are the right singular values of A. Proof. For B = A A. Then, B is Hermitian. We claim that B is positive semidefinite: x Bx = x A Ax = (Ax) (Ax) = Ax 2 0 Hence, then eigenvalues are also nonnegative (let e be an eigenvector for λ): 0 e Be = e λe = λ e 2 2 > 0 Since eigenvalues are nonnegative, write them as real squares and order them: σ 1 σ 2 σ n 0 We can diagonalize B to obtain for some unitary n n. That is, σ1 2 0 V BV =... 0 σn 2 Note that rank(a A) ranka = r min{m, n} Hence, we conclude that rankb r. Hence rank(v BV ) = r. But, σ1 2 0 V BV =... 0 σn 2 5

So, σ j = 0 for j > r and σ 1 σ 2 σ r 0. Then, Now, let u i = u i u j = 1 sigma i Av i, i = 1,..., q. ( ) ( ) 1 1 Av i Av j σ i σ j = 1 vi Bv j = 1 vi σj 2 v j σ i σ j σ i σ j = σ2 j σ i σ j v i v j = { 0, i j 1, i = j Thus, u 1,..., u q form an orthonormal set. Fill u 1,..., u q out to an orthonormal basis u 1,..., u m of C m. Set Then, U is unitary. Moreover, But, U AV = Reasons: If j > q, then σ j = 0. Therefore, Av j = 0. Finally, if i = j q, then U = [u 1,..., u m ] u 1. u m [Av 1, Av 2,..., Av n ] = [u i Av j ] m,n 0, j q u i Av j = σ i, j q, i = j 0, j q, i j Bv j = σ 2 j v j = 0. v j A Av j = v j 0 Av j 2 2 = 0 u i Av j = 1 σ i (Av i ) Av i = 1 σ j v i A Av i = 1 σ j v i σ 2 i v i = σ i v i v i = σ Finally, rank(a) = rank(ua V ) = q 6

Tuesday, September 5, 2006: Note that means U AV = Σ A = UΣV 1 = [u 1,..., u m ] σ... = p σ j u j vj j=1 σ p v 1. v n If m n, so that n = p, we have [ ] v 1 A = [u 1,..., u n ]Σ p = ŪΣ pv.v n (note Ū is not unitary as it is not square) where Σ p = diag{σ 1,..., σ p } This is the reduced form of SVD. Even further, if r = rank(a), then r = rank(uσv ) = rank(σ) Hence, σ j = 0 for j > r. So, we can write: A = [u 1,..., u r ]Σ r = ŪΣ r V v 1. v r This is customarily called the compact form of the SVD. 2.3 Applications of SVD We have the following results: 1. rank(a) = r where σ r 0, σ r+1 = 0. 2. To solve Ax = b stably; form U AV (V x) = U b Σy = c Set y i = ci σ i, i = 1,..., r, y i = 0 for i > r. This gives a solution ɛ; if rank(a) = n it gives the unique solution. 7

3. A square matrix A is invertible iff all singular values 0. 4. A 2 = σ 1. Proof. A 2 = UΣV 2 = sup U(σV x) 2 = x 2=1 sup ΣV x 2 := x 2=1 sup Σy 2 x 2=1 Note that Then, y 2 = V x 2 = x = 1 = sup Σy 2 = σ 1 y 2=1 5. A 1 2 = 1 σ n Proof. Note, if A is a square n n invertible, σ 1 0 U AV = Σ =... 0 σ n Then, σ1 1 0... 0 σ 1 n = (U AV ) 1 = V 1 A 1 (U ) 1 = V A 1 U Multiply by permutation matrices to get the σ 1 j in desired order, and notice that a product of unitary matrices is unitary. So, by part 4, A 1 2 = 1 σ n Note, if A is a square, invertible matrix, then cond 2 (A) = A 2 A 1 2 = σ 1 σ n 6. Range A = C(A) = span{u 1, u 2,..., u r }. r = rank(a). 8

Proof. Remember r A = σ k u k vk k=1 Then, range(a) = C(A) = span({a i }) = {Ax x F n } Thus, r range(a) = {( σ k u k vk)x x F n } k=1 n n = { σ k (vkx) u k x F} = { y k u k y F r } k=1 k=1 = span{u 1,..., u r } 7. null(a) = N (A) = span{v r+1,..., v n } 8. Let A k = k j=1 σ ju j vj. Then, for k < r, A A k 2 = inf B C m n,rank(b)=k A B 2 = σ k+1 3 QR Factorization and Least Squares 3.1 Motivation: We have three points of view. 1. Geometric point of view: (pictures). 2. Analytic View: Define a linear operator T : C m C m and suppose T has the projection property: T 2 = T. Let U = ranget = {T (x) : x C m } V = null(t ) = {x : T (x) = 0} One shows that U + V = C m and U V = {0}. Thursday, September 7, 2006: 9

3. Matrix View: Definition 3.1.1. A projector is a P C m n such that P 2 = P. Fact: Let P be a projector. Let U = rangep = C(P ) and let V = nullp = N (P ). Then, I P is a projector nullp = range(i P ) C m = U V 3.2 Orthogonal Projections Definition 3.2.1. A projector P is an orthogonal projector if v u = 0 for all v N (P ), u C(P ). Fact: Projector P is orthogonal iff P = P. Proof. If P is orthogonal, then for all v N (P ) = C(I P ), u C(P ), we have v u = 0 So, write u = P x, where x is arbitrary. Then, write v = (I P )y, where y is arbitrary. Thus, If v u = 0, then v u = ((I P )y) P x = y (I P )x y (I P )P x = 0 So, every entry of (I P )P is 0. Therefore, P = (P P ) = P P = P. Conversely, if P = P, then we have u = P x, v = (I P )y, v u = y (P P P )x = y (P P 2 )x = y 0x = 0 How to construct an orthogonal projection onto U (the range) along V (the nullspace). Method 1: (Derived from the normal equations) Let U be a subspace of C m and suppose dim U = n < m. Let a 1, a 2,..., a n be basis of U. Let A be the full column rank matrix: A = [a 1,..., a n ] 10

(Remark Ax = b, we ha ve the normal equations: A Ax = A b, will always have solutions.) In our case, we can show that null(a A) = {0}. Hence, A A is invertible, so the unique solution to the normal equations is: Form Then, Then, Now, x = (A A) 1 A b P = A(A A) 1 A P 2 = A(A A) 1 A A(A A) 1 A = AI(A A) 1 A = P P = A (A A) 1 A = A(A A ) 1 A = A(A A) 1 A = P Method 2: Supply an orthonormal basis u 1,..., u n of U C m. Define and when i = j, P = u 1 u 1 + + u n u n u i u i u j u j = 0, i j u i u i u i u i = u i 1u i = u i u i Thus, using these facts, it can be shown: P 2 = P Finally, observe: P = P P x = u i (u i x) + + u n (u nx) = c 1 u 1 + + c n u n Take x = u i and get P x = u i, so range(p ) = span(u 1,..., u n ). Tuesday, September 19, 2006: Definition 3.2.2. The Householder transform defined by (v 0): H v := I 2 vv v v 11

Note that H v is a unitary Hermitian operator. Key Idea: Want to map a vector x to y stably. The best choice would be Ux = y where U is unitary. However, we must have x 2 = y 2. Try H v, where v = x y. Fact: If x, y are real, x, y are the same length, then x y x + y. In general, H v v = (I 2 vv v v )v = I v) 2v(v v = v v Hence, H v (x y) = y x. Also, in general, if w v, then H v w = w. So, Adding these together, H v (x + y) = x + y H v (x y) = y x H v x = y H v y = H 2 v x = x Big Idea: Use the Householder transform to make as many 0s as possible: x 1 ± x x 2 x =. 0. = ±z x n 0 We prefer y = (sign(x 1 ) x, 0,..., 0) We can now take the Householder transform, and use it to go along the columns of a matrix A to make it upper triangular. For example, we can make: (H v3 H v2 H v1 )A = R = Q (an upper trian- where R is upper triangular. Then, taking H v3 H v2 H v1 gular method): Q A = R A = QR Algorithm 10.1 Input A, m, n. Let R = A p = min{m, n}. for k = 1 : p, x = R k:m,k x(1) = sqn(x(1)) x 2 + x(1) 12

v k = x/ x 2. R k:m,k:n = R k:m,k:n 2v k v k R k:m,k:n. end return v 1,..., v p, R. Flop count. To get work per pass: k = 1: 2mn + mn + mn = 4(mn) (4 flops per entry). Total work: 4(mn + (m 1)(n 1) + + (m p + 1)(n p + 1)) For m n, this ends up being: ( n 3 4 3 + mn2 2 Total work: ) n3 = 2mn 2 2 2 3 n3 2mn 2 2 3 n3, m > n 4 3 n3, m = n 2m 2 n 2 3 m3, m < n Algorithm 10.2 Implicit calculation of Q b. Input v i s and b: for k = 1 : n: b k:m = b k:m 2v k v k b k:m Algorithm 10.3 Implicit calculation of Q x. Input v i s and x: for k = n : 1 : 1: x k:m = x k:m 2v k v k x k:m end return x 4 Least Squares: The problem: solve Ax = b when there may not be a solution. We want a least squares solution that will minimize: b Ax 2 There is a solution. Let A = [a 1,..., a n ]. Force: a i (b Ax) = 0 A Ax = A b these always have solutions. The solutions to these equations are called the least square soln to Ax = b. Ex 1: Linear regression: variables x and y are theoretically related by the linear equation: y = ax + b 13

Estimate a, b gives us data pairs (x i, y i ). So, we have the following data: x 1 1 [ y 1 a.. = b]. x n 1 y n Example: Interpolating 10 data points by a polynomial with a Van der Mond matrix, including using. x = ( 5 : 5), y = [0; 0; 0; 1; 1; 1; zeros(5, 1)], A = vander(x), c = A y Example: Normal equations, in general Suppose we have an inner product, on a finite dimensional vector space V with basis v 1,..., v n. To approximate an element v V using v 1,..., v k, k < n, simply write: v = c 1 v 1 +. + c k v k This may well have no solution. To try to find the best solution (minimize residual), force v j, r = 0 for j = 1,..., k. This leads to the following system: v 1, v 1 c 1 + v 1, v 2 c 2 + + v 1, v k c k = v 1, v.. v k, v 1 c 1 + v k, v 2 c 2 + + v k, v k c k = v k, v = A c = b Example: V = C[0, 1], v i = x i, i = 1,..., k, using the inner product: f(x), g(x) = What results is the Hilbert matrix: x i, x j = 1 0 f(x)g(x)dx 1 i + j + 1 where has a very bad condition number. Remark: In most practical problems, A of Ax = b has full column rank. Thus, A is m n, m n, and rank(a) = n. One checks that A A is an n n matrix of rank n. So, A A is invertible and the normal equations has a unique solution. Do not use the inverse of A A! 14

Rather, use something like a Cholesky factorization. First, do QR. It is possible to write: A A = R R where R is upper triangular (as A is SPD). Now write: then solve for y. Then, solve R Rx = A b R y = b Rx = y for x. The cost here is mn 2 + 1 3 n3. If we did plain old Gaussian Elimination on normal equations, the cost is mn 2 + 2 3 n3. The next method is to use the QR factorization of A: A = QR Now, solve: for x. The cost is 2mn 2 2 3 n3. The reason we should use this is: Rx = Q b b Ax 2 = b QRx = Q(Q b Rx) = Q b Rx [ ] [ R ˆQ = Q b b x = 1 Rx ] 0 y The last method is to use the reduced SVD and solve Σw = Û b, then set x = V w. The cost is 2mn 2 + 11n 3. The reason for using this is stability and the ability to solve rank-deficient problems. QR can t do this without serious modification, and GE / Cholesky can t handle rank deficient at all. Tuesday, September 26, 2006: 5 Conditioning and Condition Numbers Fundamental Questions: Given a calculation of f(x), how sensitive is f(x) to changes in x? This is really a mathematical sensitivity. Problem is: f(x + δx). rather than compute the intended f(x), we might compute 15

5.1 Absolute condition number One measure is the (absolute) condition number. Set δf = f(x + δx) f(x). Take, δf ˆκ = ˆκ(x) = lim sup δ 0 δx Example: δx δ f(x) = 4x 2 f(x + δx) f(x) ˆκ = lim = = f (x) = 8 x δ 0 δx This is odd - we want x 2 to be stable. Note: For smooth multivariate, The Jacobian of f is: f(x) = f 1 ((x 1,..., x n )). f m ((x 1,..., x n )) [ ] fi J f (x) = x j plays the part of the derivative in this sense: m,n δf J f (x) δx I.e., So, Example 2: δf J f (x) δx) lim = 0. δx 0 δx ˆK = lim sup δ 0 δx δ J f (x) δx δx = J f (x) f(x) = x 1 x 2 The condition number here is 2, which suggests that subtraction is a stable operation (numerically, this is not true!) Obviously, we have the wrong idea. 16

5.2 Relative Condition Number: Need x, f(x) 0: Example 1: Example 2: κ := lim sup δ 0 δx δ δ / f δx / x κ = f (x) x 8 x x = f 4 x 2 = 2 κ = J f (x) x = 2 max{ x 1, x 2 } f x 1 x 2 Heuristic: If the condition number of the problem is κ, expect to lose log 10 κ digits of accuracy. Reason: If δx x 10 β, then δf f = κ δx x, δx 0 5.3 Examples: 5.3.1 Wilkinson s Polynomial We define: δf f 10log 10 κ 10 β = 10 log 10 κ β p(x) = 20 j=1 (x j) = a + + a 1 9x 19 + x 20 The condition number of λ = 15 is 5.1 10 13. This results in the perfidious polynomial, as Wilkinson calls it. Theorem 5.3.1. The condition number of computing b = Ax = f(x) is κ = A x b A A 1 =: κ(a) 17

Proof. f(x + δx) f(x) / f(x) δx / x = A δx Ax Also, Ax = b implies x = A 1 b, so So, = A(x+δx) Ax Ax δx / x x δx = Aδx δx x b A x b x b = A 1 b A 1. b κ A A 1 These inequalities are frequently nearly equal. They can be exact equalities for certain choices of b and δx. Fact: κ(a) = σ 1 σ n. Theorem 5.3.2. Let b be fixed, and x be a solution to Ax = b. Let, f(a) = A 1 b. Then, κ f κ(a) Theorem 5.3.3. Perturbation Theorem Suppose an invertible matrix A satisfies Ax = b. Suppose δa and δb are given and δx satisfies (A + δa)(x + δx) = b + δb Set B = A 1 δa. Suppose β = B < 1. Then, δx x K(A) 1 β { δa A + δb b This provides an estimate on the relative error of the solution. } Proof. Subtract Ax = b: (A + δa)(x + δx) = b + δb Aδx + δax + δaδx = δb (A + δa)δx = δb δax A(I + A 1 δa)δx = δb δax A(I + B)δx = δb δax 18

By the Banach lemma (notice B 1), δx = (I + B) 1 A 1 {δb δax} δx (I + B) 1 A 1 { δb + δa x } 1 { δb 1 β A 1 A A + δa } A x δx x κ(a) { δb 1 β A x + δa } A δx x κ(a) { δb 1 β b + δa }. A 6 Floating Point Analysis Thursday, September 28, 2006: Ref: What every computer scientist should know about floating point arithmetic. David Goldberg, 1992, Computing Surveys (ACM) While we are used to working with R or C, on a computer, we are limited to an approximation of these. We say, Here, 0 m < β t, m Z, a e b. β The base of our representation. m βt Mantissa of x e Exponent of x x f(x) = ± m β t βe t The precision of our representation Example: IEEE double precision standard. 1 byte for sign, 8 bytes for exponent, and 52 bits for mantissa. So, the floating point universe is finite. We ll idealize our field a bit by removing the bounds on our exponents. So, we have a countably infinite, self-similar set of floating point numbers. This avoids overflow and underflow. Machine Epsilon The smallest value in a floating point approximation is known as ɛ: ɛ machine = 1 2 β1 t 19

eps in Matlab. This is the measure of gaps between floating point numbers. A reasonable expectation is that f(x) approximates x via rounding, i.e.: x f(x) ɛ machine (Rounding Axiom, RA). Note: If we start with with: x = 0.d 1 d 2 d t d t+1 β e So, β t x = d 1 d 2 d t d t+1 β e Thus, d 1 d 2 d t β t e x d 1 d 2 d t + 1 Now, if we choose left or rhs, whichever is closed to β t x, we get xβ t e f(xβ t e ) 1 2 So β t e x f(x) 1 2 x f(x) 1 2 βe t β e t = 1 2 β1 t β e 1 But So, βx β e, x β e 1 x f(x) 1 2 β1 t x Thus, x f(x) ɛ machine x Rem: Some machines have a different ɛ machine. In particular, if one deals with complex numbers, one has to enlarge the machine ɛ of the rounding axiom - by 2 3 2. So, in base 2, ɛ machine = 1 2 β1 t = β t Fundamental Axiom of Floating Point Arithmetic (FAFPA): Let x, y F. Let + be any of the basic four arithmetic operations. Let be the corresponding machine operation. Require: (x + y) (x y) ɛ machine x + y 20

(considerations must be made for x + y to be nonzero). Problems that occur on real machines: Consider the system with β = 10, t = 5, 70 e 70. 1. 10 40 10 40 - This gives 10 80 and underflow, which is typically ok 2. 10 40 10 40 /10 60. Left to right, this causes an overflow. Right to left, this is ok 3. 10 40 10 40 10 60 - can be overflow or not, depending on how it is grouped. 4. x = 5/7, y = 0.71425. f(x) f(y) = 0.00003 Correct value:.34714 10 4 Error:.4714 10 5 Relative error: 0.136 The error is larger than it should be; this is because we started with x which isn t floating point. So, we should avoid subtracting nearly equal real numbers. Classic Example: Solve x 2 + bx + c = 0. The quadratic formula can cause catastrophic cancellation. So, we reorganize the calculation; Note r 1 r 2 = c. So, calculate: 7 Stability: x 2 + bx + c = (x r 1 )(x r 2 ) r 1 = b sgn(b) b 2 rc, r 2 = c r 1 We have a problem - calculate f(x), and we want to compute f(x). We want: f(x) ˆf(x) = O(ɛ machine ) f(x) This is true independent of the norm used, as all finite dimensional norms are equivalent. If we can prove this, we call the algorithm accurate. One example of this is approximating x with f(x). The FAFPA tells us this is an accurate algorithm. Thursday, October 5, 2006: 21

Note that multiplication is accurate: ˆf(x, y) = fl(x) fl(y) = (x(1 + ɛ 1 )y(1 + ɛ 2 )) (1 + ɛ 3 ) = x y(1 + O(ɛ machine )) = f(x, y)(1 + O(ɛ)) So, ˆf(x, y) f(x, y) = f(x, y) O(ɛ) ˆf(x, y) f(x, y) = O(ɛ) f(x, y) For the outer product, f(x, y) = x y which is the matrix whose (i, j) entry satisfies: x i y j fl( x ) fl(y j ) = ˆx i y j O(ɛ) So, entrywise, the calculation is stable. Using any desirable norm, we can also show the matrix, as a whole, is accurate. But this is not backwards stable; ˆf(x, y) is just an outer product with random perturbations in each entry; we cannot expect the result to be rank one. However, f(ˆx, ŷ) = ˆxŷ is rank one. Now, consider inner products: Problem: f(x, y) = x y. Algorithm: Computed ˆf(x, y) on a computer satisfying RA and FAFPA. Here, So, x = (x 1,..., x n ), y = (y 1,..., y n ) ŝ 1 = fl( x 1 ) fl(y 1 ) = x 1 y 1 (1 + ɛ 1 )(1 + ɛ 2 )(1 + µ 1 ) ŝ 2 = ŝ 1 (fl( x 2 ) fl(y 2 )) = ( x 1 y 1 (1 + e 21 ) + x 2 y 2 (1 + e 22 )(1 + µ 2 ) Eventually, you get: ŝ n = x 1 y 1 (1 + e n1 ) + x 2 y(1 + e 21 ) + + x n y n (1 + e nn ) Finally, set ˆx i = x i, ˆx = x, ŷ i = y i (1 + e n,i ), and ŷ = [ŷ 1,..., ŷ n ]. So, the computed value is: ˆf(x, y) = ŝ n = x ŷ = f(x, ŷ) where y ŷ = y O(ɛ). So, we have backward stability. Unfortunately, this algorithm is not accurate: [ ] [ ] 1 x1 x 2 = x 1 1 x 2 as subtraction is not accurate. 22

8 Stability of the Householder Triangularization Caution: Regarding the stability of vector or matrix calculations; e.g., inner products: x 1 ỹ 1 (1 + ɛ n1 if along the way: 1 + ɛ n,1 = (1 + µ 1 ) (1 + µ n ) = (1 + µ) n = 1 + nµ + O(µ 2 ) So, in general, our order contants C may be of order n. Problem: Solve for x in Ax = b. The condition of this problem is κ = κ(a). Algorithm 16.1 To solve Ax = b by QR, Factor A = QR into orthogonal Q and upper triangular R. We actually find orthogonal Q and triangular R. Compute y = Q b; Actually, compute ỹ = [ Q b] Solve y = Rx to get soln x = R 1 y.. Actually compute x = [ R 1 ỹ]. The necessary backward stability facts: The computer Q and R of A = QR by Householder reflections satisfy: Q R = A + δa with δa A = O(ɛ). This is exactly the backward stability of [Q, R] = f(a). If Q is orthogonal, b vector, and y = Q b, then there exists δ Q such that ( Q + δ Q) δ Q ỹ = b where Q = O(ɛ). This is backwards stability for f(q) = Q b. If R is nonsingular and upper triangular, then the computed solution x = R 1 ỹ satisfies ( R + δ R) x = ỹ for some δ R such that back-substitution δ R = O(ɛ). This is just backwards stability of R Remark: About Fact 1: Suppose Q 1 = H v1. Then, Q 1 A = H v1 A = (I 2 v 1v1 v1 v )A = A 2 1 v1 v v 1 (v1a) 1 We can use backward stability of inner products and + to get this algorithm is backward stable. We want the theorem: 23

Theorem 8.0.4. Algorithm 16.1 is backward stable in sense that computed x satisfies (A + δa)x = b for some δa such that δa A = O(ɛ). Proof. From fact 2, From fact 3, From Fact 1, get, where: b = ( Q + δ Q)ỹ b = ( Q + δ Q)( R + δ R) x. b = ( Q R + δ Q R + Q δtilder + δ Q δ R) x = (A + (δa + stuff )) x A = (δa + stuff ) So, we need to check that each part of A is O(ɛ), by the triangle inequality. Certainly, δa A = O(ɛ) by Fact 1. Now, Q R = A + δa so R = Q (A + δa). Then, R A Q ( A + δa ) A So, For the second term, 1(1 + O(ɛ) δ Q R A R δ Q A = (1 + O(ɛ))O(ɛ) = O(ɛ) Finally, δ R Q A frac δ R R Q R A = O(ɛ) 1 (1 + O(ɛ)) = O(ɛ) δq δr A = δq δ R A R R = δr R R δq = O(ɛ)(1 + O(ɛ))O(ɛ) Ã = O(ɛ) 24

Now, we appeal to the forward error estimate theorem and the fact that the condition number of prob Ax = b is κ(a) to obtain Theorem 8.0.5. Computed x of Algorithm 16.1 satisfies: x x x = O(κ(A)ɛ) 9 Stability of Backsolving. Problem: Given nonsingular R = [r ij ] m,m and b = [b i ] m, solve for x = [x i ] in Rx = b. Algorithm 17.1: For j = m : 1 : 1, x j = 1 m b j r jj end The flop count for this is: k=j+1 r jk x k m [2 + (m (k + 1) + 1) 2] m 2 k=1 Theorem 9.0.6. Algorithm 17.1 applied to a system of floating point numbers is stable in the sense that computed x satisfies with δr R = O(ɛ) Proof. We do this for m = 3: (R + δr) x = b r 11 x 1 + r 12 x 2 + r 13 x 3 = b 1 r 22 x 2 + r 23 x 3 = b 2 Ideally, r 33 x 3 = b 3 x 3 = b 33 r 33 x 2 = 1 r 22 (b 2 r 23 x 3 ) x 1 = 1 r 11 (b 1 (r 12 x 2 + r 13 x 3 )) 25

Instead, what happens, but, implies Here, we can say Thus, Here, x 3 = b 33 r 33 (1 + ɛ) 1 1 + ɛ = (1 + ɛ 1) 1 + ɛ = 1 1 + ɛ 1 ɛ = 1 1 + ɛ 1 1 = ɛ 1 1 + ɛ 1 = ɛ 1 (1 ɛ 1 + ɛ 2 1 ) = ɛ 1 + O(ɛ 2 1) ɛ ɛ machine + O(ɛ 2 machine) x 3 = b 3 r 33 (1 + ɛ) = b 3 r 33 r 33 r 33 r 33 ɛ For x 2, calculate, r 33 x 3 = b 3 x 2 = fl( b 2 r 23 x 3 r 22 = 1 r 22 (b 2 r 23 x 3 (1 + ɛ 1 ))(1 + ɛ 2 )(1 + ɛ 3 ) But So, where So, we have: (1 + ɛ 2 )(1 + ɛ 3 ) = 1 + ɛ 2 + ɛ 3 + O(ɛ 2 machine) = 1 + µ 1 + µ = 1 1 + ɛ 4, ɛ 4 2ɛ machine + O(ɛ 2 machine) x 2 = b r 23(1 + ɛ 1 ) x 3 r 22 (1 + ɛ 4 ) r 22 r 22 r 22 r 23 r 23 r 23 ɛ machine = b r 23 x 3 r 22 2ɛ machine + O(ɛ 2 machine) 26

We continue on like this to get the error term for x 3. So, we have, R R R mɛ machine + O(ɛ 2 machine) This shows backward stability in any norm (different norm just changes order constants. 27