MATH2071: LAB #5: Norms, Errors and Condition Numbers

MATH2071: LAB #5: Norms, Errors and Condition Numbers 1 Introduction Introduction Exercise 1 Vector Norms Exercise 2 Matrix Norms Exercise 3 Compatible Matrix Norms Exercise 4 More on the Spectral Radius Exercise 5 Types of Errors Exercise 6 Condition Numbers Exercise 7 The objects we work with in linear systems are vectors and matrices. In order to make statements about the size of these objects, and the errors we make in solutions, we want to be able to describe the sizes of vectors and matrices, which we do by using norms. We then need to consider whether we can bound the size of the product of a matrix and vector, given that we know the size of the two factors. In order for this to happen, we will need to use matrix and vector norms that are compatible. These kinds of bounds will become very important in error analysis. We will then consider the notions of forward error and backward error in a linear algebra computation. From the definitions of norms and errors, we can define the condition number of a matrix, which will give us an objective way of measuring how bad a matrix is, and how many digits of accuracy we can expect when solving a particular linear system. This lab will take two sessions. You may find it convenient to print the pdf version of this lab rather than the web page itself. 2 Vector Norms A vector norm assigns a size to a vector, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. There are three common vector norms in n dimensions: The L 1 vector norm The L 2 (or Euclidean ) vector norm The L vector norm x 1 = n x i i=1 x 2 = n x = i=1 x 2 i max x i i=1,...,n To compute the norm of a vector x in Matlab: x 1 = norm(x,1); x 2 = norm(x,2)= norm(x); x = norm(x,inf) (Recall that inf is the Matlab symbol corresponding to.) 1

Exercise 1: For each of the following vectors: x1 = [ 1; 2; 3 ] x2 = [ 1; 0; 0 ] x3 = [ 1; 1; 1 ] compute the vector norms, using the appropriate Matlab commands. Be sure your answers are reasonable. L1 L2 L Infinity x1 ---------- ---------- ---------- x2 ---------- ---------- ---------- x3 ---------- ---------- ---------- 3 Matrix Norms A matrix norm assigns a size to a matrix, again, in such a way that scalar multiples do what we expect, and the triangle inequality is satisfied. However, what s more important is that we want to be able to mix matrix and vector norms in various computations. So we are going to be very interested in whether a matrix norm is compatible with a particular vector norm, that is, when it is safe to say: There are five common matrix norms: The L 1 or max column sum matrix norm: The L 2 matrix norm: Ax A x A 1 = A 2 = max j=1,...,n i=1 where λ i is a (necessarily real) eigenvalue of A A or where µ i is a singular value of A; The L or max row sum matrix norm: A 2 = A = n A i,j max λi j=1,...,n max j=1,...,n µ i max i=1,...,n j=1 n A i,j The Frobenius matrix norm: A F = A i,j 2 i,j=1,...,n The spectral matrix norm; A spec = max λ i i=1,...,n (only defined for a square matrix), where λ i is a (possibly complex) eigenvalue of A. 2

To compute the norm of a matrix A in Matlab: A 1 = norm(a,1); A 2 = norm(a,2)=norm(a); A = norm(a,inf); A F = norm(a, fro ) A spec = (see below) 4 Compatible Matrix Norms One way to define a matrix norm is to do so in terms of a particular vector norm. We use the formula: A = max x 0 Ax x (It would be more precise to use sup rather than max here, as Atkinson does, but the surface of a sphere is a compact set, so the supremum is attained, and the maximum is correct.) A matrix norm defined in this way is said to be vector-bound to the given vector norm. The most interesting and useful property a matrix norm can have is when we can use it to bound certain expressions involving a matrix-vector product. We want to be able to say the following: Ax A x (1) but this expression is not true for an arbitrary chosen pair of matrix and vector norms. If it is true, then the two are compatible. If a matrix norm is vector-bound to a particular vector norm, then the two norms are guaranteed to be compatible. Thus, for any vector norm, there is always at least one matrix norm that we can use. But that vector-bound matrix norm is not always the only choice. In particular, the L 2 matrix norm is difficult (expensive) to compute, but there is a simple alternative. Note that: The L 1, L 2 and L matrix norms can be shown to be vector-bound to the corresponding vector norms and hence are guaranteed to be compatible with them; The Frobenius matrix norm is not vector-bound to the L 2 vector norm, but is compatible with it; the Frobenius norm is much easier to compute than the L 2 matrix norm. The spectral matrix norm is not vector-bound to any vector norm, but it almost is. This norm is useful because we often want to think about the behavior of a matrix as being determined by its largest eigenvalue, and it often is. But there is no vector norm for which it is always true that Ax A spec x Exercise 2: Consider each of the following column vectors: x1 = [ 1, 2, 3 ] x2 = [ 1, 0, 0 ] x3 = [ 1, 1, 1 ] For the same matrix A you used above: 3

A=[ 4 1 1 0-2 2 0 5-4 ] verify that the compatibility condition holds by comparing the values of A that you computed in the previous exercise with the ratios of Ax / x. The final column refers to satisfaction of the compatibility relationship (1). Matrix Vector norm(a*x1) norm(a*x2) norm(a*x3) norm(a) ---------- ---------- ---------- OK? norm norm norm(x1) norm(x2) norm(x3) 1 1 2 2 fro 2 inf inf 5 More on the Spectral Radius In the above text there is a statement that the spectral radius is not vector-bound with any norm, but that it almost is. In this section we will see by example what this means. In the first place, let s try to see why it isn t vector-bound to the L 2 norm. Consider the following false proof. Given a matrix A, for any vector x, break it into a sum of eigenvectors of A as x = x i e i where e i are the eigenvectors of A, normalized to unit length. Then, for λ i the eigenvalues of A, Ax 2 2 = λ i x i e i 2 2 max λ i 2 x i 2 i A 2 spec x 2 and taking square roots completes the proof. Why is this proof false? The error is in the statement that all vectors can be expanded as sums of orthonormal eigenvectors. If you knew, for example, that A were positive definite and symmetric, then, the eigenvectors of A would form an orthonormal basis and the proof would be correct. Hence the term almost. On the other hand, this observation provides the counterexample. Exercise 3: Consider the following matrix A=[0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5 1 0 0 0 0 0 0 0.5] You should recognize this as a Jordan block. Any matrix can be decomposed into several such blocks by a change of basis. 4

(a) Use the Matlab routine [V,D]=eig(A) to get the eigenvalues (D) and eigenvectors V of A. How many linearly independent eigenvectors are there? (b) You just computed the eigenvalues of A. What is the spectral norm of A? (c) For the vector x=[1;1;1;1;1;1;1], what are x L 2, Ax L 2, and ( A spec)( x L 2)? (d) Is Equation (1) satisfied? It is a well-known fact that if the spectral radius of a matrix A is smaller than 1.0 then lim n A n = 0. Exercise 4: (a) For x=[1;1;1;1;1;1;1] compute and plot A k x L 2 for k = 0, 1,..., 40. Please include this plot with your summary. (b) What is the smallest value of k > 2 for which A k x L 2 Ax L 2? (c) What is max 0 k 40 A k x L 2? 6 Types of Errors A natural assumption to make is that the term error refers always to the difference between the computed and exact answers. We are going to have to discuss several kinds of error, so let s refer to this first error as solution error or forward error. Suppose we want to solve a linear system of the form Ax = b, (with exact solution x) and we computed xapprox. We define the solution error as xapprox x. Sometimes the solution error is not possible to compute, and we would like a substitute whose behavior is acceptably close to that of the solution error. One example would be during an iterative solution process where we would like to monitor the progress of the iteration but we do not yet have the solution and cannot compute the solution error. In this case, we are interested in the residual error or backward error, which is defined by Axapprox b = bapprox b where, for convenience, we have defined the variable bapprox to equal Axapprox. Another way of looking at the residual error is to see that it s telling us the difference between the right hand side that would work for xapprox versus the right hand side we have. If we think of the right hand side as being a target, and our solution procedure as determining how we should aim an arrow so that we hit this target, then The solution error is telling us how badly we aimed our arrow; The residual error is telling us how much we would have to move the target in order for our badly aimed arrow to be a bullseye. There are problems for which the solution error is huge and the residual error tiny, and all the other possible combinations also occur. Exercise 5: For each of the following cases, compute the Euclidean norm (two norm) of the solution error and residual error and characterize each as large or small. (For the purpose of this exercise, take large to mean greater than 1 and small to mean smaller than 0.01.) In each case, you must find the true solution first (Pay attention! In each case you can guess the true solution, xtrue.), then compare it with the approximate solution xapprox. (a) A=[1,1;1,(1-1.e-12)], b=[0;0], xapprox=[1;-1] (b) A=[1,1;1,(1-1.e-12)], b=[1;1], xapprox=[1.00001;0] (c) A=[1,1;1,(1-1.e-12)], b=[1;1], xapprox=[100;100] (d) A=[1.e+12,-1.e+12;1,1], b=[0;2], xapprox=[1.001;1] 5

Case Residual large/small xtrue Error large/small 1 2 3 4 The norms of the matrix and its inverse exert some limits on the relationship between the forward and backward errors. Assuming we have compatible norms: xapprox x = A 1 A(xapprox x) ( A 1 )( bapprox b ) and Put another way, Axapprox b = Axapprox Ax ( A )( xapprox x ) (solution error) A 1 (residual error) (residual error) A (solution error) Often, it s useful to consider the size of an error relative to the true quantity. This quantity is sometimes multiplied by 100 and expressed as a percentage. If the true solution is x and we computed xapprox = x + x, the relative solution error is defined as (relative solution error) = x approx x x = x x Given the computed solution xapprox, we know that it satifies the equation Axapprox = bapprox. If we write bapprox = b + b, then we can define the relative residual error as: (relative residual error) = b approx b b = b b These quantities depend on the vector norm used, they cannot be defined in cases where the divisor is zero, and they are problematic when the divisor is small. Actually, relative quantities are important beyond being easier to understand. Consider the boundary value problem u = π2 4 sin(πx 2 ) u(0) = 0 (2) u(1) = 1 This problem has the exact solution u = sin πx/2. In the following exercise you will be computing the solution for various mesh sizes and using vector norms to compute the solution error. You will see why relative errors are most convenient for error computation. Exercise 6: 6

(a) Make a copy of your rope bvp.m file from last lab, or download a copy of mine. Change its name to bvp.m and modify it so that it corresponds to the boundary value problem in 2 above. Do not forget to change the comments. (b) As a test, solve the system for npts=10, plot the solution and compare it with sin(pi*x /2). They should be quite close. You do not need to send me the plot. (c) Use code such as the following to compute solution errors for various mesh sizes, and fill in the following table. Recall that the exact solution is sin πx/2. (Note: you may want to put this code into a script m-file.) sizes=[10 20 40 80 160 320 640]; for k=1:7 [x,y]=bvp(sizes(k)); error(k,1)=norm(y-sin(pi*x /2)); relative_error(k,1)=error(k,1)/norm(sin(pi*x/2)); end disp([error(1:6)./error(2:7),... relative_error(1:6)./relative_error(2:7)]) Euclidean norm error ratio relative error ratio 10/ 20 20/ 40 40/ 80 80/160 160/320 320/640 (d) Repeat the above for the L 1 vector norm. 1 vector norm error ratio relative error ratio 10/ 20 20/ 40 40/ 80 80/160 160/320 320/640 (e) Repeat the above for the L vector norm. Infinity vector norm error ratio relative error ratio 10/ 20 20/ 40 40/ 80 80/160 160/320 320/640 (f) The method used to solve this boundary value problem converges as O(h 2 ). Which of the above calculations yields this rate? 7

7 Condition Numbers Given a square matrix A, the L 2 condition number k 2 (A) is defined as: k 2 (A) = A 2 A 1 2 (3) if the inverse of A exists. If the inverse does not exist, then we say that the condition number is infinite. Similar definitions apply for k 1 (A) and k (A). Matlab provides three functions for computing condition numbers: cond, condest, and rcond. cond computes the condition number according to Equation (3), and can use the one norm, the two norm, the infinity norm or the Frobenius norm. This calculation can be expensive, but it is accurate. condest computes an estimate of the condition number using the one norm. rcond uses a different method to estimate the reciprocal condition number, defined as { 1/k1 (A) if A is nonsingular rcond(a) = 0 if A is singular So as a matrix goes singular, rcond(a) goes to zero in a way similar to the determinant. We won t worry about the fact that the condition number is somewhat expensive to compute, since it requires computing the inverse or (possibly) the singular value decomposition (a topic to be studied later). Instead, we ll concentrate on what it s good for. We already know that it s supposed to give us some idea of how singular the matrix is. But its real role is in error estimation for the linear system problem. We suppose that we are really interested in solving the linear system Ax = b but that the right hand side we give to the computer has a small error or perturbation in it. We might denote this perturbed right hand side as b + b. We can then assume that our solution will be slightly perturbed, so that we are justified in writing the system as A(x + x) = b + b The question is, if b is really small, can we expect that x is small? Can we actually guarantee such a limit? If we are content to look at the relative errors, and if the norm used to define k(a) is compatible with the vector norm used, it is fairly easy (Atkinson, Section 8.4) to show that: x x k(a) b b You can see that we would like the condition number to be as small as possible. (It turns out that the condition number is bounded below. What is the smallest possible value of the condition number?). In particular, since we have about 14 digits of accuracy in Matlab, if a matrix has a condition number of 10 14, or rcond(a) of 10 14, then an error in the last significant digit of any element of the right hand side has the potential to make all of the digits of the solution wrong! In the exercises below we will have occasion to use a special matrix called the Frank matrix. Row i of the n n Frank matrix has the formula: 0 for j < i 2 A i,j = n + 1 i for j = i 1 n + 1 j for j i The Frank matrix for n = 5 looks like: 5 4 3 2 1 4 4 3 2 1 0 3 3 2 1 0 0 2 2 1 0 0 0 1 1 8

The determinant of the Frank matrix is 1, but is difficult to compute numerically. This matrix has a special form called Hessenberg form wherein all elements below the first subdiagonal are zero. Matlab provides the Frank matrix in its gallery of matrices, gallery( frank,n), but we will use an m-file frank.m. You can find more information about the Frank matrix from the Matrix Market, and the references therein. Exercise 7: To see how the condition number can warn you about loss of accuracy, let s try solving the problem Ax = b, for x=ones(n,1), and with A being the Frank matrix. Compute b=a*x; xsolved=a\b; difference=xsolved-x; Of course, xsolved would be the same as x if there were no arithmetic rounding errors, but there are rounding errors, so difference is not zero. This loss of accuracy is the point of the exercise. For convenience let s use Matlab s estimate cond(a) for the L 1 condition number, and assume that the relative error in b is eps, the machine precision. (Recall that Matlab understands the name eps to indicate the smallest number you can add to 1 and get a result different from 1.) We expect that the relative solution error to be bounded by eps * cond(a). Since cond uses the Euclidean norm by default, use the Euclidean norm in constructing the table. Matrix Size cond(a) eps*cond(a) difference / x 6 12 18 24 Your second and third columns should be roughly comparable in size (within a factor of 1000 or so). 9