Stability of the Gram-Schmidt process

Stability of the Gram-Schmidt process Orthogonal projection We learned in multivariable calculus (or physics or elementary linear algebra) that if q is a unit vector and v is any vector then the orthogonal projection of v onto span q is v = v, q q, and the orthogonal complement is v = v v. v q v <v,q>q <v,q>q More generally, if Q is a matrix with orthonormal columns then the orthogonal projection of v onto the column space of Q is QQ v, and its orthogonal complement if v QQ v = (I QQ )v. Note that if Q is a square matrix with orthonormal columns then it is orthogonal. But in general this is not true. The operator P = QQ is called an orthogonal projector. A projector satisfies the relation P 2 = P. The complementary projector is P = I P. It is also a projector, and P P = P P =. To say that a projector is orthogonal is to say that its column space and nullspace are orthogonal or equivalently that the column spaces of P and its complement are orthogonal. This is also equivalent to the condition that P = P. This is because for any matrix A the orthogonal complement of the column space is the same thing as the nullspace of A. Beware of a possible confusion in the terminology: an orthogonal projector is rarely an orthogonal matrix. Classical method Suppose A = [a 1... a n ] is a matrix with linearly independent columns. We will use orthogonal projectors to construct a matrix Q = [q 1... q n ] satisfying the following conditions: 1. Q Q = I. 2. For all j, span{a 1,..., a j } = span{q 1,..., q j }. From the second condition we see that A = QR, where R is upper triangular. Our algorithm will also ensure that the diagonal entries of R are positive. With these three conditions, Q and R are uniquely determined. Here is the classical algorithm: 1. Set r 11 = a 1 and q 1 = a 1 /r 11. 2. Given q 1,..., q j 1, construct q j and r j as follows: (a) Let P be the orthogonal projector onto span{q 1,..., q j 1 }. (b) Let r 1j,..., r j 1,j be the coordinates of P a j. (c) Let v = P a j. (d) Let r jj = v. (e) Finally, let q j = v/r jj. 3. Repeat step 2 for j = 2,..., n. (We are using the 2-norm throughout.) This produces the so-called reduced QR-decomposition; Q has the same size as A, and R is square. The so-called full QR-decomposition completes Q to a square, orthogonal matrix, and adds all- rows to R so that we still have QR = A. The full form of the QR-decomposition is used less often than the reduced form. Also, if A does not have linearly independent columns then we can still construct a QR-decomposition. When we come to a column which is linearly dependent on the previous columns then v = in step 2c and so we merely skip steps 2d and 2e. If A has rank r then this produces an m r matrix Q with orthonormal columns and an upper echelon r n matrix R with positive pivot entries. 1

Stable method The classical Gram-Schmidt method is useful conceptually but is not numerically stable, as we shall see. There is a simple way to fix the algorithm so as to restore stability: rather than adjust the current column to be orthogonal to the previous columns, adjust the remaining columns to be orthogonal to the current one! Said another way, rather than produce R column-by-column, produce it row-by-row. Here is the modified Gram-Schmidt algorithm, which we will also refer to as the stable Gram-Schmidt algorithm: 1. Set Q = A. 2. Let r jj = q j. 3. Replace q j by q j /r jj. 4. Make q j+1,..., q n orthogonal to q j, as follows: (a) Let r j,k = q k, q j, for k = j + 1,..., n. (b) Replace q k by q k r jk q j, for k = j + 1,..., n. 5. Repeat steps 2, 3, and 4 for j = 1,..., n. To appreciate the difference in stability between the two methods, exercises 1 and 2 (below) ask you to run these algorithms by hand, on a small, nearly singular, 2 2 example. You may use a calculator to do the arithmetic, but round off each computation to 5 decimal places. For this same matrix, exercise 3 asks you to make another check on the stability, called the Householder Test. Do this in Matlab or Octave, not by hand. Finally, exercise 4 asks you to determine the efficiency of these two algorithms, by counting the total number of floating point operations (flops) for each. As before, you should probably make three separate tallies for each: Additions and subtractions. Multiplications. Divisions. Stability comparison Let us explore the stability of the three functions clgs(), stgs(), and qr() (which we will explain in a later lesson) using large matrices. We will generate an 8 8 matrix with random Q and with an R whose diagonal entries decrease exponentially. To do this we use a factorization we will learn more about later, the singular value decomposition, or svd. This has the form USV, where U and V are orthogonal and S is diagonal with positive entries. [U,X] = qr(randn(8)); % randn() use the normal distribution. [V,X] = qr(randn(8)); S = diag(2.^ (-1:-1:8)); %.^ means element-wise exponentiation. A = U*S*V; We have chosen U and V randomly, and S to have diagonal entries 2 1,..., 2 8. When we find the QR-decomposition of A these will not be the exact values of the diagonal of R but they will be of the same order of magnitude. In particular at least some of these values are below the machine accuracy, and so will be lost to roundoff error. How many? Let s compute the QR-decomposition in three ways, collect the computed diagonal of R into three vectors x, y, and z, then plot the logarithms (base 2) of the three together: [Qc,Rc] = clgs(a); [Qs,Rs] = stgs(a); [Q,R] = qr(a,); x = sum(tril(rc)); y = sum(tril(rs)); z = sum(abs(tril(r))); % Here we are only interested in the R, but % you might also check norm(q *Q-eye(8)). % The parameter requests the reduced form. % This is a way to collect the diagonal entries % into a vector -- can you find a better way? % qr() does not guarantee a postive diagonal. 2

plot(log2(x), @;clgs();, % The expression @;clgs(); is a formatting log2(y), @;stgs();, % command: it says to plot points (rather than log2(z), @;qr(); ) % lines, say) and label these as clgs(). clgs() stgs() qr() -1-2 -3-4 -5-6 1 2 3 4 5 6 7 8 Note that clgs() becomes lost in roundoff at about 2 28, which is well before machine accuracy should be a problem. (ɛ machine = 2 56 = (2 28 ) 2.) However both stgs() and qr() remain stable down to the limits of machine accuracy. Can they be distinguished in another way, perhaps by the Householder Test? Application: Legendre polynomials There are many applications of the QR-decomposition, for example to the solution of linear systems. (See exercise 5.) In this section we explore an application of the Gram-Schmidt algorithm itself. The goal of the algorithm is to take a linearly independent set and produce from it an orthonormal set with the same span. All we need to run the algorithm is a vector space with an inner product. One important, infinite-dimensional example is the space of all continuous functions on [ 1, 1], with the inner product defined as follows: f, g = 1 2 1 1 f(t)g(t) dt. (The factor of 1 2 makes the constant 1 into a unit vector.) Thus we may speak of two functions being orthogonal on [ 1, 1]. For example, sin(πt) and cos(πt) are orthogonal on [ 1, 1]. (Check this!) The Legendre polynomials are the orthogonal polynomials produced by apply the Gram-Schmidt process to the standard monomials t, t 1, t 2, t 3,.... Exercise 6 asks you to determine exact formulas for the first four Legendre polynomials, q,..., q 3. Here we approximate these by discretizing the interval [ 1, 1] that is, we take a large number (257 in this example) of equally spaced points from that interval, form a matrix A by evaluating the monomials t,..., t 3 at these points, then computing the QR-decomposition of A. Note that A consists of four columns of a Vandermonde matrix. Finally we plot the four functions q i. t = (-128:128) /128; % We could also use x = linspace(-1,1,128) ; A = []; for j = :3 % We construct A by hand using element-wise A(:j+1) = t.^ j; % operations, because it would be a waste of end % memory to produce all 8 columns with vander(). [Q,R] = qr(a,); plot(q) % This plots the columns as distinct functions. 3

.2.15.1.5 -.5 -.1 -.15 -.2 In essence we are using Riemann sums to approximate the integrals, and so we need to rescale by t. To take into account any roundoff errors, let s rescale all of these polynomials so that q i (1) = 1. for j = 1:4 Q(:,j) = Q(:,j)/Q(257,j); end plot(q) 1.5 -.5-1 4

Homework problems: Final version due Friday, 17 March Do five of the following six problems. In problems 1, 2, and 3, (but not 4, 5, or 6) let [ ].7.7711 A =..71.7711 1. For A as above, compute [Q, R] = clgs(a) using 5-place floating-point arithmetic. Is Q orthogonal? Is QR = A? 2. For A as above, compute [Q, R] = stgs(a) using 5-place floating-point arithmetic. Is Q orthogonal? Is QR = A? 3. For A as above, use Matlab or Octave to compute the QR-decomposition three ways: clgs(), stgs(), and qr(). For each of these check Q Q I. (This is called the Householder Test.) What should this value be? What do you find? 4. Carefully count flops for clgs() and stgs(). As before, you should probably make three separate tallies for each: Additions and subtractions. Multiplications. Divisions. Which is faster? 5. Suppose you are given the (exact!) QR-factorization of a matrix A. (Not necessarily the one above!) Describe an efficient method for solving Ax = b. How many flops does this method require? Is it backward stable? 6. Find explicit formulas for the Legendre polynomials q,..., q 3. Use both the classical and the stable algorithms, but do not bother to keep track of R. Verify that the q i you find are orthonormal. Which algorithm is easier to use in this context? 5