THE APPLICATION OF EIGENPAIR STABILITY TO BLOCK DIAGONALIZATION

Size: px

Start display at page:

Download "THE APPLICATION OF EIGENPAIR STABILITY TO BLOCK DIAGONALIZATION"

Piers Stone
5 years ago
Views:

1 SIAM J. NUMER. ANAL. c 1997 Society for Industrial and Applied Mathematics Vol. 34, No. 3, pp , June THE APPLICATION OF EIGENPAIR STABILITY TO BLOCK DIAGONALIZATION NILOTPAL GHOSH, WILLIAM W. HAGER, AND PURANDAR SARMAH Abstract. An algorithm presented in Hager [Comput. Math. Appl., 14 (1987), pp ] for diagonalizing a matrix is generalized to a bloc matrix setting. It is shown that the resulting algorithm is locally quadratically convergent. A global convergence proof is given for matrices with separated eigenvalues and with relatively small off-diagonal elements. Numerical examples along with comparisons to the QR method are presented. Key words. bloc diagonalization, eigenpair stability, eigenvalues, eigenvectors AMS subject classifications. 65F05, 65F15, 65F35 PII. S Introduction. Let A denote an n n bloc matrix; that is, the (i, j) element A ij of A is itself a matrix of dimension n i n j, where r i=1 n i = n for some r n. In this paper, we develop and analyze an algorithm for computing a bloc diagonalization XΛX 1 of A, assuming one exists. Here Λ is a bloc diagonal matrix with diagonal blocs Λ i,i =1tor,andXis an invertible matrix whose ith bloc of columns is denoted by X i. Hence, the equation A = XΛX 1 is equivalent to the relation AX i = X i Λ i, i =1tor. The algorithm developed in this paper is based on a stability result, Proposition 1, for a perturbation A(ε)X(ε) =X(ε)Λ(ε) of the original eigenequation. We show that if the spectrum of Λ i and Λ j are disjoint for each i j, then there exist continuously differentiable solutions X(ε) andλ(ε) to the perturbed equation. After differentiating the perturbed equation and applying Taylor s theorem, we obtain the following algorithm (throughout the paper, the subscript denotes the iteration number while the subscripts i and j denote elements or submatrices of larger matrices). BLOCK DIAGONALIZATION ALGORITHM. If X Λ X 1 is the current approximate diagonalization of A, then (1) Λ +1 = diag X 1 AX and X +1 = X (I + D), where (2) diag D = 0 and DΛ +1 Λ +1 D =off X 1 AX. The notation diag and off above are defined in the following way: given a bloc matrix B, diag B denotes the bloc diagonal matrix whose diagonal blocs coincide with the diagonal blocs of B, and off B denotes the matrix that coincides with B except for the diagonal blocs which are replaced by blocs of zeros. Received by the editors January 4, 1995; accepted for publication (in revised form) September 22, This research was supported by U.S. Army Research Office contract DAAL03-89-G-0082 and by the National Science Foundation. Department of Mathematics, Catonsville Community College, Baltimore, MD Visiting Professor, Department of Mathematics, University of Florida, Gainesville, FL, Department of Mathematics, University of Florida, Gainesville, FL (hager@math.ufl.edu, puran@swamp.bellcore.com). 1255

2 1256 N. GHOSH, W. W. HAGER, AND P. SARMAH In section 3 we show that if the spectrum of Λ i and Λ j are disjoint for i j, then this algorithm is locally quadratically convergent. Moreover, in the case of 1 1 diagonal blocs, we establish a global convergence result in section 4 when the diagonal elements of A are separated and the off-diagonal elements are relatively small. Some related Newton-type methods for computing or refining invariant spaces are presented in [2], [4], [5], [7], and [15]. The wor of Anselone and Rall [2] justifies the application of Newton s method to the system (3) Ax λx =0, y T x =1, where the vector x and the scalar λ are the unnowns and y is a normalization vector. This scheme, which typically converges locally quadratically for a simple eigenvalue, involves the inversion of a matrix of the following form in each iteration: [ ] A λ I x y T. 0 In [4] Chatelin examines the problem of approximating an invariant space spanned by the columns of some matrix Z. She applies Newton s method to the equation (4) AZ Z(Y T AZ) =0, where Y is a normalization matrix. Observe that at a solution to (4), Y T Z = I assuming Y T AZ is invertible. The condition Y T Z = I is the generalization of the condition y T x = 1 in (3). Chatelin obtains a local quadratic convergence result for Newton s method applied to (4) assuming that zero is not an eigenvalue of A and that the number of eigenvalues of A, counting multiplicities, associated with the columns of Z is equal to the number of columns of Z. Each iteration of this scheme involves inverting a linear operator L, where L acting on a matrix M is defined by L(M) =(I ZY T )AM M(Y T AZ). In [7] Dongarra, Moler, and Wilinson also consider Newton s method applied to (3); however, they focus on the special case where all the components of y are zero except for one component which is set to one. In [7] the authors consider both the modified Newton s method and the standard Newton scheme. In the modified Newton s method (see [12]), the Jacobian is evaluated at some given point, and in each Newton iteration, this fixed Jacobian is used instead of the Jacobian at the new iterate. Hence, each iteration of the modified scheme involves inverting a matrix of the form [ ] A λi x y T, 0 where λ and x are fixed approximations to an eigenpair. Generalizations of this idea for invariant spaces are also presented in [7]. In [5] Demmel examines the schemes [4] and [7], as well as a scheme that he attributes to Stewart [15], and he shows that in some sense, each of these schemes tries to solve an underlying Riccati equation. In comparing the schemes of Anselone and Rall, Chatelin, and Dongarra, Moler, and Wilinson with the scheme proposed in this paper, we mae the following observation: each iteration of any of these schemes

3 BLOCK DIAGONALIZATION 1257 essentially involves the inversion of a matrix. With the bloc diagonalization scheme, one inversion yields a first-order correction to all the eigenvectors and eigenvalues while one inversion with any of the other methods yields a first-order correction to an invariant space. On the other hand, with these other schemes, one only needs an approximation to an invariant space to get an improved approximation while the scheme in this paper requires an approximation to a bloc diagonalization to get an improved approximation. For another comparison between the bloc algorithm and Newton s method applied to (3), suppose that an approximate bloc diagonalization has been determined and we wish to obtain a first-order correction. For simplicity, suppose that all the eigenvalues are simple and separated so that Newton s method applied to (3) converges nicely. With the bloc algorithm, the flop count for 1 step is essentially 10 3 n3 there are n 3 flops involved in multiplying A and X, 4 3 n3 flops to compute X 1 (AX ), and n 3 flops to compute X (I + D). Now consider Newton s method applied to (3); to be specific, let us consider the choice of Dongarra, Molder, and Wilinson for y: each component is zero except for one component. To implement the Newton approach, let us initially reduce A to upper Hessenberg form by an orthogonal similarity transformation; without this reduction, the Newton approach would require on the order of n 4 flops. The flop count (see [10]) for this reduction is about 10 3 n3 flops. As described in [7], the Newton system can be solved using a series of Givens rotations, followed by an LU factorization. The flop count for this process in the worst case is 3n 2,and since there are n eigenvectors to update, the total flop count is 3n 3. Finally, we need to expend n 3 flops to multiply each eigenvector by the orthogonal factor to obtain an eigenvector of the original matrix. The total flop count for the Newton approach is 22 3 n3 flops. Hence, the bloc algorithm appears to be preferable relative to a flop count; in addition, the bloc algorithm is much easier to implement, especially when the blocs are larger than The algorithm. The algorithm developed in this paper is based on a stability result for the bloc diagonalization of a perturbation of the original matrix A. PROPOSITION 1. Let A(ε) denote an n n matrix which is continuously differentiable at ε =0and which has the property A = A(0). If for each i j the spectrum of Λ i and Λ j is disjoint, then there exist continuously differentiable functions X(ε) and Λ(ε), where Λ(ε) is bloc diagonal, X(0) = X, Λ(0) = Λ, and for each ε near 0 we have (5) A(ε)X(ε) =X(ε)Λ(ε). Proof. Our proof of this result is based on the implicit function theorem. Instead of evaluating X(ε) directly, we introduce an auxiliary matrix C(ε) which is chosen such that X(ε) =XC(ε). Since X is invertible, there is a one-to-one correspondence between C(ε) andx(ε). It turns out that there are an infinite number of solutions to (5) with the desired properties. In order to achieve uniqueness, we impose the following constraints: C jj (ε) =Ifor each j. With these side constraints, (5) taes the form (6) A(ε)XC(ε)=XC(ε)Λ(ε), C jj (ε)=i for each j. Abstractly, (6) has the form F (ε, C(ε), Λ(ε)) = 0, where the solutions C(ε) andλ(ε) depend on ε and where the number of equations is equal to the number of unnown elements of C(ε) andλ(ε). At ε = 0, there is the trivial solution C(0) = I, and

4 1258 N. GHOSH, W. W. HAGER, AND P. SARMAH Λ(0) = Λ. By the implicit function theorem (see [1, Thm. 13.7] or [8, Thm. 10.8]), there exists a continuously differentiable solution for ε near zero if the derivative operator of the system (6) with respect to C and Λ is invertible at ε = 0. Establishing invertibility of the derivative is equivalent to showing that the only solution (δc, δλ), with δλ bloc diagonal, to the following linear system is δc = 0 and δλ =0: F F F (0,C(0), Λ(0))δC + (0,C(0), Λ(0))δΛ = F (0,I,Λ)δC + (0,I,Λ)δΛ =0. C Λ C Λ In the context of (6), this reduces to (7) AXδC = XδCΛ+XδΛ, δc jj =0, j =1tor. Premultiplying (7) by X 1 yields (8) ΛδC = δcλ+δλ, δc jj =0, j =1tor. Since Λ and δλ are bloc diagonal and δc jj = 0, we deduce that δλ = 0. For i j, (8) implies that (9) Λ i δc ij δc ij Λ j =0. Referring to [13, sect. 12.5, Thm. 1], we see that the unique solution to (9) is δc ij =0 when the spectrum of Λ i and Λ j is disjoint. Since δc = δλ = 0, we conclude that the derivative of (6) with respect to C andλatε= 0 is invertible. The implicit function theorem completes the proof. Proposition 1 is surprising since Λ(ε) is differentiable even though its eigenvalues may be nondifferentiable. To evaluate X (0) and Λ (0), we differentiate (6) to obtain the relation A (0)X + AXC (0) = XC (0)Λ + XΛ (0), C jj(0)=0, j =1tor. Premultiplying by X 1 yields (10) X 1 A (0)X +ΛC (0) = C (0)Λ + Λ (0), C jj(0)=0, j =1tor. If Z T denotes X 1, (10) is equivalent to the following relations: (11) Λ i(0) = Z T i A (0)X i, i =1tor, and (12) C ij(0)λ j Λ i C ij(0) = Z T i A (0)X j for i j, C jj(0)=0, j =1tor. With the notation diag and off of section 1, (11) and (12) can be expressed (13) Λ (0) = diag X 1 A (0)X, diag C (0)=0, C (0)Λ ΛC (0)=off X 1 A (0)X. After obtaining C (0) satisfying (13), we have X (0) = XC (0). When the spectrum of Λ i and Λ j is disjoint, (12) can be solved in the following way (see [3] or [10, p. 387] for details): replace Λ i and Λ j by their Schur decompositions and obtain an equivalent equation with upper triangular matrices in place of Λ i and Λ j. This upper triangular system can be solved directly by a substitution procedure.

5 BLOCK DIAGONALIZATION 1259 As in [11], we can utilize these formulas for X (0) and Λ (0) in an iterative algorithm for bloc diagonalizing a matrix. In particular, given a matrix A and an approximate bloc diagonalization XΛX 1, let us consider the matrix A(ε) =XΛX 1 +ε(a XΛX 1 ). Observe that A(1) = A. Suppose that the spectrum of Λ i and Λ j is disjoint for i j and that for ε between zero and one, the continuously differentiable functions X(ε) andλ(ε) of Proposition 1 exist. Evaluating the first-order Taylor expansions of X(ε) andλ(ε) around ε = 0 and putting ε = 1, we obtain the following first-order approximations: and Λ(1) Λ(0) + Λ (0) = Λ + diag(x 1 (A XΛX 1 )X) = diag X 1 AX, X(1) X(0) + X (0) = X + XC (0) = X(I + C (0)), where C (0) satisfies (13). Based on these expansions, we arrive at the algorithm (1) and (2), where D corresponds to the matrix C (0) above. As another variation of (2), the matrix Λ +1 can be replaced by the previous iterate Λ in the evaluation of D; that is, the matrix D satisfies (14) diag D = 0 and DΛ Λ D =off X 1 AX. It turns out that with the modified formula (14) for D, the bloc diagonalization algorithm is 2-step quadratically convergent while the original formula (2) yields a 1-step quadratically convergent algorithm. 3. Local quadratic convergence. This section analyzes the local convergence of the bloc diagonalization algorithm while the next section analyzes one situation where convergence is guaranteed for the starting guess X 0 = I. Throughout this analysis, denotes any matrix norm with the following properties: P1. AB A B for each A and B. P2. A B whenever a ij b ij for each i and j. P3. I =1. Below, subscripts i and j are used to denote elements of matrices while subscripts, l, andmare used to denote the iteration number. Finally, let B ρ (X) denote the ball with center X and radius ρ: B ρ (X) ={Y : Y X ρ}. Recall that there is an eigenspace associated with each eigenvalue of a matrix. At best, the bloc diagonalization algorithm converges to elements of the eigenspace associated with a bloc diagonalization of A. Given a bloc diagonalization XΛX 1 of A, lets(x) denote the set of all matrices of the form XD for some invertible bloc diagonal matrix D. THEOREM 1. Suppose that A = XΛX 1, where Λ is bloc diagonal and the spectrum of Λ i and Λ j is disjoint for all i j. Then for all X 0 in a neighborhood of X, the iterates X and Λ generated by the bloc diagonalization algorithm approach limits X and Λ such that A = X Λ X 1, where X lies in S(X) and where the

6 1260 N. GHOSH, W. W. HAGER, AND P. SARMAH root convergence order is at least two. That is, there exists a constant C, independent of, such that X X + Λ Λ C2 2. Proof. Let F Λ (Z) denote the linear operator defined by F Λ (Z) =off (ZΛ ΛZ), where the domain and range are the set of n n bloc matrices with diagonal blocs equal to zero. As noted previously, F Λ is an invertible linear opeator since the spectrum of Λ i and Λ j is disjoint for all i j. Since F Λ is a continuous function of Λ, there exist constants γ and σ such that F 1 M γfor every bloc diagonal M B σ(λ). Choose ρ small enough that Y is invertible whenever Y B ρ (X), and let β be defined by β = max{ Y 1 : Y B ρ (X)}. Decrease ρ further if necessary to ensure that (15) ρβ 1 and Λ Y 1 AY + βρ Y 1 AY σ for every Y B ρ (X). We will show that there exist a constant c and a Y +1 S(X) such that (16) Y +1 Y c X Y and Y +1 X +1 c X Y 2, where c is independent of X and Y in B ρ (X), and where X +1 is generated by the bloc diagonalization algorithm. Assuming, for the moment, the existence of the constant c in (16), the proof is completed in the following way: define Y 0 = X and choose X 0 close enough to X such that (17) X 0 X minimum { 1 (4c), ρ (4c), ρ }. 4 Since Y 0 = X, both X 0 and Y 0 lie in B ρ (X). Proceeding by induction, suppose that X 1,Y 1,...,X,Y all lie in B ρ (X). We use (16) to show that X +1 and Y +1 also lie in B ρ (X). The second inequality in (16) leads to the relation c Y X (c X X 0 ) 2 or Y X X X 0 (c X X 0 ) 2 1. Combining this with the bound (17), we have 1 (18) Y X X X 0 4 =4 X X 0 ρ Utilizing the first inequalities in (16) and (18), and the bound in (17), we get Y +1 X Y +1 Y + Y X c X Y + Y X 4c 4 2 X X 0 + Y X ρ Y X.

7 Repeated application of this inequality yields BLOCK DIAGONALIZATION 1261 Y +1 X ρ l= l < ρ 2. Thus, Y +1 B ρ/2 (X). By (18), with replaced by +1,wehave Y +1 X +1 ρ Hence, X +1 lies in B ρ (X) and the induction step is complete. By the first inequality in (16) and by (18) we have (19) 1 Y Y l m=l cρ 1 Y m+1 Y m c m=l 1 2cρ 4 2m 4 2l m=l 1 Y m X m cρ 1 4 2m m=l for any l. Thus, for X 0 sufficiently close to X, the Y form a Cauchy sequence approaching some limit X, and by (18) the X approach X as well. The triangle inequality X X X Y + Y X combined with (18) and (19) complete the proof of quadratic convergence for the X. Since Λ +1 = diag X 1 AX, we conclude that the Λ approach Λ = X 1 AX at the same rate that the X approach X. To establish (16), we proceed in the following way. Let E be a matrix chosen so that X = Y (I + E); that is, E = Y 1 X I = Y 1 (X Y ). By (15) and (18) (20) E = Y 1 (X Y ) Y 1 X Y β X Y βρ It follows that I + E is invertible and (21) (I + E) E 2. Defining M = Y 1 AY, which is bloc diagonal since Y S(X), observe that X 1 AX =(I+E) 1 Y 1 AY (I + E) =[I E+E 2 (I+E) 1 ]M (I+E) (22) =M EM + M E EM E + E 2 (I + E) 1 M (I + E) = M EM + M E + L, where By (15) with Y = Y we have L = E 2 (I + E) 1 M (I + E) EM E. (23) Λ M = Λ Y 1 AY σ,

8 1262 N. GHOSH, W. W. HAGER, AND P. SARMAH and combining this with (21) we have ( (24) L E 2 M E )= 2 M E 2 1 E 1 E 4(σ + Λ ) E 2. Defining Λ=Λ +1 = diag X 1 AX and referring to (22), the matrix D involved in the evaluation of X +1 satisfies off (DΛ ΛD) =DΛ ΛD=off X 1 AX =off (M E EM + L), which implies that (25) off ((D + E)Λ Λ(D + E))=off ((M Λ)E E(M Λ) + L). By (21), (22), (23), and the next-to-last inequality in (24) we have (26) Λ M = diag X 1 AX M X 1 AX M = M E EM + L 2 M E + L 2 M E 4(σ + Λ ) E, 1 E from which it follows that (27) off ((M Λ)E E(M Λ) + L) (M Λ)E E(M Λ) + L 2 E Λ M + L 12(σ + Λ ) E 2. By (26) we have (28) Λ Λ Λ M + M Λ 2 M E 1 E Referring to (20), we see that + M Λ. (29) E 1 2 Combining (28) and (29) gives and E βρ 4. Λ Λ M Λ +βρ M. By (15) Λ Λ σ, which ensures that F 1 γ. This bound in conjunction with Λ (25) and (27) implies that (30) off (D + E) 12γ(σ + Λ ) E 2. Let G be defined by G =off(d+e). Substituting D =offd=g off E in (1) and utilizing the identity X = Y (I + E) yields Hence, we have X +1 = X (I + D) =Y (I+E)(I + G off E). (31) X +1 = Y (I + diag E)+Y (I+E)G Y Eoff E.

9 Defining Y +1 = Y (I + diag E), observe that BLOCK DIAGONALIZATION 1263 (32) Y +1 Y Y diag E (ρ+ X ) E since Y lies in B ρ (X). Referring to (20), E β X Y, and combining this with (32) gives (33) Y +1 Y β( X +ρ) X Y, which yields the first inequality in (16). Moreover, by (31), we have X +1 Y +1 Y ( I+E G + E 2 ) (ρ+ X )( I + E G + E 2 ). Combining this with the bound for G =off(d+e) in (30) and the relation I +E 1+ E 3 2 from (29) gives (34) X +1 Y +1 (ρ+ X )(18γ(σ + Λ )+1) E 2 β 2 (ρ+ X )(18γ(σ + Λ )+1) X Y 2. The constant c in (16) is the maximum of the constants in (33) and (34) 4. A priori convergence. This section analyzes one situation where the bloc diagonalization algorithm is guaranteed to converge. Define A 0 = A and for >0, let A denote X 1 AX. With this notation, the bloc diagonalization algorithm, starting from X 0 = I, can be expressed in the following way: (35) Λ +1 = diag A, A +1 =(I+D ) 1 A (I+D ), X +1 = X (I + D ), where diag D = 0 and D Λ +1 Λ +1 D =off A. Throughout this section, we assume that the diagonal blocs of A are all 1 1. In addition to the three properties imposed for the norm in section 3, we assume the following: P4. diag A = maximum { A ii : i =1ton}for each A. Let σ(a) be the minimum separation of diagonal elements defined by σ(a) = minimum i j A jj A ii. In the case of 1 1 blocs, the equation for D can be solved explicitly. In particular, we have A,ij (36) D,ij = A,jj A,ii for all i j. Taing absolute values yields It follows that D,ij A,ij /σ(a ). (37) D off A σ(a ). These observations are the basis for the following theorem.

10 1264 N. GHOSH, W. W. HAGER, AND P. SARMAH THEOREM 2. If off A <σ(a)( 3 1)/2, then A is diagonalizable, and the iterates Λ and X generated by the bloc diagonalization algorithm, with the starting guess X 0 = I, approach limits Λ and X, respectively, where A = X Λ X 1. Proof. Let α denote the ratio off A /σ(a), which is less than or equal to ( 3 1)/2 by assumption. Hence, the inequality (38) off A ασ(a ) holds trivially at = 0. Proceeding by induction, assume that (38) holds at iteration. By (37) and (38), (39) D off A α. σ(a ) Letting B denote I + D, it follows that (40) B 1 = (I + D ) 1 By (35) we have A +1 Λ +1 = B 1 A B Λ D 1 1 α. = B 1 (off A +Λ +1 )B Λ +1 Combining this with (38), (39), and (40) yields (41) Also note that = B 1 off A B + B 1 Λ +1 B Λ +1 = B 1 off A B + B 1 (Λ +1 B BΛ +1 ) = B 1 off A B B 1 off A = B 1 off A (B I) = B 1 off A D. off A +1 A +1 Λ +1 B 1 off A D α 1 α off A α2 1 α σ(a ). diag (A +1 A ) A +1 diag A = A +1 Λ +1 (42) α 1 α off A α2 1 α σ(a ). Applying the diag operator to the identity A +1 = A +(A +1 A ) and referring to (42) we have σ(a +1 ) σ(a ) 2 diag (A +1 A ) σ(a ) 2α2 σ(a ) (43) 1 α = (1 α 2α2 )σ(a ). 1 α Combining this with (41) gives off A +1 σ(a +1 ) α 2 1 α 2α 2 α for α ( 3 1)/2. This completes the induction step, and (38) holds for all.

11 Observe that in (41) we establish the relation BLOCK DIAGONALIZATION 1265 off A +1 α 1 α off A. Repeated application of this inequality gives ( ) α (44) off A off A. 1 α Since α< 1 2, the off-diagonal elements of A all tend to zero. Also, by (42) and (44), ( ) +1 α (45) diag A +1 diag A off A. 1 α Thus, the diagonal of A forms a Cauchy sequence approaching some limit Λ, while the off-diagonal elements tend to zero. We now show that no two diagonal elements of Λ are equal. By the first inequality in (43) and by (45) we have ( ) +1 ( ) +1 α α σ(a +1 ) σ(a ) 2 off A = σ(a ) 2ασ(A). 1 α 1 α Repeated application of this inequality yields ( ( ) ) ( i α ( ) ) i α σ(a ) σ(a) 1 2α σ(a) 1 2α 1 α 1 α i=1 i=1 ( ( )) ( ) α/(1 α) 1 2α 2α 2 = σ(a) 1 2α = σ(a). 1 α/(1 α) 1 2α Since 1 2α 2α 2 > 0forα<( 3 1)/2, σ(a ) is uniformly bounded away from zero. Combining the lower bound for σ(a ) with (37) and (44) yields (46) D cr, where r = c = By the equation for X +1 α (1 α) 1 3 and off A (1 2α) α(1 2α) σ(a)(1 2α 2α 2 = ) 1 2α 2α 2. X +1 = X (I + D ) (1 + D ) X (1 + cr ) X X 0 Since ( ) log (1 + cr l c ) (1 r), l=0 (1 + cr l ). X is uniformly bounded by some constant β independent of (see [1, p. 209, Thm. 8.55]). The estimate X +1 X = X D X D βcr l=0

12 1266 N. GHOSH, W. W. HAGER, AND P. SARMAH TABLE 1 n off A TABLE 2 off A implies that the X form a Cauchy sequence that approaches some limit X. Since D < 1 for each by (39), the X are invertible and (47) X 1 +1 (I+D ) 1 X D X cr X 1. The relation 1/(1 cr ) 1+2cr for sufficiently large along with (47) shows that the sequence X 1 is uniformly bounded. Thus, X is invertible and X 1 approaches X 1 ; moreover, Λ +1 = diag X 1 AX approaches X 1 AX =Λ. This completes the proof. During the proof of Theorem 2, it was shown that no two diagonal elements of Λ are equal. Hence, Theorem 1 implies that the convergence in Theorem 2 is locally quadratic. 5. Numerical experiments. To illustrate Theorems 1 and 2, we consider the matrix A defined by A ij =3 i j for i j, anda ii = i. We apply the iteration (35), stopping when the following inequality holds: off A = off X 1 AX Here denotes the matrix -norm (maximum absolute row sum). By Gerschgorin s theorem the diagonal elements of A approximate the eigenvalues of A with error at most off A. The number of iterations needed for convergence appears in Table 1 for various values of the dimension n of A. In Table 2 we show how the error depends on the iteration number for n = 10. Observe that the convergence appears to be at least quadratic as the theory predicts. In Table 3 we compare the execution times (in seconds on a Sun Sparc 20 worstation, compiled with f77 O) for the bloc diagonalization algorithm with the corresponding times of both Dongarra s SICEDR [6] (an implementation of the Newton algorithm contained in ACM algorithm 589) and the QR algorithm as implemented in EISPACK [9], [14]. Recall that the flop count for one iteration of the bloc algorithm is 10 3 n3 while the Newton approach requires about 22 3 n3 flops. As a rough estimate, a diagonalization computed by the QR method requires 9n 3 flops about 5n 3 flops to reduce the matrix to Hessenberg form and to form the orthogonal matrix used in the reduction, 2n 3 flops to update the orthogonal matrix during the reduction to Schur form, n 3 flops to reduce the Hessenberg matrix to triangular form, and n 3 flops

13 BLOCK DIAGONALIZATION 1267 TABLE 3 n Bloc QR SICEDR TABLE 4 ε Bloc QR to diagonalize the triangular matrix and multiply by the orthogonal factor. Hence, three iterations of the bloc algorithm are comparable to the QR algorithm, which agrees with the observed computing times in Table 3. Observe though that the Newton approach was somewhat slower than was predicted by the flop counts. Moreover, some of the Newton iterates converged to the same eigenvalue; hence, one did not actually achieve a diagonalization of the starting matrix but rather achieved a refined approximation to most of the eigenpairs. As both the flop counts and the execution times in Table 3 indicate, one or two iterations of the bloc algorithm execute quicer than the QR algorithm. Hence, on a serial computer, the bloc algorithm is only faster than the QR algorithm when we have a good starting guess for the bloc diagonalization, for example, when a matrix is modified incrementally and rediagonalized after each change. On the other hand, the bloc algorithm is much better suited to parallel computing than the QR method since matrix multiplication and matrix inversion are readily implemented in a parallel computing environment. In comparing code size the bloc algorithm was implemented in 120 lines of Fortran code, SICEDR contains 1000 statements, and the QR code contains 1370 statements (excluding comment and matrix generation statements). In the next sequence of experiments we tae a matrix A whose elements are randomly distributed between zero and one, and we use the QR method to obtain the real Schur decomposition A = QUQ T, where Q is orthogonal and U is quasiupper triangular. We then perturb A by a matrix E whose elements are randomly distributed on the interval [0,ε]. In Table 4 we compare the number of iterations for the bloc diagonalization algorithm with the number of iterations for the QR method applied to the matrix Q T (A + E)Q = U + Q T EQ. The starting X 0 in the bloc algorithm corresponds to the invariant spaces of A, where a pair of real vectors are used to span each complex conjugate pair of eigenvectors. With this choice for X 0, the bloc algorithm could be implemented with real arithmetic. Since the number of QR iterations needed to obtain the original Schur decomposition was 132, it appears that the number of iterations for the QR method applied to a small perturbation of an upper Hessenberg matrix is essentially the same as the number of iterations that were required for the original random matrix. On the other hand, with the bloc diagonalization algorithm, the number of iterations decreases as the perturbation decreases.

14 1268 N. GHOSH, W. W. HAGER, AND P. SARMAH REFERENCES [1] T. M. APOSTOL, Mathematical Analysis, 2nd ed., Addison Wesley, Reading, MA, [2] P. M. ANSELONE AND L. B. RALL, The solution of characteristic value-vector problems by Newton s method, Numer. Math., 11 (1968), pp [3] R. H. BARTELS AND G. W. STEWART, Solution of the equation AX + XB = C, Comm. ACM, 15 (1972), pp [4] F. CHATELIN, Simultaneous Newton s iteration for the eigenproblem, Comput. Suppl., 5 (1984), pp [5] J. W. DEMMEL, Three methods for refining estimates of invariant subspaces, Computing, 38 (1987), pp [6] J. J. DONGARRA, Algorithm 589 SICEDR: A FORTRAN subroutine for improving the accuracy of computed matrix eigenvalues, ACM Trans. Math. Software, 8 (1982), pp [7] J. J. DONGARRA, C. B. MOLER, AND J. H. WILKINSON, Improving the accuracy of computed eigenvalues and eigenvectors, SIAM J. Numer. Anal., 20 (1983), pp [8] L. FLATTO, Advanced Calculus, Waverly Press, Baltimore, MD, [9] B. S. GARBOW, J. M. BOYLE, J. J. DONGARRA, AND C. B. MOLER, Matrix Eigensystem Routines - EISPACK Guide Extension, Springer-Verlag, Berlin, [10] G. H. GOLUB AND C. F. VAN LOAN, Matrix Computations, 2nd ed., The Johns Hopins University Press, Baltimore, MD, [11] W. W. HAGER, Bidiagonalization and diagonalization, Comput. Math. Appl., 14 (1987), pp [12] W. W. HAGER, Applied Numerical Linear Algebra, Prentice Hall, Englewood Cliffs, NJ, [13] P. LANCASTER AND M. TISMENETSKY, The Theory of Matrices, Academic Press, Orlando, [14] B. T. SMITH, J. M. BOYLE, J. J. DONGARRA, B. S. GARBOW, Y. IKEBE, V. C. KLEMA, AND C. B. MOLER, Matrix Eigensystem Routines - EISPACK Guide, Springer-Verlag, Berlin, [15] G. W. STEWART, Error and perturbation bounds for subspaces associated with certain eigenvalue problems, SIAM Rev., 15 (1973), pp

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 2nd, 2014 A. Donev (Courant Institute) Lecture