c 2004 Society for Industrial and Applied Mathematics

Size: px

Start display at page:

Download "c 2004 Society for Industrial and Applied Mathematics"

Ira Blankenship
5 years ago
Views:

1 SIAM J. MATRIX ANAL. APPL. Vol. 25, No. 4, pp c 2004 Society for Industrial and Applied Mathematics CONVERGENCE OF RESTARTED KRYLOV SUBSPACES TO INVARIANT SUBSPACES CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI Abstract. The performance of Krylov subspace eigenvalue algorithms for large matrices can be measured by the angle between a desired invariant subspace and the Krylov subspace. We develop general bounds for this convergence that include the effects of polynomial restarting and impose no restrictions concerning the diagonalizability of the matrix or its degree of nonnormality. Associated with a desired set of eigenvalues is a maximum reachable invariant subspace that can be developed from the given starting vector. Convergence for this distinguished subspace is bounded in terms involving a polynomial approximation problem. Elementary results from potential theory lead to convergence rate estimates and suggest restarting strategies based on optimal approximation points (e.g., Leja or Chebyshev points); exact shifts are evaluated within this framework. Computational examples illustrate the utility of these results. Origins of superlinear effects are also described. Key words. Krylov subspace methods, Arnoldi algorithm, Lanczos algorithm, polynomial restarts, invariant subspaces, eigenvalues, pseudospectra, perturbation theory, potential theory, Zolotarev-type polynomial approximation problems AMS subject classifications. 15A18, 15A42, 31A15, 41A25, 65F15 DOI /S Setting. Let A be an n n complex matrix with N n distinct eigenvalues {λ j } N j=1 with corresponding eigenvectors {u j} N j=1. (We do not label multiple eigenvalues separately and make no assertion regarding the uniqueness of the u j.) Each distinct eigenvalue λ j has geometric multiplicity n j and algebraic multiplicity m j (so that 1 n j m j and N j=1 m j = n). We aim to compute an invariant subspace associated with L of these eigenvalues, which for brevity we call the good eigenvalues, labeled {λ 1,λ 2,...,λ L }. We intend to use a Krylov subspace algorithm to approximate this invariant subspace, possibly with the aid of restarts as described below. The remaining N L eigenvalues, the bad eigenvalues, are not of interest and we wish to avoid excessive expense involved in inadvertently calculating the subspaces associated with them. The class of algorithms considered here draws eigenvector approximations from Krylov subspaces generated by the starting vector v 1 C n, K l (A, v 1 ) = span{v 1, Av 1,...,A l 1 v 1 }. Such algorithms, including the Arnoldi and biorthogonal Lanczos methods reviewed in section 1.1, differ in their mechanisms for generating a basis for K l (A, v 1 ) and selecting approximate eigenvectors from this Krylov subspace. Though these approximate eigenvectors are obvious objects of study, their convergence can be greatly complicated by eigenvalue multiplicity and defectiveness; see [21]. The bounds developed in Received by the editors November 21, 2001; accepted for publication (in revised form) by Z. Strakoš June 9, 2003; published electronically July 14, Department of Mathematics, Virginia Polytechnic Institute and State University, Blacksburg, VA (beattie@math.vt.edu, rossi@math.vt.edu). Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford OX1 3QD, UK. Current address: Department of Computational and Applied Mathematics, Rice University, 6100 Main Street MS 134, Houston, TX (embree@caam.rice.edu). The research of this author was supported in part by UK Engineering and Physical Sciences Research Council Grant GR/M

2 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1075 the following sections avoid these difficulties by instead studying convergence of the Krylov subspace to an invariant subspace associated with the good eigenvalues as the dimension of the Krylov subspace is increased. Given two subspaces, W and V of C n, the extent to which V approximates W is measured (asymmetrically) by the containment gap (or just gap), defined as δ(w, V) = sup inf x W y V y x x = sin(ϑ max ). Here ϑ max is the largest canonical angle between W and a closest subspace V of V having dimension equal to dim W. (Throughout, denotes the vector 2-norm and the matrix norm it induces.) Notice that if dim V < dim W, then δ(w, V) = 1, while δ(w, V) = 0 if and only if W V. The gap can be expressed directly as the norm of a composition of projections: If Π W and Π V denote orthogonal projections onto W and V, respectively, then δ(w, V) = (I Π V )Π W (see, e.g., Chatelin [7, sect. 1.4]). The objective of this paper then is to measure the gap between Krylov subspaces and an m-dimensional invariant subspace U of A associated with the good eigenvalues. We explore how quickly δ(u, K l (A, v 1 )) can be driven to zero as l is increased, reflecting the speed of convergence, and how this behavior is influenced by the distribution of eigenvalues and nonnormality of A. Note that δ(u, K l (A, v 1 )) = 1 when l<m.forl m, our bounds ultimately take the form (1.1) max{ φ(z) : z Ω bad } δ(u, K l (A, v 1 )) C 0 C 1 C 2 φ P l m { φ(z) : z Ω good }, where P l is the set of degree-l polynomials, and Ω good and Ω bad are disjoint compact subsets of C containing the good and bad eigenvalues, respectively. The constant C 0 reflects nonnormal coupling between good and bad invariant subspaces, while C 2 reflects nonnormality within those two subspaces. The constant C 1 principally describes the effect of starting vector bias, though it, too, is influenced by nonnormality. In section 2 we identify the subspace U, which in common situations will be the entire invariant subspace of A associated with the good eigenvalues, but will be smaller when A is derogatory or the starting vector v 1 is deficient. The basic bound (1.1) is derived in section 3. Section 4 addresses the polynomial approximation problem embedded in (1.1), describing those factors that detere linear convergence rates or that lead to superlinear effects. Section 5 analyzes the constants C 1 and C 2, and section 6 provides computational examples illustrating the bounds. Since it becomes prohibitively expensive to construct and store a good basis for K l (A, v 1 ) when the dimension of A is large, practical algorithms typically limit the maximum dimension of the Krylov subspace to some p n. If satisfactory estimates cannot be extracted from K p (A, v 1 ), then the algorithm is restarted by replacing v 1 with some new v K p (A, v 1 ) that is, one hopes, enriched in the component lying in the subspace U. Since this v is chosen from the Krylov subspace, we can write v = ψ(a)v 1 for some polynomial ψ with deg(ψ) <p. Our bounds also apply to this situation, and ideas from potential theory, outlined in section 4, motivate particular choices for the polynomial ψ. The results presented here complement and extend earlier convergence theory, beginning with Saad s bound on the gap between a single eigenvector and the Krylov subspace for a matrix with simple eigenvalues [32]. Jia generalized this result to invariant subspaces associated with a single eigenvalue of a defective matrix, but

3 1076 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI these bounds involve the Jordan form of A and derivatives of approximating polynomials [20]. Simoncini uses pseudospectra to describe block-arnoldi convergence for defective matrices [37]. Interpreting restarted algorithms in terms of subspace iteration, Lehoucq developed an invariant subspace convergence theory incorporating results from Watkins and Elsner [25]. Calvetti, Reichel, and Sorensen studied single eigenvector convergence for Hermitian matrices using elements of potential theory [6]. A key feature of our approach is its applicability to general invariant subspaces, which may be better conditioned than individual eigenvectors (see, e.g., [39, Chap. V]). Notably, we estimate convergence rates for defective matrices without introducing any special choice of basis and without requiring knowledge of the Jordan form or any related similarity transformation. Finally, we note that other measures of convergence may be more appealing in certain situations. Alternatives include Ritz values [20, 24], although convergence behavior can be obscure for matrices that are defective (or nearly so). The subspace residual is computationally attractive because it doesn t require a priori knowledge of the good invariant subspace. This measure can be related to gap convergence [17, 38] Algorithmic context. Suppose V is an n n unitary matrix that reduces A to upper Hessenberg form; i.e., V AV = H for some upper Hessenberg matrix, H. For any index 1 l n, let H l denote the lth principal submatrix of H: h 11 h 12 h 1l β 2 h 22 h 2l H l = C l l. β l h ll The Arnoldi method [2, 32] builds up the matrices H and V one column at a time starting with the unit vector v 1 C n, although the process is typically stopped well before completion, with l n. The algorithm only accesses A through matrix-vector products, making this approach attractive when A is large and sparse. Different choices for v 1 produce distinct outcomes for H l. The defining recurrence may be derived from the fundamental relation AV l = V l H l + β l+1 v l+1 e l, where e l is the lth column of the l l identity matrix. The lth column of H l is detered so as to force v l+1 to be orthogonal to the columns of V l, and β l+1 then is detered so that v l+1 = 1. Provided H l is unreduced, the columns of V l constitute an orthonormal basis for the order-l Krylov subspace K l (A, v 1 )= span{v 1, Av 1,..., A l 1 v 1 }. Since Vl AV l = H l, the matrix H l is a Ritz Galerkin approximation of A on this subspace, as described by Saad [33]. The eigenvalues of H l are called Ritz values and will, in many circumstances, be reasonable approximations to some of the eigenvalues of A. An eigenvector of H l associated with a given Ritz value θ j can be used to construct an eigenvector approximation for A. Indeed, if H l y j = θ j y j, then the Ritz vector û j = V l y j yields the residual Aû j θ j û j = β l+1 e l y j. When β l+1 1, the columns of V l nearly span an invariant subspace of A. Small residuals more often arise from negligible trailing entries of the vector y j, indicating the most recent Krylov direction contributed negligibly to the Ritz vector û j.

4 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1077 Biorthogonal Lanczos methods have similar characteristics despite important differences both in conception and implementation; see, e.g., [4]. In particular, different bases for K l (A, v 1 ) are generated, and the associated Ritz values can differ considerably from those produced by the Arnoldi algorithm, even though the projection subspace K l (A, v 1 ) remains the same. Our focus here avoids the complications of Ritz value convergence and remains fixed on how well a good invariant subspace U is captured by K l (A, v 1 ), without regard to how a basis for K l (A, v 1 ) has been generated Polynomial restarts. The first p steps of the Arnoldi or biorthogonal Lanczos recurrence require p matrix-vector products of the form Av k, plus O(np 2 ) floating point operations for (bi)orthogonalization. For very large n and very sparse A (say, with a maximum number of nonzero entries per row very much smaller than n), the cost of orthogonalization will rapidly doate as p grows. Polynomial restarting is one general approach to alleviate this prohibitive expense. At the end of p + 1 steps of the recurrence, one selects some best vector v 1 + K p+1(a, v 1 ) and restarts the recurrence from the beginning using v 1 +. Different restart strategies differ essentially in how they attempt to condense progress made in the last p + 1 steps into the vector v 1 +. Since any vector in K p+1(a, v 1 ) can be represented as ψ p (A)v 1 for some polynomial ψ p of degree p or less, a restart of this type can be expressed as (1.2) v + 1 ψ p(a)v 1. If subsequent restarts occur (relabeling v + 1 as v(1) 1 ), then v (1) 1 ψ p [1] (A)v 1 (first restart), v (2) 1 ψ p [2] (A)v (1) 1 (second restart),. v (ν) 1 ψ p [ν] (A)v (ν 1) 1 (νth restart). We collect the effect of the restarts into a single aggregate polynomial of degree νp: (1.3) v (ν) 1 Ψ νp (A)v 1, where Ψ νp (λ) = ν k=1 ψ[k] p (λ) is called the filter polynomial. Evidently, the restart vectors should retain and amplify components of the good invariant subspace while damping and eventually purging components of the bad invariant subspace. One obvious way of encouraging such a trend is to choose the polynomial Ψ νp (λ) to be as large as possible when evaluated on the good eigenvalues while being as small as possible on the bad eigenvalues. If the bad eigenvalues are situated within a known compact set Ω bad (not containing any good eigenvalues), Chebyshev polynomials associated with Ω bad are often a reasonable choice. When integrated with the Arnoldi algorithm, this results in the Arnoldi Chebyshev method [34] (cf. [18]). This Chebyshev strategy requires either a priori or adaptively generated knowledge of Ω bad, a drawback. Sorensen identified an alternative approach, called exact shifts, that has proved extremely successful in practice. The filter polynomial Ψ νp is automatically constructed using Ritz eigenvalue estimates. Before each new restart of the Arnoldi method, one computes the eigenvalues of H l and sorts the resulting l = k + p Ritz values into two disjoint sets S good and S bad. The p Ritz values

5 1078 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI in the set S bad are used to define the restart polynomial ψ p (λ) = k+p j=k+1 (λ θ j). Morgan discovered a remarkable consequence of this restart strategy: The updated Krylov subspace K l (A, v 1 + ), generated by the new starting vector v+ 1 in (1.2) using exact shifts, satisfies K l (A, v 1 + ) = span{û 1, û 2,...,û k, Aû j, A 2 û j,...,a p û j } for each index j =1, 2,...,k [27]. Thus, Sorensen s exact shifts will provide, in the stage following a restart, a subspace containing every possible Krylov subspace of dimension p that could be obtained with a starting vector that was a linear combination of the good Ritz vectors (cf. [32]). Furthermore, Sorensen showed how to apply shifts implicitly, regenerating the Krylov subspace K l (A, v 1 + ) with only p matrix-vector products in a numerically stable way. Analogous features can be verified for the restarted biorthogonal Lanczos method using exact shifts to build polynomial filters. Such a strategy has been explored in [16, 9]. Assume now that an Arnoldi or biorthogonal Lanczos process has proceeded l steps past the last of ν restarts, each of which (for the sake of simplicity) has the same order p. In the jth restart (1 j ν), we use shifts {µ jk } p k=1. Define Ψ νp (λ) = ν j=1 k=1 p (λ µ jk ) to be the aggregate restart polynomial after ν restarts. An iteration without restarts will have p = ν = 0 and Ψ νp (λ) =1. Let K τ (A, v (ν) 1 ) denote the Krylov subspace of order τ generated by the starting vector v (ν) 1 that is obtained after ν restarts. The following basic result follows immediately from the observation that v (ν) 1 =Ψ νp (A)v 1. Lemma 1.1. For all τ 0, K τ (A, v (ν) 1 )=Ψ νp(a) K τ (A, v 1 ). 2. Reachable invariant subspaces. If the good eigenvalues are all simple, then the associated invariant subspace is uniquely detered as the span of good eigenvectors. However, if some of these eigenvalues are multiple, there could be a variety of associated invariant subspaces. Nonetheless, single-vector Krylov eigenvalue algorithms with polynomial restarts are capable of revealing only one of the many possible invariant subspaces for any given initial vector. Before developing convergence bounds, we first characterize this distinguished invariant subspace precisely. Let M be the cyclic subspace generated by the initial starting vector v 1, M = span{v 1, Av 1, A 2 v 1,...}. M is evidently an invariant subspace of A and s dim(m) n. Since any invariant subspace of A that contains v 1 must also contain A τ v 1, M is the smallest invariant subspace of A that contains v 1. The s vectors of the Krylov sequence {v 1, Av 1,...,A s 1 v 1 } are linearly independent, and thus constitute a basis for M. Recall that a linear transformation is nonderogatory if each eigenvalue has geometric multiplicity equal to 1; i.e., each distinct eigenvalue has precisely one eigenvector associated with it, detered up to scaling. Define A M to be the restriction of A to M. The following result is well known; see, e.g., [1], [13, Chap. VII]. Lemma 2.1. A M is nonderogatory, and K τ (A, v (ν) 1 )=K τ (A M, v (ν) 1 ) M. Define α j to be the ascent (or index) of the eigenvalue λ j, i.e., the imum positive integer α such that Ker (A λ j ) α = Ker (A λ j ) α+1. This α j is the maximum dimension of the n j different Jordan blocks associated with λ j, and Ker (A λ j ) αj then is the span of all generalized eigenvectors associated with λ j.

6 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1079 The spectral projection onto each subspace Ker (A λ j ) αj can be constructed in the following coordinate-free manner; see, e.g., [23, sect. I.5.3]. For each eigenvalue λ j,1 j N, let Γ j be some positively oriented Jordan curve in C containing λ j in its interior and all other eigenvalues in its exterior. The spectral projection is defined as P j 1 (z A) 1 dz. 2πi Γ j P j is a projection onto the span of all generalized eigenvectors associated with λ j.in particular, P j v 1 will be a generalized eigenvector associated with λ j and will generate a cyclic subspace K αj (A, P j v 1 ) Ker (A λ j ) αj. Let α j be the imum index α so that K α (A, P j v 1 )=K α+1 (A, P j v 1 ). This α j is called the ascent with respect to v 1 of the eigenvalue λ j. Notice that 1 α j α j and K αj (A, P j v 1 )isthesmallest invariant subspace of A that contains P j v 1. Furthermore, P j v 1 is a generalized eigenvector of grade α j associated with λ j and α j <α j only if v 1 is deficient in all generalized eigenvectors of maximal grade α j associated with λ j. Define spectral projections P good and P bad having ranges that are the maximal invariant subspaces associated with the good and bad eigenvalues, respectively, as L N P good = P j and P bad = P j. j=1 j=l+1 Note that P good + P bad = I. The following result in Lemma 2.2 characterizes M. The first statement, included for comparison, is well known; the second is also understood, though we are unaware of its explicit appearance in the literature. Related issues are discussed in [1], [13, Chap. VII]. Lemma 2.2. C n = N j=1 Ker(A λ j) αj with N j=1 α j n, and M = N j=1 K α j (A, P j v 1 ) with N j=1 α j = dim M. Proof. Since N j=1 P j = I, anyx C n can be written as x = Ix = N j=1 P jx, which shows that C n N j=1 Ker(A λ j) αj. The reverse inclusion is trivial. For the second statement, use N j=1 P j = I to get, for any integer τ>0, N N N v 1 = P j v 1, Av 1 = AP j v 1,..., A τ v 1 = A τ P j v 1. j=1 j=1 Thus, for each integer τ>0, K τ (A, v 1 ) N j=1 K α j (A, P j v 1 ), and, in particular, for τ sufficiently large this yields M N j=1 K α j (A, P j v 1 ). To show the reverse inclusion, note that for every j =1,...,N, there is a polynomial p j such that p j (A) = P j. (This polynomial interpolates at eigenvalues: p j (λ j ) = 1, p j has α j 1 zero derivatives at λ j, and p j (λ k ) = 0 for λ k λ j ; see, e.g., [19, sect. 6.1].) Thus for any x N j=1 K α j (A, P j v 1 ), one can write N N x = g j (A)P j v 1 = g j (A)p j (A)v 1 M j=1 j=1 for polynomials g j with degree not exceeding α j 1. Thus N j=1 K α j (A, P j v 1 ) M, and so M = N j=1 K α j (A, P j v 1 ). j=1

7 1080 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI Let X good and X bad be the invariant subspaces of A associated with the good and bad eigenvalues, respectively. Then define U good M X good and U bad M X bad. The following lemma develops a representation for U good and U bad ; it shows that U good is the maximum reachable invariant subspace associated with the good eigenvalues that can be obtained from a Krylov subspace algorithm started with v 1. Maximum reachable invariant subspace means that any invariant subspace U associated with the good eigenvalues and strictly larger than U good is unreachable: The angle between U and any computable subspace generated from v 1 is bounded away from zero independent of l, p, ν, and choice of filter shifts {µ jk }. Lemma 2.3. U good = L j=1k αj (A, P j v 1 ), L dim U good = α j m, j=1 and U bad = N j=l+1k αj (A, P j v 1 ), N dim U bad = α j = s m. j=l+1 Furthermore, for any subspace U of X good that properly contains U good, i.e., U good U X good, convergence in gap cannot occur. For all integers l 1, δ(u, K l (A, v (ν) 1 )) 1 P good > 0. Proof. Since K αj (A, P j v 1 ) Ker(A λ j ) αj, Lemma 2.2 leads to M X good = L j=1 K α j (A, P j v 1 ). Furthermore, dim K αj (A, P j v 1 )= α j implies that dim U good = m as defined above. The analogous results for U bad follow similarly. Note that X bad = N j=l+1 Ker(A λ j) αj so, for all l 0, K l (A, v (ν) 1 ) M U good X bad. Thus any v K l (A, v (ν) 1 ) can be decomposed as v = w 1 + w 2 for some w 1 U good and w 2 X bad. When U good is a proper subspace of U, there exists an x U so that x U good and x = 1. Note that x w 1 x =1. Now, Thus, v K l (A,v (ν) w 1 U good w 2 X bad 1 ) v x w 1 U good w 2 X bad max y X good w 2 X bad δ(u, K l (A, v (ν) 1 )) = max x U w 1 + w 2 x w 2 ( x w 1 ) x w 1 P good (w 2 y) w 2 y v x v K l (A,v (ν) 1 ) x v K l (A,v (ν) 1 ) v x y X good w 2 X bad 1 = 1 P good. w 2 y y 1 P good. This means that we have no hope of capturing any invariant subspace that contains a (generalized) eigenspace associated with multiple Jordan blocks at least when using

8 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1081 a single vector iteration in exact arithmetic. On the other hand, convergence can occur to the good invariant subspace U good, with a rate that depends on properties of A, v 1, and the choice of filter shifts {µ jk }, as we shall see. Almost every vector in an invariant subspace is a generalized eigenvector of maximal grade and so almost every starting vector will capture maximally defective Jordan blocks. While easily acknowledged, this fact can have perplexing consequences for the casual Arnoldi or biorthogonal Lanczos user, since eigenvectors of other Jordan blocks may be unexpectedly washed out. Suppose A is defined as A = A is in Jordan canonical form with the single eigenvalue λ = 1. Let e j denote the jth column of the 5 5 identity matrix. Then e 2 and e 5 are eigenvectors of A, e 1 and e 4 are generalized eigenvectors of grade 1 associated with the 2 2 and 3 3 Jordan blocks, and e 5 is a generalized eigenvector of grade 2 associated with the 3 3 block. For arbitrary β C, the vector v 1 =[1β 111] T generates a cyclic subspace spanned by the first three vectors in the Krylov sequence: v 1, Av 1, and A 2 v 1. By choosing β to be large, we can give the starting vector v 1 an arbitrarily large component in the direction of e 2, the eigenvector associated with the 2 2 Jordan block. Defining M = [ v 1, Av 1, A 2 ] v 1 and Ĥ = 1 0 3, a simple calculation reveals AM = MĤ. The Jordan form of Ĥ is easy to calculate as follows: (2.1) R 1 ĤR = 1 1 0, where R = The cyclic subspace generated by the single vector v 1 has captured a threedimensional invariant subspace, associated with the maximally defective 3 3 Jordan block. But this subspace is not the expected span{e 3, e 4, e 5 }. Using the change of basis defined by R in (2.1), one may calculate A(MR) =(MR)(R 1 ĤR), which is β = β Note that e 5 alone is revealed as the eigenvector associated with the eigenvalue 1; e 2 has been washed out in spite of v 1 having an arbitrarily large component in that direction. Indeed the eigenvector e 2 (and so any subspace containing it) is unreachable from any starting vector v 1 for which e 3v 1 0. In this example, v 1 itself emerges as a generalized eigenvector of grade 2. Note that every vector v in C 5 with e 3v 0is a generalized eigenvector of grade 2 associated with the eigenvalue 1. We close this section with a computational example that both confirms the gap stagnation lower bound for derogatory matrices given in Lemma 2.3 and illustrates

9 1082 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI 10 0 diagonalizable but derogatory δ(xgood, Kl(A, v1)) P good defective, not derogatory Krylov subspace dimension, l Fig The Krylov subspace can never capture X good when this subspace is associated with a derogatory eigenvalue; convergence is possible, however, when the associated eigenvalues are defective but not derogatory, as described by Lemma 2.3. other convergence properties explored in future sections. Consider two matrices A 1 and A 2, each of dimension n = 150 with eigenvalues spaced uniformly in the interval [0, 1]. In both cases, all the eigenvalues are simple except for the single good eigenvalue λ = 1, which has algebraic multiplicity 5. In the first case, the geometric multiplicity also equals 5, so the matrix is diagonalizable but derogatory. In the second case, there is only one eigenvector associated with λ = 1, so it is defective but not derogatory. Both matrices are constructed so that P good Figure 2.1 illustrates the gap convergence for the Krylov subspace to the invariant subspace X good associated with λ = 1. The starting vector v 1 has 1/ n in each component; no restarting is used here. Convergence cannot begin until the fifth iteration, when the Krylov subspace dimension matches the dimension of X good. This initial period of stagnation is followed by a sublinear phase of convergence leading to a second stagnation period. This is the end of the story for the derogatory case, but for the nonderogatory case, the second stagnation period is transient and the convergence rate eventually settles toward a nearly linear rate. In fact, this rate improves slightly over the final iterations shown here, yielding so-called superlinear convergence, the subject of section 4.3. These convergence phases resemble those observed for the GMRES iteration, as described by Nevanlinna [28]. 3. Basic estimates. Since all reachable subspaces are contained in M and A M is nonderogatory, henceforth we assume without loss of generality that A itself is nonderogatory so that n = dim M, and v 1 is not deficient in any generalized eigenvector of maximal grade. To summarize the current situation, A is an n n matrix with N n distinct eigenvalues, {λ j } N j=1, each having geometric multiplicity 1 and algebraic multiplicity m j, so that N j=1 m j = n. We seek L (1 L<N) of these eigenvalues {λ 1,λ 2,..., λ L } (the good eigenvalues) together with the corresponding (maximal) invariant subspace U good of dimension m = L j=1 m j, which is now the net algebraic multiplicity of good eigenvalues since A is nonderogatory.

10 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1083 We begin by establishing two lemmas that are used to develop a bound for the gap in terms of a polynomial approximation problem in the subsequent theorems. Lemma 3.1. Given U, V C n, suppose û U ( û =1)and v V satisfy δ(u, V) = max u U v V u v u = û v. Then û v V and û v δ(u, V) 2 û U. Proof. The first assertion is a fundamental property of least squares approximation. To show the second, consider an arbitrary unit vector u U and take >0. Letting Π V denote the orthogonal projection onto V, the optimality of û and v implies û v 2 (I Π V)(û + u) 2 û + u 2. Expanding this inequality, noting v = Π V û, and using the first assertion gives δ(u, V) 2 (1+2 Re(û u)+ 2 ) δ(u, V) 2 +2 Re((û v) u)+ 2 (I Π V )u 2. Collecting terms quadratic in on the left-hand side, 2 (δ(u, V) 2 (I Π V )u 2 ) 2 Re((û v δ(u, V) 2 û) u). Note that the left-hand side must be nonnegative. Repeating the above argument with u multiplied by a complex scalar of unit modulus, we can replace the right-hand side with 2 (û v δ(u, V) 2 û) u. Thus for any unit vector û U, (δ(u, V) 2 (I Π V )u 2 ) 2 (û v δ(u, V) 2 û) u 0. Taking 0, we conclude that û v δ(u, V) 2 û is orthogonal to every u U. As the gap between subspaces closes (δ(u, V) 0), û v becomes almost orthogonal to U in the sense that the projection of û v onto U has norm δ(u, V) 2. Lemma 3.2. Let P m 1 denote the space of polynomials of degree m 1 or less. The mapping ı: P m 1 U good defined by (3.1) ı(ψ) =ψ(a)p good v 1 is an isomorphism between P m 1 and U good. Furthermore, there exist positive constants c 1 and c 2 so that (3.2) c 1 ψ Pm 1 ψ(a)p good v 1 c 2 ψ Pm 1, uniformly for all ψ P m 1 for any fixed norm Pm 1 defined on the space P m 1. Proof. ı is clearly linear. To see that ı maps P m 1 onto U good, observe that for any given y U good, there exist polynomials {g j (λ)} L j=1 with deg(g j) m j 1 such that y = L g j (A)P j v 1. j=1 The L polynomials {g j } L j=1 provide L separate slices of a single polynomial that can be recovered by (generalized) Hermite interpolation. Let ψ be a polynomial interpolant that interpolates g j and its derivatives at λ j : ψ (k) (λ j )=g (k) j (λ j )

11 1084 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI for k =0, 1,...,m j 1 and j =1, 2,...,L. Theorem VIII.3.16 of [11] leads us first to observe that ψ(a)p j = g j (A)P j for each j =1,...,L. Then since deg(ψ) L j=1 m j 1=m 1, we have from (3.1) that y = L ψ(a)p j v 1 = ψ(a)p good v 1 = ı(ψ). j=1 Since dim(p m 1 ) = dim(u good ), nullity(ı) =0andı is bijective from P m 1 to U good. The last statement is an immediate consequence of the fact that linear bijections are bounded linear transformations with bounded inverses. Theorem 3.3. Suppose that A and v 1 satisfy the assumptions of this section, and that none of the filter shifts {µ jk } coincides with any of the good eigenvalues {λ j } L j=1. For all indices l m, the gap between the good invariant subspace, U good, and the Krylov subspace of order l, K l (A, v (ν) 1 ), generated from the ν-fold restarted vector, v (ν) 1, satisfies δ(u good, K l (A, v (ν) 1 )) C 0 max ψ P m 1 φ P l m φ(a)ψ(a)ψ νp (A)P bad v 1 φ(a)ψ(a)ψ νp (A)P good v 1, where C 0 1 if U good U bad and C 0 2 otherwise. Proof. First, suppose U good U bad. This implies that P good and P bad are orthogonal projections, U good is an invariant subspace for both Ψ νp (A) and [Ψ νp (A)], and, as we will see, that δ(u good, K l (A, v (ν) 1 )) < 1. Indeed, suppose instead that δ(u good, K l (A, v (ν) 1 )) = 1. Then there is a vector û U good with û = 1 such that û K l (A, v (ν) 1 ). Define ŷ [Ψ νp(a)] û U good, and note that by Lemma 3.2, there exists a polynomial ψ Pm 1 such that ŷ = ψ(a)p good v 1. Now, for each j =1, 2,...,l,wehave 0= û, A j 1 v (ν) 1 = û, Aj 1 Ψ νp (A)v 1 = ŷ, A j 1 P good v 1 = ψ(a)p good v 1, A j 1 P good v 1. Since l m, this implies first that ψ(a)p good v 1 = 0 and then û = 0. (Recall that [Ψ νp (A)] is bijective on U good since Ψ νp has no roots in common with good eigenvalues.) But û was given to be a unit vector, so it must be that δ(u good, K l (A, v (ν) 1 )) < 1. There are optimal vectors v K l (A, v (ν) 1 ) and x U good with x = 1 so that (3.3) δ(u good, K l (A, v (ν) 1 )) = max x U good v x = v x. v K l (A,v (ν) 1 ) x Since δ(u good, K l (A, v (ν) 1 )) < 1, it must be that v 0. Furthermore, optimality for v means v x K l (A, v (ν) 1 ) (viz., Lemma 3.1) and, in particular, v ( v x) =0. So, v 0 implies v U bad. There is a polynomial π l 1 P l 1 such that v = π l 1 (A)v (ν) 1 = π l 1 (A)Ψ νp (A)v 1. Define Q = U good Ker(π l 1 (A)) and let q be the imum (monic) annihilating polynomial for Q. 1 Evidently, π l 1 must contain q as a factor. 1 That is, q is the imum degree monic polynomial such that q(a)r = 0 for all r Q.

12 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1085 Since v U bad, π l 1 cannot be an annihilating polynomial for U good,soq U good and deg( q) m 1. One may factor π l 1 as the product of a polynomial, φ, of degree l m and a polynomial, q, of degree m 1 containing q as a factor, π l 1 (λ) =φ(λ)q(λ). Observing that U good is invariant for both φ(a) and φ(a), we may decompose x as x = φ(a)ŷ + n for some ŷ U good and some n Ker(φ(A) ) U good. Notice that v φ(a)ŷ = v x = v v > 0, so φ(a)ŷ 0. However, we ll see that it must happen that n = 0. Indeed, Lemma 3.1 shows that if z U good is orthogonal to x, x z = 0, then v z = 0 as well. In particular, for z = n 2 φ(a)ŷ φ(a)ŷ 2 n we have x z = 0. Since Ker φ(a) = Ran φ(a) implies v n =0,wehave 0= v z = n 2 v φ(a)ŷ. We have already seen that v φ(a)ŷ > 0, and so n = 0. Thus we can safely exclude from the maximization in (3.3) all x U good except for those vectors having the special form x = φ(a)y for y U good and φ as defined above. We can now begin our process of bounding the gap. Note that (3.4) δ(u good, K l (A, v (ν) 1 )) = max x U good = max x U good = max y U good v x v K l (A,v (ν) 1 ) x φ P l m φ P l m q P m 1 q P m 1 Ψ νp (A)φ(A)q(A)v 1 x x Ψ νp (A)φ(A)[q(A)v 1 y], Ψ νp (A)φ(A)y where we are able to justify the substitution x =Ψ νp (A)φ(A)y since Ψ νp (A) isan invertible map of U good to itself. Now by Lemma 3.2, y U good can be represented as y = ψ(a)p good v 1 for some ψ P m 1. Since I = P bad + P good, one finds ψ(a)v 1 y = ψ(a)p bad v 1. Continuing with (3.4), assign q ψ P m 1. Then δ(u good, K l (A, v (ν) 1 )) max y U good (y=ψ(a)p good v 1) = max ψ P m 1 φ P l m φ P l m Ψ νp (A)φ(A)[ψ(A)v 1 y] Ψ νp (A)φ(A)y Ψ νp (A)φ(A)ψ(A)P bad v 1 Ψ νp (A)φ(A)ψ(A)P good v 1, as required, concluding the proof when U good U bad. In case U good and U bad are not orthogonal subspaces, we introduce a new inner product on C n with respect to which they are orthogonal. For any u, v C n, define u, v P good u, P good v + P bad u, P bad v, and define the gap with respect to the new norm =, to be δ (W, V) = sup inf x W y V y x x.

13 1086 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI Notice that for any vector w C n, w 2 = P good w + P bad w 2 2 ( P good w 2 + P bad w 2) =2 w 2, P good w = P good w, and P bad w = P bad w. In particular, for any x U good and y C n these relationships directly imply y x x 2 y x x, and so δ(u good, K l (A, v (ν) 1 )) 2 δ (U good, K l (A, v (ν) 1 )). Since U good and U bad are orthogonal in this new inner product, we can apply the previous argument to conclude 2 δ(u good, K l (A, v (ν) 1 )) 2 max ψ P m 1 = 2 max ψ P m 1 φ P l m φ P l m φ(a)ψ(a)ψ νp (A)P bad v 1 φ(a)ψ(a)ψ νp (A)P good v 1 φ(a)ψ(a)ψ νp (A)P bad v 1 φ(a)ψ(a)ψ νp (A)P good v 1. If N is a square matrix with an invariant subspace V, define Nv N V max v V v = NΠ V, where Π V here denotes the orthogonal projection onto V. Theorem 3.4. Suppose A, v 1, and the shifts {µ jk } satisfy the conditions of Theorem 3.3. Then for l m, δ(u good, K l (A, v (ν) 1 )) C 0 C 1 φ P l m [φ(a)ψ νp (A)] 1 Ugood φ(a)ψ νp (A) Ubad, where C 0 is as defined in Theorem 3.3 and (3.5) C 1 max ψ P m 1 ψ(a)p bad v 1 ψ(a)p good v 1 is a constant independent of l, ν, p, or the filter shifts {µ jk }. Proof. Let Π good and Π bad denote the orthogonal projections onto U good and U bad, respectively. Then Ψ νp (A)φ(A)P bad ψ(a)v 1 = Ψ νp (A)φ(A)Π bad P bad ψ(a)v 1 Ψ νp (A)φ(A)Π bad P bad ψ(a)v 1, and, assug for the moment that φ(a) is invertible, P good ψ(a)v 1 = [Ψ νp (A)φ(A)] 1 Π good P good Ψ νp (A)φ(A)ψ(A)v 1 [Ψ νp (A)φ(A)] 1 Π good P good Ψ νp (A)φ(A)ψ(A)v 1. 2 A more precise value for C 0 can be found as 2 I 2 P good 1 C 0 = 2 1+ I 2P good 2 2; however, the marginal improvement in the final bound would not appear to merit the substantial complexity added.

14 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1087 Hence, Ψ νp (A)φ(A)P bad ψ(a)v 1 Ψ νp (A)φ(A)P good ψ(a)v 1 [Ψ νp (A)φ(A)] 1 ψ(a)p bad v 1 Ugood Ψ νp (A)φ(A) Ubad ψ(a)p good v 1. Minimizing with respect to φ and maximizing with respect to ψ yields the conclusion provided the expression for C 1 is finite. This is assured since, as an immediate consequence of (3.2), ψ(a)p good v 1 = 0 can occur only when ψ =0. It is instructive to consider the situation where we seek only a single good eigenvalue, λ 1, which is simple. In this case m = dim U good = 1; the conclusion of Theorem 3.3 may be stated as δ(u good, K l (A, v (ν) 1 )) C 0 C 1 φ P l 1 φ(a)ψ νp (A)w, φ(λ 1 )Ψ νp (λ 1 ) where w = P bad v 1 / P bad v 1 and C 1 = P bad v 1 / P good v 1. Elementary geometric considerations yield the following alternate expression for C 1 : ( ) 2 ( ) 2 1 sin Θ(U good, v 1 ) 1 cos Θ(U good, v 1 ) C 1 = P good cos Θ(U bad, v + 1 1) P good cos Θ(U bad, v, 1) where Θ(U good, v 1 ) and Θ(U bad, v 1) are the smallest angles between v 1 and the onedimensional subspaces U good and U bad, respectively. This special case is stated as Proposition 2.1 of [18]; 3 see also Saad s single eigenvalue convergence theory [32]. Our next step is to reduce the conclusion of Theorem 3.4 to an approximation problem in the complex plane. Let U be an invariant subspace of A associated with a compact subset Ω C (that is, Ω contains only those eigenvalues of A associated with U and no others). Define κ(ω) as the smallest constant for which the inequality (3.6) f(a) U κ(ω) max f(z) z Ω holds uniformly over all f H(Ω), where H(Ω) denotes the functions analytic on Ω. 4 Evidently, the value of the constant κ(ω) depends on the particular choice of Ω (a set containing, in any case, those eigenvalues of A associated with U). The following properties of κ(ω) are shared by the generalized Kreiss constant K(Ω) of Toh and Trefethen [41] (defined for U = C n ). κ(ω) is monotone decreasing with respect to set inclusion on Ω. Indeed, if Ω 1 Ω 2, then for each function f analytic on Ω 2, f(a) U max{ f(z) : z Ω 1 } f(a) U max{ f(z) : z Ω 2 }. Thus, Ω 1 Ω 2 implies κ(ω 1 ) κ(ω 2 ). Since the constant functions are always among the available analytic functions on Ω, κ(ω) 1. If A is normal, κ(ω) = 1. Indeed, if A is normal and Σ denotes the set of eigenvalues of A associated with the invariant subspace U, then 1 κ(ω) = sup f H(Ω) f(a) U max{ f(z) : z Ω} = sup max{ f(λ) : λ Σ} f H(Ω) max{ f(z) : z Ω} 1. 3 [18] contains an error amounting to the tacit assumption that P good is an orthogonal projection, which is true only if U good U bad. Thus the results coincide only in this special case (note C 0 = 1). 4 For given k 1, the sets Ω that (i) contain all eigenvalues of A and (ii) satisfy κ(ω) k are called k-spectral sets and figure proently in dilation theory of operators [29].

15 1088 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI If any eigenvalue associated with the invariant subspace U is defective, then some choices of Ω will not yield a finite value for κ(ω). For example, let A =[ ] and take U = C 2 as an invariant subspace associated with the defective eigenvalue λ =0. If Ω consists of the single point {0} and f(z) =z, then evidently f(a) U = 1 but max z Ω f(z) = 0. So, no finite value of κ(ω) is possible (see [31, p. 440]). More generally, if Ω is the spectrum of a defective matrix A, then the monic polynomial consisting of a single linear factor for each distinct eigenvalue of A is zero on Ω but cannot annihilate A, as it has lower degree than the imum polynomial of A. We now use κ to adapt Theorem 3.4 into a more approachable approximation problem. In particular, if Ω good is a compact subset of C containing all the good eigenvalues of A but none of the bad, then [φ(a)ψ νp (A)] 1 Ugood κ(ω good ) max{ [φ(z)ψ νp (z)] 1 : z Ω good } = κ(ω good ) { φ(z)ψ νp (z) : z Ω good }. Applying a similar bound to φ(a)ψ νp (A) Ubad, we obtain the following result, the centerpiece of our development. Theorem 3.5. Suppose A and v 1 satisfy the conditions of Theorem 3.3. Let Ω good and Ω bad be disjoint compact subsets of C that contain, respectively, the good and bad eigenvalues of A, and suppose that none of the filter shifts {µ jk } lies in Ω good. Then, for l m, δ(u good, K l (A, v (ν) 1 )) C max{ Ψ νp (z)φ(z) : z Ω bad } 0 C 1 C 2 φ P l m { Ψ νp (z)φ(z) : z Ω good }, where C 0 and C 1 are the constants introduced in Theorems 3.3 and 3.4, respectively, and C 2 κ(ω good ) κ(ω bad ). Evidently, Theorem 3.5 can be implemented with a variety of choices for Ω good and Ω bad, which affects both the polynomial approximation problem and the constant C 2 (considered in section 5.3). The polynomial approximation problem, classified as Zolotarev-type, is discussed in detail in the next section. Similar problems arise in calculating optimal ADI parameters [26]. 4. The polynomial approximation problem. Theorem 3.5 suggests the gap between a Krylov subspace and an invariant subspace will converge to zero at a rate detered by how small polynomials of increasing degree can become on Ω bad while maintaining a imal uniform magnitude on Ω good. How can this manifest as a linear convergence rate? Consider the ansatz max{ φ(w) : w Ω bad } φ P l { φ(z) : z Ω good } = rl, for some 0 <r 1. Pick a fixed φ P l, say, with exact degree l. Then ( ) max{ φ(w) : w Ωbad } (4.1) log l log(r). { φ(z) : z Ω good } ( ) Introducing U φ (z,ω bad ) 1 l log, (4.1) is equivalent to φ(z) max{ φ(w) :w Ω bad } z Ω good U φ (z,ω bad ) log(r).

16 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1089 Evidently, the size of r will be related to how large U φ (z,ω bad ) can be made uniformly throughout Ω good ; larger U φ values allow smaller r (faster rates). U φ (z,ω bad ) has the following properties: U φ (z,ω bad ) is harmonic at z where φ(z) 0; U φ (z,ω bad ) = log z + c + o(1) for a finite constant c as z ; U φ (z,ω bad ) 0 for all z Ω bad. Potential theory provides a natural setting for studying such approximation problems. It is central to the analysis of iterative methods for solving linear systems (see, e.g., [26] for ADI methods and [10, 28] for Krylov subspace methods), and has been used by Calvetti, Reichel, and Sorensen to analyze the Hermitian Lanczos algorithm with restarts [6]. We apply similar techniques here to study U φ (z,ω bad ) Potential theory background. Let D C be a compact set whose complement, C \ D, is a connected Dirichlet region. 5 The Green s function of C \ D with pole at infinity is defined as that function, g[z,d], that satisfies the following properties: (i) g is harmonic in C \ D; (ii) lim z g[z,d] = log z + finite constant; (iii) lim z ẑ g[z,d] = 0 for all ẑ D; (iv) g[z,d] > 0 for all z C \ D. Note that property (iv) can be deduced from (i), (ii), the fact that (ii) implies that g>0for all sufficiently large z, and the maximum principle for harmonic functions. The maximum principle also shows that g[z,d] is the only function satisfying (i) (iv). Example 4.1. If C \ D is simply connected, one is assured (from the Riemann mapping theorem; see, e.g., [8, sect. VII.4]) of the existence of a function F (z) that maps C\D conformally onto the exterior of the closed unit disk C\B 1 = {z : z > 1} such that F ( ) =. Such an F must behave asymptotically as αz + O(1) as z for some constant α, since it must remain one-to-one in any neighborhood of. Since log z is harmonic for any z 0, one may check that u(z) = log F (z) is also harmonic in z wherever F (z) 0,u( ) =, and u(z) 0as z 1 from C \ D. Thus, log F (z) is the Green s function with pole at infinity for C \ D. Evidently, lim z u(z) log z log α. Notice that log z itself is the Green s function with pole at infinity for C \ B 1. Even for more complicated compact sets D, the condition that g[z,d] is harmonic everywhere outside D with a pole at restricts the rate of growth of g[z,d] near. Loosely speaking, as z becomes very large, the compact set D becomes less and less distinguishable from a disk centered at 0 (say, with radius γ), and so g[z,d] becomes less and less distinguishable from g[z,b γ ] = log z/γ = log z log γ, which is the Green s function with pole at infinity for C \ B γ = {z : z >γ}. Indeed, from property (ii), g[z,d] has growth at infinity satisfying (4.2) lim g[z,d] log z = log γ z for some constant γ>0 known as the logarithmic capacity of the set D. This γ can be thought of as the effective radius of D in the sense we ve just described. Example 4.2. Suppose Φ l (z) is a monic polynomial of degree l and let D (Φ l )={z C : Φ l (z) } 5 See [8, sect. X.4]. For our purposes here, this can be taken to mean a set having a piecewise smooth boundary with no isolated points; the effect of isolated points is addressed in section 4.3.

17 1090 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI be a family of regions whose boundaries are the -lemniscates of Φ l (z). D (Φ l )is compact for each >0, though it need not be a connected region. With an easy calculation one may verify that D (Φ l ) has the Green s function (cf. [36, p. 164]) g[z,d (Φ l )] = 1 l log ( Φl (z) Equipped with the Green s function g[z,d], we return to the analysis of the function U φ (z,d) describing the error in our approximation problem. The following result is a simplified version of the Bernstein Walsh lemma (see [36, sect. III.2]). Proposition 4.3. Let D be a compact set with piecewise smooth boundary D. Suppose u is harmonic outside D and that u(z) 0 for z D. If u(z) = log z + c + o(1) for some constant c as z, then u(z) g[z,d]. In particular, if φ(z) is any polynomial of degree l, then for each z C \ D U φ (z,d) = 1 ( ) l log φ(z) (4.3) g[z,d]. max{ φ(w) : w D} For certain special choices of D =Ω bad, the polynomial approximation problem of Theorem 3.5 can be solved exactly. Theorem 4.4. Suppose Φ l (z) is a monic polynomial of degree l. Let Ω bad = D (Φ l ) be an associated -lemniscatic set as defined in Example 4.2 and suppose Ω good is a compact subset of C such that Ω good D (Φ l )=. Then ). max{ φ(w) : w Ω bad } φ P l { φ(z) : z Ω good } = { Φ l (z) : z Ω good }. Proof. Using the Green s function for D (Φ l ) described in Example 4.2, we can rearrange (4.3) to show that for any φ P l, φ(z) max{ φ(w) : w D (Φ l )} Φ l (z) holds for all z Ω good. Equality is attained for every z C whenever φ =Φ l. Minimizing over z Ω good and then maximizing over φ P l yields (4.4) max φ P l { φ(z) : z Ω good } max{ φ(w) : w D (Φ l )} { Φ l (z) : z Ω good }. In fact, equality must hold in (4.4) since φ =Φ l is included in the class of functions over which the maximization occurs. The conclusion then follows by taking the reciprocal of both sides. More general choices for D =Ω bad will not typically yield exactly solvable polynomial approximation problems, at least for fixed (finite) polynomial degree. However, the following asymptotic result holds as the polynomial degree increases. Theorem 4.5. Let Ω bad and Ω good be two disjoint compact sets in the complex plane such that C \ Ω bad is a Dirichlet region. Then (4.5) lim l φ P l ( ) 1/l max{ φ(w) : w Ωbad } = e {g[z,ω bad]:z Ω good }, { φ(z) : z Ω good } where g[z,ω bad ] is the Green s function of C \ Ω bad with pole at infinity.

18 CONVERGENCE OF RESTARTED KRYLOV SUBSPACES 1091 Proof. The theorem is proved in [26, p. 236], where the left-hand side of (4.5) is referred to as the (l, 0) Zolotarev number. We give here a brief indication of the proof to support later discussion. Inequality (4.3) can be manipulated to yield ( ) 1/l φ l (z) e g[z,ωbad], max{ φ l (w) : w Ω bad } which in turn implies ( ) 1/l max{ φl (w) : w Ω bad } e {g[z,ω bad]:z Ω good }. { φ l (z) : z Ω good } Furthermore, one may construct polynomials L k that have as their zeros points distributed on the boundary Ω bad, the Leja points {µ 1,µ 2,...,µ k }, defined recursively so that { k µ k+1 = arg max z µ j : z Ω bad }; j=1 see [36, sect. V.1]. This sequence of Leja polynomials satisfies asymptotic optimality, ( ) 1/k L k (z) (4.6) lim = e g[z,ω bad] k max{ L k (w) : w Ω bad } for each z C\Ω bad. Convergence is uniform on compact subsets of C\Ω bad.thuswe can reverse the order of the limit with respect to polynomial degree and imization with respect to z Ω good, then take reciprocals to find (4.7) Since lim k ( ) 1/k max{ Lk (w) : w Ω bad } = e {g[z,ω bad]:z Ω good}. { L k (z) : z Ω good } ( ) 1/l max{ Ll (w) : w Ω bad } { L l (z) : z Ω good } φ P l ( ) 1/l max{ φ(w) : w Ωbad } { φ(z) : z Ω good } e {g[z,ω bad]:z Ω good }, equality must hold throughout and thus (4.5) holds. In the context of Example 4.1, where F (z) was a conformal map taking the exterior of Ω bad to the exterior of the closed unit disk with F ( ) =, Theorem 4.5 reduces to (cf. [10, Thm. 2]) lim l φ P l ( ) 1/l max{ φ(w) : w Ωbad } 1 = max { φ(z) : z Ω good } z Ω good F (z) Effective restart strategies. The usual goal in constructing a restart strategy is to limit the size of the Krylov subspace (restricting the maximum degree of the polynomial φ) without degrading the asymptotic convergence rate. Demonstrating equality in (4.5) pivoted on the construction of an optimal family of polynomials in this case, Leja polynomials. There are other possibilities, however. Fekete polynomials are the usual choice for the construction in Theorem 4.5; see [36, sect. III.1]. Chebyshev polynomials and Faber polynomials offer familiar alternatives. (For Hermi-

19 1092 CHRISTOPHER BEATTIE, MARK EMBREE, AND JOHN ROSSI tian matrices, a practical Leja shift strategy has been developed by Baglama, Calvetti, and Reichel [3] and Calvetti, Reichel, and Sorenson [6]. Heuveline and Sadkane advocate numerical conformal mapping to detere Faber polynomials for restarting non-hermitian iterations [18].) Once some optimal family of polynomials is known that solves (4.5), effective restart strategies become evident. Theorem 4.6. Let Ω good and Ω bad be two disjoint compact sets in the complex plane containing, respectively, the good and bad eigenvalues of A, and such that C \ Ω bad is a Dirichlet region. Suppose that Ψ νp (z) is the aggregate restart polynomial representing ν restarts each of order p. (a) If polynomial restarts are performed using roots of optimal polynomials for Ω bad (i.e., Ψ νp (z) are optimal polynomials of degree νp), then (4.8) ( lim max{ Ψνp (w)φ(w) : w Ω bad } ν φ P l { Ψ νp (z)φ(z) : z Ω good } ) 1 νp+l = e {g[z,ω bad]:z Ω good }, where g[z,ω bad ] is the Green s function of Ω bad with pole at infinity. (b) If the boundary of Ω bad is a lemniscate of Ψ νp Φ l, Ω bad = D (Ψ νp Φ l )={z C : Ψ νp (z)φ l (z) }, for some degree-l monic polynomial Φ l and some >0, then max{ Ψ νp (w)φ(w) : w Ω bad } φ P l { Ψ νp (z)φ(z) : z Ω good } = { Ψ νp (z)φ l (z) : z Ω good }. Proof. Part (b) follows immediately from Theorem 4.4. Part (a) can be seen by observing that since Ψ νp (z) is an asymptotically optimal family for Ω bad, max{ Ψ νp (w) : w Ω bad } { Ψ νp (z) : z Ω good } φ P l ( ) max{ Ψνp (w)φ(w) : w Ω bad } { Ψ νp (z)φ(z) : z Ω good } ( e {g[z,ω bad]:z Ω good } ) νp+l. Now fixing p and l, the conclusion follows from (4.7) by following the subsequence generated by ν =1, 2,... Recall that the desired effect of the restart polynomial is to retain the rapid convergence rate of the full (unrestarted) Krylov subspace without requiring the dimension l to grow without bound. We have seen here that restarting with optimal polynomials for Ω bad recovers the expected linear convergence rate for Ω bad (presug one can identify this set, not a trivial matter in practice). Still, the unrestarted process may take advantage of the discrete nature of the spectrum, accelerating convergence beyond the expected linear rate. Designing a restart strategy that yields similar behavior is more elaborate Superlinear effects from assimilation of bad eigenvalues. In a variety of situations, the gap appears to converge superlinearly. True superlinear convergence is an asymptotic phenomenon that has a nontrivial meaning only for nonterating iterations. Thus one must be cautious about describing superlinear effects relating to (unrestarted) Krylov subspaces, since U good is eventually completely captured by the Krylov subspace as discussed in section 2. Here our point of view follows that of [46, 48], showing the estimated gap may be bounded by a family of linearly converging

Key words. Krylov subspace methods, Arnoldi algorithm, eigenvalue computations, containment gap, pseudospectra

Key words. Krylov subspace methods, Arnoldi algorithm, eigenvalue computations, containment gap, pseudospectra CONVERGENCE OF POLYNOMIAL RESTART KRYLOV METHODS FOR EIGENVALUE COMPUTATIONS CHRISTOPHER A. BEATTIE, MARK EMBREE, AND D. C. SORENSEN Abstract. Krylov subspace methods have proved effective for many non-hermitian