Nonlinear Eigenvalue Problems

Size: px

Start display at page:

Download "Nonlinear Eigenvalue Problems"

Joleen Stanley
6 years ago
Views:

1 115 Nonlinear Eigenvalue Problems Heinrich Voss Hamburg University of Technology Basic Properties Analytic matrix functions Variational Characterization of Eigenvalues General Rayleigh Functionals Methods for dense eigenvalue problems Iterative projection methods Methods using invariant pairs The infinite Arnoldi method References This chapter considers the nonlinear eigenvalue problem to find a parameter λ such that the linear system T (λ)x = 0 (115.1) has a nontrivial solution x, where T ( ) : D C n n is a family of matrices depending on a complex parameter λ D. It generalizes the linear eigenvalue problem Ax = λx, A C n n, where T (λ) = λi A, and the generalized linear eigenvalue problem where T (λ) = λb A, A, B C n n. Nonlinear eigenvalue problems T (λ)x = 0 arise in a variety of applications in science and engineering, such as the dynamic analysis of structures, vibrations of fluid solid structures, the electronic behavior of quantum dots, and delay eigenvalue problems, to name just a few. Due to its wide range of applications, the quadratic eigenvalue problem T (λ)x = λ 2 Mx + λcx + Kx = 0 is of particular interest, but also polynomial, rational and more general eigenvalue problems appear. A standard approach for investigating or numerically solving polynomial eigenvalue problems is linearization where the original problem is transformed into a generalized linear eigenvalue problem with the same spectrum. Details on linearization and structure preservation are discussed in Chapter 102, Matrix Polynomials. This chapter is concerned with the general nonlinear eigenvalue problem which in general can not be linearized. Unlike for linear and polynomial eigenvalue problems there may exist infinitely many eigenvalues. In practice, however, one is usually interested only in a few eigenvalues close to a target value or a line in the complex plane. If T is linear then T (λ) = T (0)+λT (0) has the form of a generalized eigenvalue problem, and in the general case linerization gives the approximation T (λ) = T (0) + λt (0) + O(λ 2 ), which is again a generalized linear eigenvalue problem. Hence, it is not surprising, that the (elementwise) derivative T (λ) of T (λ) plays an important role in the analysis of nonlinear eigenvalue problems. We tacitly assume in the whole chapter that whenever a derivative T (ˆλ) appears, T is analytic in a neighborhood of ˆλ or in the real case T : D R n n, D R that T is differentiable in a neighborhood of ˆλ. always denotes the Euclidean and spectral norm, respectively, and we use the notation [x; y] := [x T, y T ] T for column vectors

2 115-2 Handbook of Linear Algebra Basic Properties This section presents basic properties of the nonlinear eigenvalue problem (115.1) Definitions: As for a linear eigenvalue problem, ˆλ D is called an eigenvalue of T ( ) if T (ˆλ)x = 0 has a nontrivial solution ˆx 0. Then ˆx is called a corresponding eigenvector or right eigenvector, and (ˆλ, ˆx) is called eigenpair of T ( ). Any nontrivial solution ŷ 0 of the adjoint equation T (ˆλ) y = 0 is called left eigenvector of T ( ) and the vector-scalar-vector triplet (ŷ, ˆλ, ˆx) is called eigentriplet of T ( ). The eigenvalue problem (115.1) is regular if det T (λ) 0, and otherwise it is called singular. The spectrum σ(t ( )) of T ( ) is the set of all eigenvalues of T ( ). An eigenvalue ˆλ d of T ( ) has algebraic multiplicity k if l det(t (λ)) dλ = 0 for l = 0,..., k l λ=ˆλ 1 and dk det(t (λ)) dλ 0. k λ=ˆλ An eigenvalue ˆλ is simple if its algebraic multiplicity is one. The geometric multiplicity of an eigenvalue ˆλ is the dimension of the kernel ker(t (ˆλ)) of T (ˆλ). An eigenvalue ˆλ is called semi-simple if its algebraic and geometric mutiplicity coincide. T ( ) : J R n n is real symmetric if T (λ) = T (λ) T for every λ J R. T ( ) : D C n n is complex symmetric if T (λ) = T (λ) T for every λ D. T ( ) : D C n n is Hermitian if D is symmetric with respect to the real line and T (λ) = T ( λ) for every λ D. Facts: 1. For A C n n and T (λ) = λi A, the terms eigenvalue, (left and right) eigenvector, eigenpair, eigentriplet, spectrum, algebraic and geometric multiplicity and semi-simple have their standard meaning. 2. For linear eigenvalue problems, eigenvectors corresponding to distinct eigenvalues are linearly independent, which is not the case for nonlinear eigenvalue problems (cf. Example 1). left and right eigenvectors corresponding to distinct eigenvalues are orthogonal, which does not hold for nonlinear eigenproblems (cf. Example 2). the algebraic multiplicities of eigenvalues sum up to the dimension of the problem, whereas for nonlinear problems there may exist an infinite number of eigenvalues (cf. Example 2) and an eigenvalue may have any algebraic multiplicity (cf. Example 3). 3. [Sch08] If ˆλ is an algebraically simple eigenvalue of T ( ), then ˆλ is geometrically simple. 4. [Neu85, Sch08] Let (ŷ, ˆλ, ˆx) be an eigentriplet of T ( ). Then ˆλ is algebraically simple if and only if ˆλ is geometrically simple and ŷ T (ˆλ)ˆx [Sch08] Let D C and E C d be open sets. Let T : D E C n n be continuously differentiable, and let ˆλ be a simple eigenvalue of T (, 0) and ˆx and ŷ right and left eigenvectors with unit norm. Then the first order perturbation expansion at ˆλ reads as follows: λ(ε) ˆλ 1 = ŷ T (ˆλ, 0)ˆx d j=1 ε j ŷ T ε j (ˆλ, 0)ˆx + o( ε ).

3 Nonlinear Eigenvalue Problem The normwise condition number for ˆλ is given by λ(ε) κ(ˆλ) = lim sup ˆλ 1 = d 2 T (ˆλ, 0)ˆx ε 0 ε ŷ T ŷ (ˆλ, 0)ˆx ε j. j=1 6. [Sch08] Let (ŷ, ˆλ, ˆx) be an eigentriplet of T ( ) with simple eigenvalue ˆλ. Then for sufficiently small ˆλ λ T (λ) 1 = 1 ˆxŷ λ ˆλ ŷ T (ˆλ)ˆx + O(1). 7. [Neu85] Let ˆλ be a simple eigenvalue of T ( ), and let ˆx be a right eigenvector normalized such that e ˆx = 1 for some vector e. Then the matrix B := T (ˆλ) + T (ˆλ)ˆxe is nonsingular. 8. If T ( ) is real symmetric and λ is a real eigenvalue, then left and right eigenvectors corresponding to λ coincide. 9. If T ( ) is complex symmetric and x is a right eigenvector, then x is a left eigenvector corresponding to the same eigenvalue. 10. If T ( ) is Hermitian, then eigenvalues are real (and left and right eigenvectors corresponding to λ coincide) or they come in pairs, i.e. if (y, λ, x) is an eigentriplet of T ( ), then this is also true for (x, λ, y). Examples: 1. For the quadratic eigenvalue problem T (λ)x = 0 with [ ] [ ] [ ] T (λ) := + λ + λ (115.2) the distinct eigenvalues [ λ = ] 1 and λ = 2 share the eigenvector [1; 2]. e 2. Let T (λ)x := iλ2 1 x = 0. Then T (λ)x = 0 has a countable set of eigenvalues 2kπ, 1 1 k N {0}. ˆλ = 0 is an algebraically double and geometrically simple eigenvalue with left and right eigenvectors ˆx = ŷ = [1; 1], and ŷ T (0)ˆx = 0. Every ˆλ k = 2kπ, k 0 is algebraically and geometrically simple with the same eigenvectors ˆx, ŷ as before, and ŷ T (ˆλ k )ˆx = 2 2kπi T (λ) = (λ k ), k N has the eigenvalue ˆλ = 0 with algebraic multiplicity k Analytic matrix functions In this section we consider the eigenvalue problem (115.1) where T ( ) : D C n n is a regular matrix function which is analytic in a neighborhood of an eigenvalue ˆλ. Definitions: A sequence of vectors x 0, x 1,..., x r 1 is called a Jordan chain (of length r) corresponding to ˆλ if x 0 0 and l 1 d k T (ˆλ) k! dλ k x l k = 0 for l = 0,..., r 1. k=0 λ=ˆλ

4 115-4 Handbook of Linear Algebra x 0 is an eigenvector and x 1,..., x r 1 are generalized eigenvectors. Let x 0 be an eigenvector corresponding to an eigenvalue ˆλ. The maximal length of a Jordan chain that starts with x 0 is called the multiplicity of x 0. An eigenvalue ˆλ is is said to be normal if it is a discrete point in σ(t ( ))) and the multiplicity of each corresponding eigenvector is finite. An analytic function x : D C n is called root function of T ( ) at ˆλ D if T (ˆλ)x(ˆλ) = 0 and x(ˆλ) 0. The multiplicity of ˆλ as a zero of T (λ)x(λ) is called the multiplicity of x( ). The rank of an eigenvector x 0 is the maximum of the multiplicities of all root functions x( ) such that x(ˆλ) = x 0. A root function x( ) is called a maximal root function if the multiplicity of x( ) is equal to the rank of x 0 := x(ˆλ). Let x (1) 0 ker T (ˆλ) be an eigenvector with maximal rank and let x (1) (λ) = j=0 x(1) j (λ ˆλ) j be a maximal root function such that x (1) (ˆλ) = x (1) 0. Suppose that the root functions x(k) (λ) = j=0 x(k) j (λ ˆλ) j, k = 1,..., i 1 are already constructed, and let x (i) 0 be an eigenvector with maximal rank in some direct complement to the linear span of the vectors x (1) 0,..., x(i 1) 0 in ker T (ˆλ). Let x (i) (λ) = j=0 x(i) j (λ ˆλ) j be a maximal root function such that x (i) (ˆλ) = x (i) 0. Then the ordered set x (1) 0,..., x(1) r 1 1, x(2) 0,..., x(2) r 2 1,..., x(k) 0,..., x(k) r k 1, where k = dim ker T (ˆλ) and r j = rank x (j) 0 is called canonical set of Jordan chains, and the ordered set x (1) (λ),..., x (k) (λ) is called canonical system of root functions. Let X C n α contain in its columns the vectors of a canonical set of Jordan chains and let J = diag(j 1,..., J k ), where J j is a Jordan block of size r j r j corresponding to ˆλ. Then the pair (X, J) is called a Jordan pair. Let x (1) (λ),..., x (k) (λ) be a canonical system of root functions at ˆλ, and let x (k+1),..., x (n) C n such that x (1) (ˆλ),..., x (k) (ˆλ), x (k+1),..., x (n) is a basis of C n. Then the system x (1) (λ),..., x (k) (λ), x (k+1),..., x (n) is called an extended canonical system of root functions. To the constant functions x (k+1),..., x (n) C n (which are not root functions in the strict sense of the definition) is assigned the multiplicity 0. Let ˆλ be an eigenvalue of T ( ), and let Φ( ) be an analytic matrix function such that its columns form an extended canonical system of root functions of T ( ) at ˆλ. Then (cf. [GKS93]) in a neighborhood of ˆλ, L(λ)Φ(λ) = P (λ)d(λ), (115.3) where D(λ) is a diagonal matrix with diagonal entries (λ ˆλ) κ1,..., (λ ˆλ) κn and P ( ) is a matrix function analytic at ˆλ such that det P (ˆλ) 0. Furthermore, the exponents κ 1,..., κ n are the multiplicities of the columns of Φ( ), also called partial multiplicities of T ( ) at ˆλ. (115.3) is the local Smith form of T ( ) in a neighborhood of ˆλ. A pair of matrices (Y, Z) C n p C p p is a regular pair if for some integer l 1 rank Y Y Z. Y Z l 1 = p. The number p is called the order of the regular pair (Y, Z). Facts: The following facts for which no specific reference is given can be found in [GLR82, GR81].

5 Nonlinear Eigenvalue Problem In contrast to linear eigenvalue problems the vectors in a Jordan chain need not be linearly independent. Even the zero vector is admissible as a generalized eigenvector. 2. Let x( ) be a root function at ˆλ, and let x (j) denote the jth derivative of x. Then the vectors x j := x (j) (ˆλ), j = 0,..., r 1 form a Jordan chain at ˆλ, where r denotes the multiplicity of x( ). 3. The multiplicity of a root function at ˆλ (and hence the rank of an eigenvector) is at most the algebraic multiplicity of ˆλ. 4. The numbers r 1 r k in a Jordan pair are uniquely determined. 5. The number α := r r k is the algebraic multiplicity of the eigenvalue ˆλ. 6. [GKS93] Let y 1,..., y l : D C n be a set of root functions at ˆλ with multiplicities s 1 s l such that y 1 (ˆλ),..., y l (ˆλ) ker T (ˆλ) are linearly independent. If the root functions x 1,..., x k define a canonical set of Jordan chains of T ( ) at ˆλ with multiplicities r 1 r k, then k l and r i s i for i = 1,..., l. Moreover, y 1,..., y l define a canonical set of Jordan chains of T ( ) at ˆλ if and only if l = k and s j = r j for j = 1,..., l. 7. Let S( ) be an analytic matrix function with det S(ˆλ) 0. x 0,..., x k is a Jordan chain of T ( )S( ) corresponding to ˆλ if and only if the vectors y 0,..., y k given by y j = j i=0 1 i! S(i) (ˆλ)x j i, j = 0,..., k 1 is a Jordan chain of T ( ) corresponding to ˆλ. 8. For S as in the last fact the Jordan chains of T ( ) coincide with those of S( )T ( ) corresponding to the same ˆλ. 9. Two regular analytic matrix functions T 1 ( ) and T 2 ( ) have a common Jordan pair at ˆλ if and only if T 2 (λ)t1 1 (λ) is analytic and invertible at ˆλ. 10. [GKS93] Let T ( ), Φ( ), D( ) and P ( ) be regular n n matrix functions, analytic at ˆλ, such that L(λ)Φ(λ) = P (λ)d(λ) in a neighborhood of ˆλ. Assume that Φ(ˆλ) is invertible and that D( ) is a diagonal matrix polynomial with diagonal entries (λ ˆλ) κ1,..., (λ ˆλ) κn, where κ 1 κ n. Then the following three conditions are equivalent: (i) the columns of Φ( ) form an extended canonical system of root functions of T ( ) at ˆλ with partial multiplicities κ 1,..., κ n (ii) det P (ˆλ) 0 (iii) n j=1 κ j is the the algebraic multiplicity of ˆλ. 11. [GKS93] Let x(λ) = j=0 (λ ˆλ) j x j be an analytic C n -vector function with x 0 0, and set X := [x 0,..., x p ]. Then x( ) is a root function of T ( ) at ˆλ of multiplicity at most p if and only if T (λ)x(λi Jˆλ,p ) 1 is an n p analytic matrix function. Here Jˆλ,p denotes a p p Jordan block with eigenvalue ˆλ. 12. [AST09] T ( ) admits a representation P (λ)t (λ)q(λ) = D(λ) where P ( ) and Q( ) are regular analytic matrix functions with constant nonzero determinants, and D(λ) = diag {d j (λ)) j=1...,n is a diagonal matrix of analytic functions such that d j (λ)/d j 1 (λ) are analytic for j = 2, 3,..., n. This representation is also called local Smith form. 13. [AST09] With the representation in the last fact, if q j (λ) is the jth column of Q, and ˆλ a zero of d j ( ), then (ˆλ, q j (ˆλ)) is an eigenpair of T ( ). 14. The non-zero partial multiplicities κ j in the local Smith form of T ( ) at ˆλ coincide with the lengths r 1 r k of Jordan chains in a canonical set. 15. A Jordan pair (X, J) of T ( ) at an eigenvalue ˆλ is regular. 16. [GR81] Let ˆλ be an eigenvalue of T ( ) with algebraic multiplicity α, and let (Y, Z) C n α C α α be a pair of matrices such that σ(z) = {ˆλ}. (Y, Z) is similar to a Jordan

6 115-6 Handbook of Linear Algebra pair (X, J) (i.e. Y = XS and Z = S 1 JS for some invertible matrix S) if and only if (Y, Z) is regular and the following equation holds: T j Y (T ˆλI) j = 0, i=0 where T j = 1 j! T (j) (ˆλ) (note that only a finite number of terms in the left-hand side of the equation is different from zero, because σ (Z) = {ˆλ}). 17. [HL99] Suppose that A(λ) and B(λ) are analytic matrix-valued functions such that A(ˆλ) and B(ˆλ) are non-singular. Then the partial multiplicities of the eigenvalue ˆλ of T (λ) and T (λ) := B(λ)A(λ)C(λ) coincide. 18. [HL99] Suppose that a matrix-valued function T (λ, τ) depends analytically on λ and continuously on τ and that ˆλ = 0 is an eigenvalue of T (, 0) of algebraic multiplicity α. Then there exists a neighborhood O of ˆλ such that, for all τ sufficiently close to the origin, there are exactly α eigenvalues (counting with algebraic multiplicities) of the matrix-valued function T (, τ) in O. Examples: [ λ 2 ] λ 1. [GLR82] For T (λ) = 0 λ 2 we have det T (λ) = λ 4, and hence ˆλ = 0 is an eigenvalue of T ( ) with algebraic multiplicity 4 and geometric multiplicity 2. For an eigenvector [ x 0 = ] [x 01 ; x 02 ] the first generalized eigenvector x 1 satisfies T (0)x T (0)x 1 = 0, i.e. x 0 = 0, and x 1 exists if and only if x 02 = 0, and x 1 can be taken 0 0 completely arbitrary. For a second generalized eigenvector x 2 we have 2 1 T (0)x 0 +T (0)x 1 + [ ] [ ] T (0)x 2 = 0, i.e. x 0 + x 1 = 0, i.e. x 01 = x 12, and if this equation is satisfied, x[ 2 can ] be chosen [ arbitrarily. ] The condition for the third generalized eigenvector x 3 reads x 1 + x 2 = 0, which implies x 12 = 0 and is contradictory To summarize, the length of a Jordan chain can not exceed 3. Jordan chains of length 1 are x 0, x 0 0, Jordan chains of length 2 are x 0 = [x 01 ; 0], x 1 with x 01 0 and x 1 arbitrary, and Jordan chains of length 3 are x 0 = [x 01 ; 0], x 1 = [x 11 ; x 01 ], x 2, where x 01 0, and x 11 and x 2 are arbitrary. One example of a canonical system of Jordan chains is x (1) 0 = [1; 0], x (1) 1 = [0; 1], x (1) 2 = [1; 1], x (2) 0 = [0; 1]. T (0) = 0 implies that x( ) is a root function at ˆλ = 0, if x 1 and x 2 are analytic and x(0) 0. T (λ)x(λ) = [λ 2 x 1 (λ) λx 2 (λ); λ 2 x 2 (λ)] = 0 yields that x has at least the multiplicity 2, and if x 2 (λ) = λx 1 (λ), then the multiplicity is 3, and a higher multiplicity is not possible. In the latter case one obtains a Jordan chain as [x 1 (0); 0], [x 1(0); x 1 (0)], [x 1(0); 2x 1(0)]. 2. For the quadratic eigenvalue problem in (115.2), det T (λ) = λ 4 λ 3 3λ 2 + λ + 2. Hence, ˆλ = 1 is an eigenvalue with algebraic multiplicity 2 and geometric multiplicity 1. From T ( 1)x 0 = [ ] 6 6 x 0 = [ ] 0, T ( 1)x 1 +T ( 1)x 0 = 0 [ ] 6 6 x [ ] 5 5 x 0 = it follows that x = [1; 1] [ is an ] eigenvector [ corresponding ] to ˆλ, and a generalized eigenvector as well. Then for X = and J = the pair (X, J) is a regular pair of order 2, namely the Jordan pair corresponding to ˆλ = 1. [ ] 0 0

7 Nonlinear Eigenvalue Problem Variational Characterization of Eigenvalues Variational characterizations of eigenvalues are very powerful tools when studying selfadjoint linear operators on a Hilbert space. Many things can be easily proved using these characterizations; for example, bounds for eigenvalues, comparison theorems, interlacing results and monotonicity of eigenvalues, to name just a few. This section presents similar results for nonlinear eigenvalue problems. A minmax characterization was first proved in [Duf55] for overdamped quadratic eigenproblems, generalized in [Rog64] to general overdamped, and in [VW82] to non-overdamped problems. Although the characterizations also hold for infinite dimensional problems [Had68, VW82] the presentation here is restricted to the finite dimensional case. We assume in this whole section that J R is an open interval (which may be unbounded), and we consider a family of Hermitian matrices T : J C n n depending continuously on the parameter λ J, such that the following two conditions are satisfied (i) For every x C n, x 0 the real equation (ii) f(λ; x) := x T (λ)x = 0 (115.4) has at most one solution λ =: p(x) in J. Then (115.4) implicitly defines a (nonlinear) functional on some domain D(p). Definitions: (λ p(x))f(λ; x) > 0 for every x D(p) and every λ J, λ p(x). (115.5) The functional p : D(p) J is called the Rayleigh functional. If D(p) = C n \ {0}, then the problem T (λ)x = 0 is called overdamped. An eigenvalue ˆλ J of T ( ) is a jth eigenvalue if µ = 0 is the j largest eigenvalue of the matrix T (ˆλ). Facts: In this subsection we denote by S j the set of all j dimensional subspaces of C n. The following facts for which no specific reference is given can be found in [Had68, VW82, Vos09]. 1. D(p) is an open set in C n. 2. p(αx) = p(x) for every x D(p) and every α C \ {0}. 3. If T ( ) is differentiable in a neighborhood of an eigenvalue ˆλ and ˆx T (ˆλ)ˆx 0 for a corresponding eigenvector ˆx, then ˆx is a stationary point of p, i.e. p(ˆx + h) p(ˆx) = o( h ). In the real case T : J R n n, J R, we have p(ˆx) = For every j {1,..., n} there is at most one jth eigenvalue of T ( ). 5. T ( ) has at most n eigenvalues in J. 6. [Rog64] If T ( ) is overdamped, then T ( ) has exactly n eigenvalues in J. 7. If λ j := inf V S j, V D(p) sup x V D(p) p(x) J, then λ j is a jth eigenvalue of T ( ). 8. (minmax characterization) If λ j J is a jth eigenvalue of T ( ), then λ j := min V S j, V D(p) max x V D(p) p(x) J. The minimum is attained for an invariant subspace of the matrix T (λ j ) corresponding to its j largest eigenvalues. The maximum is attained for some x ker T (λ j ).

8 115-8 Handbook of Linear Algebra 9. Let λ 1 := inf x D(p) p(x) J and λ j J for some j {1,..., n}. Then for every k {1,..., j} there exists U k S k with U k D(p) {0} and λ k := max x Uk, x 0 p(x). Hence, λ k := min V S k, V D(p) {0} max x V, x 0 p(x) J for k = 1,..., j. 10. [Vos03] (maxmin characterization) Assume that there exists a jth eigenvalue λ j J. Then λ j = p(x). max V S j 1, V D(p) inf x V D(p) The maximum is attained for every invariant subspace of T (λ j ) corresponding to its j 1 largest eigenvalues. 11. Let λ j J be a jth eigenvalue of T ( ) and λ J. Then < λ = λ x T (λ)x < j max min = V S > j x V,x 0 x x 0. > 12. (orthogonality) [Had68] Let T ( ) be differentiable in J. Then eigenvectors can be chosen orthogonal with respect to the generalized scalar product T (p(x)) T (p(y)) y x, if p(x) p(y) [x, y] := p(x) p(y) y T (p(x))x, if p(x) = p(y) which is symmetric and homogeneous, but in general is not bilinear. If T ( ) is differentiable und condition (ii) strengthened to x T (p(x))x > 0 for every x D, then [, ] is definite, i.e. [x, x] > 0 for every x D(p). 13. (Rayleigh s principle) Assume that J contains λ 1,..., λ j 1 where λ k is a kth eigenvalue of T ( ), and let x k, k = 1,..., j 1 be a corresponding eigenvectors. If λ j := inf{p(x) : x D(p), [x, x k ] = 0, k = 1,..., j 1} J, then λ j is a jth eigenvalue of T ( ). 14. (Sylvester s law; overdamped case) Assume that T ( ) is overdamped. For σ J let (π, ν, δ) be the inertia of T (σ). Then T ( ) has π eigenvalues that are smaller than σ, ν eigenvalues that exceed σ, and if δ 0, then σ is an eigenvalue of multiplicity δ. 15. (Sylvester s law; extreme eigenvalues) Assume that T (µ) is negative definite for some µ J, and for σ > µ let (π, ν, δ) be the inertia of T (σ). Then T ( ) has exactly π eigenvalues in J that are smaller than σ. 16. (Sylvester s law; general case) Let µ J, and assume that for every r dimensional subspace V with V D(p) there exists x V D(p) with p(x) > µ. For σ J, σ > µ let (π, ν, δ) be the inertia of T (σ). Then for j = r,..., π there exists a jth eigenvalue λ j of T ( ) in [µ, σ). Examples: 1. [Duf55] The quadratic pencil Q(λ) := λ 2 A + λb + C with positive definite A, B, C C n n is overdamped if and only if d(x) := (x Bx) 2 4(x Ax)(x Cx) > 0 for every x C n \ {0}. For x 0 the quadratic equation x Q(λ)x = 0 has two real solutions p ± (x) = ( x Bx ± d(x))/(2x Ax), and γ := sup x 0 p (x) < γ + := inf x 0 p + (x). Q( ) has n eigenvalues in (, γ + ) which are minmax values of p and n eigenvalues in (γ, 0) which are minmax values of p +.

9 Nonlinear Eigenvalue Problem Assume that Q( ) as in the last example is not necessarily overdamped, and let in(q(σ)) = (π, ν, δ) denote the inertia of Q(σ). If σ < γ + := inf x 0 {p + (x) : p + (x) R}, then Q( ) has exactly ν eigenvalues in (, σ), and if σ > γ := sup x 0 {p (x) : p (x) R}, then Q( ) has ν eigenvalues in (σ, 0). If µ min and µ max are the minimal and maximal eigenvalues of Cx = µax, then µ max γ + and µ min γ. If κ min and κ max are the minimal and maximal eigenvalues of Cx = κbx, respectively, then 2κ max γ + and 2κ min γ General Rayleigh Functionals Whereas Section presupposes the existence and uniqueness of a Rayleigh functional for problems allowing for a variational characterization, this section extends its definition to more general eigenproblems. It collects results on the existence and approximation properties of a Rayleigh functional in a vicinity of eigenvectors corresponding to algebraically simple eigenvalues. The material in this section is mostly taken from [Sch08, SS10]. Definitions: Let T : D C n n be a matrix valued mapping, which is analytic, or which is differentiable with Lipschitz continuous derivative in the real case. Let (ˆλ, ˆx) be an eigenpair of T ( ), and define neighborhoods B(ˆλ, τ) := {λ C : λ ˆλ < τ} and K ε(ˆx) := {x C n : (Span{x}, Span{ˆx}) ε} of ˆλ and ˆx, respectively. p : K ε B(ˆλ, τ) is a (one-sided) Rayleigh functional if the following conditions hold: (i) p(αx) = p(x) for every α C, α 0 (ii) x T (p(x))x = 0 for every x K ε(ˆx) (iii) x T (p(x))x 0 for every x K ε(ˆx). Let (ŷ, ˆλ, ˆx) be an eigentriplet of T ( ). p : K ε(ˆx) K ε(ŷ) B(ˆλ, τ) is a two-sided Rayleigh functional if the following conditions hold for every x K ε(ˆx) and y K ε(ŷ): (i) p(αx, βy) = p(x, y) for every α, β C \ {0}, (ii) y T (p(x, y))x = 0, (iii) y T (p(x, y))x 0. The generalized Rayleigh quotient (which was introduced in [Lan61] only for polynomial eigenvalue problems) is defined as p L : K ε(ˆx) B(ˆλ, τ) K ε(ŷ) B(ˆλ, τ), p L (y, λ, x) := λ y T (λ)x y T (λ)x. Facts: The following facts can be found in [Sch08, SS10]. 1. Let (ŷ, ˆλ, ˆx) be an eigentriplet of T ( ) with ˆx = ŷ = 1, and assume that ŷ T (ˆλ)ˆx 0. Then there exist ε > 0 and τ > 0 such that the two-sided Rayleigh functional is defined in K ε (ˆx) K ε (ŷ), and p(x, y) ˆλ 8 T (ˆλ) tan ξ tan η, 3 ŷ T (ˆλ)ˆx where ξ := (Span{x}, Span{ˆx}) and η := (Span{y}, Span{ŷ}).

10 Handbook of Linear Algebra 2. Under the conditions of Fact 1 let ξ < π/3 and η < π/3. Then p(x, y) ˆλ 32 3 T (ˆλ) x ˆx y ŷ. ŷ T (ˆλ)ˆx 3. Under the conditions of Fact 1 the two-sided Rayleigh functional is stationary at (ˆx, ŷ), i.e. p(ˆx + s, ŷ + t) ˆλ = O(( s + t ) 2 ). 4. Let (ˆλ, ˆx) be an eigenpair of T ( ) with ˆx = 1 and ˆx T (ˆλ)ˆx 0, and suppose that T (ˆλ) = T (ˆλ). Then there exist ε > 0 and τ > 0 such that the one-sided Rayleigh functional p( ) is defined in K ε (ˆx), and p(x) ˆλ 8 T (ˆλ) 3 ˆx T (ˆλ)ˆx tan2 ξ, where ξ := (Span{x}, Span{ˆx}). 5. Let (ˆλ, ˆx) be an eigenpair of T ( ) with ˆx = 1 and ˆx T (ˆλ)ˆx 0. Then there exist ε > 0 and τ > 0 such that the one-sided Rayleigh functional p( ) is defined in K ε (ˆx), and p(x) ˆλ 10 T (ˆλ) tan ξ, 3 ˆx T (ˆλ)ˆx where ξ := (Span{x}, Span{ˆx}). 6. Let ˆx be a right eigenvector of T ( ) corresponding to ˆλ, and ˆx T (ˆλ)ˆx 0. The one-sided Rayleigh functional p is only stationary at ˆx if ˆx is also a left eigenvector. 7. The generalized Rayleigh quotient p L is obtained when applying Newton s method to the equation defining the two-sided Rayleigh functional for fixed x and y. 8. [Lan61] Let (ŷ, ˆλ, ˆx) be an eigentriplet of T ( ) with ŷ T (ˆλ)ˆx 0. Then the generalized Rayleigh quotient p L is stationary at (ŷ, ˆλ, ˆx). 9. Under the conditions of Fact 1 the generalized Rayleigh quotient p L is defined for all λ B(ˆλ, τ) and (x, y) K ε (ˆx) K ε (ŷ), and p L (y, λ, x) ˆλ 4 T (ˆλ) ŷ T (ˆλ)ˆx tan ξ tan η + 2L ŷ T (ˆλ)ˆx where L denotes the Lipschitz constant of T ( ). λ ˆλ 2 cos ξ cos η, Methods for dense eigenvalue problems The size of the eigenvalue problems that can be treated with the numerical methods considered in this section is limited to a few thousands depending on the available storage capacities. Moreover, they require several factorizations of varying matrices to approximate one eigenvalue, and therefore, they are not appropriate for large and sparse problems. However, they are needed to solve the projected eigenproblem in most of the iterative projection methods for sparse problems. For general nonlinear eigenvalue problems, the classical approach is to formulate the eigenvalue problem as a system of nonlinear equations and to use variations of Newton s method or the inverse iteration method. Thus, these methods are local and therefore not guaranteed to converge, but as for linear eigenvalue problems their basin of convergence can be enlarged using homotopy methods [DP01] or trust region strategies [YMW07].

11 Nonlinear Eigenvalue Problem Facts: 1. [Kub70] Let T (λ)p (λ) = Q(λ)R(λ) be the QR factorization of T (λ), where P (λ) is a permutation matrix which is chosen such that the diagonal elements r jj (λ) of R(λ) are decreasing in magnitude, i.e. r 11 (λ) r 22 (λ) r nn (λ). Then λ is an eigenvalue of T ( ) if and only if r nn (λ) = 0. Applying Newton s method to this equation, one obtains the iteration 1 λ k+1 = λ k e T n Q(λ k ) T (λ k )P (λ k )R(λ k ) 1 e n for approximations to an eigenvalue of problem T (λ)x = 0, where e n denotes the nth unit vector. Approximations to left and right eigenvectors can be obtained from y k = Q(λ k )e n and x k = P (λ k )R(λ k ) 1 e n. However, this relatively simple idea is not efficient, since it computes eigenvalues one at a time and needs several O(n 3 ) factorizations per eigenvalue. It is, however, useful in the context of iterative refinement of computed eigenvalues and eigenvectors. 2. [AR68] Applying Newton s method to the nonlinear system ( ) T (λ)x F (x, λ) := v = 0 x 1 where v C n is suitably chosen one obtains the inverse iteration given in Algorithm 1. Being a variant of Newton s method it converges locally and quadratically for simple eigenpairs. Algorithm 1: Inverse iteration Require: Initial pair (λ 0, x 0 ) and normalization vector v with v x 0 = 1 1: for k = 0, 1, 2,... until convergence do 2: solve T (λ k )u k+1 = T (λ k )x k for u k+1 3: λ k+1 λ k (v x k )/(v u k+1 ) 4: normalize x k+1 u k+1 /v u k+1 5: end for 3. If T ( ) is Hermitian such that the general conditions of Section are satisfied, one obtains the Rayleigh functional iteration if the update of λ k+1 in step 3 of Algorithm 1 is replaced with λ k+1 p(u k+1 ). This method converges locally and cubically [Rot89] for simple eigenpairs. 4. [Lan61, Ruh73] Replacing the vector v in the normalization step of inverse iteration for a general matrix function T ( ) with v k = T (λ k ) y k, where y k is an approximation to a left eigenvector, the update for λ becomes λ k+1 λ k y k T (λ k)x k y k T (λ k )x k, which is the generalized Rayleigh quotient p L. 5. [Sch08] For general T ( ) and simple eigentriplets (ŷ, ˆλ, ˆx) cubic convergence is also achieved by the two-sided Rayleigh functional iteration in Algorithm 2. If the linear system in step 2 is solved by factorizing T (λ k ), then taking the conjugate transpose the factorization can be reused for the system in step 3. So, the cost of one iteration step is similar to the one of the one-sided Rayleigh functional iteration.

12 Handbook of Linear Algebra Algorithm 2: Two-sided Rayleigh functional iteration Require: Initial triplet (y 0, λ 0, x 0 ) where x 0x 0 = y 0y 0 = 1 1: for k = 0, 1, 2,... until convergence do 2: solve T (λ k )u k+1 = T (λ k )x k for u k+1 ; x k+1 u k+1 / u k+1 3: solve T (λ k ) v k+1 = T (λ k ) y k for v k+1 ; y k+1 v k+1 / v k+1 4: solve y k+1 T (λ k+1)x k+1 = 0 for λ k+1 5: end for 6. [Neu85] The cost for solving a linear system in each iteration step with a varying matrix is avoided in the residual inverse iteration in Algorithm 3 where the matrix T (λ 0 ) is fixed during the whole iteration (or at least for several steps). Algorithm 3: Residual inverse iteration Require: Initial pair (λ 0, x 0 ) and normalization vector w with w x 0 = 1 1: for k = 0, 1, 2,... until convergence do 2: solve w T (λ 0 ) 1 T (λ k+1 )x k = 0 for λ k+1 3: solve T (λ 0 )u k = T (λ k+1 )x k for u k 4: set v k+1 x k u k and normalize x k+1 v k+1 /w v k+1 5: end for If T ( ) is Hermitian and ˆλ R, then the convergence can be improved by determining λ k+1 in step 1 via the Rayleigh functional, i.e. solving x k T (λ k+1)x k = 0 for λ k+1. If T ( ) is twice continuously differentiable and ˆλ is algebraically simple, then the residual inverse iteration converges for all (λ 0, x 0 ) sufficiently close to (ˆλ, ˆx), and x k+1 ˆx x k ˆx = O( λ 0 ˆλ ) and λ k+1 ˆλ = O( x k ˆx t ) where t = 2 in the Hermitian case if λ k+1 is updated via the Rayleigh functional, and t = 1 in the general case. 7. [Ruh73] The first order approximation T (λ + σ)x = T (λ)x + σt (λ)x + o( σ ) suggests the method of successive linear problems in Algorithm 4, which also converges quadratically for simple eigenvalues. Algorithm 4: Method of successive linear problems Require: Initial approximation λ 0 1: for k = 0, 1,... until convergence do 2: solve the linear eigenproblem T (λ k )u = θt (λ k )u 3: choose an eigenvalue θ smallest in modulus 4: λ k+1 = λ k θ 5: end for If ˆλ is a semi-simple eigenvalue, x k converges to a right eigenvector ˆx. If ŷ is a left eigenvector corresponding to ˆλ such that ŷ T (ˆλ)ˆx 0 (which is guaranteed for a simple eigenvalue), then the convergence factor is given by (cf. [Jar12]) λ k+1 c := lim ˆλ k (λ k ˆλ) = 1 ŷ T (ˆλ)ˆx 2 2 ŷ T (ˆλ)ˆx.

13 Nonlinear Eigenvalue Problem [Wer70] If the nonlinear eigenvalue problem allows for a variational characterization of its eigenvalues, then the safeguarded iteration, which aims at a particular eigenvalue, is a natural choice. Algorithm 5: Safeguarded iteration for determining an mth eigenvalue Require: Approximation λ 0 to an mth eigenvalue 1: for k = 0, 1,... until convergence do 2: determine an eigenvector x k corresponding to the m-largest eigenvalue of T (λ k ) 3: solve x k T (λ k+1)x k = 0 for λ k+1 4: end for Under the conditions of Section 115.3, the safeguarded iteration has the following properties [NV10]: (i) If ˆλ 1 := inf x D(p) p(x) J and x 0 D, then the safeguarded iteration with m = 1 converges globally to ˆλ 1. (ii) If T ( ) is continuously differentiable and ˆλ m is a simple eigenvalue, then the safeguarded iteration converges locally and quadratically to ˆλ m. (iii) Let T ( ) be twice continuously differentiable and T (ˆλ m ) be positive definite. If x k in step 3 is chosen to be an eigenvector corresponding to the m largest eigenvalue of the generalized eigenvalue problem T (λ k )x = µt (λ k )x, then the convergence is even cubic. 9. [SX11] For higher dimensions n it is too costly to solve the occurring linear systems exactly. Szyld and Xue [SX11] studied inexact versions of inverse iteration and residual inverse iteration and proved that the same order of convergence can be achieved as for the exact methods if the respective linear systems are solved sufficiently accurately Iterative projection methods For sparse linear eigenvalue problems Ax = λx iterative projection methods like the Lanczos, Arnoldi, rational Krylov or Jacobi Davidson method are very efficient. Here the dimension of the eigenproblem is reduced by projecting it to a subspace of much smaller dimension, and the reduced problem is solved by a fast technique for dense problems. The subspaces are expanded in the course of the algorithm in an iterative way with the aim that some of the eigenvalues of the reduced matrix become good approximations to some of the wanted eigenvalues of the given large matrix. Two types of iterative projection methods are in use: methods which expand the subspaces independently of the eigenpair of the projected problem and which take advantage of a normal form of A like the Arnoldi, Lanczos, and rational Krylov method, and methods which aim at a particular eigenpair and choose the expansion such that it has a high approximation potential for a wanted eigenvector like the Jacobi Davidson method. For general nonlinear eigenproblems a normal form does not exist. Therefore, generalizations of iterative projection methods to general nonlinear eigenproblems always have to be of the second type. There are essentially two types of these methods, the Jacobi Davidson method (and its two sided version) which is based on inverse iteration and the nonlinear Arnoldi method which is based on residual inverse iteration. Jacobi Davidson method Assume that we are given a search space V and a matrix V with orthonormal columns

14 Handbook of Linear Algebra containing a basis of V. Let (y, θ) be an eigenpair of the projected problem V T (λ)v y = 0 and x = V y be the corresponding Ritz vector. A direction with high approximation potential is given by inverse iteration v = T (θ) 1 T (θ)x, however replacing v with an inexact solution of the linear system T (θ)v = T (θ)x will spoil the favorable approximation properties of inverse iteration. Actually, we are not interested in the direction v but in an expansion of V which contains v, and for every α 0 the vector t = x + αv is as qualified as v. It was shown in [Vos07] that the most robust expansion of this type is obtained if x and t := x + αv are orthogonal, and it is easily seen that this t solves the so called correction equation (I T (θ)xx x T (θ)x ) T (θ) (I xx x x ) t = T (θ)x, t x. The resulting iterative projection method is called Jacobi Davidson method, a template of which is given in Algorithm 6. Algorithm 6: Nonlinear Jacobi Davidson method Require: Initial basis V, V V = I; m = 1 1: determine preconditioner K T (σ) 1, σ close to first wanted eigenvalue 2: while m number of wanted eigenvalues do 3: compute an approximation θ to the mth wanted eigenvalue and corresponding eigenvector y of the projected problem T V (θ)y := V T (θ)v y = 0 4: determine the Ritz vector u = V y and the residual r = T (θ)u 5: if r / u < ɛ then 6: accept approximate eigenpair (λ m, x m ) := (θ, u); increase m m + 1; 7: reduce search space V if indicated 8: determine new preconditioner K T (λ m ) 1 if necessary 9: choose approximation (θ, u) to next eigenpair 10: compute residual r = T (θ)u; 11: end if 12: Find approximate solution of correction equation (I T (θ)uu uu u T )T (θ)(i (θ)u u )t = r, t u (115.6) u (by preconditioned Krylov solver, e.g.) 13: orthogonalize t = t V V t, v = t/ t, and expand subspace V = [V, v] 14: update projected problem 15: end while Facts: 1. The Jacobi Davidson method was introduced for polynomial eigenproblem in [SBF96] and studied for general nonlinear eigenvalue problems in [BV04, Vos07a]. 2. As in the linear case the correction equation (115.6) does not have to be solved exactly to maintain fast convergence, but usually a few steps of a Krylov subspace solver with an appropriate preconditioner suffice to obtain a good expansion direction of the search space. 3. In the correction equation (115.6) the operator T (θ) is restricted to map the subspace x into itself. Hence, if K 1 T (θ) is a preconditioner of T (θ), then a preconditioner

15 Nonlinear Eigenvalue Problem for an iterative solver of (115.6) should be modified correspondingly to K := (I T (θ)uu u T (θ)u )K 1 (I uu u u ). With left-preconditioning equation (115.6) becomes K 1 T (θ)t = K 1 r, t u where T (θ) := (I T (θ)uu u T (θ)u )T (θ)(i uu u u ). Taking into account the projectors in the preconditioner, i.e. using K instead of K in a preconditioned Krylov solver, raises the cost only slightly. In every step one has to solve one linear system Kw = y, and to initialize the solver requires only one additional solve. 4. In step 1 of Algorithm 6 any preinformation such as a small number of known approximate eigenvectors of problem (115.1) corresponding to eigenvalues close to σ or of eigenvectors of a contiguous problem can and should be used. If no information on eigenvectors is at hand, and if one is interested in eigenvalues close to the parameter σ D, one can choose an initial vector at random, execute a few Arnoldi steps for the linear eigenproblem T (σ)u = θu or T (σ)u = θt (σ)u, and choose the eigenvector corresponding to the smallest eigenvalue in modulus or a small number of Schur vectors as initial basis of the search space. Starting with a random vector without this preprocessing usually will yield a value λ m in step 4 which is far away from σ and will avert convergence. 5. As the subspaces expand in the course of the algorithm the increasing storage or the computational cost for solving the projected eigenvalue problems may make it necessary to restart the algorithm and purge some of the basis vectors. Since a restart destroys information on the eigenvectors and particularly on the one the method is just aiming at, the method is restarted only if an eigenvector has just converged. An obvious way to restart is to determine a Ritz pair (µ, u) from the projection to the current search space span(v ) approximating an eigenpair wanted next, and to restart the Jacobi Davidson method with this single vector u. However, this may discard too much valuable information contained in span(v ), and may slowdown the speed of convergence too much. Therefore, thick restarts with subspaces spanned by the Ritz vector u and a small number of eigenvector approximations obtained in previous steps which correspond to eigenvalues closest to µ are preferable. 6. A crucial point in iterative methods for general nonlinear eigenvalue problems when approximating more than one eigenvalue is to inhibit the method from converging to the same eigenvalue repeatedly. For linear eigenvalue problems locking of already converged eigenvectors can be achieved using an incomplete Schur factorization. For nonlinear problems allowing for a variational characterization of its eigenvalues one can determine the eigenpairs one after another solving the projected problem by safeguarded iteration [BV04]. For general nonlinear eigenproblems a locking procedure based on invariant pairs was introduced in [Eff12] (cf. Subsection 7). 7. Often the matrix function T ( ) is given in the following form T (λ) := m j=1 f j(λ)a j where f j : Ω C are continuous functions and A j C n n are fixed matrices. Then the projected problem can be updated easily appending one row and column to each of the projected matrices V A j V. Two-sided Jacobi Davidson method In Algorithm 6 approximations to an eigenvalue are obtained in step 3 from a Galerkin projection of T (λ)x = 0 to the search space Span (V ) for right eigenvectors. Computing a

16 Handbook of Linear Algebra left search space also with a correction equation for left eigenvectors and applying a Petrov- Galerkin projection one arrives at the Two-sided Jacobi-Davidson method in Algorithm 7 (where only the computation of one eigentriplet is considered): Algorithm 7: Two-sided Jacobi Davidson method Require: Initial bases U with U U = I and V with V V = I 1: while not converged do 2: solve V T (θ)uc = 0 and U T (θ) V d = 0 for (θ, c, d) 3: determine Ritz vectors u = Uc and v = V d and residuals r u = T (θ)u, r v = T (θ) v 4: if min( r u / u, r v / v ) < ɛ then 5: accept approximate eigentriplet (v, θ, u); STOP 6: end if 7: Solve (approximately) correction equations (I T (θ)uv uu v T )T (θ)(i (θ)u u u )s = r u, s u, (I T (θ) vu u T (θ) v )T (θ) (I vv v v )t = r v, t v 8: orthogonalize s = s UU s, s = s/ s, and expand left search space U = [U, s] 9: orthogonalize t = t V V t, t = t/ t, and expand right search space V = [V, t] 10: end while Facts: 8. [Sch08] θ as computed in step 2 is the value of the two-sided Rayleigh functional at (u, v), and one therefore may expect local cubic convergence for simple eigenvalues. 9. [HS03] The correction equation in step 7 of Algorithm 7 can be replaced with (I T (θ)uv v T (θ)u )T (θ)(i T (θ)uv v T (θ)u )s = r u, s u, (I T (θ) vu u T (θ) v )T (θ) (I T (θ) vu u T (θ) v )t = r v, t v. This variant was suggested in [HS03] for linear eigenvalue problems, and its generalization to the nonlinear problem is obvious. Since again θ is the value of the two-sided Rayleigh functional the convergence should also be cubic. 10. [SS06] Replacing the correction equations with (I vv )T (θ)(i uu )s = r u, s u, (I uu )T (θ) (I vv )t = r v, t v one obtains the primal-dual Jacobi-Davidson method which was shown to be quadratically convergent. Nonlinear Arnoldi method Expanding the current search space V by the direction ˆv = x T 1 (σ)t (θ)x as suggested by residual inverse iteration generates similar robustness problems as for inverse iteration. If ˆv is close to the desired eigenvector, then an inexact evaluation of ˆv spoils the favorable approximation properties of residual inverse iteration.

17 Nonlinear Eigenvalue Problem Similarly as in the Jacobi Davidson method one could replace ˆv by z := x + αˆv where α is chosen that x z = 0, and one could determine an approximation to z solving a correction equation. However, since the new search direction is orthonormalized against the previous search space V and since x is contained in V we may choose the new direction v = T (σ) 1 T (θ)x as well. This direction satisfies the orthogonality condition x v = 0 at least in the limit as θ approaches a simple eigenvalue ˆλ (cf. [Vos07]), i.e. lim θ ˆλ x T (σ) 1 T (θ)x = 0. A template for the preconditioned nonlinear Arnoldi method with restarts and varying preconditioner is just like Algorithm 6. Only step 12 has to be replaced with t = Kr. Facts: 11. The general remarks about the initial approximation to the eigenvector, restarts and locking following the Jacobi Davidson method apply to the nonlinear Arnoldi method also. 12. Since the residual inverse iteration with fixed pole σ converges (at least) linearly, and the contraction rate satisfies O( σ λ m ), it is reasonable to update the preconditioner if the convergence (measured by the quotient of the last two residual norms before convergence) has become too slow. 13. The nonlinear Arnoldi method was introduced for quadratic eigenvalue problems in [Mee01] and for general nonlinear eigenvalue problems in [Vos04]. 14. [LBL10] studies a variant that avoids complex arithmetic augmenting the search space by two vectors, the real and imaginary part of the expansion t = Kr Methods using invariant pairs One of the most important problems when determining more than one eigenpair of a nonlinear eigenvalue problem is to prevent the method from determining the same pair repeatedly. Jordan chains are conceptually elegant but unstable under perturbations. More robust concepts for computing several eigenvalues along with the corresponding (generalized) eigenvectors were introduced only recently and are based on invariant pairs [Kre09, BT09]. It is convenient to consider the nonlinear eigenvalue problem in the following form: T (λ)x := m j=1 f j (λ)a j x = 0 (115.7) where f j : Ω C are analytic functions and A j C n n are fixed matrices. Definitions: Let the eigenvalues of S C k k be contained in Ω and let X C n k. Then (X, S) is called invariant pair of the nonlinear eigenvalue problem (115.7) if m A j Xf j (S) = 0. j=1

18 Handbook of Linear Algebra A pair (X, S) C n k C k k is minimal if there is l N such that the matrix X XS V l (X, S) :=. XS l 1 has rank k. The smallest such l is called the minimality index of (X, S). An invariant pair (X, S) is called simple if (X, S) is minimal and the algebraic multiplicities of the eigenvalues of S are identical to the ones of the corresponding eigenvalues of T ( ). Facts: The following facts for which no specific reference is given can be found in [Kre09]. 1. Let (X, S) be a minimal invariant pair of (115.7). Then the eigenvalues of S are eigenvalues of T ( ). 2. By the Cayley Hamilton theorem the minimality index of a minimal pair can not exceed k. 3. [BK11] For a regular matrix polynomial of degree m the minimality index of a minimal invariant pair can not exceed m. 4. [Eff12] Let p 0,..., p l 1 be a basis for the polynomials of degree less than l. Then the pair (X, S) is minimal with minimality index at most l if and only if V p l (X, S) = Xp 0 (S). Xp l 1 (S) has full column rank. 5. [BK11] If V l (X, S) has rank k < k, then there is a minimal pair ( X, S) C n k C k k such that Span( X) = Span(X) and Span(V l ( X, S)) = Span(V l (X, S)). 6. If (X, S) is a minimal invariant pair, then (XZ, Z 1 SZ) is also a minimal invariant pair for every invertible matrix Z C k k. 7. Let (X, S) be a minimal invariant pair, and let p j Π k be the Hermite interpolating polynomials of f j at the spectrum of S of maximum degree k. Then (X, S) is a minimal invariant pair of P (λ)x := m j=1 p j(λ)a j x = Let (λ j, x j ), j = 1,..., k be eigenpairs of T ( ) with λ i λ j for i j. Then the invariant pair (X, S) := ([x 1,..., x k ], diag(λ 1,..., λ k )) is minimal. 9. Consider the nonlinear matrix operator T : { C n k C k k Ω C n k (X, S) m j=1 A jxf j (S) (115.8) where C k k Ω denotes the set of k k matrices with eigenvalues in Ω. Then an invariant pair (X, S) satisfies T(X, S) = 0, but this relation is not sufficient to characterize (X, S). To define a scaling condition, choose l such that the matrix V l (X, S) has rank k, and define the partition W 0 W 1 W =. := V l(x, S) (V l (X, S) V l (X, S)) 1 C nk k W l 1

19 Nonlinear Eigenvalue Problem with W j C n k. Then V(X, S) = 0 for the operator V : C n k C k k Ω C n k, V(X, S) := W V l (X, S) I k. If (X, S) is a minimal invariant pair for the nonlinear eigenvalue problem T ( )x = 0, then (X, S) is simple if and only if the linear matrix operator L : C n k C k k C n k C k k, ( X, S) (DT( X, S), DV( X, S)) is invertible, where DT and DV denotes the Fréchet derivative of T and V, respectively. 10. [Kre09] The last Fact motivates to apply Newton s method to the system T(X, S) = 0, V(X, S) = 0 which can be written as (X p+1, S p+1 ) = (X p, S p ) L 1 (T(X p, S p ), V(X p, S p )) where L = (DT, DV) is the Jacobian matrix of T(X, S) = 0, V(X, S) = 0. DT( X, S) = T( X, S) + DV( X, S) = W 0 X + m A j X[Df j (S)]( S), j=1 m Wj ( XS j + X[DS j ]( S)). j=1 Algorithm 8: Newton s method for computing invariant pairs Require: Initial pair (X 0, S 0 ) C n k C k k such that V l (X 0, S 0 ) V l (X 0, S 0 ) = I k 1: p 0, W V l (X 0, S 0 ) 2: repeat 3: Res T(X p, S p ) 4: Solve linear matrix equation L( X, S) = (Res, O) 5: Xp+1 X p X, S p+1 S p S 6: Compute compact QR decomposition V l ( X p+1, S p+1 ) = W R 7: X p+1 X p+1 R 1, S p+1 R S p+1 R 1 8: until convergence 11. [Bey12, BEK11] T(X, S) = T (z)x(zi S) 1 dz Γ where Γ is a contour (i.e. a simply closed curve) in Ω containing the spectrum of S in its interior. 12. [BEK11] DT(X, S)( X, S) = 1 2πi ( X + X(zI S) 1 S)(zI S) 1 dz. Γ 13. [Eff12] Let (X, S) be a minimal (index l) invariant pair of T ( ). If ([ V Y ], M) is a minimal invariant pair of the augmented analytic matrix function Ω C (n+k) (n+k), [ [ ] ˆT : y T([X, y], [ ˆT (µ) = v] S 0 µ v ]) [V p l+1 (X, S)] V p S v e l+1 ([X, y], [ 0 µ ]) k+1

20 Handbook of Linear Algebra with T as in Fact 11, V p l+1 analogous to Fact 4, and e k+1 = (0,..., 0, 1) T R k+1, then ([X, Y ], [ S 0 M V ]) is a minimal invariant pair of T ( ). Conversely, for any minimal invariant pair ([X, Y ], [ S 0 M V ]) of T ( ) there exists a unique F such that ([ Y XF V (SF F M) ], M) is a minimal invariant pair of ˆT ( ). 14. The previous fact suggests that working with ˆT ( ) deflates the minimal invariant pair (X, S) from T ( ). 15. [Eff12] Effenberger combined the deflation in Fact 13 with the Jacobi Davidson method to determine several eigenpairs of a nonlinear eigenvalue problem one after another in a safe way. 16. [GKS93] The pair (X, S) is minimal if and only if rank [ ] λi S X has full rank for every λ C (or, equivalently, for every eigenvalue λ of S). 17. [GKS93] Let ˆλ be an eigenvalue of T ( ) and X := [x 0,..., x k 1 ] C k k with x 0 0. Then x 0,..., x k 1 is a Jordan chain at ˆλ if and only if (X, J k (ˆλ)) is an invariant pair of T ( ), where J k (ˆλ) denotes a k k Jordan block corresponding to ˆλ. 18. [BEK11] Let ˆλ be an eigenvalue of T ( ) and consider a matrix X = [X (1),..., X (p) ], X (i) = [x (i) 0,..., x(i) m i ], with x (i) 0 0. Then every x (i) 0,..., x(i) m i for i = 1,..., p is a Jordan chain if and only if (X, J) with J := diag(j m1 (ˆλ),..., J mp (ˆλ)) is an invariant pair of T ( ). Moreover, (X, J) is minimal if and only if x (1) 0,..., x(p) 0 are linearly independent. 19. [SX12] Suppose that (X, S) is a simple invariant pair of (115.7), ˆλ an eigenvalue of S, and J = Z 1 SZ is the Jordan canonical form of S. Assume that J has m Jordan blocks corresponding to ˆλ, each of size k i k i, 1 i m. Then there are exactly m Jordan chains of T ( ) corresponding to ˆλ, the length of each is k i, and the geometric multiplicity of ˆλ is m. This fact demonstrates that the spectral structure of an eigenvalue ˆλ of a matrix function T ( ), including the algebraic, partial and geometric multiplicities together with all Jordan chains, is completely represented in a simple invariant pair (X, S) for which ˆλ is an eigenvalue of S. Examples: 1. For the quadratic eigenvalue problem (115.2) with eigenvalue ˆλ = 1 and eigenvector x = [1; 1] the pair (X, S) := (x, ˆλ) is a minimal invariant pair with minimality index 1, which is not simple, because the algebraic multiplicity of ˆλ is 2 as an eigenvalue of T (λ)x = 0 and only 1 as an eigenvalue of S. [ ] [ ] The Jordan pair (X1, S1) with X1 = and S1 = is a minimal invariant pair with minimality [ ] index 2, [ which ] is simple, and the same is true for the pairs (X2, S2) with X2 = and S1 =, and (X3, S3) with X3 := [X1, X2] and S3 := diag(x1, X2) The infinite Arnoldi method Let T : Ω C n n be analytic on a neighborhood Ω of the origin, and assume that λ = 0 is not an eigenvalue of T ( ). To determine eigenvalues close to 0 [JMM12] use the equivalence

An Arnoldi Method for Nonlinear Symmetric Eigenvalue Problems

An Arnoldi Method for Nonlinear Symmetric Eigenvalue Problems H. Voss 1 Introduction In this paper we consider the nonlinear eigenvalue problem T (λ)x = 0 (1) where T (λ) R n n is a family of symmetric