Chapter 3 Newton methods 3. Linear and nonlinear eigenvalue problems When solving linear eigenvalue problems we want to find values λ C such that λi A is singular. Here A F n n is a given real or complex matrix. Equivalently, we want to find values λ C such that there is a nontrivial (nonzero) x that satisfies (3.) (A λi)x = Ax = λx. In the linear eigenvalue problem (3.) the eigenvalue λ appears linearly. However, as the unknown λ is multiplied with the unknown vector x, the problem is in fact nonlinear. We have n+ unknownsλ,x,...,x n that arenot uniquelydefinedbythe nequations in(3.). We have noticed earlier, that the length of the eigenvector x is not determined. Therefore, we add to (3.) a further equation that fixes this. The straightforward condition is (3.2) x 2 = x x =, that determines x up to a complex scalar of modulus, in the real case ±. Another condition to normalize x is by requesting that (3.3) c T x =, for some c. Eq. (3.3) is linear in x and thus simpler. However, the combined equations (3.) (3.3) are nonlinear anyway. Furthermore, c must be chosen such that it has a strong component in the (unknown) direction of the searched eigenvector. This requires some knowledge about the solution. In nonlinear eigenvalue problems we have to find values λ C such that (3.4) A(λ)x = where A(λ) is a matrix the elements of which depend on λ in a nonlinear way. An example is a matrix polynomial, (3.5) A(λ) = d λ k A k, A k F n n. k= The linear eigenvalue problem (3.) is a special case with d =, A(λ) = A λa, A = A, A = I. 53
54 CHAPTER 3. NEWTON METHODS Quadratic eigenvalue problems are matrix polynomials where λ appears squared [6], (3.6) Ax+λKx+λ 2 Mx =. Matrix polynomials can be linearized, i.e., they can be transformed in a linear eigenvalue of size dn where d is the degree of the polynomial and n is the size of the matrix. Thus, the quadratic eigenvalue problem (3.6) can be transformed in a linear eigenvalue problem of size 2n. Setting y = λx we get or ( A O O I ( A K O I )( x y ) ( K M = λ I O )( ) x = λ y ( O M I O )( ) x y )( ) x. y Notice that other linearizations are possible [2, 6]. Notice also the relation with the transformation of high order to first order ODE s [5, p. 478]. Instead of looking at the nonlinear system (3.) (complemented with (3.3) or (3.2)) we may look at the nonlinear scalar equation (3.7) f(λ) := det A(λ) = andapplysomezerofinder. Herethequestionariseshowtocomputef(λ)andinparticular f (λ) = d dλ det A(λ). This is waht we discuss next. 3.2 Zeros of the determinant We first consider the computation of eigenvalues and subsequently eigenvectors by means of computing zeros of the determinant det(a(λ)). Gaussian elimination with partial pivoting (GEPP) applied to A(λ) provides the decomposition (3.8) P(λ)A(λ) = L(λ)U(λ), where P(λ) is the permutation matrix due to partial pivoting, L(λ) is a lower unit triangular matrix, and U(λ) is an upper triangular matrix. From well-known properties of the determinant function, equation (3.8) gives detp(λ) deta(λ) = detl(λ) detu(λ). Taking the particular structures of the factors in (3.8) into account, we get (3.9) f(λ) = deta(λ) = ± The derivative of deta(λ) is (3.) f (λ) = ± = ± n n u ii (λ) u jj (λ) i= n i= u ii (λ) u ii (λ) j i n u ii (λ). i= n u jj (λ) = j= n i= u ii (λ) u ii (λ) f(λ). How can we compute the derivatives u ii of the diagonal elements of U(λ)?
3.2. ZEROS OF THE DETERMINANT 55 3.2. Algorithmic differentiation A clever way to compute derivatives of a function is by algorithmic differentiation, see e.g., []. Here we assume that we have an algorithm available that computes the value f(λ) of a function f, given the input argument λ. By algorithmic differentiation a new algorithm is obtained that computes besides f(λ) the derivative f (λ). The idea is easily explained by means of the Horner scheme to evaluate polynomials. Let n f(z) = c i z i. be a polynomial of degree n. f(z) can be written in the form which gives rise to the recurrence i= f(z) = c +z(c +z(c 2 + +z(c n ) )) p n := c n, p i := zp i+ +c i, f(z) := p. i = n,n 2,...,, Note that each of the p i can be considered as a function (polynomial) in z. We use the above recurrence to determine the derivatives dp i, dp n :=, p n := c n, dp i := p i+ +zdp i+, p i := zp i+ +c i, i = n,n 2,...,, f (z) := dp, f(z) := p. Here, to each equation in the original algorithm its derivation is prepended. We can proceed in a similar fashion for computing deta(λ). We need to be able to compute the derivatives a ij, though. Then, we can derive each single assignment in the GEPP algorithm. IfwerestrictourselvestothestandardeigenvalueproblemAx = λxthena(λ) = A λi and a ij = δ ij, the Kronecker δ. 3.2.2 Hyman s algorithm In a Newton iteration we have to compute the determinant for possibly many values λ. Using the factorization (3.8) leads to computational costs of 2 3 n3 flops (floating point operations) for each factorization, i.e., per iteration step. If this algorithm was used to compute all eigenvalues then an excessive amount of flops would be required. Can we do better? The strategy is to transform A by a similarity transformation to a Hessenberg matrix, i.e., a matrix H whose entries below the lower off-diagonal are zero, h ij =, i > j +. Any matrix A can be transformed into a similar Hessenberg matrix H by means of a sequence of elementary unitary matrices called Householder transformations. The details are given in Section 4.3.
56 CHAPTER 3. NEWTON METHODS LetS AS = H, wheres is unitary. S is theproductofthejustmentioned Householder transformations. Then Ax = λx Hy = λy, x = Sy. So, A and H have equal eigenvalues (A and H are similar) and the eigenvectors are transformed by S. We now assume that H is unreduced, i.e., h i+,i for all i. Otherwise we can split Hx = λx in smaller problems. Let λ be an eigenvalue of H and (3.) (H λi)x =, i.e., x is an eigenvector of H associated with the eigenvalue λ. Then the last component of x cannot be zero, x n. The proof is by contradiction. Let x n =. Then (for n = 4) h λ h 2 h 3 h 4 h 2 h 22 λ h 23 h 24 h 32 h 33 λ h 34 h 43 h 44 λ x x 2 x 3 =. The last equation reads h n,n x n +(h nn λ) = from which x n = follows since we assumed the H is unreduced, i.e., that h n,n. In the exact same procedure we obtain x n 2 =,...,x =. But the zero vector cannot be an eigenvector. Therefore, x n must not be zero. Without loss of generality we can set x n =. We continue to expose the procedure with the problem size n = 4. If λ is an eigenvalue then there are x i, i < n, such that (3.2) h λ h 2 h 3 h 4 h 2 h 22 λ h 23 h 24 h 32 h 33 λ h 34 h 43 h 44 λ If λ is not an eigenvalue then we determine the x i such that (3.3) h λ h 2 h 3 h 4 h 2 h 22 λ h 23 h 24 h 32 h 33 λ h 34 h 43 h 44 λ We determine the n numbers x n,x n 2,...,x by x x 2 x 3 x x 2 x 3 =. = p(λ). x i = ( ) (hi+,i+ λ)x i+ +h i+,i+2 x i+2 + +h i+,n x n, i = n,...,. h i+,i }{{} The x i are functions of λ, in fact, x i P n i. The first equation in (3.3) gives (3.4) (h, λ)x +h,2 x 2 + +h,n x n p(λ).
3.2. ZEROS OF THE DETERMINANT 57 Eq. (3.3) is obtained from looking at the last column of h λ h 2 h 3 h 4 x h 2 h 22 λ h 23 h 24 x 2 h 32 h 33 λ h 34 x 3 h 43 h 44 λ Taking determinants yields = h λ h 2 h 3 p(λ) h 2 h 22 λ h 23 h 32 h 33 λ. h 43 ( n det(h λi) = ( ) n i= h i+,i )p(λ) = c p(λ). So, p(λ) is a constant multiple of the determinant of H λi. Therefore, we can solve p(λ) = instead of det(h λi) =. Since the quantities x,x 2,...,x n and thus p(λ) are differentiable functions of λ, we can get p (λ) by algorithmic differentiation. For i = n,..., we have Finally, x i = ( xi+ +(h i+,i+ λ)x i+ +h i+,i+2 x i+2 + +h i+,n x ) n. h i+,i c f (λ) = x +(h,n λ)x +h,2x 2 + +h,n x n. Algorithm 3.2.2 implements Hyman s algorithm that returns p(λ) and p (λ) given an input parameter λ [7]. Algorithm 3. Hyman s algorithm : Choose a value λ. 2: x n := ; dx n := ; 3: for i = n downto do 4: s = (λ h i+,i+ )x i+ ; ds = x i+ +(λ h i+,i+ )dx i+ ; 5: for j = i+2 to n do 6: s = s h i+,j x j ; ds = ds h i+,j dx j ; 7: end for 8: x i = s/h i+,i ; dx i = ds/h i+,i ; 9: end for : s = (λ h, )x ; ds = x (λ h, )dx ; : for i = 2 to n do 2: s = s+h,i x i ; ds = ds+h,i dx i ; 3: end for 4: p(λ) := s; p (λ) := ds; This algorithm computes p(λ) = c det(h(λ)) and its derivative p (λ) of a Hessenberg matrix H in O(n 2 ) operations. Inside a Newton iteration the new iterate is obtained by λ k+ = λ k p(λ k) p (λ k ), k =,,...
58 CHAPTER 3. NEWTON METHODS The factor c cancels. An initial guess λ has to be chosen to start the iteration. It is clear that a good guess reduces the iteration count of the Newton method. The iteration is considered converged if f(λ k ). The vector x = (x,x 2,...,x n,) T is a good approximation of the corresponding eigenvector. Remark 3.. Higher order deriatives of f can be computed in an analogous fashion. Higher order zero finders (e.g. Laguerre s zero finder) are then applicable [3]. 3.2.3 Computing multiple zeros If we have found a zero z of f(x) = and want to compute another one, we want to avoid recomputing the already found z. We can explicitly deflate the zero by defining a new function (3.5) f (x) := f(x) x z, and apply our method of choice to f. This procedure can in particular be done with polynomials. The coefficients of f are however very sensitive to inaccuracies in z. We can proceed similarly for multiple zeros z,...,z m. Explicit deflation is not recommended and often not feasible since f is not given explicitely. For the reciprocal Newton correction for f in (3.5) we get f (x) f (x) = f (x) x z f(x) (x z) 2 f(x) x z = f (x) f(x) x z. Then a Newton correction becomes (3.6) x (k+) = x k f (x k ) f(x k ) x k z and similarly for multiple zeros z,...,z m. Working with (3.6) is called implicit deflation. Here, f is not modified. In this way errors in z are not propagated to f 3.3 Newton methods for the constrained matrix problem We consider the nonlinear eigenvalue problem (3.4) equipped with the normalization condition (3.3), (3.7) T(λ)x =, c T x =, where c is some given vector. At a solution (x,λ), x, T(λ) is singular. Note that x is defined only up to a (nonzero) multiplicative factor. c T x = is just one way to normalize x. Another one would be x 2 =, cf. the next section. Solving (3.7) is equivalent with finding a zero of the nonlinear function f(x,λ), (3.8) f(x,λ) = ( ) T(λ)x c T = x ( ).
3.3. NEWTON METHODS FOR THE CONSTRAINED MATRIX PROBLEM 59 To apply Newton s zero finding method to (3.8) we need the Jacobian J of f, (3.9) J(x,λ) f(x,λ) (x,λ) = ( T(λ) T ) (λ)x c T. Here, T (λ) denotes the (elementwise) derivative of T with respect to λ. Then, a step of Newton s iteration is given by ( ) ( ) xk+ xk (3.2) = J(x k,λ k ) f(x k,λ k ), λ k+ λ k or, with the abbreviations T k := T(λ k ) and T k := T (λ k ), an equation for the Newton corrections is obtained, ( Tk T (3.2) k x )( ) ( ) k xk+ x k Tk x c T = k λ k+ λ k c T. x k If x k is normalized, c T x k =, then the second equation in (3.2) yields (3.22) c T (x k+ x k ) = c T x k+ =. The first equation in (3.2) gives T k (x k+ x k )+(λ k+ λ k )T k x k = T k x k T k x k+ = (λ k+ λ k )T k x k. We introduce the auxiliary vector u k+ by (3.23) T k u k+ = T k x k. Note that (3.24) x k+ = (λ k+ λ k )u k+. So, u k+ pointsin theright direction; it justneedstobenormalized. Premultiplying (3.24) by c T and using (3.22) gives or = c T x k+ = (λ k+ λ k )c T u k+, (3.25) λ k+ = λ k c T u k+. In summary, we get the following procedure. If the linear eigenvalue problem is solved by Algorithm 3.3 then T (λ)x = x. In each iteration step a linear system has to be solved which requires the factorization of a matrix. We now change the way we normalize x. Problem (3.7) becomes (3.26) T(λ)x =, x 2 =, with the corresponding nonlinear system of equations ( ) T(λ)x (3.27) f(x,λ) = = 2 (xt x ) ( ).
6 CHAPTER 3. NEWTON METHODS Algorithm 3.2 Newton iteration for solving (3.8) : Choose a starting vector x R n with c T x =. Set k :=. 2: repeat 3: Solve T(λ k )u k+ := T (λ k )x k for u k+ ; (3.23) 4: µ k := c T u k+ ; 5: x k+ := u k+ /µ k ; (Normalize u k+ ) 6: λ k+ := λ k /µ k ; (3.25) 7: k := k +; 8: until some convergence criterion is satisfied The Jacobian now is (3.28) J(x,λ) f(x,λ) (x,λ) = The Newton step (3.2) is changed into (3.29) ( Tk T k x k x T k ( T(λ) T ) (λ)x x T. )( ) ( xk+ x k Tk x = k λ k+ λ k 2 ( xt k x k) ). If x k is normalized, x k =, then the second equation in (3.29) gives (3.3) x T k (x k+ x k ) = x T k x k+ =. The correction x k := x k+ x k is orthogonal to the actual approximation. The first equation in (3.29) is identical with the first equation in (3.2). Again, we employ the auxiliary vector u k+ defined in (3.23). Premultiplying (3.24) by x T k and using (3.3) gives = x T k x k+ = (λ k+ λ k )x T k u k+, or (3.3) λ k+ = λ k x T k u. k+ The next iterate x k+ is obtained by normalizing u k+, (3.32) x k+ = u k+ / u k+. Algorithm 3.3 Newton iteration for solving (3.27) : Choose a starting vector x R n with x () =. Set k :=. 2: repeat 3: Solve T(λ k )u k+ := T (λ k )x k for u k+ ; (3.23) 4: µ k := x T k u k+; 5: λ k+ := λ k /µ k ; (3.3) 6: x k+ := u k+ / u k+ ; (Normalize u k+ ) 7: k := k +; 8: until some convergence criterion is satisfied
3.4. SUCCESSIVE LINEAR APPROXIMATIONS 6 3.4 Successive linear approximations Ruhe [4] suggested the following method which is not derived as a Newton method. It is based on an expansion of T(λ) at some approximate eigenvalue λ k. (3.33) T(λ)x (T(λ k ) ϑt (λ k ))x =, λ = λ k ϑ. Equation (3.33) is a generalized eigenvalue problem with eigenvalue ϑ. If λ k is a good approximation of an eigenvalue, then it is straightforward to compute the smallest eigenvalue ϑ of (3.34) T(λ k )x = ϑt (λ k )x and update λ k by λ k+ = λ k ϑ. Algorithm 3.4 Algorithm of successive linear problems : Start with approximation λ of an eigenvalue of T(λ). 2: for k =,2,... do 3: Solve the linear eigenvalue problem T(λ k )u = ϑt (λ k )u. 4: Choose an eigenvalue ϑ smallest in modulus. 5: λ k+ := λ k ϑ; 6: if ϑ is small enough then 7: Return with eigenvalue λ k+ and associated eigenvector u/ u. 8: end if 9: end for Remark 3.2. If T is twice continuously differentiable, and λ is an eigenvalue of problem () such that T (λ) is singular and is an algebraically simple eigenvalue of T (λ) T(λ), then the method in Algorithm 3.4 converges quadratically towards λ. Bibliography [] P. Arbenz and W. Gander, Solving nonlinear eigenvalue problems by algorithmic differentiation, Computing, 36 (986), pp. 25 25. [2] D. S. Mackey, N. Mackey, C. Mehl, and V. Mehrmann, Structured polynomial eigenvalue problems: Good vibrations from good linearizations, SIAM J. Matrix Anal. Appl., 28 (26), pp. 29 5. [3] B. Parlett, Laguerre s method applied to the matrix eigenvalue problem, Math. Comput., 8 (964), pp. 464 485. [4] A. Ruhe, Algorithms for the nonlinear eigenvalue problem, SIAM J. Numer. Anal., (973), pp. 674 689. [5] G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge Press, Wellesley, 986. [6] F. Tisseur and K. Meerbergen, The quadratic eigenvalue problem, SIAM Rev., 43 (2), pp. 235 286. [7] R. C. Ward, The QR algorithm and Hyman s method on vector computers, Math. Comput., 3 (976), pp. 32 42.
62 CHAPTER 3. NEWTON METHODS