Functions of a Matrix and Krylov Matrices

Size: px

Start display at page:

Download "Functions of a Matrix and Krylov Matrices"

Natalie Imogen Holmes
6 years ago
Views:

1 Functions of a Matrix and Krylov Matrices Hongguo Xu Abstract For a given nonderogatory matrix A, formulas are given for functions of A in terms of Krylov matrices of A. Relations between the coefficients of a polynomial of A and the generating vector of a Krylov matrix of A are provided. With the formulas, linear transformations between Krylov matrices and functions of A are introduced, and associated algebraic properties are derived. Hessenberg reduction forms are revisited equipped with appropriate inner products and related properties and matrix factorizations are given. Keywords Keywords Krylov matrix, Krylov subspace, function of a matrix, polynomial of a matrix, Hessenberg matrix, companion matrix AMS subject classification. 15A21, 65F15. 1 Introduction Let A C n,n. The Krylov matrix of A generated by a vector b C n is given by [ b Ab... A n 1 b ] C n,n. Given a scalar function f(t) that is well defined on the spectrum of A, one defines a matrix f(a) C n,n, which is usually called a function of A, e.g., [10, 11]. Both functions of a matrix and Krylov matrices play a fundamental role in matrix computations. They are key tools in understanding and developing numerical methods for solving eigenvalue problems and systems of linear equations, including the QR algorithm and Krylov subspace methods, e.g., [8, 9, 22]. Functions of a matrix arise from a variety of applications. The development of numerical algorithms is still a challenging topic. See the recently published book [10] for details. In this paper we investigate the Krylov matrices and functions of a matrix. We focus on the situation where the associated matrix A is nonderogatory, i.e., the geometric multiplicity of every eigenvalue of A is one. We provide formulas to express a function of A in terms of Krylov matrices and vise versa, based on a simple observation. We use the formulas to study the relations and properties of these two objects. Krylov matrices and functions of a matrix have been studied extensively in the past several decades. Still, it seems that their behaviors have not been fully understood. The goal of this study is to use a new angle to interpret the existing properties and provide new insight that may be potentially useful for the development of numerical methods. 1 Department of Mathematics, University of Kansas, Lawrence, KS 66045, USA. address: xu@math.ku.edu. Partially supported by Senior Visiting Scholar Fund of Fudan University Key Laboratory and the University of Kansas General Research Fund allocation # Part of the work was done while this author was visiting School of Mathematics, Fudan University whose hospitality is gratefully acknowledged. 1

2 The paper is organized as follows. In Section 2 we give definitions of functions of a matrix and Krylov matrices, and some basic properties that are necessary for deriving main results. In Section 3 we show relations between functions of a matrix and Krylov matrices by providing explicit formulas. In Section 4 we interpret the relations in terms of linear transformations and subspaces. In Section 5 we study the Hessenberg reduction forms, and derive some related properties and matrix factorizations. In Section 6 we give conclusions. The spectrum of A is denoted by λ(a). stands for both the Euclidian norm of a vector and the spectral norm of a matrix. I n is the n n identity matrix, and e j is the jth column of I n. N r is the r r nilpotent matrix with 1 on the super diagonal and 0 elsewhere, and N r (λ) = λi r + N r. A square matrix is called unreduced upper Hessenberg if it is upper Hessenberg with nonzero subdiagonal elements. P m denotes the space of the polynomials with degree no greater than m. 2 Functions of a matrix and Krylov matrices In this paper we only consider the functions defined as follows. Let A be a square matrix and have the Jordan canonical form Z 1 AZ = diag(n r1,1 (λ 1 ),..., N r1,s1 (λ 1 ),..., N rη,1 (λ η ),..., N rη,sη (λ η )) where λ 1,..., λ η λ(a) are distinct. Let f(t) be a scalar function. If for each λ i, f(λ i ) and the derivatives f (k) (λ i ) (k = 1,..., max 1 j si r i,j 1) are defined, we define f(a) := Z diag(f(n r1,1 (λ 1 )),..., f(n r1,s1 (λ 1 )),..., f(n rη,1 (λ η )),..., f(n rη,sη (λ η )))Z 1, where f(λ i ) f(n ri,j (λ i )) = f (λ i ) f (r i,j 1) (λ i ) (r i,j 1)! 1! f (λ i ) 1! f(λ i ) For a scalar polynomial p(t) = m j=0 α jt j P m we simply have p(a) = m j=0 α ja j. We provide below some basic properties of functions of a matrix. Proposition 2.1 ([10, 11]) Suppose that µ is the degree of the minimal polynomial of A. For any function f(t) such that f(a) is defined, there exists a unique polynomial p(t) P µ 1 such that f(a) = p(a). The unique polynomial p(t) can be constructed by the Lagrange-Hermite interpolation with p (k) (λ i ) = f (k) (λ i ) for k = 0, 1,..., max 1 j si r i,j 1, and i = 1, 2,..., η. Proposition 2.2 ([10, 11]) (i) Af(A) = f(a)a. (ii) f(x 1 AX) = X 1 f(a)x.. 2

3 The Schur-Parlett algorithm for computing f(a) is based on these two properties. See [16, 17, 12, 3] and [10, Ch. 4, 9]. The properties will be used frequently in the rest of the paper. Suppose that A C n,n and b C n. We define the Krylov matrix K n,m (A, b) = [ b Ab... A m 1 b ] C n,m. When m = n, we will simply use the notation K n (A, b) or K(A, b). A polynomial with degree no greater than m 1 is characterized uniquely by its m coefficients. We use the following polynomial notation to emphasize the coefficients. Definition 2.3 For x = [x 1,..., x m ] T C m, It is obvious that p(t; x) := x 1 + x 2 t x m t m 1 P m 1. p(a; x)b = K n,m (A, b)x, x C m. (1) So K n,m (A, b)x = 0 if and only if p(a; x)b = 0. The minimal polynomial of b with respect to A is a nonzero polynomial p(t) of the lowest degree such that p(a)b = 0, [22, pp ]. Let ν be the degree of this minimal polynomial p(t). Then based on (1), rank K n,m (A, b) = min{m, ν}. More precisely, b, Ab,..., A ν 1 b are linearly independent, and for any k ν, A k b can be expressed as a linear combination of b, Ab,..., A ν 1 b, [19, Ch. VI]. Proposition 2.4 Suppose rank K n,m (A, b) = r. Then there exists a nonsingular matrix X = [X 1, X 2 ] C n,n with X 1 C n,r and range X 1 = range K n,r (A, b) such that K n,m (A, b) = X 1 [ R11 R 12 ], (2) where R 11 C r,r is nonsingular, and [ X 1 A11 A AX = 12 0 A 22 ] [, X 1 b1 b = 0 ] (3) with A 11 C r,r and b 1 C r. Moreover, R 11 is upper triangular if and only if A 11 is unreduced upper Hessenberg and X 1 b = γe 1. Proof. It is trivial when b = 0. So we only consider the case when b 0. Since rank K n,m (A, b) = r, based on the above arguments, K n,m (A, b) = K n,r (A, b) [ I r T ], for some T C r,m r. If X 1 C n,r satisfies range X 1 = range K n,r (A, b), then X 1 = K n,r (A, b)z for some nonsingular matrix Z C r,r. So we have (2) with [ R11 R 12 ] = Z 1 [ I r T ]. 3

4 Clearly, b = X 1 b 1 with b 1 = Z 1 e 1. Because A r b is a linear combination of b, Ab,..., A r 1 b, we have 0 c 1,r AK n,r (A, b) = K n,r (A, b)c r, C r = 1... c2,r c r,r Then AX 1 = X 1 A 11, A 11 = Z 1 C r Z. So for a nonsingular matrix X = [X 1, X 2 ] we have (3). Suppose that R 11 in (2) is upper triangular. Then Z = R11 1 is also upper triangular. So A 11 is unreduced upper Hessenberg. Because b 1 = Z 1 e 1 = R 11 e 1 = r 11 e 1, we have X 1 b = r 11 e 1 =: γe 1. Conversely, if A 11 is unreduced upper Hessenberg and X 1 b = γe 1, using (3), ( [ e1 A K n,r (A, b) = X γ 11 e 1... A r 1 11 e ]) 1 =: [ ] [ ] R X X 11 2 = X 0 1 R 11, and it is straightforward to show that R 11 is nonsingular and upper triangular. We now turn to a square Krylov matrix K(A, b) (m = n). Suppose that the characteristic polynomial of A is det(λi A) =: λ n c n λ n 1... c 2 λ c 1. We define the companion matrix 1 of A as 0 c 1 C = 1... c (4) 1 c n Proposition 2.5 AX = XC if and only if X = K(A, b) for some b C n. Proof. For sufficiency, it is straightforward to show AK(A, b) = K(A, b)c for any b C n, by using the Cayley-Hamilton Theorem. For necessity, X = K(A, b) follows simply by comparing the columns of the matrices AX and XC with b = Xe 1. The rank of K(A, b) is ν, the degree of the minimal polynomial of b with respect to A, which is no greater than the degree of the minimal polynomial of A. In order for K(A, b) to be nonsingular, it is necessary for the minimal polynomial of A to be the same as its characteristic polynomial, or equivalently, A has to be nonderogatory, i.e., the geometric multiplicity for every eigenvalue is one [6, 7]. Still, the nonsingularity of K(A, b) depends on the vector b. There are numerous equivalence conditions based on canonical forms [1] and the controllability from linear system theory [13, 18, 4, 5]. We list a few of them in the following proposition. Proposition 2.6 Suppose A C n,n and b C n. The following statements are equivalent. 1 Usually the transpose of C is also called a companion matrix of A. In this paper we always refer the companion matrix of A to C. 4

5 (a) K(A, b) is nonsingular. (b) y b 0 for any vector y C n satisfying y A = λy with some λ C. (c) rank [ A λi b ] = n for all λ C. (d) b 0 and there exists a nonsingular (or unitary) X such that X 1 b = γe 1 and X 1 AX is unreduced upper Hessenberg. (e) The only polynomial p(t; x) P n 1 that satisfies is p(t; 0) 0. p(a; x)b = 0 Proof. (a), (b), (c) are just three equivalence conditions for (A, b) to be controllable [13, 18, 4, 5]. The equivalence between (a) and (d) is from Proposition 2.4. The equivalence between (a) and (e) can be shown by using (1). Generically, a square matrix A is nonderogatory. When A is nonderogatory, the left eigenvector space of each eigenvalue is one-dimensional. So the set of b that satisfies Proposition 2.6 (b) is dense in C n, and the Krylov matrix K(A, b) is generically nonsingular. When A is derogatory, it is impossible for A to be similar to its companion matrix. Instead, A has a Frobenius form ([22, pp ]), X 1 AX = diag(c 1,..., C q ), where C 1,..., C q are in a companion matrix form, and the characteristic polynomial of each C j divides the characteristic polynomials of C 1,..., C j 1. Suppose the size of C j is n j n j, for j = 1,..., q. Then the similarity matrix X can be expressed as X = [K n,n1 (A, b 1 );..., K n,nq (A, b q )], for some b 1,..., b q C n, which generalizes the result in Proposition 2.5. In this paper, however, we focus on the nonderogatory case only, although some results can be generalized to the derogatory case by using the above observation. 3 Relations between Krylov matrices and functions of a matrix The formulation of a function of A in terms of Krylov matrices of A is based on the following simple observation. For any A C n,n and and b C n, using the fact f(a)a = Af(A) (Proposition 2.2) we have f(a)k(a, b) = K(A, d), d = f(a)b. (5) We first use this fact to show relations between polynomials p(a; x) and Krylov matrices. Theorem 3.1 For any x C n, K(A, d) = p(a; x)k(a, b), d = K(A, b)x, (6) 5

6 If K(A, b) is nonsingular (so A is necessarily nonderogatory), then for any d C n, p(a; x) = K(A, d)k(a, b) 1, x = K(A, b) 1 d, (7) and in this case (6) can be also expressed as where C is the companion matrix of A. K(A, d) = K(A, b)p(c; x), d = K(A, b)x, (8) Proof. The formula (6) follows simply from (5) with f(t) = p(t; x), and (1). The formula (7) is simply from (6) and the nonsingularity of K(A, b). (8) follows from p(a; x)k(a, b) = p(k(a, b)ck(a, b) 1 ; x)k(a, b) = K(A, b)p(c; x), based on Proposition 2.2 (ii) and Proposition 2.5. We now consider a general function f(t) and we have to following results. Theorem 3.2 Suppose that K(A, b) is nonsingular and C is the companion matrix of A. Let f(t) be a scalar function and τ C such that f(τa) is defined. Then and Also, when τ 0, f(τa) = K(A, d(τ))k(a, b) 1, d(τ) = f(τa)b, (9) f(τa) = p(a; x(τ)), x(τ) = K(A, b) 1 f(τa)b = f(τc)e 1. (10) f(τa) = p(τa; y(τ)), y(τ) = K(τA, b) 1 f(τa)b = f(c(τ))e 1, (11) where C(τ) = 0 τ n c τ n 1 c (12) 1 τc n is the companion matrix of τa. Proof. Because f(τa) and A i (i = 0,..., n 1) commute, f(τa) = f(τa)k(a, b)k(a, b) 1 = K(A, f(τa)b)k(a, b) 1 = K(A, d(τ))k(a, b) 1. By (9) and (7), f(τa) = K(A, d(τ))k(a, b) 1 = p(a; x(τ)), where x(τ) = K(A, b) 1 d(τ) = K(A, b) 1 f(τa)b = f(τc)e 1. For (11), define D = diag(1, τ,..., τ n 1 ), which is nonsingular. From the simple relation K(τA, b) = K(A, b)d, (13) 6

7 K(τA, b) is also nonsingular. By applying (5) with A replaced by τa, we get f(τa) = K(τA, d(τ))k(τa, b) 1, d(τ) = f(τa)b. Again, using (7) we have where f(τa) = p(τa; y(τ)), y(τ) = K(τA, b) 1 d(τ) = K(τA, b) 1 f(τa)b = f(c(τ))e 1, and C(τ) is the companion matrix of τa. It remains to derive the formula for C(τ). By using (13) we have C(τ) = K(τA, b) 1 (τa)k(τa, b) = D 1 K(A, b) 1 (τa)k(a, b)d = τd 1 CD, which has the form (12). Note that when τ = 0, (11) may not hold, since on the left-hand side it only requires f(0) to be defined while on the right-hand side f(c(0)) has to be defined. Even if (11) holds, it usually doesn t give a polynomial corresponding to f(0) with minimal degree. This is because y(0) = f(c(0))e 1 may not be a scalar multiple of e 1, resulting a polynomial p(t; y(0)) with degree greater than 0. Note also that when τ = 1, (10) and (11) are identical. The following results are directly from Theorem 3.2. Theorem 3.3 Suppose that K(A, b) is nonsingular and C is the companion matrix of A. Let f(t) be a scalar function such that f(a) is defined. Then and If in addition f(a) is nonsingular, then f(a) = K(A, d)k(a, b) 1, d = f(a)b, (14) f(a) = p(a; x), x = K(A, b) 1 f(a)b = f(c)e 1. (15) [f(a)] 1 = p(a; y), y = K(A, d) 1 b. (16) Proof. The first part is from Theorem 3.2 with τ = 1. So we only need to prove (16). If f(a) is nonsingular, then from (14), K(A, d) is also nonsingular. So [f(a)] 1 = K(A, b)k(a, d) 1, and (16) is from (15). Formula (15) not only restates the result given in Proposition 2.1 in the nonderogatory case, i.e., f(a) P n 1 (A), but also provides an explicit formula for the polynomial p(t; x). Formula (16) shows that the same properties hold for the inverse of f(a). Formula (14) holds true for all f(a). When f(t) is a rational function, we have the following additional formula. Theorem 3.4 Suppose that K(A, b) is nonsingular and r(t) = p(t)/q(t) is a rational function with q(a) nonsingular. Then r(a) = K(A, d 1 )K(A, d 2 ) 1, d 1 = p(a)b, d 2 = q(a)b, (17) and r(a) = p(a; x), x = K(A, d 2 ) 1 p(a)b. (18) 7

8 Proof. Since K(A, b) is nonsingular, r(a) = p(a)q(a) 1 = p(a)k(a, b)k(a, b) 1 q(a) 1 = K(A, p(a)b)k(a, q(a)b) 1, which gives (17). The relations in (18) are from (17) and (7). Remark 3.5 Computing a Krylov matrix K(A, b) has the same cost of a matrix-matrix multiplication. So if f(a)b is available, computing f(a) with (14) or (17) requires two matrixmatrix multiplications and one matrix equation solving. In general, computing the vector f(a)b is far from trivial, but it is straightforward when f(t) is a polynomial or a rational function. So this approach may have advantages in symbolic or exact arithmetic computations. For numerical computations, however, it is well-known that a Krylov matrix is usually ill-conditioned. A method that uses (14) or (17) directly may be numerically unstable. The above formulations may be used to derive some interesting results. For instance, let f(t) = e t. Then from Theorem 3.2, we have e τa = K(A, d(τ))k(a, b) 1, d(τ) = e τa b. This shows that the fundamental matrix of the linear system dx/dτ = Ax is completely determined by the solution to the initial value problem dx/dτ = Ax, x(0) = b. In the end of this section, we consider the case where Krylov matrices are slightly generalized. Let g j (t) = p(t; γ j ) P n 1, γ j C n, for j = 1,..., n. Define By using (1), for j = 1,..., n. Define Then G(A, b) = [ g 1 (A)b g 2 (A)b... g n (A)b ]. (19) g j (A)b = K(A, b)γ j, Γ = [ γ 1... γ n ]. G(A, b) = K(A, b)γ, (20) Clearly for any Γ C n,n a matrix G(A, b) can be generated by using (20). When K(A, b) is nonsingular, it defines an isomorphism from Γ to G(A, b). Note also that G(A, b) is nonsingular if and only if both K(A, b) and Γ are nonsingular. Corollary 3.6 Suppose that G(A, b) defined in (19) with g 1 (t),..., g n (t) P n 1 is nonsingular. Let f(t) be a scalar function and τ C be a scalar such that f(τa) is defined. Then Proof. The proof is trivial. f(τa) = G(A, d(τ))g(a, b) 1, d(τ) = f(τa)b. (21) Remark 3.7 All the results established in this section apply to the matrices and vectors defined over any field as long as f(τa) is defined and satisfies f(τa)a = Af(τA). 8

9 4 Connections to subspaces and linear transformations In this section, we interpret Krylov matrices and polynomials of a matrices in terms of linear transformations. For any vectors b 1, b 2 C n and scalars α, β, we have K(A, αb 1 + βb 2 ) = αk(a, b 1 ) + βk(a, b 2 ). So the matrix A introduces a linear transformation: C n C n,n defined by b K(A, b). The range of the transformation is the set of the Krylov matrices of A: K(A) = {K(A, b) b C n }, which is a subspace of C n,n. Clearly, dim K(A) = n. Let L(K(A)) be the space of the linear operators on K(A). It has the dimension n 2. Suppose T L(K(A)) and T C n,n is its matrix with the basis {K(A, e j )} n j=1. Then TK(A, b) = K(A, T b), b C n. So we may identify L(K(A)) with C n,n based on the above relation. Now consider a subspace of L(K(A)) defined by Define L c (K(A)) = {T TK(A, b) = K(A, T b) = T K(A, b), b C n }. P n 1 (A) = {p(a; x) x C n }. From (6), p(a; x) L c (K(A)) for any x C n. Hence P n 1 (A) L c (K(A)). Theorem 4.1 Suppose A C n,n. Then P n 1 (A) = L c (K(A)) if and only if A C n,n is nonderogatory. Proof. For any T L c (K(A)), the corresponding matrix T satisfies K(A, T b) = T K(A, b) for all b C n if and only if T A = AT. Without distinguishing T and its matrix T we have L c (K(A)) = {T T A = AT, T C n,n }, which is called the centralizer of A [11, pp. 275]. With this connection, the equivalence relations follow from [11, Corollary ]. Because K(A) and C n are isomorphic, when A is nonderogatory, P n 1 (A) and C n are also isomorphic. So P n 1 (A) and K(A) are isomorphic. Then L(P n 1 (A), K(A)), the space of linear transformations from P n 1 (A) to K(A), has the dimension n 2, and it is isomorphic to C n,n. For any S C n,n we may introduce S L(P n 1 (A), K(A)) defined by Sp(A; x) = K(A, Sx), x C n. (Again, S is considered as the matrix of S with the bases {A j } n 1 j=0 and {K(A, e j)} n j=1.) Define L c (P n 1 (A), K(A)) = {S Sp(A; x) = K(A, Sx) = p(a; x)s, x C n } L(P n 1 (A), K(A)). 9

10 Theorem 4.2 Suppose A is nonderogatory. Then K(A) = L c (P n 1 (A), K(A)). Proof. Formula (6) shows that for any b C n, the linear transformation S corresponding to S := K(A, b) is in L c (P n 1 (A), K(A)). So if we don t distinguish S with S, we have K(A) L c (P n 1 (A), K(A)). On the other hand, for each S L c (P n 1 (A), K(A)), the corresponding matrix S satisfies K(A, Sx)e j = p(a; x)se j for j = 1,..., n. Using these relations and (1), we have which implies A j 1 Sx = K(A, Sx)e j = p(a; x)se j = K(A, Se j )x, x C n, A j 1 S = K(A, Se j ), (22) for j = 1,..., n. Let b = Se 1. Then setting j = 1 in (22) we have S = K(A, b), and with which (22) holds for j = 2,..., n. We then have shown L c (P n 1 (A), K(A)) K(A). Therefore, the two spaces are the same. Theorems 4.1 and 4.2 show that when A is nonderogatory, a linear operator on K(A) is just a polynomial p(a; x) and a linear transformation from P n 1 to K(A) is just a Krylov matrix K(A, b), both described by (6). A common technique to generate a new Krylov matrix from K(A, b) is to choose a new initial vector d = p(a; x)b for an appropriate polynomial p(t; x). Such a technique is widely used in the QR algorithm and Krylov subspace methods [19, 8, 9, 20]. Theorem 4.1 shows that in order to obtain K(A, d) expressed as T K(A, b), this is the only way when A is nonderogatory. For any b such that K(A, b) is nonsingular, the corresponding S b L c (P n 1 (A), K(A)) is an isomorphism of P n 1 (A) onto K(A) defined by Its inverse is S b p(a; x) = K(A, d) = p(a; x)k(a, b), d = K(A, b)x, x C n. S 1 b K(A, d) = p(a; x) = K(A, d)k(a, b) 1, x = K(A, b) 1 d, d C n, (23) which is just (7). Using L c (K(A)) and the isomorphisms of K(A) and L c (P n 1 (A), K(A)), we are also able to define a subspace of linear operators on P n 1 (A). Let S b1, S b2 L c (P n 1 (A), K(A)) be invertible. Define { } L c (P n 1 (A)) = W Wp(A; x) = S 1 b 2 TS b1 p(a; x) = p(a; y), T = p(a; z) L c (K(A)), y = K(A, b 2 ) 1 p(a; z)k(a, b 1 )x, x C n Clearly, L c (P n 1 (A)) is isomorphic to L c (K(A)) = P n 1 (A). So its dimension is n. When K(A, b) is nonsingular, by Proposition 2.4 we have a Hessenberg reduction form Q AQ = H, Q b = γe 1, γ = b, (24) where Q is unitary and H is unreduced upper Hessenberg. The matrices Q and H can be computed by using the Arnoldi process. Arnoldi process. 10

11 Input: Matrix A and vector b Output: Unitary matrix Q = [q 1,..., q n ] and upper Hessenberg matrix H = [h ij ] Choose q 1 = b/ b. For k = 1,..., n End h jk = q j Aq k, j = 1,..., k h k+1,k = Aq k h 1k q 1... h kk q k q k+1 = (Aq k h 1k q 1... h kk q k )/h k+1,k In practice, one uses the modified Arnoldi process [8, Sec. 9.4], or the numerically stable Hessenberg reduction method with Householder transformations [8, Sec. 7.4]. Note that with the above Arnoldi process γ = b and all the subdiagonal elements of H are positive. With the Hessenberg reduction form (24), by Proposition 2.4, one has the QR factorization K(A, b) = QR, (25) where R = K(H, γe 1 ) is nonsingular and upper triangular with r kk k = 1,..., n. From (25), = γ k 1 j=1 h j+1,j for Q = K(A, b)r 1 =: [ g 1 (A)b g 2 (A)b... g n (A)b ], (26) and it is easily verified that g j (t) P n 1 and deg g j (t) = j 1 for j = 1,..., n. So Q is a generalized Krylov matrix of the form (19). In fact the polynomials g i (t) has the following properties. Theorem 4.3 The polynomials g 1 (t),..., g n (t) satisfy g 1 (t) = 1 γ, g 1 j(t) = γ j k=2 h det(ti H j 1 ), j = 2,..., n, (27) k,k 1 where H k is the leading principal k k matrix of H given in (24). Also, [g 1 (t), g 2 (t),..., g n (t)] = [1, t,..., t n 1 ]R 1. (28) Proof. From the Arnoldi process it is not difficult to get the recurrence g 1 (t) = 1 γ, g j+1(t) = 1 h j+1,j (tg j (t) h 1j g 1 (t)... h jj g j (t)), j = 1,..., n 1. We now prove (27) by induction. When j = 2, So (27) holds for j = 2. g 2 (t) = 1 h 21 (tg 1 (t) h 11 g 1 (t)) = 1 γh 21 (t h 11 ) = 1 γh 21 det(ti 1 H 1 ). 11

12 Assume (27) is true for 1,..., j. Expanding det(ti H j ) based on the last column we get det(ti H j ) = (t h jj ) det(ti H j 1 ) h j 1,j h j,j 1 det(ti H j 2 ) ( j j ) h j 2,j h k,k 1 det(ti H j 3 )... h 1j h k,k 1. k=j 1 k=2 By dividing γ j+1 k=2 h k,k 1 on both side, and using the assumption we have 1 γ j+1 k=2 h det(ti H j ) = 1 ((t h jj )g j (t) h j 1,j g j 1 (t)... h 1j g 1 (t)) = g j+1 (t). k,k 1 h j+1,j So (27) hold also for j + 1. The relation (28) is simply form (26). Since K(A, b) is nonsingular, one may introduce the following inner product in P n 1 (A). p(a; x), p(a; y) b = b p(a; x) p(a; y)b = x K(A, b) K(A, b)y, p(a; x), p(a; y) P n 1 (A). The last relation is due to (1). With this inner product we define the norm p(a; x) b = p(a; x), p(a; x) 1/2 b = p(a; x)b = K(A, b)x. Then the matrices g 1 (A),..., g n (A) that determine Q in (26) are orthonormal, which can be viewed as being generated from I, A,..., A n 1 P n 1 (A) by applying the Gram-Schmidt process with respect to the above defined inner product [2]. So g 1 (A),..., g n (A) form an orthonormal basis for P n 1 (A). Also, the polynomials g 1 (t),..., g n (t) form an orthonormal basis for P n 1 with respect to the inner product p(t; x), p(t; y) A,b := p(a; x), p(a; y) b, p(t; x), p(t; y) P n 1, which can be interpreted as being generated by applying the Gram-Schmidt process to 1, t,..., t n 1. Because K(A) and P n 1 (A) are isomorphic, if K(A, b) is nonsingular, using the isomorphism S 1 b defined in (23), an inner product in K(A) can be induced from the inner product, b with K(A, u), K(A, v) = S 1 b K(A, u), S 1 b K(A, v) b = p(a, K(A, b) 1 u), p(a, K(A, b) 1 v) b = (K(A, b) 1 u) (K(A, b) K(A, b))(k(a, b) 1 v) = u v, which is just the standard inner product in C n. 5 More properties related to Hessenberg reductions Given a nonsingular matrix W, in a similar way one can determine a W -unitary matrix X, i.e., X W W X = I, such that X 1 AX = Ĥ, X 1 b = ˆγe 1, (29) 12

13 where Ĥ is unreduced upper Hessenberg (by Proposition 2.4). The matrix X can be obtained by applying the Arnoldi process. The only difference is to make the columns of X to be W -orthonormal. By Proposition 2.4, So we also have K(A, b) = X R, R = K( Ĥ, ˆγe 1 ). X = K(A, b) R 1 = [ ĝ 1 (A)b ĝ 2 (A)b... ĝ n (A)b ], where ĝ j (t) P n 1 with deg ĝ j (t) = j 1, and ĝ 1 (A),..., ĝ n (A) form an orthonormal basis for P n 1 (A) with the generalized inner product p(a; x), p(a; y) W,b = b p(a; x) W W p(a; y)b = x K(A, b) W W K(A, b)y. (30) Theorem 5.1 Suppose K(A, b) is nonsingular. Let W be nonsingular, X and Ĥ satisfy (29), and Q, H satisfy (24). Define Q = W X and T = RR 1 = K(Ĥ, ˆγe 1)K(H, γe 1 ) 1. Then Q is unitary, T is upper triangular, and Then T = ˆγ[g 1 (Ĥ)e 1,..., g n (Ĥ)e 1] = γ 1 [ĝ 1 (H)e 1,..., ĝ n (H)e 1 ] 1, Q = XT, W = QT Q, H = T 1 ĤT. Proof. It is obvious that Q = W X is unitary and T is upper triangular. As (28), we have [ĝ 1 (t),..., ĝ n (t)] = [1, t,..., t n 1 ] R 1. T = RR 1 = ˆγ[g 1 (Ĥ)e 1,..., g n (Ĥ)e 1] = γ 1 [ĝ 1 (H)e 1,..., ĝ n (H)e 1 ] 1 = (R R 1 ) 1, which is upper triangular, and from [g 1 (t),..., g n (t)] = [1, t,..., t n 1 ]R 1 = [1, t,..., t n 1 ] R 1 ( RR 1 ) = [ĝ 1 (t),..., ĝ n (t)]t, we have Then Q = XT. W = QX 1 = QT Q, and from (24) and (29) we have H = T 1 ĤT. This theorem shows the relations between the Hessenberg reduction forms (24) and (29). In fact, with Â = W AW 1 and ˆb = W b, (29) can be rewritten as Q Â Q = Ĥ, Q ˆb = ˆγe1, ˆγ = ˆb. So (29) is the same as (24) but with Q, A, b replaced by Q, Â, ˆb. The next result shows that for any sequence of n polynomials in P n 1 with degrees in increasing order, a unitary matrix can be constructed to reduces W AW 1 and W b to a Hessenberg reduction form for an appropriate W. 13

14 Theorem 5.2 Suppose that K(A, b) is nonsingular and p(t; r 1 ),..., p(t; r n ) P n 1 deg p(t; r j ) = j 1 for j = 1,..., n. There exists a nonsingular matrix W such that for with Â = W AW 1, ˆb = W b, the matrix is unitary and satisfies Q = [p(â; r 1)ˆb,..., p(â; r n)ˆb] Q Â Q = Ĥ, Q ˆb = ˆγe1, ˆγ = ˆb, where Ĥ is unreduced upper Hessenberg. Proof. Let R = [r 1,..., r n ], X = [p(a; r 1 )b,..., p(a; r n )b]. By the assumptions R is upper triangular and nonsingular, and X = K(A, b)r is nonsingular. From AK(A, b) = K(A, b)c, where C is the companion matrix of A, we have X 1 AX = R 1 CR =: Ĥ, X 1 b = ˆγe 1, where Ĥ is unreduced upper Hessenberg and ˆγ is a scalar. Let X = W 1 Q, where W is a nonsingular matrix and Q is unitary. Such a factorization always exists, for instance, an RQ factorization. Then, with this W and the corresponding Â, ˆb, we have Q = W X = W [p(a; r 1 )b,..., p(a; r n )b] = [p(â; r 1)ˆb,..., p(â; r n)ˆb], Q Â Q = X 1 W 1 (W AW 1 )W X = X 1 AX = Ĥ, and Q ˆb = X 1 W 1 W b = X 1 b = ˆγe 1. Obviously, ˆγ = Q ˆb = ˆb. We now use the Hessenberg reduction form (24) to give a factorization for f(a). Theorem 5.3 Suppose that K(A, b) is nonsingular and A, b have the forms in (24). Then for any f(t) such that f(a) is defined, f(a) = QK(H, d)k(h, e 1 ) 1 Q, d = f(h)e1. (31) Moreover, let Q = [ Q 1, Q 2 ] be unitary with Q 1 C n,r such that [ ] H Q = Q H, H11 H12 H =, Q d = γe1, γ = 0 d, H22 where H 11 C r,r is an r r unreduced upper Hessenberg matrix. Then rank f(a) = r and f(a) = (Q Q 1 ) RQ, (32) where R = γk r,n ( H 11, e 1 )K(H, e 1 ) 1, is upper triangular. 14

15 Proof. Since A = QHQ, we have f(a) = Qf(H)Q. Applying (14) to f(h) with A = H and b = γe 1, f(h) = K(H, d)k(h, e 1 ) 1, d = f(h)e1. So we have (31). The factorization (32) follows by applying Proposition 2.4 to K(H, d) in (31). Corollary 5.4 Suppose K(A, b) is nonsingular. If rank f(a) = r < n, then all the eigenvalues of H 22 are the roots of f(t) = 0 (counting multiplicity), and range Q Q 2 = null([f(a)] ). Proof. From (32), (Q Q) f(a)(q Q) = [ R 0 ] Q. On the other hand, by using the Hessenberg reduction forms we have [ (Q Q) f(a)(q Q) = f( H) f( = H ] 11 ) 0 f( H. 22 ) So we have f( H 22 ) = 0, and (Q Q 2 ) f(a) = 0. Clearly, all the eigenvalues of H 22 are the roots of f(t). Since rank f(a) = r, we have range Q Q 2 = null([f(a)] ). When f(a) is nonsingular, (32) can be also derived by the orthogonalization argument using the weighted inner product (30) with W = f(a). In this case, (W AW 1, W b) becomes (A, d). From Theorem 5.3, the matrices and scalar in (29) are Q = Q Q, Ĥ = H, ˆγ = γ γ. By Theorem 5.1, f(a) has the URV decomposition ([21]), where f(a) = Q QT Q, T = K( H, γ γe 1 )K(H, γe 1 ) 1 = γk( H, e 1 )K(H, e 1 ) 1 = R. The formula (32) is more generalized, since it holds when f(a) is singular as well. Remark 5.5 Using Af(A) = f(a)a, the matrix R in (32) satisfies H 11 R = RH. So R can be computed column by column with the recurrence r 1 = γe 1, r k+1 = ( H 11 r k h 1k r 1... h kk r k )/h k+1,k, k = 1,..., n 1, which is the Arnoldi process for q k, but with A replaced by H 11. More generally, using Hf(H) = f(h)h, one may use the same recurrence to compute f(h), provided f(h)e 1 is given. This approach was mentioned in [14, 15] for f(t) = e t. The next result shows how the unitary matrix is related to Q if it is generated by another vector. 15

16 Theorem 5.6 Suppose that K(A, b) is nonsingular and A, b have the forms in (24). d C n and Q = [ Q 1, Q 2 ] be unitary with Q 1 C n,r and satisfy [ ] A Q = QĤ, Ĥ = Ĥ11 Ĥ 12, 0 Ĥ Q d = ˆγe 1, ˆγ = d, 22 Let where Ĥ11 C r,r is unreduced upper Hessenberg. Let [ ] H11 H Q = [Q 1, Q 2 ], H = 12, Q H 21 H 1 C n,r, H 11 C r,r. 22 Then and where Q 1 K r,n (Ĥ11, ˆγe 1 ) = p(a; x)qk(h, γe 1 ), (33) Q 1 T r = p(a; x)q 1, x = K(A, b) 1 d, T r = K r (Ĥ11, ˆγe 1 )K r (H 11, γe 1 ) 1. Proof. Because x = K(A, b) 1 d, by (6), By the Hessenberg reduction forms we have K(A, d) = p(a; x)k(a, b). K(A, d) = Q 1 K r,n (Ĥ11, ˆγe 1 ), K(A, b) = QK(H, γe 1 ). So we have (33). The second equation follows by equating the first r columns in (33) to get Q 1 K r (Ĥ11, ˆγe 1 ) = p(a; x)q 1 K r (H 11, γe 1 ), and by using the fact that K r (H 11, γe 1 ) is nonsingular. The relation (33) shows that the unitary matrix corresponding to d in the Hessenberg reduction is just the unitary factor of the QR factorization of p(a; x)q for an appropriate polynomial p(t; x). Any nonsingular Krylov matrix K(A, b) has a QR factorization (25) with R nonsingular. If Q = I, from (24) A has to be unreduced upper Hessenberg and b = γe 1. In this case (32) becomes a QR factorization of f(a). If further K(A, b) = I, then b = e 1 and A is the companion matrix C defined in (4). In this case, f(a) and K(A, b) have simpler relations. Corollary 5.7 Suppose C C n,n is a companion matrix defined in (4). Then p(c; d) = K(C, d), d C n. Proof. It is simply from Theorem 3.1 with A = C and b = e 1, and K(C, e 1 ) = I n. If we choose d as c = [c 1,..., c n ] T, the last column of C, then by the Cayley-Hamilton Theorem, C n = p(c; c) = K(C, c). 16

17 Corollary 5.8 Suppose C is the companion matrix (4) and f(t) is a scalar function such that f(c) is defined. Then f(c) = K(C, d) = p(c; d), d = f(c)e 1. Proof. It follows from Theorem 3.3 and Corollary 5.7. If in (25), R = I or equivalently K(A, b) = Q, then Q K(A, b) = K(Q AQ, Q b) = I. This implies b = Qe 1 and A = QCQ. In this case, we have a simple formula f(a) = K(A, d)q. Note that this is one of the situations where the GMRES stagnates when it is used to solve associated linear systems or eigenvalue problems, [9, Sec. 3.2]. Note also that when W = K(A, b) 1 in the inner product (30), (W AW 1, W b) becomes (C, e 1 ). So for any A, b such that K(A, b) is nonsingular, the above situation occurs (with Q = I) if we consider the orthonormal polynomials with respect to, K(A,b) 1,b in P n 1. 6 Conclusions Starting from a simple observation, we derived formulas to show relations between functions of a matrix and Krylov matrices. By introducing subspaces and linear transformations, we interpreted the relations at an abstract level. We provided several properties of Hessenberg reductions that can be used to understand some common techniques used in Krylov subspace methods and eigenvalue algorithms. How to use the results to improve existing methods and develop new methods? That needs more work. Acknowledgment The author thanks the referees for their comments and suggestions. References [1] M. Arioli, V. Pták, and Z. Strakoš. Krylov sequences of maximal length and convergence of GMRES. BIT, 38(4): , [2] D. Calvetti, S.-M. Kim, and L. Reichel. Quadrature rules based on the arnoldi process. SIAM J. Matrix Anal. Appl., 26(3): , [3] P. I. Davies and N. J. Higham. A Schur-Parlett algorithm for computing matrix functions. SIAM J. Matrix Anal. Appl., 25(2): , [4] R. Eising. The distance between a systems and the set of uncontrollable systems. Proc. MTNS, , Beer-sheva, June [5] R. Eising. Between controllable and uncontrollable. Systems Control Lett., 4(5): , [6] F. R. Gantmacher. Theory of Matrices. Vol. 1. Chelsea, New York, [7] F. R. Gantmacher. Theory of Matrices. Vol. 2. Chelsea, New York,

18 [8] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 3rd edition, [9] A. Greenbaum. Iterative Methods for Solving Linear Systems. SIAM Publisher, Philadelphia, PA, [10] N. J. Higham. Functions of Matrices, Theory and Computation. SIAM Publications, Philadelphia, PA, [11] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, Cambridge, [12] B. Kågström. Numerical computation of matrix functions. Report UMIN-58.77, Department of Information processing, University of Umeå, Sweden, [13] R. E. Kalman. On the general theory of control systems. In Proc. 1st IFAC Congr., 1: , London, Butterworth, [14] C. B. Moler and C. F. Van Loan. Nineteen dubious ways to compute the exponential of a matrix. SIAM Rev., 29: , [15] C. B. Moler and C. F. Van Loan. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev., 45:3 49, [16] B. N. Parlett. Computation of functions of triangular matrices. Memorandum ERL- M481, Electronics Research Laboratory, College of Engineering, UC Berkeley, [17] B. N. Parlett. A recurrence among the elements of functions of triangular matrices. Linear Algebra Appl., 14: , [18] H. H. Rosenbrock. State-Space and Multivariable Theory. London, Nelson, [19] Y. Saad. Numerical Methods for Large Eigenvalue Problems. Manchester University Press, Manchester, UK, [20] Y. Saad. Iterative Methods for Sparse Linear Systems (2nd edition). SIAM Publications, Philadelphia, PA, [21] G. W. Stewart. Updating a rank-revealing URV decompositions in parallel. Parallel Computing, 20: , [22] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, Oxford,

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH V. FABER, J. LIESEN, AND P. TICHÝ Abstract. Numerous algorithms in numerical linear algebra are based on the reduction of a given matrix