Eigenvalue Estimation with the Rayleigh-Ritz and Lanczos methods

Size: px

Start display at page:

Download "Eigenvalue Estimation with the Rayleigh-Ritz and Lanczos methods"

Conrad Gibson
5 years ago
Views:

1 Eigenvalue Estimation with the Rayleigh-Ritz and Lanczos methods Ivo Panayotov Department of Mathematics and Statistics, McGill University, Montréal Québec, Canada August, 2010 A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements of the degree of Doctor of Philosophy Copyright c Ivo Panayotov, 2010

3 i Abstract In this thesis we study two different problems related to eigenvalue error bounds. In the first part of our thesis, we examine a conjecture of Knyazev and Argentati [Siam J. Matrix Anal. Appl., 29 (2006), pp ] bounding the difference between Ritz values of a Hermitian matrix A for two subspaces, one of which is A-invariant. We provide a proof for a slightly weaker version of the conjecture, and discuss the recently published full proof. Moreover we give implications of the now proven bound and examine how it compares to a classical bound in the same context. In the second part of our thesis, we derive some properties of complex Hessenberg matrices and consider the relevant normal matrix cases of these to re-examine the lengths of the Ritz vectors in the rounding error analysis of the Lanczos process for tridiagonalizing certain normal matrices. This question has already been studied for the real symmetric case, but part of that answer has never been published in scientific journals, and in that case we give new theory. For the more general normal matrix cases we develop applicable theory including some new tight bounds.

4 ii Abstract

5 iii Résumé Dans cette thèse, nous étudions deux problèmes différents liés aux bornes d erreur de valeurs propres. Dans la première partie de notre thèse, nous examinons une conjecture de Knyazev et Argentati [Siam J. Matrix Anal. Appl., 29 (2006), pp ] bornant la différence entre les valeurs Ritz d une matrice hermitienne A correspondant à deux sous-espaces, dont l un est A-invariant. Nous obtenons une preuve pour une version légèrement plus faible de la conjecture, et discutons la preuve complète publiée récemment. De plus, nous dérivons certaines implications de la borne, maintenant prouvée, et la comparons à une borne classique dans le même contexte. Dans la deuxième partie de notre thèse, nous dérivons certaines propriétés des matrices Hessenberg complexes et examinons les cas appropriés normaux de celles-ci afin de réexaminer les longueurs des vecteurs Ritz dans l analyse d erreur du processus de Lanczos de tridiagonalization de certaines matrices normales. Cette question a déjà été étudiée pour le cas symétrique réel, mais une partie de l analyse n a jamais été publiée dans une revue scientifique; dans ce cas, nous présentons une nouvelle théorie. Pour le cas plus général de matrices normales, nous développons une théorie applicable, avec de nouvelles bornes serrées.

6 iv Résumé

7 v Acknowledgments I would like to thank my supervisor Chris Paige for his continued guidance, support, and encouragement throughout my doctoral studies. I am grateful for the incredible amount of time he spent with me, both academically and otherwise. I would also like to thank my supervisor Xiao-Wen Chang. Although I did not work with him directly, I learned a lot from his two superb courses in matrix computations, and from his sharp and always to the point questions. I express my deepest gratitude to both for taking me as their student five years ago and for the great commitment that they have shown. I am grateful to David Airapetyan for taking the time to discuss my research problems and for helping me clarify my own direction, to Felicia Magpantay for enthusiastically performing Matlab tests with me on my own problems, and to Layan El Hajj, Svetla Vassileva and Jeremy Macdonald for fruitful discussions. My thanks go to all my friends in the mathematics department for making my stay at McGill particularly pleasant. I am very fortunate and grateful to have received funding from FQRNT of Québec Scholarship and Chris Paige s NSERC of Canada Grant OGP , making it possible for me to attend conferences and to focus on my studies without financial worries.

8 vi Acknowledgments

9 vii Table of Contents Abstract i Résumé iii Acknowledgments v Introduction 1 1 Majorization Bounds for Ritz Values of Hermitian Matrices Ritz Values in Eigenvalue Approximation Classical Bounds for Ritz Values Definitions and Prerequisites Notation Angles between Subspaces Majorization Majorization Bounds for Ritz Values Discussion Full Proof of the Main Conjecture Normwise Implications Majorization versus Classical Bounds Concluding Remarks

10 viii TABLE OF CONTENTS 2 Hessenberg Matrix Properties Notation Hessenberg Matrix Properties A Side Implication Concluding Remarks Ritz vectors in the Lanczos process The Lanczos Process Implementation of the Lanczos Process Approximating Eigenvalues from the Lanczos Process The Lanczos Process in Finite Precision Our Current Approach for Studying z m Bound for an Isolated Ritz Value Divided Differences Bound for a Cluster of Ritz Values Comparing the Old and New Approaches Concluding Remarks Conclusion 103

11 1 Introduction Objectives of the Research Eigenvalue problems appear in many applications. For example, frequencies of vibration in mechanical systems can be found by solving eigenproblems, while the energy levels of a system are the eigenvalues of the Hamiltonian in quantum mechanics. Eigenvalue methods are used today in these and many other applications, including spectral data clustering and internet search engines. Eigenvalues cannot be computed exactly except in trivial cases, so they are usually approximated numerically. Eigenvalue a posteriori and a priori error bounds describe the quality of eigenvalue approximations, and such error bounds are a classical and important topic in matrix analysis. A posteriori bounds are based on information readily available when running an algorithm, e.g., the eigenvector residual. They are used for practical estimation of, or bounds on, the eigenvalue error from the algorithm. A priori bounds are based on information not readily available during an algorithm run. Nevertheless, they help our understanding of algorithms, and, as Wilkinson [59, p. 166] pointed out, a priori bounds are of great value in assessing the relative performance of algorithms. Thus a priori bounds are important in theory and practice. In this thesis we examine two different questions related to eigenvalues: the first is an a priori eigenvalue error bound, and the second is related to an a posteriori eigenvalue error bound.

12 2 Introduction In the first problem, we examine a conjecture of Knyazev and Argentati [32] on the difference between Ritz values of a Hermitian matrix A for two subspaces, one of which is A-invariant. This conjecture generalizes a classical one-dimensional bound of Ruhe [53] to multidimensional subspaces using majorization. We show here, and in [2], that the conjectured bound holds under some additional assumptions, and a slightly weaker version of it holds in the general case. We also present recent work of Knyazev and Argentati [33] proving the conjectured bound in all cases. We then examine some of the consequences of the conjecture, now a theorem, and compare the new majorization bound to a multidimensional version of the classical one, see [49]. In the second problem, we develop properties of Hessenberg matrices some of which we believe are new and useful for analysis of algorithms. We then use these properties to re-examine the lengths of the Ritz vectors in the finite arithmetic Lanczos process for the Hermitian eigenvalue problem, which is an important question in the rounding error analysis of this process. Our analysis is also valid for the Lanczos process adapted to skew-hermitian and normal matrices with collinear eigenvalues, that is, eigenvalues which lie on a line segment in C, see [50]. Our thesis is outlined as follows. In Chapter 1 we examine the conjecture posed by Knyazev and Argentati in [32], in Chapter 2 we develop properties of Hessenberg matrices and in Chapter 3 we provide the new analysis for the lengths of the Ritz vectors in the Lanczos process for Hermitian, skew-hermitian and normal matrices with collinear eigenvalues. A Brief Review and Introduction to the Literature Here we mention some background, together with some key literature. The full relevance of the literature to the present work is given in the main text.

13 3 Our first major topic is the examination of a conjecture posed by Knyazev and Argentati in [32], which generalizes a classical one-dimensional result on the Rayleigh quotient by Ruhe [53]. This conjecture of Knyazev and Argentati provides a link between two important topics in linear algebra: majorization, and the Rayleigh-Ritz method for the Hermitian eigenvalue problem. Majorization is a classical topic in theoretical linear algebra and is particularly important in matrix perturbation theory. Majorization (weak or strong) is a type of inequality (comparison) relation between two real vectors which is differently defined from the usual componentwise or normwise inequality. Majorization inequalities arise naturally, e.g., when describing the spectrum or singular values of sums or products of matrices. They are important because in many circumstances these inequalities describe much more precisely the eigenvalue or singular value relationships between matrices than what is achievable with componentwise or normwise inequalities. Majorization is a well developed field in theoretical linear algebra, see, e.g., [7, 41, 24]. The Rayleigh-Ritz method for the eigenvalue problem is a classical topic in computational linear algebra. Although it is customary to say the Rayleigh-Ritz method, it is not an actual algorithm in itself; it is more a technique for obtaining eigenvalues when some algorithm, such as some iteration, produces approximations to eigenvectors. The Rayleigh-Ritz approach generalizes the Rayleigh quotient. If A is an n n Hermitian matrix and y is a unit vector, presumably obtained from some algorithm, we can obtain an approximation to an eigenvalue of A based on this trial vector y by forming the Rayleigh quotient y H Ay. If y is close to an eigenvector, the approximation will be good, as we shall see, coinciding exactly with an eigenvalue of A if y is an eigenvector. The idea behind the Rayleigh-Ritz method is essentially the same, but now applied to subspaces. Suppose Y is a k-dimensional subspace of C n with a unitary basis Y, presumably obtained from some algorithm, then one can obtain an approximation for a k-block of eigenvalues of A by computing the eigenvalues of

14 4 Introduction Y H AY. There is a vast literature on Rayleigh-Ritz eigenvalue methods and error bounds; see, e.g., [52, Chapters 11 13], [58, Chapters 3 5], and [35, Chapter 4]. Despite the fact that majorization and the Rayleigh-Ritz method are two very important topics, the former in theoretical, and the latter in computational linear algebra, there are not many links between them. We are aware of very few instances where majorization has been used in the context of the Rayleigh-Ritz method. The first such instance that we are aware of is the celebrated work of Davis and Kahan [11] to bound eigenvalue errors a posteriori. Recently, Knyazev and Argentati [31, 32] have established what appear to be the first a priori eigenvalue error bounds using majorization in the context of the Rayleigh-Ritz method, stating, in [32], the conjecture which constitutes the first major part of our work. Our second major topic is a derivation of Hessenberg matrix properties and an application of these in the re-examination of the lengths of the Ritz vectors in the rounding error analysis of the Lanczos process. Hessenberg matrices are important in computational linear algebra because they are almost upper triangular and are obtained in intermediate steps in many algorithms for the eigenvalue problem such as QR, Lanczos, Arnoldi, etc., see [17, 7, 9]. Recently Zemke [62] has introduced polynomial vectors associated with a Hessenberg matrix which interpolate the eigenvectors and which appear to be very useful in the analysis of the properties of such matrices. In [63] Zemke uses these polynomial vectors in an analysis of perturbed Krylov subspace methods. Here we provide further use of these polynomial vectors in the context of a rounding error analysis for the Lanczos process for the eigenvalue problem, see [50]. We now briefly introduce the Hermitian Lanczos process. Let A be an n n Hermitian matrix. At step k, the Lanczos process for tridiagonalizing a Hermitian matrix, see, e.g., [17, Chapter 9], in theory produces V k C n k, T k R k k, v k+1 C n,

15 5 β k+1 R, such that AV k = V k T k + v k+1 β k+1 e T k, where [V k, v k+1 ] has orthonormal columns, β j > 0, j = 1,...,k, e T k is the k-th row vector of the identity matrix, and T k is real symmetric. The above process was introduced by Cornelius Lanczos in [36] for solving eigenvalue problems, and later in [37], for solving linear systems of equations. The idea is that the matrix T k above is the restriction of the operator A onto the subspace of C n spanned by the columns of V k. In the context of the eigenvalue problem for A, since T k has a very simple tridiagonal structure, one may solve for its eigenvalues very quickly and reliably, see, e.g., [52], [17, 8], and use the results to approximate the eigenvalues of A. If the range of V k is an approximate invariant subspace of A, the eigenvalues of T k will be good approximations to those of A. The Lanczos process applied to the eigenvalue problem is a particular instance of a Rayleigh-Ritz method since T k = V H k AV k. This algorithm is particularly useful for large sparse matrix computations since T k can be computed very efficiently from a three term recurrence using matrix-vector products, which are performed quickly due to the sparsity of A. In exact arithmetic V k has orthonormal columns; in finite precision arithmetic, however, the columns of V k can quickly lose orthogonality. For this reason, together with the advent of a backward stable tridiagonalization algorithm produced by Wallace Givens in 1954 [15], the Lanczos algorithm was dismissed soon after its appearance. It was brought back to life by the work of Paige in the seventies who performed a rounding error analysis of the real symmetric Lanczos process in [44], and later in [45, 46, 47], and showed that despite its departure from theory the algorithm is nevertheless extremely accurate, and very useful for finding eigenvalues and eigenvectors of large sparse symmetric matrices. Later, the understanding of the rounding

16 6 Introduction error behaviour of this Lanczos process was developed further in the works of Parlett, Greenbaum, Strakoš and others, see, for example, [52, 18, 19, 57, 26, 60, 61]. For an overview of this algorithm, its history, as well as for an extensive bibliography, we refer the reader to the book by Meurant [42]. Variants of the Lanczos process that will be dealt with here are also applicable to skew-hermitian and normal matrices with collinear eigenvalues, see [14, 39]. Ideas here might also be applicable to some variant of the Lanczos unsymmetric matrix tridiagonalization process in [36], see also for example [59, 35 40, pp ]. They could possibly be used to develop the work initiated by Bai [5] on the rounding error analysis of this unsymmetric process. Contributions of the Author In the research for this thesis, the author has worked very closely with Chris Paige, one of his supervisors; thus the work in this thesis is essentially joint work, since most of the results were developed after many discussions and exchanges of ideas. Nevertheless, here we attempt to describe, to the best of our knowledge, the original results in which the author s contributions were particularly significant. In Chapter 1 we examine a conjecture of Knyazev and Argentati [32] on the difference between Ritz values of a Hermitian matrix A for two subspaces, one of which is A-invariant. The author has made significant contributions towards the proofs of Theorem 1.14 and its Corollary 1.15, showing that a slightly weaker version of the conjecture holds in the general case. Moreover, the author has largely contributed to the examples in Section 1.5 demonstrating that on the one hand the conjectured bound is sharp, and on the other, that our approach cannot be improved to prove the full conjecture in all cases. Concerning the implications of the conjectured bound, now a theorem, the author has largely contributed in the proofs of Corollary 1.21, which provides normwise implications, and Corollary 1.23 which compares the conjectured

17 7 bound to a multidimensional version of the classical one. The work on the proof of the slightly weaker version of the conjecture is original and was published in [2]. Notice that this paper is joint work with two other coauthors: Argentati and Knyazev. Initially the author of this thesis and his supervisor had submitted independent work on the proof of the conjecture, and naturally enough, Knyazev and Argentati were among the referees. They communicated to the editor that they were also working on that topic, so we suggested to the editor and to them that we rewrite the paper with them. The outcome of our combined ideas and interaction is [2]. The later work on the consequences of the conjecture and its comparison to the classical bound is also original, and was published in [49]. In Chapters 2 and 3 we derive properties of Hessenberg matrices, and then use these to re-examine the lengths of the Ritz vectors in the finite precision Lanczos process for tridiagonalizing certain normal matrices. In this part the author has jointly contributed to some of the lemmas in Chapter 2, although it is difficult to assert their originality or importance. The results presented in these lemmas are mostly used as tools for later analysis. In Chapter 3, the author has somewhat contributed towards Theorem 3.3, which is original and which is the basic building block of our analysis; however the statement of this theorem in its current form is due to Chris Paige. The author realized that Lemma 3.5 was the key tool to use in obtaining the later bounds, and made significant contributions towards Theorems 3.6, 3.7, 3.8, and 3.13 as well as their corollaries 3.9 and 3.15, which provide new analysis and bounds for the lengths of the Ritz vectors in the Lanczos process for Hermitian matrices. In particular he realized the relationship with divided differences, see section 3.7, and their effectiveness in handling the case of a cluster of eigenvalues. With this insight he developed all the new theory for handling such clusters, leading to the entirety of section 3.8. The bounds derived are essentially the same as the already existing bounds for the real symmetric application of the Lanczos process, but there is some

18 8 Introduction hope that they might be improved in the future. However the analysis provided is more straightforward, hopefully simpler, and allows for a comprehensive treatment of the case of a cluster of Ritz values, which is part of the old analysis that has never been published in a scientific journal. Also our new approach is more general in that it applies directly to other versions of the Lanczos process, namely for Hermitian, skew- Hermitian, and normal matrices with collinear eigenvalues. The theorems and analysis in Chapters 2 and 3 (except where stated otherwise) are original work submitted for publication in [50].

19 9 Chapter 1 Majorization Bounds for Ritz Values of Hermitian Matrices 1.1 Ritz Values in Eigenvalue Approximation The Rayleigh-Ritz method is a classical technique in numerical linear algebra for approximating eigenvalues of Hermitian matrices. Although we use the word method in the description, it is not an actual algorithm in itself, but rather a technique for obtaining eigenvalues when some algorithm produces approximations to eigenvectors. Let A be an n n Hermitian matrix and suppose we are given a unit vector y, presumably obtained from some algorithm. We can approximate an eigenvalue of A based on this trial vector y from the Rayleigh quotient y H Ay. The Rayleigh-Ritz method is simply the Rayleigh quotient applied to subspaces. If Y is a k-dimensional subspace of C n with a unitary basis Y, presumably obtained from some algorithm, the Rayleigh-Ritz method approximates k eigenvalues of A by computing the eigenvalues of Y H AY. These are also called Ritz values of A corresponding to the trial subspace Y. If Y is A-invariant the Ritz values will be exact eigenvalues of A. In this chapter we examine a conjecture posed in [32] relating to the quality of eigenvalue approximation

20 10 Majorization Bounds for Ritz Values of Hermitian Matrices of the Rayleigh-Ritz method when the trial subspace is close to A-invariant. This conjecture generalizes a classical one-dimensional result on the Rayleigh quotient by Ruhe [53]. 1.2 Classical Bounds for Ritz Values Let x, y C n with x = y = 1, where denotes the usual vector (or induced matrix) two-norm, and let A = A H C n n. Write spr(a) λ max (A) λ min (A), θ(x, y) arccos x H y [0, π/2], (1.1) θ(x, y) being the acute angle between x and y. Here spr(a) denotes the spread of the eigenvalues of A, i.e. the length of the smallest interval containing the eigenvalues of A; these are all real because A is Hermitian. If we know x, y, and spr(a), we may bound the difference in the Rayleigh quotients using θ(x, y) as follows. Theorem 1.1 ([31, Theorem 1]). Let x, y C n, x = y = 1, A = A H C n n, and let θ(x, y) be as in (1.1), then x H Ax y H Ay spr(a) sinθ(x, y). (1.2) Proof. [31, Theorem 1] gave a proof in R n, saying (1.2) also holds in C n. We give a proof in C n. Clearly (1.2) holds if x H y = 0, so from now on assume x H y 0. The values of the left and right hand sides of (1.2) are unchanged when shifting A to A + γi, so we may assume without loss of generality that the spectrum of A is centered at zero, i.e. A = λ max (A) and spr(a) = 2 A. Also (1.1) and (1.2) are unaltered if x is multiplied by a scalar α with α = 1, so without loss of generality we can replace x by x(x H y)/ x H y, giving for the new x, y H x = x H y = cosθ(x, y) > 0.

21 1.2 Classical Bounds for Ritz Values 11 Now x H Ax y H Ay is real and x H Ay y H Ax is imaginary, so that x H Ax y H Ay 2 x H Ax + x H Ay y H Ax y H Ay 2 = (x y) H A(x + y) 2 A 2 x y 2 x + y 2 Taking square roots completes the proof. = spr(a) 2[2 (xh y + y H x)][2 + (x H y + y H x)] 4 = spr(a) 2 (1 cos 2 θ(x, y)) = spr(a) 2 sin 2 θ(x, y). The above result relates the difference of Rayleigh quotients to the angle between the (arbitrary) unit vectors x and y. An important special case, both theoretically and practically, occurs when one of the vectors, say x, is an eigenvector of A and y is an approximation to x, often obtained from a numerical method. In this case, x H Ax is an eigenvalue of A and y H Ay is an approximation to that eigenvalue. The classic result that motivates our research is the following: the Rayleigh quotient approximates an eigenvalue of a Hermitian matrix with accuracy proportional to the square of the eigenvector approximation error. The following result was proven by Ruhe [53]. Theorem 1.2 ([53, p. 146]). With the notation in Theorem 1.1, if Ax = xλ then λ y H Ay = x H Ax y H Ay spr(a) sin 2 θ(x, y). (1.3) Proof. We give the proof in [2, p. 551]. Here x H Ax = λ. Let y = u + v where u span{x} and v (span{x}). Then (A λi)u = 0 and v = sin θ(x, y), giving x H Ax y H Ay = λ y H Ay = y H (A λi)y = v H (A λi)v (1.4) A λi v 2 = A λi sin 2 θ(x, y) spr(a) sin 2 θ(x, y).

22 12 Majorization Bounds for Ritz Values of Hermitian Matrices Remark 1.1. The fact that λ x H Ax is an eigenvalue of A is used in the third equality in (1.4). If λ is not an eigenvalue of A the proof fails because other non-zero terms appear in the expansion of y H (A λi)y. Suppose that sin θ(x, y) is our measure of the error between the approximate eigenvector y and the true eigenvector x. Then from (1.3) the eigenvalue approximation error λ y H Ay is at worst proportional to the square of the eigenvector approximation error. Therefore (1.3) is a big improvement over the more general (1.2). It is important to realize that these bounds depend on the theoretical quantity θ(x, y) which will not usually be computed, and so these are a priori results. Such results help our understanding rather than produce computationally useful a posteriori results. As Wilkinson [59, p. 166] pointed out, a priori bounds are of great value in assessing the relative performance of algorithms. Thus while (1.3) is very interesting in its own right depending on sin 2 θ(x, y) rather than sin θ(x, y) it could also be useful for assessing the performance of algorithms that iterate vectors y approximating x, in order to also approximate x H Ax. Now suppose an algorithm produces a succession of k-dimensional subspaces Y (j) approximating an invariant subspace X of A. For example the block Lanczos algorithm of Golub and Underwood [16] is a Krylov subspace method which does this. In what ways can we generalize (1.3) to subspaces X and Y with dim X = dim Y = k > 1? In [32] Knyazev and Argentati proved the following theorem generalizing (1.2) to the multidimensional setting. Theorem 1.3 ([32, Theorem 4.2]). Let X, Y be subspaces of C n having the same dimension k, with orthonormal bases given by the columns of the matrices X and Y respectively, and let A C n n be a Hermitian matrix. Then λ(x H AX) λ(y H AY ) w spr(a) sin θ(x, Y). (1.5) Here w denotes the weak submajorization relation, θ(x, Y) denotes the vector

23 1.2 Classical Bounds for Ritz Values 13 of principal angles between subspaces X and Y, and λ(x H AX) and λ(y H AY ) denote the vectors of eigenvalues (taken in non-increasing order) of X H AX and Y H AY. These concepts will be explained in Section 1.3. Moreover, in the case where X is A-invariant, Knyazev and Argentati conjectured that it is natural to expect a much better bound that involves the square of sinθ(x, Y) further indicating that majorization results of this kind are apparently not known in the literature, see [32, p. 27]. In light of the classical result (1.3) we make the conjecture precise as follows: Conjecture 1.1 ([2, Conjecture 3.1]). Let X, Y be subspaces of C n having the same dimension k, with orthonormal bases given by the columns of the matrices X and Y respectively. Also, let A C n n be a Hermitian matrix, and X be A-invariant. Then λ(x H AX) λ(y H AY ) w spr(a) sin 2 θ(x, Y). (1.6) Relations (1.5) and (1.6) are the respective higher dimensional analogues of the Rayleigh quotient error bounds (1.2) and the classical (1.3), as we will shortly see. In Section 1.4 we provide the following partial answer to Conjecture 1.1: Theorem 1.4 ([2, Theorem 3.1, Corollary 3.3]). Let X, Y be subspaces of C n having the same dimension k, with orthonormal bases given by the columns of the matrices X and Y respectively. Let A C n n be a Hermitian matrix, and let X be A-invariant. Then λ(x H AX) λ(y H AY ) w spr(a) ( sin 2 θ(x, Y) + 1 ) 2 sin4 θ(x, Y). (1.7) Moreover, if the A-invariant subspace X corresponds to the set of k largest or smallest eigenvalues of A, or if all of the eigenvalues of A corresponding to X lie between (and possibly include) one extreme eigenvalue of A and the midpoint of A s spectrum, then λ(x H AX) λ(y H AY ) w spr(a) sin 2 θ(x, Y). (1.8)

24 14 Majorization Bounds for Ritz Values of Hermitian Matrices This is a slightly weaker result than Conjecture 1.1. In numerical analysis we are mainly interested in these results as the angles become small, and then there is minimal difference between the right hand sides of (1.7) and (1.6), so proving the full Conjecture 1.1 is largely of mathematical interest. Although all numerical tests we did leading to [2] suggested that (1.6) was true in all cases, we were unable to prove the full Conjecture. Recently Knyazev and Argentati succeeded in proving (1.6) in all cases, see [33]. Their proof uses many of the same techniques we develop here, although the exact application is a little more sophisticated. Their proof is not part of our research but we nevertheless present it here, in Section 1.6, for completeness. 1.3 Definitions and Prerequisites We introduce the definitions and tools we need, together with some mild motivation. We do not provide proofs for most of the results in this section instead we refer the reader to some of the relevant literature Notation For a real vector x = [x 1,..., x n ] T, we use x [x 1,...,x n] T to denote x with its elements rearranged in non-increasing order, while x [x 1,..., x n ]T denotes x with its elements rearranged in non-decreasing order. We use x to denote the vector of absolute values of the components of x. We use the symbol to compare real vectors componentwise. For real vectors x and y the expression x y means that x is majorized by y, while x w y means that x is weakly submajorized by y, and x w y means that x is weakly supermajorized by y, see Section We consider the Euclidean space C n of column vectors equipped with the standard scalar product x H y and the norm x x H x. We use the same notation A for

25 1.3 Definitions and Prerequisites 15 the induced matrix norm of a complex matrix A C n n. X = R(X) C n means the subspace X is equal to the range of the matrix X with n rows. The unit matrix is I, and the zero matrix (not necessarily square) is 0, while e [1,...,1] T. We use H(n) to denote the set of n n Hermitian matrices and U(n) to denote the set of n n unitary matrices in the set C n n of all n n complex matrices. For a vector x, diag(x) denotes the square matrix with x along its main diagonal and zeros elsewhere; similarly, for a square matrix B, diag of(b) denotes the matrix B with its off-diagonal elements set to zero, while offdiag(b) B diag of(b). We write λ(a) λ (A) for the vector of eigenvalues of A H(n) arranged in descending order, and we write σ(b) σ (B) for the vector of singular values of B arranged in descending order. Individual eigenvalues and singular values are denoted by λ i (A) and σ i (B), respectively, so, e.g., spr(a) = λ 1 (A) λ n (A) and σ 1 (B) = B. Let subspaces X and Y C n have the same dimension, with orthonormal bases given by the columns of the matrices X and Y, respectively. We denote the vector of principal angles between X and Y arranged in descending order by θ(x, Y) θ (X, Y) and define it by using cosθ(x, Y) = σ (X H Y ), see, e.g., [8], [17, ]. We clarify this concept in the next section Angles between Subspaces Let x, y C n, x = y = 1. In (1.1) we have defined the acute angle between x and y via the cosine function, using their inner product. The acute angle provides a measure for the relative positioning of two unit vectors independently of any underlying basis. This notion of relative positioning can also be extended to multidimensional subspaces. Let X, Y be k-dimensional subspaces of C n. One may define a vector θ(x, Y) of k angles completely describing the relative position between these two

26 16 Majorization Bounds for Ritz Values of Hermitian Matrices subspaces as follows, see [8] and [17, ]. Let cosθ k (X, Y) max x X, y Y xh y, x = y = 1. (1.9) This defines the smallest angle θ k between X and Y (giving the largest cosine). In particular if X and Y share a common non-zero vector then θ k (X, Y) = 0. Here we use max rather than sup because the unit ball is compact in these finite dimensional subspaces. So the above maximum is achieved for some x k X and y k Y. Now remove x k from X by considering the orthogonal complement of x k in X and do the same for y k in Y. Repeat the definition (1.9) for the k 1 dimensional subspaces {x X : x x k } and {y Y : y y k }, and then keep going in the same fashion until reaching empty spaces. After completion the above procedure defines recursively the k principal angles 0 θ k (X, Y)... θ 2 (X, Y) θ 1 (X, Y) π 2 between subspaces X and Y. The vectors {x 1, x 2,...,x k } and {y 1, y 2,...,y k } are called principal vectors between the two subspaces. In short we have, cosθ j (X, Y) max x X, y Y xh y, j = k, k 1,..., 1, where (1.10) x = y = 1, x H x i = 0, y H y i = 0, i = k, k 1,...,j + 1. The angles between subspaces are constructed from smallest to largest. Although the construction (1.10) appears slightly awkward, indexed backwards, it is convenient for us to order the angles from largest to smallest, i.e., θ(x, Y) θ (X, Y). In practice one is usually more interested in the larger than the smaller angles, since the larger angles are the ones which give a better idea of how far away the spaces are from each other. The largest angle θ 1 (X, Y) is usually called the gap between X and Y and is sometimes used as a measure of the distance between X and Y. Another

27 1.3 Definitions and Prerequisites 17 widely used measure of distance between X and Y is sin θ 1 (X, Y), the sine of the gap. Of course the full relative positioning between the spaces is described not only by the gap, but by the complete vector of angles between subspaces. One may define principal angles even for subspaces of unequal dimensions in the exact same fashion. In that case the number of angles corresponds to the dimension of the smaller of the two subspaces. Here we only deal with subspaces X and Y of equal dimension. In this case the principal vectors form orthonormal bases for X and Y. These vectors are by no means unique, in fact one has many choices of sets for principal vectors; in particular multiplying a principal vector by a unit-length scalar produces another principal vector, not affecting the other principal vectors in the set. We now show that the angles between subspaces correspond to singular values. Let X and Ỹ be any two orthonormal bases for the k dimensional subspaces X and Y respectively. We can take unitary matrices U and V so that U H XH Ỹ V = diag(σ k,...,σ 1 ), where the singular values are written backwards, i.e., from smallest to largest as we go down the main diagonal. Let X = XU, Y = Ỹ V, then the columns of X and Y are also orthonormal bases for X and Y. Moreover, the columns of X and Y satisfy all the conditions of (1.10), showing that they can be taken as principal vectors between the subspaces X and Y. This also shows that the cosines of the principal angles between X and Y are precisely the singular values of the matrix X H Ỹ. Note here that these singular values are always the same regardless of the initial choice of bases X and Ỹ, that is, the angles depend on the subspaces but not on the choice of bases. Generally we have cos θ(x, Y) = σ (X H Y ), (1.11) X, Y orthonormal bases for X, Y,

28 18 Majorization Bounds for Ritz Values of Hermitian Matrices where the columns of X and Y can be chosen so that C diag(cosθ(x, Y)) = X H Y. (1.12) Later we will often choose bases X, Y for X, Y as in (1.12) Majorization Majorization inequalities are comparison relations between real vectors. They appear naturally, e.g., when describing the spectrum or singular values of sums and products of matrices. Majorization is a well developed theoretical field, applied extensively in matrix analysis, see, e.g., [7, 24, 41]. Here we briefly introduce the subject and state a few important theorems that we will use later. With the notation in [7, pp ] we say that x R n is weakly submajorized by y R n, written x w y, if k x i i=1 k i=1 y i, k = 1, 2,...n, (1.13) while x is weakly supermajorized by y, written x w y, if k x i i=1 k i=1 y i, k = 1, 2,...n. (1.14) Finally x is (strongly) majorized by y, written x y, if (1.13) holds together with n x i = i=1 n y i. (1.15) i=1 From the above definitions, one can show directly that { } n n x y x w y, x i = y i, i=1 i=1 x y {x w y, x w y}.

29 1.3 Definitions and Prerequisites 19 Remark 1.2. We rarely use weak supermajorization from now on. We use the term majorization sometimes to generally describe all of the above majorization relations, but usually for, emphasizing with strong whenever needed, whereas we use the term weak majorization for w, often omitting the sub. We only use the precise weak supermajorization in the context of w. The linear inequalities in the various majorization relations define convex sets in R n. Geometrically x y if and only if the vector x is in the convex hull of all vectors obtained by permuting the coordinates of y, see, e.g., [7, Theorem II.1.10]. If x w y then x is in a certain convex set depending on y, but in this case the description is a little more complicated. In particular this convex set is not bounded. However if x, y 0 then the corresponding set becomes a bounded convex polygon. Majorization relations are important because they often provide an intermediate alternative to componentwise and normwise inequalities and often are more precise than the other two. In many contexts we would like to compare in some fashion vectors x, y R n, but a normwise inequality does not provide much information about how the components of x and y compare with each other, and is often weaker than necessary, whereas componentwise inequalities such as x y or x y may simply be false. In such cases, comparing x and y via majorization provides a viable alternative. In particular, when x and y are positive vectors, a case which often occurs in error estimation, comparing them via majorization is an intermediate alternative to componentwise and normwise comparison, see (1.22). Strong and weak w majorization relations only share some properties with the usual inequality relation, so one should deal with them carefully. For example, both and w are reflexive and transitive, but x y and y x only implies that x and y are equal up to permutation; it does not imply that x = y, e.g., x = (1, 0) T, y = (0, 1) T. Similarly x y does not imply the intuitive x+z y+z, as is seen in the example x = (0, 0, 0) T, y = (2, 1, 1) T, z = ( 2, 0, 0) T. So we must be particularly

30 20 Majorization Bounds for Ritz Values of Hermitian Matrices careful of the ordering when we combine results. It can be seen from (1.13) and (1.15) that x + u x + u, e.g., [7, Corollary II.4.3], and this is part of the very useful result {x w y} & {u w v} & x+u+ x +u + w y +v +, (1.16) where this also holds with w replaced by. Here are some other basic majorization and related results that we will use later: x y x y x w y; (1.17) A H(n) λ(±a) = σ(a); (1.18) x ± y w x + y, since from (1.16) x ± y x + y x + y ; (1.19) x y x w y ; see, e.g., [7, Example II.3.5]. (1.20) Many inequality relations between eigenvalues and singular values are succinctly expressed as majorization or weak majorization relations; we use the following theorems later on. The proofs of these theorems are for the most part non trivial so we do not present them here. Instead we refer the reader to the relevant literature. Theorem 1.5. (Lidskii [38], see also, e.g., [7, p. 69]). Let A, B H(n), then λ(a) λ(b) λ(a B). Theorem 1.6. ( Schur s Theorem, see, e.g., [7, p. 35]). Let A H(n), then diag of(a)e λ(a). Theorem 1.7. (see, e.g., [41, Chapter 9, G.1.d], [24, Corollary 3.4.3]). If A, B C n n, then σ(a ± B) w σ(a) + σ(b). Theorem 1.7 extends to the case of three or more matrices, since σ( ) = σ( ) gives via (1.16) and Theorem 1.7, σ(a±b±c) w σ(a±b)+σ(c) w σ(a)+σ(b)+σ(c).

31 1.3 Definitions and Prerequisites 21 Theorem 1.8. ( Weyl s Monotonicity Theorem, see, e.g., [7, Corollary III.2.3]). Let A, H H(n), where H has non-negative eigenvalues. Then λ(a) λ(a + H). (1.21) Theorem 1.9. (see, e.g., [24, Theorem ], [7, p. 75]). σ(ab) A σ(b) and σ(ab) B σ(a) for arbitrary matrices A and B such that AB exists. Theorem (see, e.g., [24, Theorem ]). σ(ab) w σ(a)σ(b) for arbitrary matrices A and B such that AB exists. Remark 1.3. Notice that in the previous theorem we have the product of two vectors σ(a)σ(b). From now on we adopt the convention that a product of vectors used in majorization is performed componentwise. Also, in Theorems 1.9 and 1.10 for rectangular matrices we may need to operate with nonnegative vectors of different lengths. A standard agreement in this case is to add zeros at the end of the shorter vector to match the sizes needed for componentwise arithmetic operations and comparisons. This agreement only makes sense because the components of σ( ) are positive so the extra zeros do not change the ordering. We use this agreement in later proofs. Finally we state two theorems which show that majorization inequalities imply a wide variety of other inequalities, and in particular a wide variety of normwise inequalities. Theorem (see, e.g., [41, Proposition 4.B.1]). Let x, y R k. The inequality k k g(x i ) g(y i ) i=1 i=1 holds for all continuous convex functions g : R R if and only if x y. Theorem (see, e.g., [41, Proposition 4.B.2]). Let x, y R k. The inequality k g(x i ) i=1 k g(y i ) i=1

32 22 Majorization Bounds for Ritz Values of Hermitian Matrices holds for all continuous increasing convex functions g : R R if and only if x w y. Similarly, it holds for all continuous decreasing convex functions g if and only if x w y. Theorem 1.12 has the following important implication. Corollary Let x, y R k have positive components. Then, using the standard definition of a p-norm, see, e.g. [7, p. 84], for any p [1, ], x y x w y x p y p. (1.22) Proof. The first implication is just the same as (1.17). To prove the second, let x and y be nonnegative real vectors with x w y, and for p [1, ) let g(t) = t p, which is continuous, increasing, and convex on [0, ). By Theorem 1.12 it follows that k x p p = x p i i=1 k y p i = y p p. i=1 Taking p th roots gives the second implication for p [1, ). The case of p = holds because for any vector u, u p u as p and non-strict inequalities are preserved at the limit. The implications in (1.22) hold only in the forward direction. For example the converse of the first implication is broken with x = (1, 1) T, y = (2, 0) T. For the converse of the second implication, with p [1, ), we can take x = (1, 0) T, y = ( (2 ) 1 p, ( ) 1 T 2 p) 3 3 showing that x p = 1 < ( ) 4 1/p 3 = y p but that x 1 = 1 > ( ) 1 2 p = y 3 1 so x w y. For p =, take x = (3, 2) T, y = (4, 0) T giving x = 3 < 4 = y but x 1 + x 2 = 5 > 4 = y 1 + y 2 so again x w y. Corollary 1.13 is particularly important where the vector x represents an approximation (positive) error and y is some positive estimate of x. It says that for such bounds, the relation x w y is an intermediate step between a componentwise and a normwise inequality. Numerical analysts like obtaining componentwise inequalities

33 1.3 Definitions and Prerequisites 23 because they give very precise information about the error vectors. On the other hand a componentwise inequality using a particular estimate y is not always possible to achieve, in fact the relations x y and x y are often false. In those cases, numerical analysts usually settle for a normwise relation like x p y p for some p, usually p = 1, 2 or, which is much weaker. With majorization, a numerical analyst has an alternative intermediate way of bounding errors, which is weaker than bounding errors componentwise, but stronger and more precise than bounding errors normwise. We now give an example demonstrating that the classical bound (1.3) cannot be extended to multidimensional subspaces via a standard inequality relation. Consider A = , X = , Y = , X = R(X), Y = R(Y ). Here A is Hermitian, X and Y form orthonormal bases for X and Y respectively, and X is A-invariant. From (1.11), and by calculating the eigenvalues of A, we have cos θ(x, Y) = σ (X H Y ) = 0 1, sin θ(x, Y) = sin 2 θ(x, Y) = 1 0, spr(a) = 1. Moreover, calculating the Ritz values and taking the absolute value of the difference gives λ(x H AX) λ(y H AX) = = spr(a) sin 2 θ(x, Y). 0 Hence the conjectured bound (1.6) simply does not hold if w is replaced by. This example shows even more: even if we replace spr(a) by any other positive

34 24 Majorization Bounds for Ritz Values of Hermitian Matrices constant, the inequality relation above would still break. This means that generalizing (1.3) to the multidimensional setting using sin 2 θ(x, Y) cannot be done via a standard inequality relation, so a generalization via weak majorization as in Conjecture 1.1 is a good alternative. In particular, as suggested by (1.22), generalizing (1.3) using weak majorization is stronger than a direct attempt to generalize the bound using an inequality with p-norms. 1.4 Majorization Bounds for Ritz Values We now have all of the tools needed to prove our main results Theorem 1.14, Theorem 1.17, and their corollaries which essentially establish (1.7), (1.8) as well as some other related statements. In section 1.5 we give an example demonstrating that the conjectured bound (1.6) is sharp, that is, equality can be reached. Our numerical tests suggested that (1.6) holds in all cases; however, in Section 1.5 we also show that the very first step in our proof of Theorem 1.14 does not allow us to prove the full statement (1.6), so a different approach is needed to show (1.6) in all cases. A full proof of (1.6) has recently been established by Knyazev and Argentati in [33]. In Section 1.6 we provide this proof with notation and slight modifications adapted to our current discussion. Theorem ([2, Theorem 3.2]). Let X, Y be subspaces of C n having the same dimension k, with orthonormal bases given by the columns of the matrices X and Y respectively. Let A C n n be a Hermitian matrix, and let X be A-invariant. Then [ λ(x H AX) λ(y H AY ) w spr(a) e cos θ(x, Y) + 1 ] 2 sin2 θ(x, Y). (1.23) Proof. If X and X are any two orthonormal bases of X then X = XU for some unitary matrix U so λ(x H AX) = λ( X H A X). In general, the Ritz values corresponding to a subspace are the same regardless of the choice of basis.

35 1.4 Majorization Bounds for Ritz Values 25 Choose X = [x 1, x 2,...,x k ] and Y = [y 1, y 2,...,y k ] as in (1.12) so that C X H Y is real, square, and diagonal, with the diagonal entries in increasing order. Therefore, C X H Y = diag (cosθ(x, Y)). (1.24) We arbitrarily complete X and Y to unitary matrices [X, X ] and [Y, Y ] U(n), respectively, and consider the 2 2 partition of their unitary product [X, X ] H [Y, Y ]. By construction of X and Y, its k k upper left block is C. We denote its (n k) k lower left block by S (X ) H Y. We obtain [X, X ] H [Y, Y ] = XH Y X HY X H Y XH Y = C XH Y S X H Y Since [X, X ] H [Y, Y ] is unitary, the entries C and S of its first block column satisfy. C 2 + S H S = I. (1.25) Hence S H S is diagonal, S H S = diag(σ 2 (S)), and λ(s H S) = λ(i C 2 ) = e cos 2 θ(x, Y) = sin 2 θ(x, Y), where e is the vector of ones, so the vectors of singular values σ(c) and σ(s) are closely connected, and we derive from this that sin θ(x, Y) T = [σ(s) T, 0,..., 0], (1.26) where max{2k n, 0} zeros are added on the right-hand side to match the number k of angles in the vector θ(x, Y) with the number min{k, n k} of singular values in the vector σ(s). Since X is A-invariant and [X, X ] is unitary then [X, X ] H A [X, X ] = A A 22, A = [X, X ] A A 22 [X, X ] H,

36 26 Majorization Bounds for Ritz Values of Hermitian Matrices where Now from X H AX A 11 H(k), and (X ) H AX A 22 H(n k). (1.27) Y H [X, X ] = [C H, S H ] = [C, S H ], it follows that Y H AY = Y H [X, X ] A A 22 The expression we want to bound now takes the form [X, X ] H Y = CA 11 C + S H A 22 S. (1.28) λ(x H AX) λ(y H AY ) = λ(a 11 ) λ(ca 11 C + S H A 22 S) = λ(a 11 ) λ(ca 11 C) + λ(ca 11 C) λ(ca 11 C + S H A 22 S) (1.29) λ(a 11 CA 11 C) + λ( S H A 22 S), (1.30) where the last line used Lidskii s Theorem 1.5 twice, then (1.16), remembering that λ( ) λ( ), see section Next (1.18), Theorems 1.10 and 1.9, and (1.26) give λ( S H A 22 S) = σ(s H A 22 S) w σ(s H )σ(a 22 S) A 22 sin 2 θ(x, Y). (1.31) This bounds the last term in (1.30). Remark 1.4. These results are also applicable to the proof of Theorem 1.17, so we will refer to the above material again. These two proofs differ in the way the λ(a 11 ) λ(ca 11 C) term is bounded in (1.29). Also, in our later proof of Theorem 1.20, we use the same material up to equation (1.28). We will not reconstruct the definitions of C, S, A 11, and A 22, but will refer to those given here. The second term in (1.30) is bounded by (1.31); we now bound the first term. To do so we use the identity A 11 CA 11 C = (I C)A 11 + CA 11 (I C), (1.32)

37 1.4 Majorization Bounds for Ritz Values 27 together with, see Theorem 1.9 and (1.24), σ((i C)A 11 ) A 11 σ(i C) = A 11 (e cosθ(x, Y)), (1.33) and, see also Theorem 1.10, σ(ca 11 (I C)) w σ(c)σ(a 11 (I C)) σ(a 11 (I C)) A 11 σ(i C) = A 11 (e cosθ(x, Y)). (1.34) Discarding the first C in σ(ca 11 (I C)) is no real loss; see Section 1.5. Using (1.32) and applying (1.18), Theorem 1.7, and (1.16) and (1.17) with (1.33) and (1.34), gives λ(a 11 CA 11 C) = σ((i C)A 11 + CA 11 (I C)) w σ((i C)A 11 ) + σ(ca 11 (I C)) w 2 A 11 (e cos θ(x, Y)). (1.35) This bounds the first term on the right of (1.30). We now combine our bounds. Apply (1.20) followed by (1.19) to (1.30), then use (1.35) and (1.31) with (1.16), together with A 11, A 22 A, to obtain: λ(x H AX) λ(y H AY ) w λ(a11 CA 11 C) + λ( S H A 22 S) w λ(a 11 CA 11 C) + λ( S H A 22 S) w 2 A 11 (e cos θ(x, Y)) + A 22 sin 2 θ(x, Y) A [ 2(e cosθ(x, Y)) + sin 2 θ(x, Y) ]. (1.36) Our final step is to replace A by an expression involving spr(a). Observe that the difference between Ritz values is invariant under any shift α R. So we shift A in a way to minimize A. This situation occurs when 0 is exactly in the middle of the spectrum, in which case A = spr(a)/2. Combining this observation with (1.36) (and remembering (1.17)) completes the proof of (1.23).

ETNA Kent State University

ETNA Kent State University Electronic Transactions on Numerical Analysis. Volume 1, pp. 1-11, 8. Copyright 8,. ISSN 168-961. MAJORIZATION BOUNDS FOR RITZ VALUES OF HERMITIAN MATRICES CHRISTOPHER C. PAIGE AND IVO PANAYOTOV Abstract.