Key words. multiplicative perturbation, relative perturbation theory, relative distance, eigenvalue, singular value, graded matrix

Similar documents
A Note on Eigenvalues of Perturbed Hermitian Matrices

the Unitary Polar Factor æ Ren-Cang Li P.O. Box 2008, Bldg 6012

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

ETNA Kent State University

Matrix Inequalities by Means of Block Matrices 1

deviation of D and D from similarity (Theorem 6.). The bound is tight when the perturbation is a similarity transformation D = D? or when ^ = 0. From

THE RELATION BETWEEN THE QR AND LR ALGORITHMS

Department of Mathematics Technical Report May 2000 ABSTRACT. for any matrix norm that is reduced by a pinching. In addition to known

arxiv: v1 [math.na] 1 Sep 2018

1 Quasi-definite matrix

Spectral inequalities and equalities involving products of matrices

Numerical Methods for Solving Large Scale Eigenvalue Problems

ON PERTURBATIONS OF MATRIX PENCILS WITH REAL SPECTRA. II

Interlacing Inequalities for Totally Nonnegative Matrices

A DIVIDE-AND-CONQUER METHOD FOR THE TAKAGI FACTORIZATION

Numerische Mathematik

Singular Value Inequalities for Real and Imaginary Parts of Matrices

Yimin Wei a,b,,1, Xiezhang Li c,2, Fanbin Bu d, Fuzhen Zhang e. Abstract

Matrix Perturbation Theory

the Unitary Polar Factor æ Ren-Cang Li Department of Mathematics University of California at Berkeley Berkeley, California 94720

ON THE HÖLDER CONTINUITY OF MATRIX FUNCTIONS FOR NORMAL MATRICES

Multiplicative Perturbation Analysis for QR Factorizations

For δa E, this motivates the definition of the Bauer-Skeel condition number ([2], [3], [14], [15])

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix

MINIMAL NORMAL AND COMMUTING COMPLETIONS

A Note on Inverse Iteration

14.2 QR Factorization with Column Pivoting

RELATIVE PERTURBATION THEORY FOR DIAGONALLY DOMINANT MATRICES

The Solvability Conditions for the Inverse Eigenvalue Problem of Hermitian and Generalized Skew-Hamiltonian Matrices and Its Approximation

Matrix Energy. 1 Graph Energy. Christi DiStefano Gary Davis CSUMS University of Massachusetts at Dartmouth. December 16,

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

AN INVERSE EIGENVALUE PROBLEM AND AN ASSOCIATED APPROXIMATION PROBLEM FOR GENERALIZED K-CENTROHERMITIAN MATRICES

Banach Journal of Mathematical Analysis ISSN: (electronic)

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

ELA THE OPTIMAL PERTURBATION BOUNDS FOR THE WEIGHTED MOORE-PENROSE INVERSE. 1. Introduction. Let C m n be the set of complex m n matrices and C m n

WHEN MODIFIED GRAM-SCHMIDT GENERATES A WELL-CONDITIONED SET OF VECTORS

S.F. Xu (Department of Mathematics, Peking University, Beijing)

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are

MULTIPLICATIVE PERTURBATION ANALYSIS FOR QR FACTORIZATIONS. Xiao-Wen Chang. Ren-Cang Li. (Communicated by Wenyu Sun)

PERTURBATION ANALYSIS OF THE EIGENVECTOR MATRIX AND SINGULAR VECTOR MATRICES. Xiao Shan Chen, Wen Li and Wei Wei Xu 1. INTRODUCTION A = UΛU H,

Some bounds for the spectral radius of the Hadamard product of matrices

Interpolating the arithmetic geometric mean inequality and its operator version

Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II)

Iterative Algorithm for Computing the Eigenvalues

The Lanczos and conjugate gradient algorithms

Some inequalities for sum and product of positive semide nite matrices

Two Results About The Matrix Exponential

ETNA Kent State University

Abstract. In this article, several matrix norm inequalities are proved by making use of the Hiroshima 2003 result on majorization relations.

CLASSIFICATION OF TREES EACH OF WHOSE ASSOCIATED ACYCLIC MATRICES WITH DISTINCT DIAGONAL ENTRIES HAS DISTINCT EIGENVALUES

A fast randomized algorithm for overdetermined linear least-squares regression

Is there a Small Skew Cayley Transform with Zero Diagonal?

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

2 and bound the error in the ith eigenvector in terms of the relative gap, min j6=i j i? jj j i j j 1=2 : In general, this theory usually restricts H

An angle metric through the notion of Grassmann representative

Numerical Methods - Numerical Linear Algebra

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Z-Pencils. November 20, Abstract

A Note on Simple Nonzero Finite Generalized Singular Values

Performance Evaluation of Some Inverse Iteration Algorithms on PowerXCell T M 8i Processor

Characterization of half-radial matrices

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Math 408 Advanced Linear Algebra

Stat 159/259: Linear Algebra Notes

Invertibility and stability. Irreducibly diagonally dominant. Invertibility and stability, stronger result. Reducible matrices

Clarkson Inequalities With Several Operators

Row and Column Distributions of Letter Matrices

Structured Condition Numbers of Symmetric Algebraic Riccati Equations

Chapter 3 Transformations

Eigenvalue and Eigenvector Problems

ON WEIGHTED PARTIAL ORDERINGS ON THE SET OF RECTANGULAR COMPLEX MATRICES

Central Groupoids, Central Digraphs, and Zero-One Matrices A Satisfying A 2 = J

First, we review some important facts on the location of eigenvalues of matrices.

Foundations of Matrix Analysis

ON THE QR ITERATIONS OF REAL MATRICES

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

GENERALIZED FINITE ALGORITHMS FOR CONSTRUCTING HERMITIAN MATRICES WITH PRESCRIBED DIAGONAL AND SPECTRUM

Index. for generalized eigenvalue problem, butterfly form, 211

A Residual Inverse Power Method

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Using Godunov s Two-Sided Sturm Sequences to Accurately Compute Singular Vectors of Bidiagonal Matrices.

c 2005 Society for Industrial and Applied Mathematics

Review of similarity transformation and Singular Value Decomposition

or H = UU = nx i=1 i u i u i ; where H is a non-singular Hermitian matrix of order n, = diag( i ) is a diagonal matrix whose diagonal elements are the

Accuracy and Stability in Numerical Linear Algebra

Orthogonal similarity of a real matrix and its transpose

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

ON WEIGHTED PARTIAL ORDERINGS ON THE SET OF RECTANGULAR COMPLEX MATRICES

A Divide-and-Conquer Method for the Takagi Factorization

Linear Algebra: Matrix Eigenvalue Problems

The following definition is fundamental.

UMIACS-TR July CS-TR 2721 Revised March Perturbation Theory for. Rectangular Matrix Pencils. G. W. Stewart.

Block Lanczos Tridiagonalization of Complex Symmetric Matrices

Total Least Squares Approach in Regression Methods

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

Affine iterations on nonnegative vectors

Analysis of Block LDL T Factorizations for Symmetric Indefinite Matrices

Linear Operators Preserving the Numerical Range (Radius) on Triangular Matrices

Exponentials of Symmetric Matrices through Tridiagonal Reductions

Transcription:

SIAM J MATRIX ANAL APPL c 1998 Society for Industrial and Applied Mathematics Vol 19, No 4, pp 956 98, October 1998 009 RELATIVE PERTURBATION THEORY: I EIGENVALUE AND SINGULAR VALUE VARIATIONS REN-CANG LI Abstract The classical perturbation theory for Hermitian matrix eigenvalue and singular value problems provides bounds on the absolute differences between approximate eigenvalues singular values and the true eigenvalues singular values of a matrix These bounds may be bad news for small eigenvalues singular values, which thereby suffer worse relative uncertainty than large ones However, there are situations where even small eigenvalues are determined to high relative accuracy by the data much more accurately than the classical perturbation theory would indicate In this paper, we study how eigenvalues of a Hermitian matrix A change when it is perturbed to à = D AD, where D is close to a unitary matrix, and how singular values of a nonsquare matrix B change when it is perturbed to B = D 1 BD, where D 1 and D are nearly unitary It is proved that under these kinds of perturbations small eigenvalues singular values suffer relative changes no worse than large eigenvalues singular values Many well-known perturbation theorems, including the Hoffman Wielandt and Weyl Lidskii theorems, are extended Key words multiplicative perturbation, relative perturbation theory, relative distance, eigenvalue, singular value, graded matrix AMS subject classifications 15A18, 15A4, 65F15, 65F35, 65G99 PII S08954798969849X 1 Introduction The classical perturbation theory for Hermitian matrix eigenvalue problems provides bounds on the absolute differences λ λ between approximate eigenvalues λ and the true eigenvalues λ of a Hermitian matrix A When λ is computed using standard numerical software, the bounds on λ λ are typically only moderately bigger than ɛ A [15, 33, 40], where ɛ is the rounding error threshold characteristic of the computer s arithmetic These bounds are bad news for small eigenvalues, which thereby suffer worse relative uncertainty than large ones Generally, the classical error bounds are best possible if perturbations are arbitrary However, there are situations where perturbations have special structures and, under these special perturbations, even small eigenvalues singular values are determined to high relative accuracy by the data much more accurately than the classical perturbation theory would indicate A relative perturbation theory is then called for to exploit the situations for better bounds on the relative differences between λ and λ Received by the editors February, 1996; accepted for publication in revised form by R Bhatia June 3, 1997; published electronically July 7, 1998 A preliminary version of this paper appeared as Technical Report UCB//CSD-94-855, Computer Science Division, Department of EECS, University of California at Berkeley, 1994, and also appeared as LAPACK working note 85 revised January 1996 available online at http://wwwnetliborg/lapack/lawns/lawns84ps This research was supported in part by Argonne National Laboratory under grant 05540, by the University of Tennessee through the Advanced Research Projects Agency under contract DAAL03-91-C-0047, by the National Science Foundation under grant ASC-9005933, by National Science Infrastructure grants CDA-87788 and CDA-9401156, and by a Householder Fellowship in Scientific Computing at Oak Ridge National Laboratory, supported by the Applied Mathematical Sciences Research Program, Office of Energy Research, United States Department of Energy contract DE-AC05-96OR464 with Lockheed Martin Energy Research Corp http://wwwsiamorg/journals/simax/19-4/9849html Mathematical Science Section, Oak Ridge National Laboratory, PO Box 008, Bldg 601, Oak Ridge, TN 37831-6367 Present address: Department of Mathematics, University of Kentucky, Lexington, KY 40506 rcli@csukyedu 956

RELATIVE PERTURBATION THEORY I 957 The development of such a theory goes back to Kahan [0] and is becoming a very active area of research [1, 6, 7, 8, 9, 11, 1, 14, 16, 10, 8, 34] In this paper, we develop a theory by a unifying treatment that sharpens some existing bounds and covers many previously studied cases We shall deal with perturbations that have multiplicative structures; namely, perturbations to unperturbed matrices are realized by multiplying the unperturbed ones with matrices that are nearly unitary To be exact, our theorems only require those multiplying matrices to be nonsingular, but our bounds are interesting only when they are close to some unitary matrices For Hermitian eigenvalue problems, we shall assume that A is perturbed to à = D AD, where D is nonsingular; and for singular value problems we shall consider that B is perturbed to B = D 1BD, where D 1 and D are nonsingular It is proved that these kinds of perturbations introduce no bigger uncertainty to small eigenvalues in magnitude and small singular values than they would to large ones Although special, these perturbations cover componentwise relative perturbations of entries of symmetric tridiagonal matrices with zero diagonal [8, 0] and componentwise relative perturbations of entries of bidiagonal and biacyclic matrices [1, 7, 8] More realistically, perturbations of graded nonnegative Hermitian matrices [9, 8] and perturbations of graded matrices of singular value problems [9, 8] can be transformed to take forms of multiplicative perturbations as will be seen from later proofs Additive perturbations are the most general in the sense that if A is perturbed to Ã, the only possible known information is on some norm of A def = à A Such perturbations, no matter how small, may not guarantee relative accuracy in eigenvalues singular values of the matrix under consideration For example, when A is singular, à can be made nonsingular no matter how small a norm of A is; thus some zero eigenvalues are perturbed to nonzero ones and therefore lose their relative accuracy completely Retaining any relative accuracy of a zero at all ends up not changing it The rest of this paper is organized as follows Section defines two kinds of relative distances ϱ p 1 p and χ, and Appendices A and B present proofs of some crucial properties of ϱ p and χ needed in this paper We devote two sections to present and discuss our main theorems section 3 for relative perturbation theorems for Hermitian matrix eigenvalue problems and section 4 for relative perturbation theorems for singular value problems Long proofs of our main theorems are postponed to sections 5 and 6 Section 7 briefly discusses how our relative perturbation theorems can be applied to generalized eigenvalue problems and generalized singular value problems Notation We shall adopt the following convention: capital letters denote unperturbed matrices and capital letters with tildes denote their perturbed matrices For example, X is perturbed to X Throughout the paper, capital letters are for matrices, lowercase Latin letters for column vectors or scalars, and lowercase Greek letters for scalars Also, C m n : the set of m n complex matrices, and C m = C m 1 ; R m n : the set of m n real matrices, and R m = R m 1 ; U n : the set of n n unitary matrices; 0 m,n : the m n zero matrix we may simply write 0 instead; I n : the n n identity matrix we may simply write I instead;

958 REN-CANG LI X : the conjugate transpose of a matrix X; λx: the set of the eigenvalues of X, counted according to their algebraic multiplicities; σx: the set of the singular values of X, counted according to their algebraic multiplicities; σ min X: the smallest singular value of X C m n ; σ max X: the largest singular value of X C m n ; X : the spectral norm of X, ie, σ max X; X F : the Frobenius norm of X, ie, i, j x ij, where X =x ij Relative distances Classically, the relative error in α = α1 + δ asan approximation to α is measured by 1 δ = relative error in α = α α α When δ ɛ, we say that the relative perturbation to α is at most ɛ see, eg, [8] Such a measurement lacks mathematical properties upon which a nice relative perturbation theory can be built; for example, it lacks symmetry between α and α and thus it cannot be a metric Nonetheless, it is good enough and is convenient to use for measuring correct digits in numerical approximations Our new relative distances have better mathematical properties, such as symmetry in the arguments Topologically they are all equivalent to the classical δ- measurement defined by 1 The p-relative distance between α, α C is defined as ϱ p α, α def = α α p α p + α p for 1 p We define, for convenience, 0/0 def =0ϱ has been used by Deift et al [6] to define relative gaps Another relative distance that is of interest to us is 3 χα, α def = α α α α This χ-distance has been used by Barlow and Demmel [1] and Demmel and Veselić [9] to define relative gaps between the spectra of two matrices Appendix B will show that ϱ p 1 p is indeed a metric on R; see also Li [4] We suspect that ϱ p is a metric on C also, but we cannot give a proof at this point Unfortunately χ violates the triangle inequality and thus cannot be a metric In fact, one can prove that χα, γ >χα, β+χβ,γ for α<β<γ; see Lemma 61 We refer the reader to Li [4] for a detailed study of the two relative distances Here, only properties that are most relevant to our relative perturbation theory will be presented, and those proofs that require little work and seem to be straightforward are omitted Complicated proofs will be given in Appendix A Proposition 1 see [4] Let α, α R 1 For 0 ɛ<1, α α 1 ɛ ϱ ɛ pα, α, p 1+1 ɛ p 4 α α 1 ɛ 5 ɛ χα, α 1 ɛ

RELATIVE PERTURBATION THEORY I 959 For 0 ɛ<1, { α ϱ p α, α ɛ max α 1 6, α For 0 ɛ<, 7 { α χα, α ɛ max α 1 3 Asymptotically,, α } α 1 } α 1 1/p ɛ 1 ɛ ɛ + 1+ ɛ ɛ 4 lim α α ϱ p α, α α = 1/p α 1 and χα, α lim α α α =1 α 1 Thus 4, 6, 5, and 7 are at least asymptotically sharp The following proposition establishes a relation between ϱ p and χ Proposition see [4] For α, α C, ϱ p α, α 1/p χα, α, and the equality holds if and only if α = α Next we ask what are the best one-one pairings between two sets of n real numbers? Such a question will become important later in this paper when we try to pair the eigenvalues or the singular values of one matrix to those of another Proposition 3 see [4] Let {α 1,α,,α n } and { α 1, α,, α n } be two sets of n real numbers ordered in descending order, ie, 8 α 1 α α n, α 1 α α n We have for p =1, max ϱ 1α i, α i = min max ϱ 1 i n τ 1α i, α τi 1 i n For p>1, if in addition all α i s and α j s are nonnegative, 9 max ϱ pα i, α i = min max ϱ 1 i n τ pα i, α τi 1 i n Both minimizations are taken over all permutations τ of {1,,, n} Proofs of this proposition and Proposition 4 below are given in Appendix A Remark 1 Equation 9 of Proposition 3 may fail if not all the α i s and α j s are of the same sign A counterexample is as follows: n = and Then for p>1, α 1 =1>α = and α 1 =4> α = max {ϱ p α 1, α 1,ϱ p α, α } = ϱ p α, α = 1 1/p 6 > p p +4 = ϱ pα p, α 1 = max {ϱ p α 1, α,ϱ p α, α 1 }

960 REN-CANG LI Remark Given two sets of α i s and α j s ordered as in 8, generally, 10 n [ϱ p α i, α i ] min τ n [ ϱp α i, α τi ], even if all α i, α j > 0 Here is a counterexample: n =, α 1 >α 1 = α 1 / > α >α > 0, where α is sufficiently close to 0, and α is sufficiently close to α 1 which is fixed Since, as α 0 + and α α 1, [ϱ p α 1, α ] +[ϱ p α, α 1 ] 1, [ϱ p α 1, α 1 ] +[ϱ p α, α ] 1 p p +1 +1, 10 must fail for some α 1 >α 1 = α 1 / > α >α > 0 Proposition 4 see [4] Let {α 1,,α n } and { α 1,, α n } be two sets of n positive numbers ordered as in 8 Then 11 1 max χα i, α i = min max χα 1 i n τ i, α τi, 1 i n n n [χα i, α i ] [ = min χαi, α τi ], τ where the minimization is taken over all permutations τ of {1,,,n} Remark 3 Both 11 and 1 of Proposition 4 may fail if the α i s and α j s are not all of the same sign A counterexample for 11 is that n = and α 1 =1>α = 1 and α 1 => α = 1 4, for which { max {χα 1, α 1,χα, α } = max 1/ }, 5/ =5/ > 3/ { = max 3/, 3/ } = max {χα 1, α,χα, α 1 } A counterexample for 1 is that n = and α 1 =1>α = and α 1 => α =1, for which [χα 1, α 1 ] +[χα, α ] = 1/ + 3/ =5 > 4=0 + 4/ 4 =[χα1, α ] +[χα, α 1 ]

RELATIVE PERTURBATION THEORY I 961 3 Relative perturbation theorems for Hermitian matrix eigenvalue problems Throughout the section, A, à C n n are Hermitian and one is a perturbation of the other Denote their eigenvalues by 31 λa ={λ 1,,λ n } and λã ={ λ 1,, λ n } ordered so that 3 λ 1 λ λ n, λ1 λ λ n Theorem 31 Let A and à = D AD be two n n Hermitian matrices with eigenvalues 31 ordered as in 3, where D is nonsingular Then 1 there is a permutation τ of {1,,,n} such that n [ ϱ λ i, λ ] 33 τi I Σ d F + I Σ 1 d F, where Σ d is diagonal and its diagonal entries are D s singular values if, in addition, A is nonnegative definite, 1 then 34 35 n max χλ i, λ i D D 1, 1 i n [ χλ i, λ i ] D D 1 F A proof of Theorem 31 will be given in section 5 A corollary of 33 is 33a n [ ϱ λ i, λ ] τi I D F + I D 1 F by a well-known absolute perturbation theorem for singular values; see 47 On the other hand, 33a leads to 33 as well by considering Ud AU d and Vd ÃV d = Σ d Ud AU dσ d instead, where 36 D = U d Σ d Vd is D s singular value decomposition SVD [15, p 71] It is also possible to relate the right-hand sides of 34 and 35 to the singular values of D, since for every unitarily invariant norm, D D 1 = V d Σ d Σ 1 d U d = Σ d Σ 1 d 1 Then à must be nonnegative definite as well In this we follow Mirsky [30], Stewart and Sun [35], and Bhatia [3] That a norm is unitarily invariant on C m n means that it also satisfies, besides the usual properties of any norm, 1 UY V = Y, for any U U m, and V U n; Y = Y, for any Y C m n with ranky =1 Two unitarily invariant norms most frequently used are the spectral norm and the Frobenius norm F Let be a unitarily invariant norm on some matrix space The following inequalities [35, p 80] will be employed later in this paper: WY W Y and YZ Y Z

96 REN-CANG LI The earliest relative perturbation result for eigenvalue problems goes back to a theorem due to Ostrowski [3] see also [18, pp 4 5], though he did not interpret his theorem in the way we do now Ostrowski proved that for two n n Hermitian matrices A and à = D AD with eigenvalues 31 ordered as in 3, where D is nonsingular, we have 37 σ min D λ i λ i σ max D λ i for i n Inequalities 37 immediately imply a relative perturbation bound λ i λ i max I D D 1 i n λ i This result of Ostrowski s is independent of 34 Both may be attainable for the scalar case n = 1 or for the case when A and D are diagonal Our bounds 33 and 35 are the first of their kind Roughly speaking, the classical perturbation theory for Hermitian matrix eigenvalue problems establishes one uniform bound for all differences λ i λ i regardless of magnitudes of λ i s In this regard, we have the following Let both A and à be Hermitian No special form of à is assumed Then for any unitarily invariant norm, 38 diagλ 1 λ 1,,λ n λ n A à There is a long history associated with this inequality; see Bhatia [3] for details Theorem 31 extends 38 to the relative perturbation theory for = and F Two main differences between Theorem 31 and 38 are as follows 1 Inequality 38 bounds the absolute differences λ i λ i It is in fact the best possible as far as arbitrary perturbations are concerned However, it may overestimate the differences λ j λ j too much for eigenvalues λ j of much smaller magnitudes than A when perturbations have special structures such as multiplicative perturbations, for which it is possible that A à is larger than λ j λ j by many orders of magnitudes while, on the other hand, D D I Theorem 31 exploits fully multiplicative perturbation structures by bounding directly the relative differences χλ i, λ i orϱ λ i, λ i in terms of D s departures from unitary matrices D D 1 and I Σ d F + I Σ 1 d F Thus, all eigenvalues of the same as or much smaller magnitudes than A alike provably suffer small uncertainty as long as D s departures from unitary matrices are small Such arguments more or less apply to our other relative perturbation theorems in this paper in comparison to their counterparts in the classical absolute perturbation theory In Theorem 31, the perturbation to A is rather restrictive but is applicable to a more realistic situation when scaled A is much better conditioned In Theorem 3, S is a scaling matrix, often highly graded and diagonal in practice, though the theorem does not assume this Theorem 3 Let A = S HS and à = S HS be two n n nonnegative definite Hermitian matrices with eigenvalues 31 ordered as in 3, and let H = H H

If H 1 H < 1, then RELATIVE PERTURBATION THEORY I 963 39 310 311 31 max χλ i, λ i D D 1, 1 i n H 1 H 1 H 1 H, n [ χλ i, λ i ] D D 1 F, H 1 H F 1 H 1 H, where D = I + H 1/ HH 1/ 1/ Proof Rewrite A and à as A = S HS =H 1/ S H 1/ S, à = S H 1/ I + H 1/ HH 1/ H 1/ S = I + H 1/ HH 1/ 1/ H 1/ S I + H 1/ HH 1/ 1/ H 1/ S Set B def = H 1/ S and B def = I + H 1/ HH 1/ 1/ H 1/ S, then A = B B and à = B B We have B = DB, where D = I + H 1/ HH 1/ 1/ Notice that λa =λb B=λBB and λã =λ B B =λ B B, and B B = DBB D Applying Theorem 31 to BB and B B yields both 39 and 311 Inequalities 310 and 31 follow from the fact that for any Hermitian matrix E with E < 1 and for any unitarily invariant norm, I + E 1/ I + E 1/ I + E 1/ E E 1 E Inequality 310 can also be derived from the following bound essentially due to Demmel and Veselić [9] see also Mathias [8] Let the conditions of Theorem 3 hold Then 313 λ i λ i max H 1 H 1 i n λ i To see how 313 leads to 310, we notice that 3 χλ i, λ i = λ i λ i λ i λ i λ i λ i D 1 λ i λ i by Ostrowski s theorem 37 and that D 1 1/ 1 H 1 H Remark 31 Li [4] also considered extending Theorem 31 to diagonalizable matrices under multiplicative perturbations But the bounds obtained in a recent paper [6] are better Both Li [4] and Eisenstat and Ipsen [13] extended the classical Bauer Fike theorem [] 3 λ i = 0 if and only if λi = 0, since A and à have the same number of zero eigenvalues, if any So we only need to consider those i such that λ i 0

964 REN-CANG LI 4 Relative perturbation theorems for singular value problems Throughout the section, B, B C m n and one is a perturbation of the other We shall assume, without loss of generality, that m n in this section Denote their singular values by 41 σb ={σ 1,,σ n } and σ B ={ σ 1,, σ n } ordered so that 4 σ 1 σ σ n 0, σ 1 σ σ n 0 Theorem 41 Let B and B = D1BD be two m n matrices with singular values 41 ordered as in 4, where D 1 and D are square and nonsingular If D1 D1 1 D D 1 < 3, then 43 44 max χσ i, σ i 1 1 i n n [χσ i, σ i ] 1 D 1 D 1 1 + D D 1 1 1 3 D 1 D 1 1 D D 1 D 1 D 1 1 F + D D 1 F 1 1 3 D 1 D 1 1 D D 1 A proof of Theorem 41 will be given in section 6 The restriction D1 D1 1 D D 1 < 3, though mild, is unpleasant But we argue that neither this restriction nor the factor 1 1 3 D 1 D1 1 D D 1 plays any visible role for any applications where one might expect that perturbing B to B = D1BD retains any significant digits of B s singular values Our arguments go as follows 1 For the ease of explanation, consider the case when B and D j are diagonal In order for each of B s singular values to have at least one significant decimal digit the same as that of the corresponding B s, it is necessary that 4, 1 45 09 σ min D j σ max D j 105 which imply that Dj D 1 j 0, and thus the factor 1 1 1 3 D 1 D1 1 D D 1 101 In fact, the restriction D1 D1 1 D D 1 < 3 is satisfied and the factor is almost 1 even for D j s singular values being fairly away from 1 It can be seen that Dj D 1 j 1if0618 5 1 5+1 σ min D j σ max D j 1618, under which circumstances the unpleasant factor is 1 1 1 3 D 1 D1 1 D D 1 3/31 103 4 This is for the worse case in the sense that if 45 is violated, then there are D j s such that some of the B s singular values retain no significant decimal digits at all under the perturbations

3 In applications where D j D 1 j RELATIVE PERTURBATION THEORY I 965 1, the quantity D 1 D 1 1 D D 1 is of second order Then the restriction and the factor act as if they were not there Even more in some applications, as in Corollary 4, one of the D j s is I for which the restriction and the factor disappear completely Eisenstat and Ipsen [1] obtained the following result which is essentially a consequence of Ostrowski s theorem see inequalities 37 and which can also be seen from known inequalities for singular values of a product of two matrices: 5 Let the conditions of Theorem 41, except D 1 D 1 1 D D 1 < 3, hold We have 46 σ min D 1 σ min D σ i σ i σ max D 1 σ max D σ i for 1 i n Inequalities 46 imply immediately the following relative perturbation bound: σ i σ i max max{ 1 σ min D 1 σ min D, 1 σ max D 1 σ max D } 1 i n σ i The classical perturbation theory for singular value problems establishes one uniform bound for all differences σ i σ i, regardless of magnitudes of σ i s The following theorem was established by Mirsky [30], based on results from Lidskii [7] and Wielandt [39] For any unitarily invariant norm, we have 47 diagσ 1 σ 1,,σ n σ n B B No special form of B is assumed A possible application of Theorem 41 is related to deflation in computing SVD of a bidiagonal matrix For more details, the reader is referred to [6, 8, 1, 9] Corollary 4 Assume, in Theorem 41, that one of D 1 and D is the identity matrix and the other takes the form I X D =, I where X is a matrix of suitable dimensions Then 48 49 Proof Notice that D D 1 = max χσ i, σ i 1 1 i n X, n [χσ i, σ i ] 1 X F I X I I X I = X X, 5 Arranging the singular values of a matrix in the decreasing order, we have see, eg, [19] the ith singular value of XY the ith singular value of X Y

966 REN-CANG LI and thus D D 1 = X and D D 1 F = X F 410 Eisenstat and Ipsen [1] showed that σ i σ i X σ i, or equivalently σ i 1 σ i X Our inequality 48 is sharper by roughly a factor of 1/, as long as X is small As a matter of fact, it follows from 48 and Proposition 1 that if X < 4, then σ i 1 σ i X + 1+ X X = X X + O 4 16 4 Our inequality 49 is the first of its kind Theorem 43 Let B and B = D1BD be two m n matrices with singular values 41 ordered as in 4, where D 1 and D are square and nonsingular Then max ϱ pσ i, σ i 1 411 D 1 i n 1+1/p 1 D1 1 + D D 1, n [ϱ p σ i, σ i ] 1 41 D 1+1/p 1 D1 1 F + D D 1 F A straightforward combination of Proposition and Theorem 41 will lead to bounds that are slightly weaker than those in Theorem 43 by a factor of 1 1 1 3 D 1 D1 1 D D 1 A proof of Theorem 43 will be given in section 6 Again we shall now consider a more realistic situation when scaled B is much better conditioned In Theorem 44 below, S is a scaling matrix, often highly graded and diagonal in practice, though the theorem does not assume this Theorem 44 Let B = GS and B = GS be two n n matrices with singular values 41 ordered as in 4, where G and G are nonsingular, and let G = G G If G G 1 < 1, then max χσ i, σ i 1 I + GG 1 I + GG 1 1 413, 1 i n 1 G 1 G 414 1+ 1 G 1, G n [χσ i, σ i ] 1 I + GG 1 I + GG 1 1 415 F, 416 Proof Write 1+ 1 G 1 G F 1 G 1 G 417 B =G + GS =I + GG 1 GS = DB,

RELATIVE PERTURBATION THEORY I 967 where D = I + GG 1 Now, applying Theorem 41 to B and B = DB yields both 413 and 415 We notice that I + E I + E 1 = I + E 1 i E i = E + E + E 1 i E i 1, i=0 where E = GG 1 and E G 1 G < 1; therefore, for any unitarily invariant norm, i= 418 419 I + E I + E 1 E + E + E E i E + E = + E E E 1 E 1 1+ E 1 E An application of 419 for and F completes the proof Equation 417 also makes 46 applicable and leads to the following 40 Let the conditions of Theorem 44 hold We have σ i σ i max G 1 G 1 i n σ i This inequality also follows from [10, Theorem 11] Inequality 414 can actually be derived from 40 as follows Notice that χσ i, σ i = σ i σ i σi σ i σ i D 1 1/, σ i σ i σ i and that D 1 1/ 1 1 1+ 1 G 1 G 1 1 G 1 G Remark 41 When GG 1 is nearly skew Hermitian, 413 and 415 lead to bounds that are much better than 414 and 416 This can be seen from 418: Under the conditions of Theorem 44, we have GG 1 max χσ + G G i, σ i 1 i n GG 1 + GG 1 GG 1 1 GG 1, n [χσ i, σ i ] GG 1 + G G F GG 1 + GG 1 GG 1 F F 1 GG 1 Nowif GG 1 is nearly skew Hermitian, then χσ i, σ i =o GG 1 ; moreover, GG 1 + G G = O GG 1 χσi, σ i =O GG 1

968 REN-CANG LI Remark 4 Theorem 44 can be extended to nonsquare matrices Assume B = GS and B = GS are m n m n; S is a scaling matrix and both G and G are m n; G has full column rank Let G =G G 1 G be the pseudo-inverse of G Notice that G G = I We have B = GS =G + GS =I + GG GS =I + GG B DB Now, apply Theorem 41 to B and B = DB 5 Proof of Theorem 31 We need a little preparation first A matrix Z = z ij R n n is doubly stochastic if all z ij 0 and n n z ik = z kj = 1 k=1 k=1 for i, j =1,,,n Using a Birkhoff theorem [4] see also [18, pp 57 58] and the technique of Hoffman and Wielandt [17] see also [35, p 190], we can prove the following Lemma 51 Let Z =z ij be an n n doubly stochastic matrix, and let M = m ij C n n Then there exists a permutation τ of {1,,,n} such that n i, j=1 m ij z ij n m iτi For X C m n, we introduce the following notation for a k l submatrix of X =x ij : 51 x i1j 1 x i1j x i1j l i1 i X k def x ij 1 x ij x ij l = j 1 j l, x ik j 1 x ik j x ik j l where 1 i 1 < <i k n and 1 j 1 < <j l n The following lemma is due to Li [, pp 07 08] Lemma 5 see Li [] Suppose that X C n n is nonsingular, and 1 i 1 < <i k n and 1 j 1 < <j l n, and k + l>n Then X i1 i k X 1 1 j 1 j l Moreover, if X is unitary, then X i1 i k =1 j 1 j l Proof of Theorem 31 We shall prove 33 first Due to the argument we made right after Theorem 31, it suffices for us to prove 33a Let the eigen decompositions of A and à be A = UΛU and à = Ũ ΛŨ,

RELATIVE PERTURBATION THEORY I 969 where U and Ũ are unitary and Λ = diagλ 1,λ,,λ n and Λ = diag λ 1, λ,, λ n Notice that A à = A D AD = A AD + AD D AD = AI D+D Ià Pre- and postmultiply the equations by U and Ũ, respectively, to get 5 ΛU Ũ U Ũ Λ =ΛU I DŨ + U D IŨ Λ Set Q def = U Ũ =q ij, E def = U I DŨ =e ij, Ẽ def = U D IŨ =ẽ ij Then 5 reads ΛQ Q Λ =ΛE +Ẽ Λ, or componentwise λ i q ij q ij λj = λ i e ij +ẽ ij λj, so λ i λ j q ij = λ i e ij + ẽ ij λj λ i + λ j e ij + ẽ ij, which yields 6 [ϱ λ i, λ j ] q ij e ij + ẽ ij Hence n i, j=1 [ ϱ λ i, λ j ] qij U I DŨ F + U D IŨ F = I D F + D I F The matrix q ij n n is a doubly stochastic matrix The above inequality and Lemma 51 imply that n [ ϱ λ i, λ τi ] I D F + D I F for some permutation τ of {1,,,n} This is 33a We now prove 34 and 35 Suppose that A is nonnegative definite There is a matrix B C n n such that A = B B With this B, à = D AD = D B BD = B B, where B = BD Let SVDs of B and B be B = UΛ 1/ V and B = Ũ Λ 1/ Ṽ, where Λ 1/ = diag λ 1, λ,, λ n and Λ 1/ = diag λ 1, λ,, λ n In what follows, we actually work with BB and B B, rather than A = B B and à = B B themselves We have B B BB = BD B BD 1 B = BD D 1 B Pre- and postmultiply the above equations by Ũ and U, respectively, to get 53 ΛŨ U Ũ UΛ= Λ 1/ Ṽ D D 1 V Λ 1/ Write Q def = Ũ U =q ij Equation 53 implies D D 1 F = Ṽ D D 1 V F = n i, j=1 λ i λ j λi λ j q ij 6 This inequality still holds even if λ i = λj = 0 because of our convention 0/0 = 0; see section

970 REN-CANG LI Since q ij n n is a doubly stochastic matrix, an application of Lemma 51 and Proposition 4 concludes the proof of 35 To confirm 34, let k be the index such that η def = max 1 i n χλ i, λ i =χλ k, λ k If η = 0, no proof is necessary Assume η>0 Also assume, without loss of generality, that Partition U, V, Ũ, Ṽ as follows: λ k > λ k 0 U = k n k U 1 U,V = k n k V 1 V, Ũ = k 1 n k+1 Ũ 1 Ũ, Ṽ = k 1 n k+1 Ṽ 1 Ṽ, and write Λ = diagλ 1, Λ and Λ = diag Λ 1, Λ, where Λ 1 R k k and Λ 1 R k 1 k 1 It follows from 53 that Λ Ũ U 1 Ũ U 1 Λ 1 = Postmultiply this equation by Λ 1 1 to get 1/ Λ Ṽ D D 1 V 1 Λ 1/ 1 54 Λ Ũ U 1 Λ 1 1 Ũ U 1 = 1/ Λ Ṽ D D 1 V 1 Λ 1/ 1 Lemma 5 implies that Ũ U 1 = 1 since Ũ U 1 is an n k +1 k submatrix of unitary Ũ U and k +n k +1=n +1>n Bearing in mind that Λ = λ k = Λ 1/ and Λ 1 1 =1/λ k = Λ 1/ 1,wehave 1 λ k = Ũ U 1 Λ Ũ U 1 Λ 1 1 λ k Ũ U 1 Λ Ũ U 1 Λ 1 1 Ũ U 1 Λ Ũ U 1 Λ 1 1 = Λ 1/ Ṽ D D 1 V 1 Λ 1/ by 54 Λ 1/ Ṽ D D 1 V 1 Λ 1/ λ k = Ṽ D D 1 V 1 λ k λ k D D 1, λ k an immediate consequence of which is 34 1 1 6 Proofs of Theorems 41 and 43 We need the following lemma regarding the relative distance χ Lemma 61 1 If 0 α β β α, then χα, α χβ, β

RELATIVE PERTURBATION THEORY I 971 If α, α 0, then χα, α χα, α 3 For α, β, γ 0, we have 61 χα, γ χα, β+χβ,γ+ 1 χα, βχβ,γχα, γ 8 Thus if χα, βχβ,γ < 8 also, then χα, γ χα, β+χβ,γ 1 1 8χα, βχβ,γ Proof To prove the first inequality, we notice that function 1 x x is monotonically decreasing for 0 x 1, and that 0 α/ α β/ β 1 Thus χα, α = 1 α/ α 1 α/ α β/ β as was to be shown If α, α 0, then β/ β = χβ, β, χα, α α + α =χα, α = χα, α α + α χα, α α α =χα, α, α α α α α α which confirms the second inequality For the third inequality 61, without loss of generality, we may assume 0 α γ Nowifβ α or γ β, we have by the first inequality { χβ,γ χα, β+χβ,γ, if β α, χα, γ χα, β χα, β+χβ,γ, if γ β, so 61 holds Consider the case 0 α β γ It can be verified that χα, γ =χα, β+χβ,γ+χ α, βχ β, γχ α, γ Inequality 61 follows by applying the second inequality Proofs of Theorems 41 and 43 Set B = BD and denote its singular values by σ 1 σ σ n Apply Theorem 31 to B B and B B = D B BD to get max 1 i n χσ i, σ i D D 1 and n [χσi, σ i ] D D 1 F Now apply the second inequality of Lemma 61 to obtain max χσ i, σ i 1 1 i n D D 1 and n 6 [χσ i, σ i ] 1 D D 1 F Similarly for B = BD and B = D1BD = D1 B, wehave max χ σ i, σ i 1 1 i n D 1 D1 1 and n 63 [χ σ i, σ i ] 1 D 1 D1 1 F

97 REN-CANG LI The first inequalities in 6 and 63, and the assumptions of Theorem 41, imply χσ i, σ i χ σ i, σ i 1 4 D 1 D 1 1 D D 1 < 1 4 3=8 By Lemma 61, we have χσ i, σ i χσ i, σ i +χ σ i, σ i 1 1 8 χσ i, σ i χ σ i, σ i 1 D1 D1 1 + D D 1 1 1 3 D 1 D 1 1 D D 1, n [χσ i, σ i ] n [ ] χσi, σ i +χ σ i, σ i 1 1 8 χσ i, σ i χ σ i, σ i n n [χσ i, σ i ] + [χ σ i, σ i ] 1 1 8 max 1 i n χσ i, σ i χ σ i, σ i 1 D1 D1 1 F + D D 1 F 1 1 3 D 1 D 1 1 D D 1, as expected This completes the proof of Theorem 41 To prove Theorem 43, we notice that ϱ p σ i, σ i ϱ p σ i, σ i +ϱ p σ i, σ i ϱ p is a metric on R 1/p χσ i, σ i + 1/p χ σ i, σ i by Proposition 1 1/p D D 1 + D1 D1 1 by 6 and 63 and n [ϱ p σ i, σ i ] n [ϱ p σ i, σ i +ϱ p σ i, σ i ] ϱ p is a metric on R n [ϱ p σ i, σ i ] + n [ϱ p σ i, σ i ] 1/p n [χσ i, σ i ] + 1/p n [χ σ i, σ i ] 1 1/p D D 1 F + D1 D1 1 F These inequalities complete the proof of Theorem 43 by Proposition by 6 and 63 7 Generalized eigenvalue problems and generalized singular value problems In this section, we discuss perturbations for scaled generalized eigenvalue problems and scaled generalized singular value problems As we shall see, the results in previous sections, as well as those in Li [5], can be applied to derive relative perturbation bounds for these problems

RELATIVE PERTURBATION THEORY I 973 The generalized eigenvalue problem: A 1 λa S1H 1 S 1 λsh S and Ã1 λã S1 H 1 S 1 λs H S, where H 1 and H are positive definite; H 1 j H j H j < 1 for j =1, ; S 1 and S are some square matrices and one of them is nonsingular 7 The generalized singular value problem: {B 1,B } {G 1 S 1,G S } and { B 1, B } { G 1 S 1, G S }, where G 1 and G are nonsingular; G 1 j G j G j < 1 for j =1, ; S 1 and S are some square matrices and one of them is nonsingular For the scaled generalized eigenvalue problem just mentioned, without loss of generality, we consider the case when S is nonsingular Then the generalized eigenvalue problem for A 1 λa S1H 1 S 1 λsh S is equivalent to the standard eigenvalue problem for A def = H 1/ S S 1H 1 S 1 S 1 H 1/, and the generalized eigenvalue problem for Ã1 λã S 1 H 1 S 1 λs H S is equivalent to the standard eigenvalue problem for where D = D à def = D H 1/ S S 1 H 1 S 1 S 1 H 1/ D, def 1/ = I + H 1/ H H 1/ def and H = H H So, bounding relative distances between the eigenvalues of A 1 λa and those of à 1 λã is transformed to bounding relative distances between the eigenvalues of A and those of à The latter can be accomplished in two steps: 1 Bounding relative distances between the eigenvalues of A and those of  def = D H 1/ S S 1H 1 S 1 S 1 H 1/ D = D AD Bounding relative distances between the eigenvalues of  and those of à Denote and order the eigenvalues of A, Â, and à as λ 1 λ n, λ1 λ n, and λ 1 λ n Set D 1 = D 1 def 1/ = I + H 1/ 1 H 1 H 1/ def 1 and H 1 = H 1 H 1 By Theorem 31 on A and  = D AD, Theorem 3 on  = X H 1 X, and à = X H1 X, where X = S 1 S 1 H 1/ D,wehave 71 and 7 χλ i, λ i D D 1 and χ λ i, λ i D 1 D 1 1 n [ χλ i, λ ] i D D 1 F and n [ χ λ i, λ ] i D1 D1 1 F 7 When S is singular, both pencils will have the same number of the eigenvalue + For convenience, we define the relative differences by any measure introduced in section to be 0

974 REN-CANG LI By Lemma 61, we have that if D 1 D 1 1 D D 1 < 8, then χλ i, λ i χλ i, λ i +χ λ i, λ i 1 1 8 χλ i, λ i χ λ i, λ i D D 1 + D 1 D1 1 1 1 8 D 1 D1 1 D D 1 and [ n [ χλ i, λ ] n χλ i, i λ i +χ λ i, λ ] i 1 1 8 χλ i, λ i χ λ i, λ i n [ χλ i, λ ] n [ i + χ λ i, λ ] i 1 1 8 max χλ i, λ i χ λ i, λ i 1 i n D D 1 F + D 1 D1 1 F 1 1 8 D 1 D1 1 D D 1 Notice also that for j =1, and for any unitarily invariant norm, D j D 1 j H 1 j H j 1 H 1 j H j So we have proved the following Theorem 71 Let A 1 λa S1H 1 S 1 λsh S and Ã1 λã S1 H 1 S 1 λs H S, where H 1 and H are n n, positive definite, and H 1 j H j H j < 1 for j =1, S 1 and S are some square matrices and one of them is nonsingular Let the generalized eigenvalues of A 1 λa and Ã1 λã be If θ 1 θ H 1 H < 8, then n λ 1 λ n and λ1 λ n max χλ i, λ i θ 1 H 1 + θ H 1 i n 1 1 8 θ, 1θ H 1 H [ χλ i, λ ] θ 1 H 1 F + θ H F i 1 1 8 θ, 1θ H 1 H / def where θ j = H 1 j 1 H 1 j H j for j =1, On the other hand, from 71, 7, and Proposition, we get ϱ p λ i, λ i 1/p D D 1 and ϱ p λ i, λ i 1/p D 1 D 1 1 and n [ ϱ p λ i, λ ] i 1/p D D 1 F and n [ ϱ p λ i, λ ] i 1/p D 1 D1 1 F

RELATIVE PERTURBATION THEORY I 975 Since ϱ p is a metric on R, wehave ϱ p λ i, λ i ϱ p λ i, λ i +ϱ p λ i, λ i 1/p D D 1 + D 1 D1 1 and n [ ϱ p λ i, λ i ] n [ ϱ p λ i, λ i +ϱ p λ i, λ ] i n [ ϱ p λ i, λ i ] + n [ ϱ p λ i, λ ] i 1/p D D 1 F + D 1 D1 1 F Theorem 7 Let all conditions of Theorem 71, except D 1 D1 1 D D 1 < 8, which is no longer necessary, hold Then n max ϱ pλ i, λ i 1/p θ 1 H 1 + θ H, 1 i n [ ϱ p λ i, λ i ] 1/p θ 1 H 1 F + θ H F As to the scaled generalized singular value problem mentioned above, we shall consider instead its corresponding generalized eigenvalue problem [1, 36, 37] for 73 S 1G 1G 1 S 1 λs G G S and S 1 G 1 G 1 S 1 λs G G S Theorem 73 Let {B 1,B } {G 1 S 1,G S } and { B 1, B } { G 1 S 1, G S }, where G 1 and G are n n and nonsingular; G 1 j G j G j < 1 for j =1, ; S 1 and S are some square matrices and one of them is nonsingular Let the generalized singular values of {B 1,B } and { B 1, B } be σ 1 σ n and σ 1 σ n If δ 1 δ < 3, where δ jt = I + G j G 1 j I + Gj G 1 1 t j for j =1, and t =, F, then max χσ i, σ i 1 1 i n n [χσ i, σ i ] 1 It can be proved that for j =1, and t =, F, Gj G 1 j + G j G j t δ jt G j G 1 1+ 1 1 G 1 j G j δ 1 + δ 1 1 3 δ 1δ, δ 1F + δ F 1 1 3 δ 1δ 1 j + G jg j t 1 G j G 1 G 1 j G j t j G j G 1 j t

976 REN-CANG LI Proof Consider the case when S is nonsingular The case when S 1 is nonsingular can be handled analogously By 73, we know that the singular values of B def = G 1 S 1 S 1 G 1 def and B = G 1 S 1 S 1 G 1 are σ 1 σ n and σ 1 σ n, respectively Set D 1 = I + G 1 G 1 1, G 1 = G 1 G 1, and D = I + G G 1, G = G G ; then B = D 1 BD 1 By Theorem 41, we have max χσ i, σ i 1 1 i n n [χσ i, σ i ] 1 D 1 D 1 1 + D D 1 1 3 D 1 D 1 1 D, D D 1 D 1 1 F + D D F 1 1 3 D 1 D 1 1 D, D as were to be shown By the first half of the proof of Theorem 73 and by Theorem 43, we can prove the following Theorem 74 Let all conditions of Theorem 73, except δ 1 δ < 3, which is no longer necessary, hold Then max ϱ pσ i, σ i 1 1 i n δ 1+1/p 1 + δ, n [ϱ p σ i, σ i ] 1 δ 1+1/p 1F + δ F 8 Conclusions We have developed a relative perturbation theory for eigenvalue and singular value variations under multiplicative perturbations In the theory, extensions of the celebrated Hoffman Wielandt and Weyl Lidskii theorems from the classical perturbation theory are made Our extensions use two kinds of relative distance: ϱ p and χ Topologically, these new relative distances are equivalent to the classical measurement 1 for relative accuracy, but the new distances have better mathematical properties It is proved that ϱ p is indeed a metric on R while χ is not Often it is the case that perturbation bounds using χ are sharper than bounds using ϱ p Our unifying treatment in this paper covers many previously studied cases and yields bounds that are at least as sharp as existing ones Our results are applicable to the computations of sharp error bounds in the Demmel Kahan QR [8] algorithm and the Fernando Parlett implementation of the Rutishauser QD algorithm [14]; see Li [3] Previous approaches to building a relative perturbation theory are more or less along the lines of using the min-max principle for Hermitian matrix eigenvalue problems Our approach in this paper, however, is through deriving the perturbation equations 5 and 53 A major advantage of this new approach is that these perturbation equations will lead to the successful extensions in [5] of Davis Kahan sin θ theorems [5] and Wedin sin θ theorems [38]

RELATIVE PERTURBATION THEORY I 977 Appendix A Proofs of Propositions 3 and 4 Lemma A1 Let α, β, α, β R If α β β α, then ϱ 1 α, α ϱ 1 β, β If α β β α and β β 0, then ϱ p α, α ϱ p β, β for p>1, and it is strict if either α<β or β < α holds Proof We consider function fξ defined by fξ def = 1 ξ, where 1 ξ 1 p 1+ ξ p When p =1, { 1, for 1 ξ 0, fξ = 1+ξ 1, for 0 ξ 1, so fξ decreases monotonically and decreases strictly monotonically for 0 ξ 1 We are about to prove that when p > 1, function fξ so defined is strictly monotonically decreasing This is true if p = When 1 <p<, set hξ def =[fξ] p and gξ def =[f ξ] p Since, for 0 <ξ<1, h ξ = p1 ξp 1 1 + ξ p 1 1 + ξ p < 0 and g ξ = p1 + ξp 1 1 ξ p 1 1 + ξ p > 0, for 0 <ξ<1, hξ is strictly monotonically decreasing and gξ is strictly monotonically increasing Thus function fξ is strictly monotonically decreasing for p > 1 There are four cases to deal with Assume that at least one of α β and β α is strict 1 0 α β β α, then 0 α/ α <β/ β 1; thus ϱ p α, α =fα/ α >fβ/ β =ϱ p β, β α 0 β β α or α β β 0 α; then ϱ p α, α 1 ϱ p β, β It is easy to verify that the equalities in the two inequality signs cannot be satisfied simultaneously 3 α β 0 β α Only p = 1 shall be considered: ϱ 1 α, α =1=ϱ 1 β, β 4 α β β α 0, then 0 α/α < β/β 1; thus ϱ p α, α =f α/α >f β/β =ϱ p β, β The proof is completed Remark A1 In Lemma A1, assumption β β 0 for the case p>1 is essential A counterexample is the following: let ξ>ζ>0, and let α = ζ β = ζ < β = ζ< α <ξ Then ϱ p α, α = ξ + ζ p ξp + ζ p < 1 1/p = ϱ p β, β

978 REN-CANG LI Proof of Proposition 3 For any permutation τ of {1,,,n}, the idea of our proof is to construct n + 1 permutations τ j such that τ 0 = τ, τ n = identity permutation, and for j =0, 1,,,n 1, max ϱ pα i, α 1 i n τji max ϱ pα i, α 1 i n τj+1i The construction of these τ j s goes as follows Set τ 0 = τ Given τ j,ifτ j j+1 = j+1, set τ j+1 = τ j ; otherwise, define τ j i, τ j+1 i = j +1, for i = j +1, τ j j +1, for τ 1 j j +1 i j +1, for i = τ 1 j j +1 In this latter case, τ j and τ j+1 differ only at two indices as shown in the following picture notice that τ 1 j j +1>j+ 1 and τ j j +1>j+ 1: α τ 1 j+1 α j+1 j τ j τ j+1 τ j+1 3 τ j α τjj+1 α j+1 With Lemma A1, it is easy to prove that { } max ϱ p α j+1, α τjj+1, ϱ p α τ 1 j+1, α j+1 j { } max ϱ p α j+1, α j+1,ϱ p α τ 1 j+1, α τ j jj+1 Thus τ j s so constructed have the desired properties A proof of Proposition 4 can be given analogously with the help of the first inequality of Lemma 61 and the following lemma Lemma A Let α 1 α > 0 and α 1 α > 0 Then [χα 1, α 1 ] +[χα, α ] [χα 1, α ] +[χα, α 1 ], and the equality holds if and only if either α 1 = α or α 1 = α Proof It can be verified that α 1 α 1 + α α α α 1 α 1 α α 1 α 1 α α α α 1 α 1 α = α 1 α α 1 α α 1 α + α 1 α 0, α 1 α 1 α α and the equality holds if and only if either α 1 = α or α 1 = α Appendix B ϱ p is a metric on R Throughout this appendix, we will be working with real numbers The definition of ϱ p immediately implies that 1 ϱ p α, α 0; and ϱ p α, α = 0 if and only if α = α

RELATIVE PERTURBATION THEORY I 979 ϱ p α, α =ϱ p α, α So it remains to show that ϱ p satisfies the triangle inequality B1 ϱ p α, γ ϱ p α, β+ϱ p β,γ for α, β, γ R to conclude that the following holds Theorem B1 ϱ p is a metric on R We strongly conjecture that ϱ p is a metric on C Unfortunately, we are unable to prove it at this point Since ϱ p is symmetric with respect to its two arguments, we may assume, without loss of generality, that from now on B α γ There are three possible positions for β: B3 β α or α<β γ or γ<β The hardest part of our proof is to show that B1 holds for the second position of β in B3 We state it in the following lemma whose proof is postponed to the end of this section Lemma B Inequality B1 holds for α β γ, and the equality holds if and only if β = α or β = γ With this lemma, we are now ready to prove B1 Proof of B1 The proof is divided into two different cases The case αγ 0 Lemma B says that B1 is true if α β γ If either β<αor γ<β, by Lemma A1, we have { ϱp α, β ϱ ϱ p α, γ p α, β+ϱ p β,γ, if γ β, ϱ p β,γ ϱ p α, β+ϱ p β,γ, if β α The case αγ < 0 We may assume α<0 and γ>0 see B Consider the three possible positions B3 for β 1 β α<0 In this subcase, 1/α 1/β < 0 < 1/γ By Lemma B, we have ϱ p α, γ =ϱ p 1/α, 1/γ ϱ p 1/α, 1/β+ϱ p 1/β, 1/γ =ϱ p α, β+ϱ p β,γ α β γ This subcase has been taken care of by Lemma B 3 0 <γ β In this subcase, 1/α < 0 < 1/β 1/γ The rest is the same as in subcase 1 above The proof is completed Proof of Lemma B Since both swapping α and γ and multiplying α, β, γ all by 1 lose no generality, we may further assume that B4 α α γ Inequality B1 clearly holds if one of α, β, γ is zero or if β = α, β = γ, orα = γ So from now on we assume α<β<γ and α 0,β 0,γ 0

980 REN-CANG LI For 1 p<, ϱ p α, γ = γ α γp + α = γ β + β α = p γp + α p p p γ β γp + α + p p β α p γp + α p γ β = p γp + β + β α p p β p + α p 1 +γ β γp + α 1 p γp + β p p 1 +β α γp + α 1 p α p + β p = ϱ p α, β+ϱ p β,γ+h, p p p where h = γ β β p α p p γp + α p p γ p + β p + β α β p γ p p γp + α p p α p + β p p γp + β p p γ p + α p β p α p p α p + β p p γ p + α p β p γ p The second factors of the two summands in h are always nonnegative Now if α< β α γ, then β p α p 0 and β p γ p < 0, and thus h < 0 Hence ϱ p α, γ <ϱ p α, β+ϱ p β,γ Consider now α <β<γ Then γ ββ α 1 h = p γp + α p p γp + β βp α p p γp + β p p γ p + α p p β α β p α p 1 p α p + β γp β p p p γ β α p + β p p γ p + α p β p γ p < 0 The last inequality is true because p γ p + β p > p α p + β p p 1 γp +β p < 1 and 0 < βp α p γp β p β α γ β, p γp + β 0 < p γ p + α p p γ p + β p γ p + α p α p + β p p γ p + α p α p + β p γ p + α p p α p +β p by Lemma B3, since for 1 <p<, fx =x p is convex and gx = p x is concave So we also have ϱ p α, γ <ϱ p α, β+ϱ p β,γ for α <β<γ The proof for the case p< is completed When p =, B4 and α<β<γimply γ > max{ α, β } So ϱ α, γ = γ α γ = γ β γ = γ β + β α γ γ β α + +β α max{ α, β } <ϱ α, β+ϱ β,γ, 1 γ 1 max{ α, β }

RELATIVE PERTURBATION THEORY I 981 as was to be shown Lemma B3 Suppose functions fx and gx are defined on the interval [a, b], and suppose fx is convex and gx concave Let x, y, z [a, b] and x y z Then fy fx y x fz fy z y and gy gx y x gz gy z y A proof of this lemma can be found in most mathematical analysis books; see, eg, [31, section 144] Acknowledgments I thank Professor W Kahan for his consistent encouragement and support, Professor J Demmel for helpful discussions on open problems in this research area, and Professor B N Parlett for drawing my attention to Ostrowski s theorem Thanks also go to Professor I C F Ipsen for sending me the report [13] Professor R Bhatia and the referees constructive comments, which improve the presentation considerably, are greatly appreciated REFERENCES [1] J Barlow and J Demmel, Computing accurate eigensystems of scaled diagonally dominant matrices, SIAM J Numer Anal, 7 1990, pp 76 791 [] F L Bauer and C T Fike, Norms and exclusion theorems, Numer Math, 1960, pp 137 141 [3] R Bhatia, Matrix Analysis, Graduate Texts in Mathematics 169, Springer-Verlag, New York, 1996 [4] G D Birkhoff, Tres observaciones sobre el algebra lineal, Univ Nac de Tucuman Rev, Ser A, 5 1946, pp 147 151 [5] C Davis and W Kahan, The rotation of eigenvectors by a perturbation III, SIAM J Numer Anal, 7 1970, pp 1 46 [6] P Deift, J Demmel, L-C Li, and C Tomei, The bidiagonal singular value decomposition and Hamiltonian mechanics, SIAM J Numer Anal, 8 1991, pp 1463 1516 [7] J Demmel and W Gragg, On computing accurate singular values and eigenvalues of matrices with acyclic graphs, Linear Algebra Appl, 185 1993, pp 03 17 [8] J Demmel and W Kahan, Accurate singular values of bidiagonal matrices, SIAM J Sci Statist Comput, 11 1990, pp 873 91 [9] J Demmel and K Veselić, Jacobi s method is more accurate than QR, SIAM J Matrix Anal Appl, 13 199, pp 104 145 [10] G DiLena, R I Peluso, and G Piazza, Results on the relative perturbation of the singular values of a matrix, BIT, 33 1993, pp 647 653 [11] S C Eisenstat and I C F Ipsen, Relative perturbation bounds for eigenspaces and singular vector subspaces, in Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, J G Lewis, ed, SIAM, Philadelphia, PA, 1994, pp 6 66 [1] S C Eisenstat and I C F Ipsen, Relative perturbation techniques for singular value problems, SIAM J Numer Anal, 3 1995, pp 197 1988 [13] S C Eisenstat and I C F Ipsen, Relative Perturbation Results for Eigenvalues and Eigenvectors of Diagonalisable Matrices, Technical Report CRSC-TR96-6, Department of Mathematics, North Carolina State University, Raleigh, NC, 1996 [14] K V Fernando and B N Parlett, Accurate singular values and differential qd algorithms, Numer Math, 67 1994, pp 191 9 [15] G H Golub and C F Van Loan, Matrix Computations, nd ed, Johns Hopkins University Press, Baltimore, MD, 1989 [16] M Gu and S C Eisenstat, Relative Perturbation Theory for Eigenproblems, Research Report YALEU/DCS/RR 934, Department of Computer Science, Yale University, New Haven, CT, 1993 [17] A J Hoffman and H W Wielandt, The variation of the spectrum of a normal matrix, Duke Math J, 0 1953, pp 37 39 [18] R A Horn and C R Johnson, Matrix Analysis, Cambridge University Press, Cambridge, 1985

98 REN-CANG LI [19] R A Horn and C R Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, 1991 [0] W Kahan, Accurate Eigenvalues of a Symmetric Tridiagonal Matrix, Technical Report CS41, Computer Science Department, Stanford University, Stanford, CA, 1966 revised June 1968 [1] R-C Li, Bounds on perturbations of generalized singular values and of associated subspaces, SIAM J Matrix Anal Appl, 14 1993, pp 195 34 [] R-C Li, Norms of certain matrices with applications to variations of the spectra of matrices and matrix pencils, Linear Algebra Appl, 18 1993, pp 199 34 [3] R-C Li, On Deflating Bidiagonal Matrices, manuscript, Department of Mathematics, University of California, Berkeley, CA, 1994 [4] R-C Li, Relative Perturbation Theory: I Eigenvalue and Singular Value Variations, Technical Report UCB//CSD-94-855, Computer Science Division, Department of EECS, University of California at Berkeley, 1994; LAPACK working note 85 revised January 1996 available online at http://wwwnetliborg/lapack/lawns/lawn84ps [5] R-C Li, Relative Perturbation Theory: II Eigenspace and Singular Subspace Variations, Technical Report UCB//CSD-94-856, Computer Science Division, Department of EECS, University of California at Berkeley, 1994; LAPACK working note 85 revised January 1996 and April 1996, available at http://wwwnetliborg/lapack/lawns/lawn85ps [6] R-C Li, Relative Perturbation Theory: III More Bounds on Eigenvalue Variation, Linear Algebra Appl, 66 1996, pp 337 345 [7] V B Lidskii, The proper values of the sum and product of symmetric matrices, Dokl Akad Nauk SSSR, 75 1950, pp 769 77 in Russian Translation by C Benster available from the National Translation Center of the Library of Congress [8] R Mathias, Spectral perturbation bounds for positive definite matrices, SIAM J Matrix Anal Appl, 18 1997, pp 959 980 [9] R Mathias and G W Stewart, A block QR algorithm and the singular value decomposition, Linear Algebra Appl, 18 1993, pp 91 100 [30] L Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart J Math, 11 1960, pp 50 59 [31] D S Mitrinovic, Analytic Inequalities, Springer-Verlag, New York, 1970 [3] A M Ostrowski, A quantitative formulation of Sylvester s law of inertia, Proc Nat Acad Sci USA, 45 1959, pp 740 744 [33] B N Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood Cliffs, NJ, 1980 [34] I Slapničar, Accurate Symmetric Eigenreduction by a Jacobi Method, PhD thesis, Fernuniversität Gesamthochschule Hagen, Fachbereich Mathematik, 199 [35] G W Stewart and J-G Sun, Matrix Perturbation Theory, Academic Press, Boston, 1990 [36] J-G Sun, Perturbation analysis for the generalized singular value decomposition, SIAM J Numer Anal, 0 1983, pp 611 65 [37] C F Van Loan, Generalizing the singular value decomposition, SIAM J Numer Anal, 13 1976, pp 76 83 [38] P-Å Wedin, Perturbation bounds in connection with singular value decomposition, BIT, 1 197, pp 99 111 [39] H Wielandt, An extremum property of sums of eigenvalues, Proc Amer Math Soc, 6 1955, pp 106 110 [40] J H Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965