c 2010 Society for Industrial and Applied Mathematics

SIAM J MATRIX ANAL APPL Vol 31, No 5, pp 841 859 c 010 Society for Industrial and Applied Mathematics RIGOROUS PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS X-W CHANG AND D STEHLÉ Abstract This article presents rigorous normwise perturbation bounds for the Cholesky, LU, and QR factorizations with normwise or componentwise perturbations in the given matrix The considered componentwise perturbations have the form of backward rounding errors for the standard factorization algorithms The used approach is a combination of the classic and refined matrix equation approaches Each of the new rigorous perturbation bounds is a small constant multiple of the corresponding first-order perturbation bound obtained by the refined matrix equation approach in the literature and can be estimated efficiently These new bounds can be much tighter than the existing rigorous bounds obtained by the classic matrix equation approach, while the conditions for the former to hold are almost as moderate as the conditions for the latter to hold Key words perturbation analysis, normwise perturbation, componentwise perturbation, Cholesky factorization, LU factorization, QR factorization, column and row scaling, first-order bounds, rigorous bounds AMS subject classifications 15A3, 65F35 DOI 101137/090778535 1 Introduction Let A be a given matrix and have a factorization 11 A = BC Suppose that A is perturbed to A +ΔA, where a normwise or componentwise bound on ΔA is known Let the same factorization for A +ΔA be 1 A +ΔA =B +ΔBC +ΔC The aim of a perturbation analysis is to assess the effects of ΔA on ΔB and ΔC In the analysis, normwise or componentwise bounds on ΔB and ΔC are derived The perturbation theory of matrix factorizations has been extensively studied The following table summarizes the relevant works on perturbation bounds of Cholesky, LU, and QR factorizations which are known to the authors P B Cholesky LU QR N FN [], [8], [18], [19], [0] [], [6], [1], [18], [19] [], [9], [18], [0], [3] N RN [8], [1], [13], [17], [0] [1], [1] [17], [0], [3] C FN [4] [5] [7], [5] C RN [3], [13] [7], [10] C FC [3] [5] [7] C RC [1], [1], [] [1], [] [] In the first column, P stands for the type of perturbation in the matrix to be factorized, and N and C stand for normwise perturbation and componentwise Received by the editors November 30, 009; accepted for publication in revised form by R-C Li September 9, 010; published electronically November 4, 010 http://wwwsiamorg/journals/simax/31-5/77853html School of Computer Science, McGill University, Montreal, QC, H3A A7, Canada chang@csmcgillca The work of this author was supported by NSERC of Canada grant RGPIN17191-07 CNRS, Laboratoire LIP U Lyon, CNRS, ENS de Lyon, INRIA, UCBL, École Normale Supérieure delyon, 46Allée d Italie, 69364 Lyon Cedex 07, France damienstehle@ens-lyonfr 841

84 X-W CHANG AND D STEHLÉ perturbation, respectively; in the second column, B stands for perturbation bound of the factor, FN, RN, FC and RC stand for first-order normwise perturbation bound, rigorous normwise perturbation bound, first-order componentwise perturbation bound, and rigorous componentwise perturbation bound, respectively In the present article, we call a bound rigorous if it does not neglect any higher-order terms as the first-order bound does: Under appropriate assumptions, it always holds true Two types of approaches are often used to derive normwise perturbation bounds One is the matrix-vector equation approach, and the other is the matrix equation approach; see [3] Here we give a brief explanation about these two approaches in the context of first-order analysis From 11 and 1 we have by dropping the second-order term that 13 ΔA BΔC +ΔBC The basic idea of the matrix-vector equation approach is to write this approximate matrix equation 13 as a matrix-vector equation by using the special structures and properties of the involved matrices, then obtain the vector-type expressions for ΔB and ΔC, from which normwise bounds on ΔB and ΔC can be derived The approach can be extended to obtain rigorous bounds This approach usually leads to sharp bounds, but the bounds first-order bounds or rigorous bounds are expensive to estimate, and the conditions for the rigorous bounds to hold are often too restrictive and complicated The matrix equation approach comes in two flavors The classic matrix equation approach keeps 13 in the matrix-matrix form and drives bounds on ΔB and ΔC The approach can be extended to obtain rigorous bounds The bounds first-order bounds or rigorous bounds can be efficiently estimated, and the conditions for the rigorous bounds to hold are less restrictive and simpler But the bounds are usually not tight The refined matrix equation approach additionally uses row or column scaling techniques It has been mainly used to derive first-order bounds, which numerical experiments showed are often good approximations to the sharp firstorder bounds derived by the matrix-vector equation approach It is often unclear whether a first-order bound is a good approximate bound, as the ignored higher-order terms may dominate the true perturbation see, eg, Remark 51 Furthermore, in some applications rigorous bounds are needed in order to certify the accuracy of computations; see, eg, [10, 15] for an application with the QR factorization and [16] for an application with the Cholesky factorization The present article aims at providing tight rigorous perturbation bounds for the Cholesky, LU, and QR factorizations, which can be efficiently estimated in On flops, where n is the number of columns of the matrix to be factorized Additionally, the conditions for the bounds to hold are simple and moderate We consider both normwise and componentwise perturbations in the matrix to be factorized The componentwise perturbations have the form of backward errors resulting from standard factorization algorithms In [10] we obtained such a rigorous bound for the R-factor of the QR factorization under a componentwise perturbation which has the form of backward rounding errors of standard QR factorization algorithms The approach used in the latter work is actually a combination of the classic and refined matrix equation approaches We will use a similar approach in this article The rest of this article is organized as follows In section, we introduce notation and give some basics that will be necessary for the following three sections Sections 3, 4, and 5 are devoted to Cholesky, LU, and QR factorizations, respectively Finally a summary is given in section 6

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 843 Notation and basics For a matrix X R n n,weusexi, : and X:,jto denote its ith row and jth column, respectively, and use X k to denote its k k leading principal submatrix We define a lower triangular matrix and two upper triangular matrices associated with X R n n as follows: 1 3 sltx =s ij, s ij = { xij if i>j, 0 otherwise, utx =X sltx, x ij if i<j, 1 upx =s ij, s ij = x ij if i = j, 0 otherwise For any absolute matrix norm ie, A = A for any A, we have 4 sltx X, utx X, upx X Let D n denote the set of all real n n positive definite diagonal matrices We will use the following properties, which hold for any D D n : 5 sltdx = D sltx, utxd = utxd, upxd = upxd It can be verified that if X T = X, then 6 upx F 1 X F It is proved in [9, Lemma 51] that, for any D =diagδ 1,,δ n D n, 7 upx+d upx T D F ρ D X F, ρ D = [ ] 1/ 1+ max δ j/δ i 1 i<j n For any matrix X R m n and any consistent matrix norm ν, we define κ ν X = X ν X ν, cond ν X = X X ν, where X is the Moore-Penrose pseudo-inverse of X The following well-known results are due to van der Sluis [4] Lemma 1 Let S, T R n n with S nonsingular and define D rp = diag Si, : p, D cp = diag S:,j p, p =1, Then 8 9 10 11 T S = TD r1 Dr1 S = min TD D S, D D n S T 1 = SDc1 1 D c1 T 1 = min SD 1 DT 1, D D n TD r Dr S n inf TD D S, D D n SDc D c T n inf SD DT D D n

844 X-W CHANG AND D STEHLÉ This lemma indicates that if one wants to estimate the rightmost sides of 8 11, one can select appropriate scaling matrices and estimate the norms of scaled matrices T and S IfbothS and T are available, or if only S is available but S is triangular and T = DS in 8 and 10, or T = S D in 9, and 11 for a known D D n, then the above estimations can be done by norm estimators in On flops; see, eg, [14, Chap 15] These results can be used to estimate the perturbation bounds to be presented Finally, we give the following basic result, which will be used in later sections many times Lemma Let a, b > 0 Letc be a continuous function of a parameter t [0, 1] such that b 4act > 0 holds for all t Suppose that a continuous function xt satisfies the quadratic inequality axt bxt +ct 0 Ifc0 = x0 = 0, then x1 a 1 b b 4ac1 Proof Thetworootsofaxt bxt+ct =0are x 1 t = 1 b b 4act, x t = 1 b + b 4act a a Notice that x 1 t <x t and both are continuous Since axt bxt+ct 0, we have either xt x 1 t orxt x t But xt is continuous and x0 = c0 = 0 so that x0 = x 1 0 <x 0, and therefore we must have xt x 1 t for all t 3 Cholesky factorization We first present rigorous perturbation bounds for the Cholesky factor when the given symmetric positive definite matrix has a general normwise perturbation Theorem 31 Let A R n n be symmetric positive definite with the Cholesky factorization A = R T R,whereR R n n is upper triangular with positive diagonal entries and let ΔA R n n be symmetric If 31 κ A ΔA F A < 1/, then A +ΔA has the unique Cholesky factorization 3 A +ΔA =R +ΔR T R +ΔR, where 33 34 R κ R [ inf D Dn κ D R ] ΔA F A 1+ 1 κ A ΔA F A + [ ] κ R inf κ D ΔA F R D D n A Proof From the condition 31, A ΔA κ A ΔA / A < 1 Thus, the matrix A + tδa for t [0, 1] is symmetric positive definite and has the unique Cholesky factorization 35 A + tδa =R +ΔRt T R +ΔRt,

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 845 which, with ΔR1 = ΔR, leads to 3 Notice that ΔRt is a continuous function of t From 35 we obtain 36 R T ΔRt T +ΔRtR = tr T ΔAR R T ΔRt T ΔRtR As ΔRtR is upper triangular, it follows from 3 that 37 ΔRtR =uptr T ΔAR R T ΔRt T ΔRtR Taking the Frobenius norm on both sides of 37 and using the inequality 6 and the fact that A = R,weobtain 38 39 ΔRtR F 1 tr T ΔAR R T ΔRt T ΔRtR F 1 t A ΔA F + ΔRtR F Therefore, as the assumption 31 guarantees that the condition of Lemma holds, we have by Lemma that 310 ΔRR F 1 1 1 A ΔA F Taking t = 1 in 37, multiplying both sides by a diagonal D D n from the right, and using the fact that upxd = upxd see 5, we have 311 ΔRR D = upr T ΔAR D R T ΔR T ΔRR D Taking the Frobenius norm on both sides of 311 and using upx F see 4, we obtain X F ΔRR D F R R D ΔA F + ΔRR F ΔRR D F Then, it follows by using 310 that ΔRR D F R R D ΔA F 1+ 1 A ΔA F Therefore, ΔRR D F D R R R D D R ΔA F 1+ 1 A ΔA F Since D D n is arbitrary and A = R, we have 33 and then 34 Nowwemakesomeremarkstoshowtherelations between the new results and existing results in the literature Remark 31 In [8], the following first-order perturbation bound, which can be estimated in On flops, was derived: 31 R [ ] κ R inf κ D ΔA F R D D n A ΔA + O F A

846 X-W CHANG AND D STEHLÉ Note that the difference between this first-order bound and the rigorous bound 34 is afactorof+ Numerical experiments indicated that 31 is a good approximation to the optimal first-order bound derived by the matrix-vector equation approach in [8]: R κ C A ΔA F A ΔA + O F A, where 313 [ ] 1 κ R κ C A κ R inf κ D R D D n The expression of κ C A involvesan nn+1 nn+1 lower triangular matrix defined by the entries of R The best known method to estimate it requires On 3 flops; see [8, Remark 6] If the standard symmetric pivoting strategy is used in computing the Cholesky factorization, the quantity inf D Dn κ D R is bounded by a function of n; see [8, sections 4 and 5] Remark 3 One of the rigorous bounds derived by the classic matrix equation approach presented in [0] is as follows: 314 R κ A ΔA F A 1+ 1 κ A ΔA F A under the same condition as 31 If we take D = I in 33, we obtain 315 R κ A ΔA F A 1+ 1 κ A ΔA F A Comparing 315 with 314, we observe that the new rigorous bound 33 is at most + 1 times as large as 314 But κ Rinf D Dn κ D R can be much smaller than κ A whenr has bad row scaling For example, for R = diag1,γ with large γ>0, κ Rκ D R=Θγ with D = diag1,γ, and κ A =Θγ Thus the bound 33 can be much tighter than 314 Remark 33 In [8, Theorem 9], the following rigorous perturbation bound was derived by the matrix-vector equation approach: 316 R κ C A ΔA F A By 313, the new bound 34 is not as tight as this bound, but no numerical experiment has indicated that the former can be significantly larger than the latter; see Remark 31 As we mentioned in Remark 31, it is more expensive to estimate the latter than the former A more serious problem with 316 is that the condition for it to hold given in [8, Theorem 9] can be as bad as κ A ΔA C F / A < 1/4 This is much more constraining than the condition 31 if inf D Dn κ D R is not bounded by a constant; see 313 For example, for R = [ ] 1 γ 01 with large γ>0, κ A C =Θγ4 and κ A =Θγ

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 847 Remark 34 In [3, Theorem 8], the following rigorous bound was derived by the refined matrix equation approach: 317 under the condition R κ Rκ D R ΔA F A 318 κ R R R D D ΔA F < 1/4 A for any D D n Notice that κ R R R D D κ R =κ A Thus the condition 318 is not only more complicated but also more constraining than the condition 31 If we want to make the bound 317 similar to the new bound 34, then we may minimize κ D RoverthesetD n But the optimal choice of D may make the condition 318 much more constraining than the condition 31 Here is an example Let R = 1 γ γ 0 γ γ 0 0 γ with large γ>0 By 10, D r =diag 1+γ + γ 4, γ + γ 4,γ is an approximate optimal D Itiseasytoverifythatκ R R R D r Dr =Θγ 5 and κ A =Θγ 4 Thus the former can be arbitrarily larger than the latter In the following we present rigorous perturbation bounds for the Cholesky factor when the perturbation ΔA has the form we could expect from the backward error in A resulting from a standard Cholesky factorization algorithm see [11] and [14, section 101] Theorem 3 Let A R n n be symmetric positive definite with the Cholesky factorization A = R T R and let A = D c HD c with D c = diaga 1/ 11,,a1/ nn Let ΔA R n n be symmetric such that ΔA εdd T for some constant ε and d = [a 1/ 11,,a1/ nn ] T If 319 n H ε<1/, then A +ΔA has the unique Cholesky factorization 30 A +ΔA =R +ΔR T R +ΔR, where n DcR inf D Dn D cr D D R 31 R ε R 1+ 1 n H ε 3 + n D c R infd Dn D c R D D R ε R Proof In the proof, we will use the following fact: D c ΔAD c F ε D c dd T D F = ε ee T F = nε, where e =[1,,1] T Note that the spectral radius of A ΔA satisfies ρa ΔA =ρd c H D c H D c c ΔA =ρh D ΔAD c c ΔADc n H ε<1

848 X-W CHANG AND D STEHLÉ Thus, the matrix A + tδa for t [0, 1] is symmetric positive definite and has the unique Cholesky factorization 35, which, with ΔR1 = ΔR, leads to 30 From 37 we obtain 33 ΔRtR =up tr T D c Dc ΔADc D c R R T ΔRt T ΔRtR Then, using 6 and the fact that H = D c R,weobtain ΔRtR F 1 tn H ε + ΔRtR F Therefore, as the assumption 319 guarantees that the condition of Lemma holds, we have by Lemma that 34 ΔRR F 1 1 1 n H ε Taking t = 1 in 33, multiplying both sides by a diagonal D D n from the right and then taking the Frobenius norm, we obtain ΔRR D F n D c R D c R D ε + ΔRR F ΔRR D F Then, using 34, we obtain ΔRR D F n Dc R D c R D ε 1+ 1 n H ε This, combined with the inequality ΔRR D F D R, leads to 31 and then 3 In the following we make some remarks, which are analogous to Remarks 31 34 Remark 35 In [4] the following first-order perturbation bound, which can be estimated in On flops, was presented: R n D cr infd Dn D c R D D R R ε + Oε Note that the difference between the above first-order bound and the rigorous bound 3 is a factor of + cf Remark 3 Numerical experiments indicated that the above first-order bound is often a reasonable approximation to the nearly optimal first-order bound derived by the matrix-vector equation approach in [4]: R χ C Aε + Oε, where the first inequality below was proved in [3, Remark 35] 35 na 1/ nn A 1/ H 1/ χ C A n D cr infd Dn D c R D D R R The expression of χ C A involvesan nn+1 nn+1 lower triangular matrix defined by the entries of RDc and the best known estimator of χ C A requireson 3 flops Here we would like to point out that an example given in [3, Remark 39] shows that

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 849 in the second inequality in 35 the right-hand side can be arbitrarily larger than the left-hand side, although numerical tests have shown that usually the former is a reasonable approximation to the latter If the standard symmetric pivoting strategy is used in computing the Cholesky factorization, the quantity inf D Dn D c R D D R / R is bounded by a function of n; see [3, Theorem 38 and section 34] Remark 36 In [13] rigorous bounds on ΔRR F, were derived The bound on ΔRR F, which was credited to Ji-guang Sun, is identical to 34 under the identical condition 319 As mentioned in [13], the bound on can be obtained by using ΔRR F R, leading to 36 1 1 1 n H R ε = If we take D = I in the bound in 31, we obtain n H ε 1+ 1 n H ε R n H ε 1+ 1 n H ε Thus the bound in 31 is at most + 1 times as large as the bound in 36 But D c R infd Dn D c R D [ D ] R / R in 31 can be much smaller than H γ γ For example, for R = 0 1 with large γ>0, DcR D cr D D R R = Θγ with D = diagγ,1, and H =Θγ Thus the bound 31 can be much tighter than the bound 36 Remark 37 In [3, Theorem 39], the following rigorous perturbation bound was derived by the matrix-vector equation approach: 37 R χ C Aε under the condition see [3, Theorem 39, Remark 34] 38 A n min i a ii χ C Aε <1 4 By the second inequality in 35, the new bound 3 is not as tight as 37 But, as we mentioned in Remark 35, estimating the latter is more expensive than estimating the former A more serious problem is that the condition 38 can be much more constraining than the condition 319 In fact, by the first inequality in 35, we have A n min i a ii χ CA A n min i a ii n a nn 4 A H 1 4 n H Thus, if a nn is much larger than min i a ii, then 38 is much more constraining than 319 Remark 38 In [3, Theorem 310], the following rigorous bound was derived by the refined matrix equation approach: 39 n D cr D c R D D R ε R R

850 X-W CHANG AND D STEHLÉ under the condition 330 n D c R D c R D D ε<1/4 for any D D n Notice that D c R D c R D D D c R = H Thus the condition 330 is not only more complicated but also more constraining than the condition 319 If we want to make the bound 39 similar to the new bound 3, then we may minimize D c R D D R over the set D n But the optimal choice of D may make the condition 330 much more constraining than the condition 319 Here is an example Let R = [ ] 11 0 γ with a large γ>0 By 10, D r =diag,γ is an approximate optimal D It is easy to verify that D c R D c R D r Dr =Θγ and H = Θ1 Thus the former can be arbitrarily larger than the latter 4 LU factorization We first present rigorous perturbation bounds for the LU factors when the given matrix has a general normwise perturbation Theorem 41 Let A R n n have nonsingular leading principal submatrices with the LU factorization A = LU, wherel R n n is unit lower triangular and U R n n is upper triangular, and let ΔA R n n be a small perturbation in A If 41 L U ΔA F < 1/4, then A +ΔA has the unique LU factorization 4 A +ΔA =L +ΔLU +ΔU, where 43 44 ΔL F L F ΔU F U F infdl D n κ LD L 1+ U n A F L F ΔA F 1 4 L U ΔA F inf κ LD L D L D n infdu D n κ D U U n L F U L U F ΔA F, ΔA F 45 1+ 1 4 L U ΔA F L inf κ D 46 U U D U D n U F Proof With the condition 41, we have for 1 k n, A k ΔA F ΔA k L k U k ΔA k F L U ΔA F < 1 Thus A k + tδa k for t [0, 1] is nonsingular In other words, all the leading principal submatrices of A + tδa are nonsingular Thus, the matrix A + tδa has a unique LU factorization 47 A + tδa =L +ΔLtU +ΔUt, which, with ΔL1 = ΔL and ΔU1 = ΔU, leads to 4 From 47, we obtain 48 L ΔLt+ΔUtU = tl ΔAU L ΔLtΔUtU

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 851 Notice that L ΔLt is strictly lower triangular and ΔUtU is upper triangular Taking the Frobenius norm on both sides of 48, we obtain 49 L ΔLt+ΔUtU F t L U ΔA F + L ΔLt F ΔUtU F Let xt =max L ΔLt F, ΔUtU F Then we have xt L ΔLt+ΔUtU F, L ΔLt F ΔUtU F xt Thus, from 49 it follows that xt xt +t L U ΔA F 0 The assumption 41 ensures that the condition of Lemma is satisfied Therefore, by Lemma we obtain 410 max L ΔL F, ΔUU F 1 1 1 4 L U ΔA F We now derive perturbation bounds for the L-factor Let [ U n ] 0 u nn From 48 with t = 1 it follows that [ L ΔL =slt L U ΔA n Un u/u ] nn sltl ΔLΔUU 0 1/u nn [ ] =slt L U ΔA n 0 411 sltl ΔLΔUU 0 0 Multiplying both sides of 411 from the left by a diagonal D L D n and taking the Frobenius norm, we obtain D L L ΔL F D L L U n ΔA F + D L L ΔL F ΔUU F Using 410, we have D L L ΔL F D L L U n ΔA F 1+ 1 4 L U ΔA F Combining the inequality ΔL F LD L D L L ΔL F and the above inequality leads to 43 and then 44 Now we derive perturbation bounds for the U-factor From 48 with t =1, 41 ΔUU =utl ΔAU utl ΔLΔUU u Multiplying both sides of 41 from the right by a diagonal D U the Frobenius norm, we obtain D n and taking ΔUU D U F L U D U ΔA F + L ΔL F ΔUU D U F Using 410, we have ΔUU D U F L U D U ΔA F 1+ 1 4 L U ΔA F Therefore, with the inequality ΔU F 45 and 46 ΔUU D U F D U U, we can obtain

85 X-W CHANG AND D STEHLÉ Remark 41 In [6] the following first-order perturbation bounds, which can be estimated in On flops, were presented: ΔL F L F ΔU F U F inf κ LD L D L D n inf κ D U U D U D n U n L F L U F ΔA F ΔA F ΔA + O F A F ΔA + O F A F Note that the difference between the above first-order bound for the L-factor and the rigorous bound 44 is a factor of The same holds for the U-factor as well Numerical experiments have indicated that the above first-order bounds are good approximations to the corresponding optimal first-order bounds derived by the matrix-vector equation approach in [6]: ΔL F L F ΔU F U F κ L A ΔA F κ U A ΔA F ΔA + O F A F ΔA + O F A F,,, where 413 414 Un U κ L A inf κ LD n L, L F D L D n L F L L κ U A inf κ D U U U F D U D n U F The expressions of κ L A andκ U A involve an n n matrix defined by the entries of L and U and are expensive to estimate To see how partial pivoting and complete pivoting affect the bounds in 413 and 414, we refer to [6, sections 4 and 5] Remark 4 In [1] the following rigorous bounds were presented: 415 ΔL F L L ΔAU F 1 L ΔAU, ΔU F U L ΔAU F 1 L ΔAU under the condition that L ΔAU < 1 If we know only ΔA F or ΔA rather than L ΔAU this is often the case, then the tightest bounds we can derive from 415 are as follows: U L F ΔA F ΔL F κ L L F 1 L U, ΔA L U F ΔA F ΔU F κ U U F 1 L U, ΔA whereweassume L U ΔA < 1, which is a little less restrictive than 41 A comparison between these two bounds with 43 and 45 shows that the formers can be much larger than the latters when L has bad column scaling or U is much larger than Un for the L-factor and when U has bad row scaling for the U-factor If the Gaussian elimination is used for computing the LU factorization of A and runs to completion, then the computed LU factors L and Ũ satisfy 416 A +ΔA = LŨ, ΔA ε L Ũ,

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 853 where ε = nu/1 nu with u being the unit roundoff; see, for example, [14, Theorem 93] In the following theorem we will consider the perturbation ΔA, which has the same form as in 416 The perturbation bounds will involve the LU factors of A + ΔA, unlike other perturbation bounds given in this paper, which involve the factors of A The reason is that the bound on ΔA in 416 involves the LU factors of A + ΔA The perturbation bounds will use a consistent absolute matrix norm eg, the 1-norm, -norm, and F -norm, unlike other bounds given in this paper, which use the F-norm or -norm Theorem 4 Suppose that ΔA R n n is a perturbation in A R n n and A + ΔA has nonsingular leading principal submatrices with the LU factorization satisfying 416 Let denote a consistent absolute matrix norm If 417 cond LcondŨ ε<1/4, then A has the unique LU factorization A = LU LetΔL = L L and ΔU = Ũ U Then 418 419 40 inf ΔL DL Dn L ΔU Ũ LD L L DL L L condũ n ε 1+ 1 4cond LcondŨ ε inf D L D n LD L L D L L L inf DU Dn Ũ Ũ D U D U Ũ Ũ condũ n ε, cond Lε 1+ 1 4cond LcondŨ ε inf D U D n Ũ Ũ D U D Ũ U 41 Ũ cond Lε Proof The proof is similar to the proof of Theorem 41 and we mainly reverse the roles of A and A +ΔA Using the bound on ΔA in 416 and 417, we have for 1 k n, 4 L k ΔA kũ k = L k L k Ũk Ũ k ε cond LcondŨ ε<1 For t [0, 1], A k +ΔA k tδa k = L k Ũ k tδa k = L k [I t L k ΔA kũ k ]Ũk Thus, by 4, the matrix A k +ΔA k tδa k is nonsingular Therefore A+ΔA tδa has the unique LU factorization 43 A +ΔA tδa = L ΔLt Ũ ΔUt, which, with ΔL1 = ΔL and ΔU1 = ΔU, gives the LU factorization A = LU From 43, we obtain 44 L ΔLt+ΔUtŨ = t L ΔAŨ + L ΔLtΔUtŨ, where L ΔLt is strictly lower triangular and ΔUtŨ is upper triangular Taking the consistent absolute matrix norm on both sides of 44 and using the bound on ΔA in 416, we obtain 45 L ΔLt+ΔUtŨ t cond LcondŨ ε+ L ΔLt ΔUtŨ

854 X-W CHANG AND D STEHLÉ Let xt =max L ΔLt, ΔUtŨ Then we have xt L ΔLt+ΔUtŨ, L ΔLt ΔUtŨ xt Thus, from 45 it follows that xt xt +t cond LcondŨ ε 0 The assumption 417 ensures that the condition of Lemma is satisfied Therefore, by Lemma, 46 max L ΔL, ΔUŨ 1 1 1 4cond LcondŨ ε We now derive perturbation bounds for the L-factor Let [ Ũ n ] 0 ũ nn Similarly to 411, from 44 with t =1wehave [Ũ ] 47 L ΔL =slt L ΔA n 0 + slt L 0 0 ΔLΔUŨ Then, with the bound on ΔA in 416, from 47 we obtain that for any D L D n, D L L ΔL D L L L condũ n ε + D L L ΔL ΔUŨ Therefore, using 46, we have D L L ΔL D L L L condũ n ε 1+ 1 4cond LcondŨ ε Combining the inequality ΔL LD D L L L ΔL and the above inequality leads to 418 and then 419 Now we derive perturbation bounds for the U-factor From 44 with t =1,it follows that ΔUŨ =ut L ΔAŨ +ut L ΔLΔUŨ Then, for any D U D n, with 416 we obtain ΔUŨ D U cond L Ũ Ũ D U ε + L ΔL ΔUŨ D U Therefore, using 46, we have ũ ΔUŨ D U Ũ Ũ D U cond Lε 1+ 1 4cond LcondŨ ε Combining the inequality ΔU ΔUŨ D U D Ũ and the above inequality, U we obtain 40 and then 41 Remark 43 For some choices of norms, we can remove the scaling matrices in Theorem 4 From Lemma 1 we observe that if we take the 1-norm or -norm, the bounds 418, 419, 40, and 41 can be written as p =1, ΔL p L p 1+ L L L p L p cond p Ũ n ε 1 4cond p Lcond p Ũ ε L L L p cond p Ũ n L ε, p Ũ Ũ Ũ p ΔU cond p Lε p Ũ Ũ p p 1+ 1 4cond p Lcond p Ũ ε Ũ Ũ Ũ p Ũ cond p Lε p

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 855 under the condition cond p Lcond p Ũ ε < 1/4 In [5] the following first-order bounds were derived with being a consistent absolute matrix norm: 48 49 ΔL L ΔU Ũ L L L condũ n L ε + Oε, Ũ Ũ Ũ Ũ cond Lε + Oε We can see the obvious relation between these first-order bounds and the rigorous bounds derived above when the 1-norm and -norm are used For estimation of the perturbation bounds, we refer to [5] We would like to point out that to our knowledge there are no optimal or nearly optimal first-order bounds or rigorous bounds derived by the matrix-vector equation approach in the literature To see how partial pivoting, rook pivoting, and complete pivoting affect the firstorder bounds in 48 and 49, we refer to [5, section 4] 5 QR factorization In this section we consider the perturbation of the R- factor of the QR factorization of A As we do not have any new result concerning the Q-factor, we will not consider it First we present rigorous perturbation bounds when the given matrix A has a general normwise perturbation Theorem 51 Let A R m n be of full column rank with QR factorization A = QR, whereq R m n has orthonormal columns and R R n n is upper triangular with positive diagonal entries If the perturbation matrix ΔA R m n satisfies 51 κ A ΔA F < 3/ 1, A then A +ΔA has a unique QR factorization 5 A +ΔA =Q +ΔQR +ΔR, where, with ρ D defined in 7, 53 54 55 R infd Dn ρ D κ D R Q T ΔA F A 1+ 1 4κ A ΔA F A + κ A ΔA F A κ A ΔA F A 3 infd Dn ρ D κ D R ΔA F A 1+ 1 4κ A ΔA F A κ A ΔA F A 6+ 3 inf ρ D κ D ΔA F R D D n A Proof Notice that for any t [0, 1], Q T A + tδa = RI + tr Q T ΔA = RI + ta ΔA and A ΔA < 1 by 51 Thus Q T A + tδa is nonsingular, and then A + tδa has full column rank and has the unique QR factorization 56 A + tδa =Q +ΔQtR +ΔRt, which, with ΔQ1 = ΔQ and ΔR1 = ΔR, gives 5 From 56, we obtain R T ΔRt+ΔRt T R = tr T Q T ΔA + tδa T QR + t ΔA T ΔA ΔRt T ΔRt

856 X-W CHANG AND D STEHLÉ Multiplying the above by R T from left and R from right, we obtain R T ΔRt T +ΔRtR = tq T ΔAR + tr T ΔA T Q + R T t ΔA T ΔA ΔRt T ΔRt R Since ΔRR is upper triangular, it follows that 57 ΔRtR =up [ tq T ΔAR + tr T ΔA T Q + R T t ΔA T ΔA ΔRt T ΔRt R ] Thus, by 6, the quantity ΔRtR F verifies ΔRtR F 1 t R Q T ΔA F + t R ΔA F + ΔRtR F Itcaneasilybeverifiedthat1 4t R Q T ΔA F t R ΔA F > 0when R ΔA F < 3/ 1, which is equivalent to the condition 51 The condition of Lemma is thus satisfied and we can apply it, with xt = ΔRtR F,toget 58 ΔRR F 1 1 1 4 R Q T ΔA F R ΔA F For any D D n, we have from 57 with t =1that ΔRR D =up [ Q T ΔAR D+D DR T ΔA T QD ] 59 +up [ R T ΔA T ΔA ΔR T ΔR R D ] Then, by 7, it follows that ΔRR D F ρ D Q T ΔA F R D + R ΔA F R D + ΔRR F ΔRR D F Therefore, using 58 and the fact that ρ D 1 see 7, we obtain 510 ΔRR ρd R D Q T ΔA F + R ΔA F D F 1+ 1 4 R ΔA F R ΔA F Combining the inequality ΔRR D F D R and the above inequality we obtain 53 Since Q T ΔA F ΔA F and 51 holds, 54 follows from 53 Then 55 is obtained Remark 51 In [9] the following first-order bound was derived by the refined matrix equation approach: 511 R Q inf ρ D κ D T ΔA F R D D n A ΔA + O F A Some practice choices of D were given in [9] to estimate the above bound This first-order bound 511 has some similarity to 53 But if Q T ΔA = 0 ie, ΔA lies in the orthogonal complement of the range of A, then this first-order bound becomes useless, but the rigorous bound 53 clearly shows how R is sensitive to the perturbation ΔA Numerical experiments have indicated that this first-order bound is

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 857 a good approximation to the optimal first-order bound derived by the matrix-vector equation approach in [9]: κ R A QT ΔA F ΔA + O F R A A, where 1 κ R A inf D D n ρ D κ D R The expression of κ R A involvesan nn+1 nn+1 lower triangular matrix and an nn+1 n matrix defined by the entries of R and is expensive to estimate If the standard column pivoting strategy is used in computing the QR factorization, the quantity inf D Dn ρ D κ D R can be bounded by a function of n; see [9, sections 5 and 6] Remark 5 The following rigorous bound was derived in [0] by the classic matrix equation approach: κ A ΔA F A 51 R 1 κ A ΔA A under the condition κ A ΔA / A < 1, which is a little less restrictive than 51 Note that if D = I, thenρ D κ D R= κ A If R has bad row scaling and the -norm of its rows decreases from the top to bottom, then inf D Dn ρ D κ D R canbemuchsmallerthanκ A For example, for R = diagγ,1 with large γ, ρ D κ D R = Θ1 with D = diagγ,1, κ A =κ R =Θγ Thus the new rigorous bounds can be much tighter than 51 Here we would like to point out that to our knowledge there are no rigorous bounds derived by the matrix-vector equation approach in the literature For the componentwise perturbation ΔA which has the form of backward error we could expect from a standard QR factorization algorithm, the analysis has been done in [10] For completeness, we give the result here, without a proof Theorem 5 Let A R m n be of full column rank with QR factorization A = QR, whereq R m n has orthonormal columns and R R n n is upper triangular with positive diagonal entries Let ΔA R m n be a perturbation matrix in A such that 513 ΔA εc A, C R m m, 0 c ij 1, ε a small constant If 514 cond R ε< 3/ 1 m n, then A +ΔA has a unique QR factorization 515 A +ΔA =Q +ΔQR +ΔR, where, with ρ D defined in 7, 516 R 6mn 1/ inf D D n ρ D R R D D R R ε

858 X-W CHANG AND D STEHLÉ The assumption 513 on the perturbation ΔA can essentially handle two special cases First, there is a small relative componentwise perturbation in A; ie, ΔA ε A Second, there is a small relative columnwise perturbation in A; ie, ΔA:,j ε A:,j for 1 j n; see, eg, [7, section ] The second case may arise when ΔA is the backward error of the QR factorization by a standard algorithm; see [14, Chap 19] For practical choices of D to estimate the bound 516, we refer to [7, 10] 6 Summary We have presented new rigorous normwise perturbation bounds for the Cholesky, LU, and QR factorizations with normwise and componentwise perturbations in the given matrix by using a hybrid approach of the classic and refined matrix equation approaches Each of the new rigorous perturbation bounds is a small constant multiple of the corresponding first-order perturbation bound obtained by the refined matrix equation approach in the literature and can be estimated efficiently These new bounds can be much tighter than the existing rigorous bounds obtained by the classic matrix equation approach, while the conditions for the former to hold are almost as moderate as the conditions for the latter to hold Acknowledgments The authors thank Gilles Villard for early discussions on this work They are grateful to the referees very helpful suggestions, which improved the presentation of the paper REFERENCES [1] A Barrlund, Perturbation bounds for the LDL H and the LU factorizations, BIT, 31 1991, pp 358 363 [] R Bhatia, Matrix factorizations and their perturbations, Linear Algebra Appl, 197 198 1994, pp 45 76 [3] X-W Chang, Perturbation Analysis of Some Matrix Factorizations, PhD thesis, Computer Science, McGill University, Montreal, Canada, February 1997 [4] X-W Chang, Perturbation analyses for the Cholesky factorization with backward rounding errors, in Proceedings of the Workshop on Scientific Computing, Hong Kong, 1997, pp 180 187 [5] X-W Chang, Some features of Gaussian elimination with rook pivoting, BIT, 4 00, pp 66 83 [6] X-W Chang and C C Paige, On the sensitivity of the LU factorization, BIT, 38 1998, pp 486 501 [7] X-W Chang and C C Paige, Componentwise perturbation analyses for the QR factorization, Numer Math, 88 001, pp 319 345 [8] X-W Chang, C Paige, and G Stewart, New perturbation analyses for the Cholesky factorization, IMA J Numer Anal, 16 1996, pp 457 484 [9] X-W Chang, C Paige, and G Stewart, Perturbation analyses for the QR factorization, SIAM J Matrix Anal Appl, 18 1997, pp 775 791 [10] X-W Chang, D Stehlé, and G Villard, Perturbation analysis of the QR factor R in the context of LLL lattice basis reduction, 5 pages, submitted Available at http://perso ens-lyonfr/damienstehle/qrperturbhtml [11] J W Demmel, On floating point errors in Cholesky, Technical report CS 89-87, Department of Computer Science, University of Tennessee, Knoxville, TN, 1989, 6 pages LAPACK Working Note 14 [1] F M Dopico and J M Molera, Perturbation theory for factorizations of LU type through series expansions, SIAM J Matrix Anal Appl, 7 005, pp 561 581 [13] Z Drma c, M Omladi c, K Veselić, On the perturbation of the Cholesky factorization, SIAM J Matrix Anal Appl, 15 1994, pp 1319 133 [14] N J Higham, Accuracy and Stability of Numerical Algorithms, nd ed, Society for Industrial and Applied Mathematics, Philadelphia, PA, 00 [15] I Morel, G Villard, D Stehlé, H-LLL: Using Householder inside LLL, in Proceedings of ISSAC, Seoul, Korea, 009, pp 71 78

PERTURBATION BOUNDS OF SOME MATRIX FACTORIZATIONS 859 [16] P Q Nguyen, D Stehlé, An LLL algorithm with quadratic complexity, SIAMJ Comput, 39 009, pp 874 903 [17] G W Stewart, Perturbation bounds for the QR factorization of a matrix, SIAMJ Numer Anal, 14 1977, pp 509 518 [18] G W Stewart, On the perturbation of LU, Cholesky, and QR factorizations, SIAMJ Matrix Anal Appl, 14 1993, pp 1141 1146 [19] G W Stewart, On the perturbation of LU and Cholesky factors, IMA J Numer Anal, 17 1997, pp 1 6 [0] J-G Sun, Perturbation bounds for the Cholesky and QR factorizations, BIT, 31 1991, pp 341 35 [1] J-G Sun, Rounding-error and perturbation bounds for the Cholesky and LDL T factorizations, Linear Algebra Appl, 173 199, pp 77 97 [] J-G Sun, Componentwise perturbation bounds for some matrix decompositions, BIT, 3 199, pp 70 714 [3] J-G Sun, On the perturbation bounds for the QR factorization, Linear Algebra Appl, 15 1995, pp 95 11 [4] A van der Sluis, Condition numbers and equilibration of matrices, Numer Math, 14 1969, pp 14 3 [5] H Zha, A componentwise perturbation analysis of the QR decomposition, SIAM J Matrix Anal Appl, 14 1993, pp 114 1131