arxiv:cs.na/ v2 14 Oct 2003

Size: px

Start display at page:

Download "arxiv:cs.na/ v2 14 Oct 2003"

Luke Clark
5 years ago
Views:

1 arxiv:cs.na/0300 v 4 Oct 003 Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices Arvind Sankar Department of Mathematics Massachusetts Institute of Technology Shang-Hua Teng Department of Computer Science Boston University and Akamai Technologies Inc. October 4, 003 Abstract Daniel A. Spielman Department of Mathematics Massachusetts Institute of Technology Let A be any matrix and let A be a slight random perturbation of A. We prove that it is unlikely that A has large condition number. Using this result, we prove it is unlikely that A has large growth factor under Gaussian elimination without pivoting. By combining these results, we bound the smoothed precision needed by Gaussian elimination without pivoting. Our results improve the average-case analysis of Gaussian elimination without pivoting performed by Yeung and Chan (SIAM J. Matrix Anal. Appl., 997). Partially supported by NSF grant CCR-0487 Partially supported by an Alfred P. Sloan Foundation Fellowship, and NSF grant CCR-0487 Partially supported by an Alfred P. Sloan Foundation Fellowship, and NSF grant CCR

2 Introduction Spielman and Teng [ST0], introduced the smoothed analysis of algorithms as a means of explaining the success of algorithms and heuristics that could not be well understood through traditional worst-case and average-case analyses. Smoothed analysis is a hybrid of worst-case and average-case analyses in which one measures the maximum over inputs of the expected value of a function on slight random perturbations of that input. For example, the smoothed complexity of an algorithm is the maximum over its inputs of the expected running time of the algorithm under slight perturbations of that input. If an algorithm has low smoothed complexity and its inputs are subject to noise, then it is unlikely that one will encounter an input on which the algorithm performs poorly. (See also the Smoothed Analysis Homepage [Smo]) Smoothed analysis is motivated by the existence of algorithms and heuristics that are known to work well in practice, but which are known to have poor worst-case performance. Average-case analysis was introduced in an attempt to explain the success of such heuristics. However, averagecase analyses are often unsatisfying as the random inputs they consider may bare little resemblance to the inputs actually encountered in practice. Smoothed analysis attempts to overcome this objection by proving a bound that holds in every neighborhood of inputs. In this paper, we prove that perturbations of arbitrary matrices are unlikely to have large condition numbers or large growth factors under Gaussian Elimination without pivoting. As a consequence, we conclude that the smoothed precision necessary for Gaussian elimination is logarithmic. We obtain similar results for perturbations that affect only the non-zero and diagonal entries of symmetric matrices. We hope that these results will be a first step toward a smoothed analysis of Gaussian elimination with partial pivoting an algorithm that is widely used in practice but known to have poor worst-case performance. In the rest of this section, we recall the definitions of the condition numbers and growth factors of matrices, review prior work on their average-case analysis, and formally state our results. In Section 3, we perform a smoothed analysis of the condition number of a matrix. In Section 4, we use the results of Section 3 to obtain a smoothed analysis of the growth factors of Gaussian elimination without pivoting. In Section 5, we combine these results to obtain a smoothed bound on the precision needed by Gaussian elimination without pivoting. Definitions of zero-preserving perturbations and our results on perturbations that only affect the non-zero and diagonal entries of symmetric matrices appear in Section 6. In the conclusions, we mention how our results may be extended to larger families of perturbations, present some counter-examples, and suggest future directions for research. Other conjectures and open questions appear in the body of the paper.. Condition numbers and growth factors For a square matrix A, the condition number of A is defined by κ(a) = A A,

3 where we recall A = Ñ Ü x = Ax. The condition number measures how much the solution to a system Ax = b changes as one makes slight changes to A and b. A consequence is that if ones solves the linear system using fewer than ÐÓ (κ(a)) bits of precision, one is likely to obtain a result far from a solution. The simplest and most implemented method of solving linear systems is Gaussian elimination. Natural implementations of Gaussian elimination use O ( n 3) arithmetic operations to solve a system of n linear equations in n variables. If the coefficients of these equations are specified using b bits, in the worst case it suffices to perform the elimination using O(bn) bits of precision [GLS9]. This high precision may be necessary because the elimination may produce large intermediate entries. However, in practice one usually obtains accurate answers using much less precision. We show O(b + ÐÓ n) digits of precision usually suffice. In fact, it is rare to find an implementation of Gaussian elimination that uses anything more than double precision, and high-precision solvers are rarely used or needed in practice [TB97, TS90] (for example, LAPACK uses 64 bits [ABB + 99]). Since Wilkinson s seminal work [Wil6], it has been understood that it suffices to carry out Gaussian elimination with b + ÐÓ (5nκ(A) L U / A + 3) bits of accuracy to obtain a solution that is accurate to b bits. In this formula, L and U are the LU-decomposition of A; that is, U is the upper-triangular matrix and L is the lower-triangular matrix with s on the diagonal for which A = LU.. Prior work The average-case behaviors of the condition numbers and growth factors of matrices have been studied both analytically and experimentally. In his paper, The probability that a numerical analysis problem is difficult, Demmel [Dem88] proved that it is unlikely that a Gaussian random matrix centered at the origin has large condition number. Demmel s bounds on the condition number were improved by Edelman [Ede88]. Average-case analysis of growth factors began with the experimental work of Trefethen and Schreiber [TS90], who found that Gaussian random matrices rarely have large growth factors under partial or full pivoting. Yeung and Chan [YC97] studied the growth factors of Gaussian elimination without pivoting. They define ρ U and ρ L by ρ U (A) = U / A, and ρ L (A) = L, where LU = A is the LU-factorization of A obtained without pivoting. They prove Theorem. (Yeung-Chan). Ì Ö Ü Ø ÓÒ Ø ÒØ c > 0 Ò 0 < b < Ù Ø Ø G Ò n n Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò Ò Ñ Ò 0 Ò LU = G

4 Ø ÄÍ¹ ØÓÖ Þ Ø ÓÒ Ó G Ø Ò [ρ L (G) > x] [ρ U (G) > x] cn3 x, Ò ( Ñ Ò cn 7/ x, n ) + cn5/ x + b n. As it is generally believed that partial pivoting is better than no pivoting, their result provides some intuition for the experimental results of Trefethen and Schreiber demonstrating that random matrices rarely have large growth factors under partial pivoting. However, we note that it is difficult to make this intuition rigorous as there are matrices A for which no pivoting has L Ñ Ü U Ñ Ü / A Ñ Ü = while partial pivoting has growth factor n. (See also [Hig90]) The running times of many numerical algorithms depend on the condition numbers of their inputs. For example, the number of iterations taken by the method of conjugate gradients is the square root of the condition number. Similarly, the running times of interior-points methods can be bounded in terms of condition numbers [Ren95]. Blum [Blu89] suggested that a complexity theory of numerical algorithms should be parameterized by the condition number of an input in addition to the input size. Smale [Sma97] proposed a complexity theory of numerical algorithms in which one:. proves a bound on the running time of an algorithm solving a problem in terms of its condition number, and then. proves that it is unlikely that a random problem instance has large condition number. This program is analogous to the average-case complexity of Theoretical Computer Science..3 Our results To better model the inputs that occur in practice, we propose replacing step of Smale s program with. prove that for every input instance it is unlikely that a slight random perturbation of that instance has large condition number. That is, we propose to bound the smoothed value of the condition number. Our first result in this program is presented in Section 3, where we improve upon Demmel s [Dem88] and Edelman s [Ede88] results to show that a slight Gaussian perturbation of an arbitrary matrix is unlikely to have large condition number. In particular, we let A be an arbitrary n n matrix of norm at most n and we bound the probability that κ( A + G) is large, where G is a matrix of independent Gaussian random variables of variance σ and mean 0. We bound this probability in terms of σ and n. In contrast with the average-case analysis of Demmel and Edelman, our analysis can 3

5 be interpreted as demonstrating that if there is a little bit of imprecision or noise in the entries of a matrix, then it is unlikely it is ill-conditioned. On the other hand, Edelman [Ede9] writes of random matrices: What is a mistake is to psychologically link a random matrix with the intuitive notion of a typical matrix or the vague concept of any old matrix. ½ The reader might also be interested in recent work on the smoothed analysis of the condition numbers of linear programs [BD0, DST0]. In Section 4, we use results from Section 3 to perform a smoothed analysis of the growth factors of Gaussian elimination without pivoting. If one specializes our results to perturbations of the zero matrix, then one obtains a bound ρ U that improves the bound obtained by Yeung and Chan by a factor of n and which agrees with their experimental observations. The result obtained for ρ L is comparable to that obtained by Yeung and Chan. However, while Yeung and Chan compute the density functions of the distribution of the elements in L and U, such precise estimates are not immediately available in our model. As a result, the techniques we develop are applicable to a wide variety of models of perturbations beyond the Gaussian. For example, one could use our techniques to obtain results of a similar nature if G were a matrix of random variables chosen uniformly in [,]. We comment further upon this in the conclusions section of the paper. The less effect a perturbation has, the more meaningful the results of smoothed analysis are. As many matrices encountered in practice are sparse or have structure, it would be best to consider perturbations that respect their sparsity pattern or structure. Our first result in this direction appears in Section 6, in which we consider the condition numbers and growth factors of perturbations of symmetric matrices that only alter their non-zero and diagonal elements. We prove results similar to those proved for dense perturbations of arbitrary matrices. Notation Definition.. ÓÖ Ñ ØÖ Ü A Û Ö ÐÐ Ø Ø A Ñ Ü = Ñ Ü A i,j, i,j A = A = Ñ Ü x Ax / x, A = Ñ Ü x Ax / x For a collection of vectors a,...,a k, we let Span(a,...,a k ) denote the vector space spanned by these vectors. For integers a b, we let a : b denote the set of integers {x: a x b}. For a matrix A we let A a:b,c:d denote the submatrix of A indexed by rows in a : b and columns in c : d. In Section 4, we will use the following properties of matrix norms and vector norms. ½ We thank Alan Edelman for suggesting the name smoothed analysis. 4

6 Proposition. (vector norms). ÓÖ ÒÝ Ú ØÓÖ a Ò R n a / n a a º Proposition.3 ( A ). Û Ö a,...,a n Ö Ø ÖÓÛ Ó Aº A = Ñ Ü a i, i 3 Smoothed analysis of the condition number of a matrix In his paper, The probability that a numerical analysis problem is difficult, Demmel [Dem88] proved that it is unlikely that a Gaussian random matrix centered at the origin has large condition number. Demmel s bounds on the condition number were improved by Edelman [Ede88]. In this section, we present the smoothed analogue of this bound. That is, we show that for every matrix it is unlikely that a slight perturbation of that matrix has large condition number. For more information on the condition number of a matrix, we refer the reader to one of [GL83, TB97, Dem97]. As bounds on the norm of a random matrix are standard, we focus on the norm of the inverse. Recall that / A = Ñ Ò x Ax / x. The first step in the proof is to bound the probability that A v is small for a fixed unit vector v. This result is also used later (in Section 4.) in studying the growth factor. Using this result and an averaging argument, we then bound the probability that A is large. Lemma 3. (Projection of A ). Ä Ø A Ò Ö ØÖ ÖÝ ÕÙ Ö Ñ ØÖ Ü Ò R n n Ò A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ä Ø v Ò Ö ØÖ ÖÝ ÙÒ Ø Ú ØÓÖº Ì Ò ] A [ v > x < π xσ Proof. First observe that by multiplying A by an orthogonal matrix, we may assume that v = e. In this case, A v = (A, ) :, the length of the first column of A. The first column of A, by the definition of the matrix inverse, is a vector orthogonal to A :n,:, i.e., every row but the first. Also, it has inner product with the first row. Hence its length is the reciprocal of the length of the projection of the first row onto the subspace orthogonal to the rest of the rows. This projection is a -dimensional Gaussian random variable of variance σ, and the probability that it is smaller than /x in absolute value is at most /x σ e t /σ dt π π xσ, which completes the proof. /x 5

7 Theorem 3. (Smallest singular value). Ä Ø A Ò Ö ØÖ ÖÝ ÕÙ Ö Ñ ØÖ Ü Ò R n n Ò A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ì Ò [ ] A n x.35 xσ Proof. We apply Lemma 3. to a uniformly distributed random unit vector v and obtain [ ] A v x A,v π xσ Now let u be the unit vector such that A u = A (this is unique with probability ). From the inequality A v A u,v, we have that for any c > 0, So, A A,v [ ] A x [ A v x ] c n [ A x A,v [ ] A x = A ] c and u v n [ ] c u v. A,v n [ A,v A v x c [ n] A,v u v c ] n n π xσ [ c A,v u v c ] (by (3.)) n n π xσ [ ], (by Lemma B.) c g g c where g is a standard normal variable. Choosing c = 0.57, and evaluating the error function numerically, we get [ ] A n x.35 A xσ. (3.) Theorem 3.3 (Condition number). Ä Ø A Ò n n Ñ ØÖ Ü Ø Ý Ò A n Ò Ð Ø A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ì Ò ( 9.4n + ) ÐÓ (x)/n [κ(a) x]. xσ 6

8 Proof. As observed by Davidson and Szarek [DS0, Theorem II.7], one can apply inequality (.4) of [LT9] to show that for all k 0, [ A A n + k ] e k /. We rephrase this bound as [ A A n + ] ÐÓ (/ǫ) ǫ, for all ǫ. By assumption, A n; so, [ A n + ] ÐÓ (/ǫ) ǫ. From the result of Theorem 3., we have [ A.35 ] n ǫ. ǫσ Combining these two bounds, we find [ A A 4.7n +.35 ] n ÐÓ (/ǫ) ǫ. ǫσ We would like to express this probability in the form of [ A ] A x, for x. By substituting x = 4.7n +.35 n ÐÓ (/ǫ), ǫσ we observe that for ǫ = which holds here, since σ. Therefore, we conclude ( 4.7n +.35 ) n ÐÓ (/ǫ) xσ [ ] A A x ( 9.4n + ) ÐÓ (x)/n σ ( 9.4n + ) ÐÓ (x)/n xσ ( 9.4n + ) ÐÓ (x)/n xσ. 7

9 We also conjecture that the + ÐÓ (x)/n term should be unnecessary because those matrices for which A is large are less likely to have A large as well. Conjecture. Ä Ø A n n Ñ ØÖ Ü Ø Ý Ò A Ñ Ü Ò Ð Ø A Ñ ØÖ Ü Ó Ò ¹ Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ì Ò [κ(a) x] O(n/xσ). 4 Growth Factor of Gaussian Elimination without Pivoting We now turn to proving a bound on the growth factor. With probability, none of the diagonal entries that occur during elimination will be 0. So, in the spirit of Yeung and Chan, we analyze the growth factor of Gaussian elimination without pivoting. When we specialize our smoothed analyses to the case A = 0, we improve the bounds of Yeung and Chan by a factor of n. Our improved bound on ρ U agrees with their experimental analyses. 4. Growth in U We recall that ρ U (A) = U U i,: = Ñ Ü, A i A and so we need to bound the l -norm of each row of U. We denote the upper triangular segment of the kth row of U by u = U k,k:n, and observe that u can be obtained from the formula: where u = a T b T C D (4.) a T = A k,k:n b T = A k,:k C = A :k,:k D = A :k,k:n. This expression for u follows immediately from ( ) ( ) ( ) C D L:k,:k 0 U:k,:k U A :k,: = = :k,k:n L k,:k 0 u b T a T In this section, we give two bounds on ρ U (A). The first will have a better dependence on σ, and second will have a better dependence on n. It is the later bound, Theorem 4.3, that agrees with the experiments of Yeung and Chan [YC97] when specialized to the average-case. 8

10 4.. First bound Theorem 4. (First bound on ρ U (A)). Ä Ø A Ò n n Ñ ØÖ Ü Ø Ý Ò A Ò Ð Ø A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ì Ò [ρ U (A) > + x] n(n + ). π xσ Proof. From (4.), u = a T b T C D a T + b T C D a T + b T C D (as D = D T ) A ( + b T C ) (4.) We now bound the probability b T C is large. By Proposition., b T C k b T C. Therefore, [ [ b T C b > x] T C > x/ k (k )σ k ] + k b,c b,c π xσ π xσ, where the second inequality follows from Lemma 4. below and the last inequality follows from the assumption σ. We now apply a union bound over the n rows of U to obtain n k [ρ U (A) > + x] π xσ n(n + ). π xσ k= Lemma 4.. Ä Ø C Ò Ö ØÖ ÖÝ ÕÙ Ö Ñ ØÖ Ü Ò R d d Ò C Ö Ò ÓÑ Ñ ØÖ Ü Ó Ò Ô Ò¹ ÒØ Ù Ò Ú Ö Ð Ó Ú Ö Ò σ ÒØ Ö Ø Cº Ä Ø b Ú ØÓÖ Ò R d Ù Ø Ø b Ò Ð Ø b Ö Ò ÓÑ Ù Ò Ú ØÓÖ Ó Ú Ö Ò σ ÒØ Ö Ø bº Ì Ò [ b T C x] b,c π σ d + Proof. Let ^b be the unit vector in the direction of b. By applying Lemma 3., we obtain for all b, [ [ b T C > x] = ^b T C > x ] C C b π xσ b. xσ 9

11 Therefore, we have [ [ [ ] b T C > x] ] b = b T C > x b,c C π xσ b[ b ]. ] It is known [KJ8, p. 77] that b [ b σ d+ b. As [X] [X ] for every positive random variable X, we have b [ b ] σ d + b σ d Second Bound for ρ U (A) In this section, we establish an upper bound on ρ U (A) which dominates the bound in Theorem 4. for σ n 3/. If we specialize the parameters in this bound to A = 0 and σ =, we improve the average-case bound proved by Yeung and Chan [YC97] by a factor of n. Moreover, the resulting bound agrees with their experimental results. Theorem 4.3 (Second bound on ρ U (A)). Ä Ø A Ò n n Ñ ØÖ Ü Ø Ý Ò A Ò Ð Ø A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º ÓÖ n [ρ U (A) > + x] π x ( 3 n3/ + n σ ) n σ Proof. We will first consider the case k n. By (4.) and Proposition., we have u a + b T C D a + k b T C D. Therefore, for all k n, u a + k b T C D A A + k b T C D A + k b T C D A n,:. We now observe that for fixed b and C, (b T C )D is a Gaussian random vector of variance b T C σ centered at (b T C ) D, where D is the center of D. We have D A, by the assumptions of the theorem; so, b T C D b T C D b T C. 0

12 Thus, if we let t = (b T C D)/ b T C, then for any fixed b and C, t is a Gaussian random vector of variance σ centered at a point of norm at most. We also have [ ] [ b T C D b x T C t x]. b,c,d It follows from Lemma 4. that [ b T C x] b,c Hence, we may apply Corollary C.5 to show [ b T C t x] b,c,t = b,c,t σ (k ) + π xσ σ (k ) + σ (n k + ) + π xσ Note that A n,: is a Gaussian random vector in R n of variance σ. As A n,: is independent of b, C and D, we can again apply Lemma C.4 to show k b [ T C D ] k σ x (k ) + σ [ ] (n k + ) + A n,: π xσ A n,: ) k ( + nσ π xσ nσ, by Lemma A.4. From the proof of Theorem 4., we have that for k = n n [ u / A > + x] π xσ. (4.3) Applying a union bound over the choices for k, we obtain [ρ U (A) > + x] = n ) k ( + nσ π xσ nσ + n π xσ k= ( n + n ) σ n + π 3 x π xσ ( π x 3 n3/ + n σ + 4 ) n 3 σ

13 4. Growth in L Let L be the lower-triangular part of the LU-factorization of A. We have L (k+):n,k = A (k ) (k+):n,k /A(k ) k,k, where we let A (k) denote the matrix remaining after the first k columns have been eliminated. We will show that it is unlikely that L (k+):n,k is large by proving that it is unlikely that is large while A (k ) is small. A (k ) (k+):n,k k,k Theorem 4.4 (ρ L (A)). Ä Ø A Ò n¹ Ý¹n Ñ ØÖ Ü ÓÖ Û A Ò Ð Ø A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ì Ò ( n [ρ L (A) > x] π x σ + ) ÐÓ n + π ÐÓ n Proof. We have L (k+):n,k = A(k ) (k+):n,k A (k ) k,k = A (k+):n,k A (k+):n,:(k ) A :(k ),:(k ) A :(k ),k A k,k A k,:(k ) A :(k ),:(k ) A :(k ),k = A (k+):n,k A (k+):n,:(k ) v A k,k A k,:(k ) v, where we let v = A :(k ),:(k ) A :(k ),k. Since A, and all the terms A(k+):n,k, A (k+):n,:(k ), A k,k, A k,:(k ) and v are independent, we can apply Lemma 4.5 to show that ] [ L (k+):n,k > x π x ( σ + ÐÓ n + ) π ÐÓ n The theorem now follows by applying a union bound over the n choices for k and observing that L is at most n times the largest entry in L. Lemma 4.5 (Vector Ratio). Ä Ø a Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ Û Ø Ñ Ò a Ó ÓÐÙØ Ú ÐÙ Ø ÑÓ Ø b Ù Ò Ö Ò ÓÑ d¹ú ØÓÖ Ó Ú Ö Ò σ ÒØ Ö Ø ÔÓ ÒØ b Ó ÒÓÖÑ Ø ÑÓ Ø x Ù Ò Ö Ò ÓÑ n¹ú ØÓÖ Ó Ú Ö Ò σ ÒØ Ö Ø ÔÓ ÒØ Ó ÒÓÖÑ Ø ÑÓ Ø

14 Y Ù Ò Ö Ò ÓÑ n¹ Ý¹d Ñ ØÖ Ü Ó Ú Ö Ò σ ÒØ Ö Ø Ñ ØÖ Ü Ó ÒÓÖÑ Ø ÑÓ Ø Ò Ð Ø v Ò Ö ØÖ ÖÝ d¹ú ØÓÖº Á a b x Ò Y Ö Ò Ô Ò ÒØ Ò σ Ø Ò ] x + Yv [ a + b T > x v π x ( σ + ÐÓ n + ) π ÐÓ n Proof. We begin by observing that a + b T v and each component of x + Yv is a Gaussian random variable of variance σ (+ v ) whose mean has absolute value at most + v, and that all these variables are independent. By Lemma A.3, ( )( ) [ x + Yv ] + v + σ ( + v ) ÐÓ n +. π ÐÓ n On the other hand, Lemma A. implies [ ] a + b T v > x. (4.4) π xσ + v Thus, we can apply Corollary C.4 to show [ x + Yv a + b T v ] + v + > x π ( σ ) ( + v ÐÓ n + xσ + v ( ) ( σ + v ÐÓ n + = + v π x + σ + v σ + v ( π x σ + ) ÐÓ n + π ÐÓ n πðó n ) πðó n ) 5 Smoothed Analysis of Gaussian Elimination We now combine the results from the previous sections to bound the smoothed precision needed to obtain b-bit answers using Gaussian elimination without pivoting. 3

15 Theorem 5. (Smoothed precision of Gaussian elimination). ÓÖ n > e 4 Ð Ø A Ò n¹ Ý¹n Ñ ØÖ Ü ÓÖ Û A Ò Ð Ø A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ /4º Ì Ò Ø ÜÔ Ø ÒÙÑ Ö Ó Ø Ó ÔÖ ÓÒ Ò ÖÝ ØÓ ÓÐÚ Ax = b ØÓ b Ø Ó ÙÖ Ý Ù Ò Ù Ò Ð Ñ Ò Ø ÓÒ Û Ø ÓÙØ Ô ÚÓØ Ò Ø ÑÓ Ø b + 7 ( ) ÐÓ n + 3 ÐÓ + ÐÓ ( + nσ) + ÐÓ σ ÐÓ n + ÐÓ n Proof. By Wilkinson s theorem, we need the machine precision, ǫ mach, to satisfy 5 b nρ L (A)ρ U (A)κ(A)ǫ mach =.33 + b + ÐÓ n + ÐÓ (ρ L (A)) + ÐÓ (ρ U (A)) + ÐÓ (κ(a)) ÐÓ (/ǫ mach ). We will apply Lemma C.6 to bound these log-terms. For any matrix A satisfying A, Theorem 4. implies ( ) [ÐÓ ρ U (A)] ÐÓ n + ÐÓ + 0., σ and Theorem 4.4 implies [ÐÓ ρ L (A)] ÐÓ n + ÐÓ ( σ + ÐÓ n ( + )) +.6 ÐÓ n using σ and n > e4, Theorem 3. implies ÐÓ n + ÐÓ ( σ [ ] A ÐÓ ) + ÐÓ ÐÓ n + ÐÓ n +.6 ÐÓ n + ÐÓ ( σ ) +.68, and, [ÐÓ ( A )] ÐÓ ( + nσ) follows from the well-known fact that the expectation of A A is at most nσ (c.f., [Seg00]) and that [ÐÓ (X)] ÐÓ [X] for every positive random variable X. Thus, the expected number of digits of precision needed is at most b + 7 ( ) ÐÓ n + 3 ÐÓ + ÐÓ ( + nσ) + ÐÓ σ ÐÓ n + ÐÓ n The following conjecture would further improve the coefficient of ÐÓ (/σ). 4

16 Conjecture. Ä Ø A n¹ Ý¹n Ñ ØÖ Ü ÓÖ Û A Ò Ð Ø A Ñ ØÖ Ü Ó Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ì Ò ÓÖ ÓÑ ÓÒ Ø ÒØ c Ò c º [ρ L (A)ρ U (A)κ(A) > x] nc ÐÓ c (x), xσ 6 Zero-preserving perturbations of symmetric matrices with diagonals Many matrices that occur in practice are symmetric and sparse. Moreover, many matrix algorithms take advantage of this structure. Thus, it is natural to study the smoothed analysis of algorithms under perturbations that respect symmetry and non-zero structure. In this section, we study the condition numbers and growth factors of Gaussian elimination without pivoting of symmetric matrices under perturbations that only alter their diagonal and non-zero entries. Definition 6. (Zero-preserving perturbations). Ä Ø T Ñ ØÖ Üº Ï Ò Þ ÖÓ¹ ÔÖ ÖÚ Ò Ô ÖØÙÖ Ø ÓÒ Ó T Ó Ú Ö Ò σ ØÓ Ø Ñ ØÖ Ü T Ó Ø Ò Ý Ò Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ñ Ò ¼ Ò Ú Ö Ò σ ØÓ Ø ÒÓÒ¹Þ ÖÓ ÒØÖ Ó Tº In the lemmas and theorems of this section, when we express a symmetric matrix A as T+D+T T, we mean that T is lower-triangular with zeros on the diagonal and D is a diagonal matrix. By making a zero-preserving perturbation to T, we preserve the symmetry of the matrix. The main results of this section are that the smoothed condition number and growth factors of symmetric matrices under zero-preserving perturbations to T and diagonal perturbations to D have distributions similar those proved in Sections 3 and 4 for dense matrices under dense perturbations. 6. Bounding the condition number We begin by recalling that the singular values and vectors of symmetric matrices are the eigenvalues and eigenvectors. Lemma 6.. Ä Ø A = T + D + T T Ò Ö ØÖ ÖÝ n¹ Ý¹n ÝÑÑ ØÖ Ñ ØÖ Üº Ä Ø T Þ ÖÓ¹ ÔÖ ÖÚ Ò Ô ÖØÙÖ Ø ÓÒ Ó T Ó Ú Ö Ò σ Ð Ø G D ÓÒ Ð Ñ ØÖ Ü Ó Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ Ò Ñ Ò ¼ Ò Ð Ø D = D + G D º Ì Ò ÓÖ A = T + D + T T Proof. [ ] (T + D + T T ) x [ ] A x Ñ Ü T 5 π n3/ /xσ. [ ] ((T + D + T T ) + G D ) x. G D

17 The proof now follow from Lemma 6.3, taking T + D + T T as the base matrix. Lemma 6.3. Ä Ø A Ò Ö ØÖ ÖÝ n¹ Ý¹n ÝÑÑ ØÖ Ñ ØÖ Ü Ð Ø G D ÓÒ Ð Ñ ØÖ Ü Ó Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ Ò Ñ Ò ¼ Ò Ð Ø A = A + G D º Ì Ò [ ] A x π n3/ /xσ. Proof. Let x,...,x n be the diagonal entries of G D, and let g = n n x i, and i= y i = x i g. Then, [ ] ( A + G D ) x y,...,y n,g [ ] ( = A + diag(y,...,y n ) + gi) x y,...,y n,g [ ] ( Ñ Ü A + diag(y,...,y n ) + gi) x. y,...,y n g The proof now follows from Proposition 6.4 and Lemma 6.5. Proposition 6.4. Ä Ø x,...,x n Ò Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ Û Ø Ñ Ò a,...,a n Ö Ô Ø Ú ÐÝº Ä Ø g = n n x i, Ò i= y i = x i g. Ì Ò g Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ /n Û Ø Ñ Ò (/n) a i Ò Ô Ò ÒØ Ó y,...,y n º Lemma 6.5. Ä Ø A Ò Ö ØÖ ÖÝ n¹ Ý¹n ÝÑÑ ØÖ Ñ ØÖ Ü Ò Ð Ø g Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ñ Ò 0 Ò Ú Ö Ò σ /nº Ä Ø A = A + giº Ì Ò [ ] A x π n3/ /xσ. Proof. Let λ,...,λ n be the eigenvalues of A. Then, ( A + gi) = Ñ Ò λ i + g. i 6

18 By Lemma A., [ λ i g < ǫ] < [ ] Ñ Ò λ i g < ǫ < i nǫ/σ; so, π π n3/ ǫ/σ. As in Section 3, we can now prove: Theorem 6.6 (Condition number of symmetric matrices). Ä Ø A = T + D + T T Ò Ö ØÖ ÖÝ n¹ Ý¹n ÝÑÑ ØÖ Ñ ØÖ Ü Ø Ý Ò A º Ä Ø σ Ð Ø T Þ ÖÓ¹ÔÖ ÖÚ Ò Ô ÖØÙÖ Ø ÓÒ Ó T Ó Ú Ö Ò σ Ð Ø G D ÓÒ Ð Ñ ØÖ Ü Ó Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ Ò Ñ Ò 0 Ò Ð Ø D = D + G D º Ì Ò ÓÖ A = T + D + T T n [κ(a) x] 4 π xσ ( + ) ÐÓ (x)/n Proof. As in the proof of Theorem 3.3, we can apply the techniques used in the proof of [DS0, Theorem II.7], to show [ A A ] d + k < e k /. The rest of the proof follows the outline of the proof of Theorem 3.3, using Lemma 6. instead of Theorem Bounding entries in U In this section, we will prove: Theorem 6.7 (ρ U (A) of symmetric matrices). Ä Ø A = T + D + T T Ò Ö ØÖ ÖÝ n¹ Ý¹n ÝÑÑ ØÖ Ñ ØÖ Ü Ø Ý Ò A º Ä Ø σ Ð Ø T Þ ÖÓ¹ÔÖ ÖÚ Ò Ô ÖØÙÖ Ø ÓÒ Ó T Ó Ú Ö Ò σ Ð Ø G D ÓÒ Ð Ñ ØÖ Ü Ó Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ Ò Ñ Ò 0 Ò Ð Ø D = D + G D º Ì Ò ÓÖ A = T + D + T T [ρ U (A) > + x] n 7/ 7 π xσ Proof. We proceed as in the proof of Theorem 4., where we derived (4.) U k,k:n + A k,:k A A :k,:k + k A k,:k A :k,:k + A k A k,:k :k,:k 7

19 Hence [ Uk,k:n A ] [ A > + x A k,:k (k ) [ A k,:k ] π xσ + jσ (k ) π xσ where j is the number of non-zeros in A k,:k, k(k ). π xσ :k,:k > Applying a union bound over k, n [ρ U (A) > x] k(k ) π xσ k= n 7/ 7 π xσ. ] x k, by Lemmas 6. and C.4, 6.3 Bounding entries in L As in Section 4., we derive a bound on the growth factor of L. As before, we will show that it is unlikely that A (k ) j,k is large while A (k ) k,k is small. However, our techniques must differ from those used in Section 4., as the proof in that section made critical use of the independence of A k,:(k ) and A :(k ),k. Theorem 6.8 (ρ L (A) of symmetric matrices). Ä Ø σ º Ä Ø A = T + D+ T T Ò Ö ØÖ ÖÝ n¹ Ý¹n ÝÑÑ ØÖ Ñ ØÖ Ü Ø Ý Ò A º Ä Ø T Þ ÖÓ¹ÔÖ ÖÚ Ò Ô ÖØÙÖ Ø ÓÒ Ó T Ó Ú Ö Ò σ Ð Ø G D ÓÒ Ð Ñ ØÖ Ü Ó Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò σ Ò Ñ Ò 0 Ò Ð Ø D = D + G D º Ì Ò ÓÖ A = T + D + T T Proof. Using Lemma 6.9, we obtain for all k [ρ L (A) > x] 3.n4 xσ ÐÓ 3/( e π/xσ ). [ j > k : L j,k > x] [ L(k+):n,k > x ] 3.n xσ ÐÓ 3/( e π/xσ ). 8

20 Applying a union bound over the choices for k, we then have [ j,k : L j,k > x] 3.n3 xσ ÐÓ 3/( e π/xσ ). The result now follows from the fact that L is at most n times the largest entry in L. Lemma 6.9. ÍÒ Ö Ø ÓÒ Ø ÓÒ Ó Ì ÓÖ Ñ º Proof. We recall that [ L(k+):n,k > x ] 3.n xσ ÐÓ 3/( e π/xσ ). L k+:n,k = A k+:n,k A k+:n,:k A :k,:k A :k,k A k,k A k,:k A :k,:k A :k,k Because of the symmetry of A, A k,:k is the same as A :k,k, so we can no longer use the proof that worked in Section 4.. Instead we will bound the tails of the numerator and denominator separately. Consider the numerator first. Setting v = A :k,:k A :k,k, the numerator can be written A k+:n,:k ( v T A k+:n,:k,a :k,:k ). We will now prove [ ) v T A k+:n,:k( ] ( n ( + σ ) ÐÓ (xσ)) + n > x. (6.) π xσ It suffices to prove this for all x for which the right-hand side is less than, so in particular it suffices to consider x for which x + σ, (6.) ÐÓ (xσ) and xσ. We divide this probability accordingly to a parameter c, which we will set so that c cσ = ÐÓ (xσ). We have A k+:n,:k,a :k,:k A :(k ),:k + A k+:n,:k [ v T ) A k+:n,:k( ] > x [( ) ] v T > cx [ ( ) v T A k+:n,:k > ( ) v T ( ) v T c < cx] (6.3) (6.4) 9

21 ( ) Once v is fixed, each component of A v T k+:n,:k is a Gaussian random vector of variance ( + v )σ ( + v ) σ and mean at most A ( ) ( ) v T k+:n,:k v T. So, ( v T ) A k+:n,:k > ( v T ) c implies some term in the numerator is more than /c standard deviations from its mean, and we can therefore apply Lemma A. and a union bound to derive (6.4) π ne ( c cσ ) c cσ n π xσ ÐÓ (xσ). To bound (6.4), we note that Lemma 6. and Corollary C.5 imply [ ] A A :k,:k A n :k,k > y :(k ),:k π yσ, and so So, A :(k ),:k [( ) v T [ A > cx] A :(k ),:k n π (cx )σ π :k,:k A :k,k n ( + σ ÐÓ (xσ)), by (6.). xσ ( ) n (6.) π xσ ÐÓ (xσ) + n ( + σ ÐÓ (xσ)) xσ ( n ( + σ ) ÐÓ (xσ)) + n, π xσ ] > cx by the assumption xσ, which proves (6.). As for the denominator, we note that A k,k is a independent of all the other terms, and hence [ ] Ak,k A k,:k A :k,:k A :k,k < /x π xσ, 0

22 by Lemma A.. Applying Corollary C.3 with ( ) α = n + n π β = 4n σ γ = π π to combine this inequality with (6.), we derive the bound ( (( πxσ n + n ) ) σ/3 n + n ÐÓ 3/( )) π/xσ n ( πxσ )( σ/3 ÐÓ 3/( ) ) π/xσ + as σ. 7 Conclusions and open problems 7. Generality of results 3.n xσ ÐÓ 3/( e π/xσ ), With the exception of the proof of Theorem 3., the only properties of Gaussian random vectors that we used in Sections 3 and 4 are. there is a constant c for which the probability that a Gaussian random vector has distance less than ǫ to a hyperplane is at most cǫ, and. it is exponentially unlikely that a Gaussian random vector lies far from its mean. Moreover, a result similar to Theorem 3. but with an extra factor of d could be proved using just fact. In fact, results of a character similar to ours would still hold if the second condition were reduced to a polynomial probability. Many other families of perturbations share these properties. For example, similar results would hold if we let A = A + U, where U is a matrix of variables independently uniformly chosen in [ σ,σ], or if A = A + S, where the columns of S are chosen uniformly among those vectors of norm at most σ.

23 7. Counter-Examples The results of sections 3 and 4 do not extend to zero-preserving perturbations. For example, the following matrix remains ill-conditioned under zero-preserving perturbations A symmetric matrix that remains ill-conditioned under zero-preserving perturbations that do not alter the diagonal can be obtained by locating the above matrix in the upper-right quadrant, and its transpose in the lower-left quadrant: The following matrix maintains large growth factor under zero-preserving perturbations, regardless of whether partial pivoting or no pivoting is used. 7.3 Open Problems Questions that naturally follow from this work are: What is the probability that the perturbation of an arbitrary matrix has large growth factors under Gaussian elimination with partial pivoting.

24 Can zero-preserving perturbations of symmetric matrices have large growth factors under partial pivoting? Can zero-preserving perturbations of arbitrary matrices have large growth factors under complete pivoting? For the first question, we point out that experimental data of Trefethen and Bau[TB97, p. 68] suggest that the probability that the perturbation of an arbitrary matrix has large growth factor under partial pivoting may be exponentially smaller than without pivoting. This leads us to conjecture: Conjecture 3. Ä Ø A Ò n¹ Ý¹n Ñ ØÖ Ü ÓÖ Û A Ò Ð Ø A Ñ ØÖ Ü Ó Ò¹ Ô Ò ÒØ Ù Ò Ö Ò ÓÑ Ú Ö Ð ÒØ Ö Ø A Ó Ú Ö Ò σ º Ä Ø U Ø ÙÔÔ Ö¹ØÖ Ò ÙÐ Ö Ñ ØÖ Ü Ó Ø Ò ÖÓÑ Ø ÄÍ¹ ØÓÖ Þ Ø ÓÒ Ó A Û Ø Ô ÖØ Ð Ô ÚÓØ Ò º Ì Ö Ü Ø ÓÐÙØ ÓÒ Ø ÒØ k k Ò α ÓÖ Û [ U Ñ Ü / A Ñ Ü > x + ] n k e αxk σ Finally, we ask whether similar analyses can be performed for other algorithms of Numerical Analysis. One might start by extending Smale s program by analyzing the smoothed values of other condition numbers. 8 Acknowledgments We thank Alan Edelman for suggesting we examine growth factors, and for his continuing support of our efforts. We thank Juan Cuesta and Mario Wschebor for pointing out some mistakes in an early draft of this paper. References [ABB + 99] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users Guide, Third Edition. SIAM, Philadelphia, 999. [AS64] [BD0] Milton Abramowitz and Irene A. Stegun, editors. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, volume 55 of Applied mathematics series. U. S. Department of Commerce, Washington, DC, USA, 964. Tenth printing, with corrections (December 97). Avrim Blum and John Dunagan. Smoothed analysis of the perceptron algorithm for linear programming. In SODA 0, pages , 00. 3

25 [Blu89] [Dem88] Lenore Blum. Lectures on a theory of computation and complexity over the reals (or an arbitrary ring). In Erica Jen, editor, The Proceedings of the 989 Complex Systems Summer School, Santa Fe, New Mexico, volume, pages 47, June 989. James Demmel. The probability that a numerical analysis problem is difficult. Math. Comp., pages , 988. [Dem97] James Demmel. Applied Numerical Linear Algebra. SIAM, 997. [DS0] K. R. Davidson and S. J. Szarek. Handbook on the Geometry of Banach spaces, chapter Local operator theory, random matrices, and Banach spaces, pages Elsevier Science, 00. [DST0] John Dunagan, Daniel A. Spielman, and Shang-Hua Teng. Smoothed analysis of Renegar s condition number for linear programming. Available at spielman/smoothedanalysis, 00. [Ede88] [Ede9] [GL83] [GLS9] Alan Edelman. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl., 9(4): , 988. Alan Edelman. Eigenvalue roulette and random test matrices. In Marc S. Moonen, Gene H. Golub, and Bart L. R. De Moor, editors, Linear Algebra for Large Scale and Real-Time Applications, NATO ASI Series, pages Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins Series in the Mathematical Sciences. The Johns Hopkins University Press and North Oxford Academic, Baltimore, MD, USA and Oxford, England, 983. Martin Grotschel, Laszlo Lovasz, and Alexander Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, 99. [Hig90] Nick Higham. How accurate is Gaussian elimination? In Numerical Analysis 989, Proceedings of the 3th Dundee Conference, volume 8 of Pitman Research Notes in Mathematics, pages 37 54, 990. [JKB95] [KJ8] [LT9] N. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions, volume. Wiley-Interscience, 995. Samuel Kotz and Norman L. Johnson, editors. Encyclopedia of Statistical Sciences, volume 6. John Wiley & Sons, 98. Michel Ledoux and Michel Talagrand. Probability in Banach Spaces. Springer-Verlag, 99. 4

26 [Ren95] [Seg00] J. Renegar. Incorporating condition measures into the complexity theory of linear programming. SIAM J. Optim., 5(3):506 54, 995. Yoav Seginer. The expected norm of random matrices. Combinatorics, Probability and Computing, 9:49 66, 000. [Sma97] Steve Smale. Complexity theory and numerical analysis. Acta Numerica, pages 53 55, 997. [Smo] [ST0] spielman/smoothedanalysis. Daniel Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. In Proceedings of the 33rd Annual ACM Symposium on the Theory of Computing (STOC 0), pages , 00. [TB97] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, Philadelphia, PA, 997. [TS90] [Wil6] [YC97] Lloyd N. Trefethen and Robert S. Schreiber. Average-case stability of Gaussian elimination. SIAM Journal on Matrix Analysis and Applications, (3): , 990. J. H. Wilkinson. Error analysis of direct methods of matrix inversion. J. Assoc. Comput. Mach., 8:6 330, 96. Man-Chung Yeung and Tony F. Chan. Probabilistic analysis of Gaussian elimination without pivoting. SIAM J. Matrix Anal. Appl., 8():499 57, 997. A Gaussian random variables We recall that the probability density function of a d-dimensional Gaussian random vector with covariance matrix σ I d and mean µ is given by n(µ,σ I d )(x) = (πσ ) d/e σ dist(x,µ) Lemma A.. Ä Ø x ÙÒ Ú Ö Ø Ù Ò Ö Ò ÓÑ Ú Ö Ð ØÖ ÙØ N(0,)º Ì Ò ÓÖ ÐÐ k > º Proof. We have [x k] e k. π k [x k] = π k e x dx 5

27 putting t = x, e t π k k dt = e k. π k Lemma A.. Ä Ø x d¹ Ñ Ò ÓÒ Ð Ù Ò Ö Ò ÓÑ Ú ØÓÖ Ó Ú Ö Ò σ Ò Ð Ø H ÝÔ ÖÔÐ Ò º Ì Ò [dist (x,h) ǫ] /πǫ/σ. Lemma A.3. Ä Ø g,...,g n Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ñ Ò ¼ Ò Ú Ö Ò º Ì Ò [ ] Ñ Ü g i ÐÓ n +. i π ÐÓ n Proof. applying Lemma A., [ ] Ñ Ü g i i = [ t=0 ÐÓ n t=0 Ñ Ü g i t i dt + ] dt ÐÓ n n [ g t] dt ÐÓ n + n e t dt ÐÓ n π t ÐÓ n + n ÐÓ n π e t dt ÐÓ n ÐÓ n) ÐÓ n + n e ( ÐÓ n π ÐÓ n = ÐÓ n + π ÐÓ n Lemma A.4 (Expectation of reciprocal of the L norm of a Gaussian vector). Ä Ø a Ò n¹ Ñ Ò ÓÒ Ð Ù Ò Ö Ò ÓÑ Ú ØÓÖ Ó Ú Ö Ò σ ÓÖ n º Ì Ò [ ] a nσ 6

28 Proof. Let a = (a,...,a n ). Without loss of generality, we assume σ =. For general σ, we can simply scale the bound by the factor /σ. It is also clear that the expectation of / a is maximized if the mean of a is zero, so we will make this assumption. Recall that the Laplace transform of a positive random variable X is defined by [ L[X](t) = X e tx] and the expectation of the reciprocal of a random variable is simply the integral of its Laplace transform. Let X be the absolute value of a standard normal random variable. The Laplace transform of X is given by L[X](t) = e tx e x dx π 0 = π e t e (x+t) dx 0 = π e t e x dx t ( ) = e t t Ö. We now set a constant c =.4 and set α to satisfy ( ) c/π = e c/π (c/π) Ö. α ) As e Ö ( t t is convex, we have the upper bound ( ) e t t Ö t α, for 0 t c/π. For t > c/π, we apply the upper bound ( ) e t t Ö π t. 7

29 We now have for n. [ ] = a 0 c/π 0 ( e t Ö (t/ )) n dt α n + + n, ( α) t n dt + (/c) (n )/ π n c/π ( ) n dt π t B Random point on sphere Lemma B.. Ä Ø u,...,u d ÙÒ Ø Ú ØÓÖ Ó Ò ÙÒ ÓÖÑÐÝ Ø Ö Ò ÓÑ Ò R d º Ì Ò ÓÖ c [ ] c u [ g c ], d Û Ö g Ù Ò Ö Ò ÓÑ Ú Ö Ð Ó Ú Ö Ò Ò Ñ Ò 0º Proof. We may obtain a random unit vector by choosing d Gaussian random variables of variance and mean 0, x,...,x d, and setting u i = x i. x + + x d We have We now note that [ u c ] [ x = d x + + c ] x d d (d )x = [ x + x d (d )x [ x + c x d t d = (d )x x + x d ] (d )c d c ], since c. 8

30 is the random variable distributed according to the t-distribution with d degrees of freedom. The lemma now follows from the fact (c.f. [JKB95, Chapter 8, Section ] or [AS64, 6.7.5]) that, for c > 0, [ t d > c ] [ g > c ], and that the distributions of t d and g are symmetric about the origin. C Combination Lemma Lemma C.. Ä Ø A Ò B ØÛÓ ÔÓ Ø Ú Ö Ò ÓÑ Ú Ö Ð º ÙÑ ½º [A x] f(x)º ¾º [B x A] g(x)º Û Ö g ÑÓÒÓØÓÒ ÐÐÝ Ö Ò Ò Ð Ñ x g(x) = 0º Ì Ò [AB x] 0 ( x f ( g t) (t))dt Proof. Let µ A denote the probability measure associated with A. We have integrating by parts, [AB x] = = = [B x/t A] dµ A(t) B ( x g dµ A (t) t) [A t] d ( x ) dt g dt t f(t) d ( x ) dt g dt t ( x f ( g t) (t))dt Corollary C. (linear-linear). Ä Ø A Ò B ØÛÓ ÔÓ Ø Ú Ö Ò ÓÑ Ú Ö Ð º ÙÑ ½º [A x] α x Ò ¾º [B x A] β x 9

31 ÓÖ ÓÑ α,β > 0º Ì Ò [AB x] αβ x ( ( )) x + ÐÒ αβ Proof. As the probability of an event can be at most, ( α ) [A x] Ñ Ò x, = f(x), and ( ) β [B x] Ñ Ò x, = g(x). Applying Lemma C. while observing g (t) = 0 for t [0,β], and f(x/t) = for t x/α, we obtain [AB x] β 0 = αβ x = αβ x αt x 0dt + x/α β ( + ÐÒ x/α β dt t + αβ x ( x αβ αt x )). β t dt + x/α β t dt Corollary C.3. Ä Ø A Ò B ØÛÓ ÔÓ Ø Ú Ö Ò ÓÑ Ú Ö Ð º ÙÑ ( ) ½º [A x] Ñ Ò º ¾º [B x A] γ xσ º, α+β ÐÒxσ σx ÓÖ ÓÑ α Ò β,γ,σ > 0º Ì Ò Proof. Define f and g by [AB x] αγ xσ f(x) = g(x) = Applying Lemma C. while observing ( ( β + 3α + )ln 3/ ( xσ { for x α σ α+β ÐÒxσ xσ for x > α σ { for x γ σ γ xσ for x > γ σ 30 γ )).

32 g (t) = 0 for t [ 0, γ σ], and f(x/t) = for t xσ/α, we obtain [AB x] = xσ/α γ/σ xσ/α γ/σ (substituting s = ÐÒ(xσ/t), t = xσe s ) α + β ÐÒ(xσ/t) xσ/t γ t σ dt + α + β ÐÒ(xσ/t) γ αγ xσ dt + t xσ xσ/α γ σt dt as α. = ÐÒα α + βs γ ÐÒ(xσ /γ) xσ xσe s xσ( se s )ds + αγ xσ ÐÒ(xσ /γ) = γ xσ = αγ xσ αγ xσ ÐÒα ( + ÐÒ ( + s(α + βs)ds + αγ xσ ( ) xσ + β ( xσ (ÐÒ 3/ αγ 3α γ ( ) ( )) β xσ 3α + ln 3/, γ ) )) ÐÒ 3/ α Lemma C.4 (linear-bounded expectation). Ä Ø A B Ò C ÔÓ Ø Ú Ö Ò ÓÑ Ú Ö Ð Ù Ø Ø [A x] α x, ÓÖ ÓÑ α > 0 Ò Ì Ò A, [B x A] [C x]. [AB x] α x [C]. Proof. Let g(x) be the distribution function of C. By Lemma C., we have ( ) αt [AB x] ( ( g) (t)) dt x = α x 0 0 = α x [C]. t(g (t)) dt 3

33 Corollary C.5 (linear-chi). Ä Ø A ÔÓ Ø Ú Ö Ò ÓÑ Ú Ö Ð Ù Ø Ø ½º [A x] α x º ÓÖ ÓÑ α > 0º ÓÖ Ú ÖÝ A Ð Ø b d¹ Ñ Ò ÓÒ Ð Ù Ò Ö Ò ÓÑ Ú ØÓÖ ÔÓ ÐÝ Ô Ò Ò ÙÔÓÒ Aµ Ó Ú Ö Ò Ø ÑÓ Ø σ ÒØ Ö Ø ÔÓ ÒØ Ó ÒÓÖÑ Ø ÑÓ Ø k Ò Ð Ø B = b º Ì Ò [AB x] α σ d + k x Proof. As [B] [B ], and it is known [KJ8, p. 77] that the expected value of B the non-central χ -distribution with non-centrality parameter b is d + b, the corollary follows from Lemma C.4. Lemma C.6 (Linear to log). Ä Ø A ÔÓ Ø Ú Ö Ò ÓÑ Ú Ö Ð º ÙÑ A [A x] α x, ÓÖ ÓÑ α º Ì Ò A [ÐÓ A] ÐÓ α +. Proof. A [ÐÓ A] = = x=0 A ÐÓ α x=0 [ÐÓ A x]dx = dx + x=ðó α x=0 Ñ Ò(, α e x)dx αe x dx = ÐÓ α +. 3

arxiv:cs/ v4 [cs.na] 21 Nov 2005

arxiv:cs/ v4 [cs.na] 21 Nov 2005 arxiv:cs/0300v4 [cs.na] Nov 005 Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices Arvind Sankar Department of Mathematics Massachusetts Institute of Technology Shang-Hua Teng Department