18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest sngular value n addton to the lower bound on the smallest that we derved last class. Snce the largest sngular value of A + G can be bounded by σ n (A + G) = A + G A + G and we can t really do much about A, the mportant thng to do s bound G. To start off wth a weak but easy bound, we use the followng smple lemma. Lemma 1. If a denote the columns of the matrx A, then max a A d max a Proof. If e denotes the vector wth 1 n the th component but 0 s everywhere else, then Ae = a Hence the left hand nequalty s clear. For the other nequalty, let x be a unt vector and wrte ( ) Ax = A x e = x a Therefore Applyng Cauchy Schwarz and usng the fact that x = 1, we get Ax x a 2 d max a 2 whch s what we want. Ax x a If g s a vector of Gaussan random varables wth varance 1, then g 2 s dstrbuted accordng to the χ 2 dstrbuton wth d degrees of freedom, whch has densty functon x d/2 1 e x/2 We need the followng bound on how large a χ 2 random varable can be. 1
Lemma 2. If X s a random varable dstrbuted accordng to the χ 2 dstrbuton wth d degrees of freedom, then Pr{X kd} k d/2 1 e d(k 1)/2 Snce G kd mples max g k d, hence usng lemma 2 and the unon bound, we get e d(k2 1)/2 Pr{ G kd} dk d 2 2 A sharper bound usng nets The bound above s unsatsfyng: for any fxed unt vector x, the vector Gx s a Gaussan random vector, and so ts length should be about d on average. Ths secton wll show how to get a bound on G that uses ths dea to get a bound on G that grows as d rather than as d. Let S d 1 denote the (d 1) dmensonal unt sphere (the boundary of the unt ball n d dmensons). Defnton 1. A λ net on S d 1 s a collecton of ponts {x 1, x 2,... x n } such that for any x S d 1, mn x x λ We wll use only 1 nets, and the followng lemma clams that they need not be too large. Lemma 3. For d 2, there exsts a 1 net wth at most 2 d (d 1) ponts. Usng ths lemma, we can prove the followng bound on G : Lemma 4. If G s a matrx of standard normal varables, then e d(k2 1)/2 Pr{ G 2k d} 2 d (d 1)k d 2 (Ths lemma appears wth a slghtly dfferent bound as lemma 2.8 on pg. 907 of [Sza90]) Proof. Let N be the 1 net gven by lemma 3. Let G = UΣV T be the sngular value decomposton of G, and let u and v be the columns of U and V respectvely. By defnton of the net, there exsts a vector x N such that v n x 1 Ths s equvalent to 1 v n x 2 Expandng x n the bass v, we obtan x = x v 2
wth x n 1/2. Hence Gx = x Gv = x σ u x n σ n G /2 Hence G 2k d mples that there exsts x N such that Gx k d By the unon bound and lemma 2, we obtan Pr{ G 2k d} N k d 2 e d(k2 1)/2 whch s the stated result. 3 Gaussan elmnaton In the next couple of lectures, we wll use the results we have proved to analyze Gaussan elmnaton. Brefly, Gaussan elmnaton solves a system Ax = b by performng row and column operatons on A to reduce t to an upper trangular matrx, whch can then be easly solved. Theoretcally, one can vew ths process as factorng A nto a product of a lower trangular matrx representng the row operatons performed (actually, ther nverses), and an upper trangular matrx representng the result of these operatons. Ths s called the LU factorzaton of A. There are three pvotng strateges one can use whle performng ths algorthm (pvotng s the process of permutng rows and/or columns before dong the elmnaton). 1. No pvotng: Just what t says. Ths can be done only f we never run nto zeros on the dagonal. Ths s easy to analyze. 2. Partal pvotng: Here only row permutatons are permtted. The strategy s to brng the largest entry n the column we are consderng onto the dagonal. The LU factorzaton now actually has to be wrtten as LU = P A where P s a permutaton matrx representng the row permutatons performed. Partal pvotng guarantees that no entry n L can exceed 1 n absolute value. 3. Complete pvotng: Here both row and column permutatons are permtted, and the strategy s to move the largest entry n the part of the matrx that we have not yet processed to the dagonal. The factorzaton now looks lke LU = P AQ where P and Q are permutaton matrces. 3
Wlknson showed that f L, ˆ U ˆ and x ˆ represent the computed values of L, U and x n floatng pont to an accuracy of ɛ, then wth δa such that (A + δa)ˆx = b δa dɛ(3 A + 5 L U ) Matlab uses partal pvotng, and t can be shown that there exst matrces A for whch partal pvotng fals, n the sense that U becomes exponentally large (n d). Ths leads to a total loss of precson unless at least d bts are used to store ntermedate results. Wlknson also showed that for complete pvotng, U A d 1 2 lg d whch means that the number of bts requred s only lg 2 d n the worst case. However, complete pvotng s much more expensve n floatng pont than partal pvotng, whch seems to work qute well n practce. One of the goals of ths class s to understand why. In the next couple of lectures, we wll show n fact that no pvotng does well most of the tme. 4 Proof of techncal lemmas For completeness, we gve the proofs of lemmas 2 and 3. Proof of lemma 2. We have x d/2 1 e x/2 Pr{X kd} = dx kd (x + (k 1)d) d/2 1 e (k 1)d/2 x/2 = dx d Usng x + (k 1)d kx, and we are done. k d/2 1 e (k 1)d/2 k d/2 1 e (k 1)d/2 d x d/2 1 e x/2 dx Proof of lemma 3. Let N be a maxmal set of ponts on the unt sphere such that the great crcle dstance between any two ponts n N s at least π/3. Then N wll be a 1 net, because f u were a unt vector such that no vector n N s wthn dstance 1 of u, then there would be no pont of N wthn great crcle dstance π/3 of u, so u could be added to N. 4
To see that N (d 1)2 d, observe that the sets B(x, π/6) = {u S d 1 : d(u, x) π/6}, x N are dsjont. A lower bound on the (d 1) dmensonal volume of each B(x, π/6) s gven by the volume of the (d 1) dmensonal ball of radus sn(π/6) = 1/2. If S d 1 denotes the volume of S d 1 and V d the volume of the unt ball n d dmensons, then V d = 2π d/2 dγ(d/2) and S d 1 = 2πd/2 Γ(d/2) Hence N 2 d 1 S d 1 V d 1 Γ((d 1)/2) = 2 d 1 (d 1) π Γ(d/2) 2 d (d 1) A somewhat tghter bound can be obtaned by usng the fact that lm d Γ((d 1)/2) = e Γ(d/2) d References [Sza90] Stanslaw J. Szarek, Spaces wth large dstance to l n and random matrces, Amercan Journal of Mathematcs 112 (1990), no. 6, 899 942. 5