HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION MARK RUDELSON AND ROMAN VERSHYNIN Abstract. In ths expostory note, we gve a modern proof of Hanson-Wrght nequalty for quadratc forms n sub-gaussan random varables. We deduce a useful concentraton nequalty for sub-gaussan random vectors. Two examples are gven to llustrate these results: a concentraton of dstances between random vectors and subspaces, and a bound on the norms of products of random and determnstc matrces. 1. Hanson-Wrght nequalty Hanson-Wrght nequalty s a general concentraton result for quadratc forms n sub-gaussan random varables. A verson of ths theorem was frst proved n [, 9], however wth one weak pont mentoned n Remark 1.. In ths artcle we gve a modern proof of Hanson-Wrght nequalty, whch automatcally fxes the orgnal weak pont. We then deduce a useful concentraton nequalty for sub-gaussan random vectors, and llustrate t wth two applcatons. Our arguments use standard tools of hgh-dmensonal probablty. The reader unfamlar wth them may beneft from consultng the tutoral [8]. Stll, we wll recall the basc notons where possble. A random varable ξ s called sub-gaussan f ts dstrbuton s domnated by that of a normal random varable. One can express ths by the growth of the moments E ξ p = Op) p/ as p. Ths can be quanttatvely captured by the sub-gaussan norm of ξ, whch s defned as ξ ψ = sup p 1/ E X p ) 1/p ; p 1 ξ s sub-gaussan whenever ths quantty s fnte. One can smlarly defne subexponental random varables, by settng ξ ψ1 = sup p 1 p 1 E X p ) 1/p. For an m n matrx A = a j ), recall that the operator norm of A s A = max x 0 Ax / x and the Hlbert-Schmdt or Frobenus) norm of A s A =,j a,j ) 1/. Throughout the paper, C, C 1, c, c 1,... denote postve absolute constants. Theorem 1.1 Hanson-Wrght nequalty). Let X = X 1,..., X n ) R n be a random vector wth ndependent components X whch satsfy E X = 0 and X ψ K. Let A be an n n matrx. Then, for every t 0, } [ P X T AX E X T t t )] AX > t exp c mn K 4 A, K. A Date: June 1, 013. M. R. was partally supported by NSF grant DMS 116137. R. V. was partally supported by NSF grant DMS 100189 and 16578. 1
MARK RUDELSON AND ROMAN VERSHYNIN Remark 1. Relaton to the orgnal Hanson-Wrght nequalty). Improvng upon an earler result on Hanson-Wrght [], Wrght [9] establshed a slghtly weaker verson of Theorem 1.1. Instead of A = a j ), both papers had a j ) n the rght sde. The latter norm can be much larger than the norm of A, and t s often less easy to compute. Ths weak pont went unnotced n several later applcatons of Hanson-Wrght nequalty, however t was clear to experts that t could be fxed. Proof. By replacng X wth X/K we can assume wthout loss of generalty that K = 1. Let us frst estmate } p := P X T AX E X T AX > t. Let A = a j ) n,j=1. By ndependence and zero mean of X, we can represent X T AX E X T AX =,j a j X X j a E X = a X E X ) +,j: j a j X X j. The problem reduces to estmatng the dagonal and off-dagonal sums: } p P a X E X ) > t/ + P a j X X j > t/ =: p 1 + p.,j: j Step 1: dagonal sum. Note that X E X are ndependent mean-zero subexponental random varables, and X E X ψ1 X ψ1 4 X ψ 4K. These standard bounds can be found n [8, Remark 5.18 and Lemma 5.14]. Then we can use a Bernsten-type nequalty see [8, Proposton 5.16]) and obtan [ t t )] [ t t )] p 1 c mn, exp c mn,. 1.1) a max a A A Step : decouplng. It remans to bound the off-dagonal sum S := a j X X j.,j: j The argument wll be based on estmatng the moment generatng functon of S by decouplng and reducton to normal random varables. Let λ > 0 be a parameter whose value we wll determne later. By Chebyshev s nequalty, we have p = P S > t/} = P λs > λt/} exp λt/) E expλs). 1.) Consder ndependent Bernoull random varables δ 0, 1} wth E δ = 1/. Snce E δ 1 δ j ) equals 1/4 for j and 0 for = j, we have S = 4 E δ S δ, where S δ =,j δ 1 δ j )a j X X j. Here E δ denotes the expectaton wth respect to δ = δ 1,..., δ n ). Jensen s nequalty yelds E expλs) E X,δ exp4λs δ ) 1.3)
where E X,δ denotes expectaton wth respect to both X and δ. Consder the set of ndces Λ δ = [n] : δ = 1} and express S δ = a j X X j = ) X j a j X. Λ δ Λ δ, j Λ c δ Now we condton on δ and X ) Λδ. Then S δ s a lnear combnaton of meanzero sub-gaussan random varables X j, j Λ c δ, wth fxed coeffcents. It follows that the condtonal dstrbuton of S δ s sub-gaussan, and ts sub-gaussan norm s bounded by the l -norm of the coeffcent vector see e.g. n [8, Lemma 5.9]). Specfcally, S δ ψ Cσ δ where σδ := ). a j X Λ δ Next, we use a standard estmate of the moment generatng functon of sub-gaussan random varables, see [8, Lemma 5.5]. It yelds j Λ c δ j Λ c δ E Xj ) j Λ c δ exp4λs δ ) expcλ S δ ψ ) expc λ σ δ ). Takng expectatons of both sdes wth respect to X ) Λδ, we obtan E X exp4λs δ ) E X expc λ σ δ ) =: E δ. 1.4) Recall that ths estmate holds for every fxed δ. It remans to estmate E δ. Step 3: reducton to normal random varables. Consder g = g 1,..., g n ) where g are ndependent N0, 1) random varables. The rotaton nvarance of normal dstrbuton mples that for each fxed δ and X, we have Z := ) g j a j X N0, σδ ). Λ δ j Λ c δ By the formula for the moment generatng functon of normal dstrbuton, we have E g expsz) = exps σ δ /). Comparng ths wth the formula defnng E δ n 1.4), we fnd that the two expressons are somewhat smlar. Choosng s = C λ, we can match the two expressons as follows: E δ = E X,g expc 1 λz) where C 1 = C. Rearrangng the terms, we can wrte Z = Λ δ X j Λ c a j g j ). Then we δ can bound the moment generatng functon of Z n the same way we bounded the moment generatng functon of S δ n Step, only now relyng on the sub-gaussan propertes of X, Λ δ. We obtan E δ E g exp [C λ Λδ ) a j g j ]. To express ths more compactly, let P δ denotes the coordnate projecton restrcton) of R n onto R Λ δ, and defne the matrx A δ = P δ AI P δ ). Then what we obtaned ) E δ E g exp C λ A δ g. j Λ c δ 3
4 MARK RUDELSON AND ROMAN VERSHYNIN Recall that ths bound holds for each fxed δ. We have removed the orgnal random varables X from the problem, so t now becomes a problem about normal random varables g. Step 4: calculaton for normal random varables. By the rotaton nvarance of the dstrbuton of g, the random varable A δ g s dstrbuted dentcally wth s g where s denote the sngular values of A δ. Hence by ndependence, E δ = E g exp C λ ) s g = E g exp C λ s g ). Note that each g has the ch-squared dstrbuton wth one degree of freedom, whose moment generatng functon s E exptg ) = 1 t) 1/ for t < 1/. Therefore E δ 1 C λ s ) 1/ provded max C λ s < 1/. Usng the numerc nequalty 1 z) 1/ e z whch s vald for all 0 z 1/, we can smplfy ths as follows: E δ expc 3 λ s ) = exp C 3 λ ) s provded max C 3 λ s < 1/. Snce max s = A δ A and s = A δ A, we have proved the followng: E δ exp C 3 λ A ) for λ c 0 / A. Ths s a unform bound for all δ. Now we take expectaton wth respect to δ. Recallng 1.3) and 1.4), we obtan the followng estmate on the moment generatng functon of S: E expλs) E δ E δ exp C 3 λ A ) for λ c 0 / A. Step 5: concluson. Puttng ths estmate nto the exponental Chebyshev s nequalty 1.), we obtan p exp λt/ + C 3 λ A ) for λ c 0 / A. Optmzng over λ, we conclude that [ t t )] p exp c mn A, =: pa, t). A Now we combne wth a smlar estmate 1.1) for p 1 and obtan p = p 1 + p pa, t). Repeatng the argument for A nstead of A, we get P X T AX E X T AX < t } pa, t). Combnng the two events, we obtan P X T AX E X T AX > t } 4pA, t). Fnally, one can reduce the factor 4 to by adjustng the constant c n pa, t). The proof s complete.
. Sub-gaussan concentraton Hanson-Wrght nequalty has a useful consequence, a concentraton nequalty for random vectors wth ndependent sub-gaussan coordnates. Theorem.1 Sub-gaussan concentraton). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P AX A > t } exp ct ) K 4 A. Remark.. The consequence of Theorem.1 can be alternatvely formulated as follows: the random varable Z = AX A s sub-gaussan, and Z ψ CK A. Remark.3. A few specal cases of Theorem.1 can be easly deduced from classcal concentraton nequaltes. For Gaussan random varables X, ths result s a standard consequence of Gaussan concentraton, see e.g. [4]. For bounded random varables X, t can be deduced n a smlar way from Talagrand s concentraton for convex Lpschtz functons [5], see [6, Theorem.1.13]. For more general random varables, one can fnd versons of Theorem.1 wth varyng degrees of generalty scattered n the lterature e.g. the appendx of [1]). However, we were unable to fnd Theorem.1 n the exstng lterature. Proof. Let us apply Hanson-Wrght nequalty, Theorem 1.1, for the matrx Q = A T A. Snce X T QX = AX, we have E XT QX = A. Also, note that snce all X have unt varance, we have K 1/. Thus we obtan for any u 0 that P AX A > u } [ exp C u K 4 mn A, u )] A T A. Let ε 0 be arbtrary, and let us use ths estmate for u = ε A. Snce A T A AT A = A A, t follows that P AX A > ε A } [ exp c mnε, ε ) A ] K 4 A..1) Now let δ 0 be arbtrary; we shall use ths nequalty for ε = maxδ, δ ). Observe that the lkely) event AX A ε A mples the event AX A δ A. Ths can be seen by dvdng both sdes of the nequaltes by A and A respectvely, and usng the numerc bound max z 1, z 1 ) z 1, whch s vald for all z 0. Usng ths observaton along wth the dentty mnε, ε ) = δ, we deduce from.1) that P } AX A > δ A exp cδ A ) K 4 A. Settng δ = t/ A, we obtan the desred nequalty..1. Small ball probabltes. Usng a standard symmetrzaton argument, we can deduce from Theorem.1 some bounds on small ball probabltes. The followng result s due to R. Latala et al. [3, Theorem.5]. 5
6 MARK RUDELSON AND ROMAN VERSHYNIN Corollary.4 Small ball probabltes). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 } A exp c A ) K 4 A. Remark.5. Informally, Corollary.4 states that the small ball probablty decays exponentally n the stable rank ra) = A / A. Proof. Let X denote an ndependent copy of the random vector X. Denote p = P AX y < 1 A }. Usng ndependence and trangle nequalty, we have p = P AX y < 1 A, AX y < 1 } A P AX X ) < A }..) The components of the random vector X X have mean zero, varances bounded below by and sub-gaussan norms bounded above by K. Thus we can apply Theorem.1 for 1 X X ) and conclude that P AX X ) < } A t) exp ct ) K 4 A, t 0. Usng ths wth t = 1 1/ ) A, we obtan the desred bound for.). The followng consequence of Corollary.4 s even more nformatve. It states that AX y A + y wth hgh probablty. Corollary.6 Small ball probabltes, mproved). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 } 6 A + y ) exp c A ) K 4 A. Proof. Denote h := A. Combnng the conclusons of Theorem.1 and Corollary.4, we obtan that wth probablty at least 1 4 exp ch /K 4 A ), the followng two estmates hold smultaneously: AX 3 h and AX y 1 h..3) Suppose ths event occurs. Then by trangle nequalty, AX y y AX y 3 h. Combnng ths wth the second nequalty n.3), we obtan that 1 AX y max h, y 3 ) h 1 6 h + y ). The proof s complete.
3. Two applcatons Concentraton results lke Theorem.1 have many useful consequences. We nclude two applcatons n ths artcle; the reader wll certanly fnd more. The frst applcaton s a concentraton of dstance from a random vector to a fxed subspace. For random vectors wth bounded components, one can fnd a smlar result n [6, Corollary.1.19], where t was deduced from Talagrand s concentraton nequalty. Corollary 3.1 Dstance between a random vector and a subspace). Let E be a subspace of R n of dmenson d. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P dx, E) n d } > t exp ct /K 4 ). Proof. The concluson follows from Theorem.1 for A = P E, the orthogonal projecton onto E. Indeed, dx, E) = P E X, P E = dme ) = n d and P E = 1. Our second applcaton of Theorem.1 s for operator norms of random matrces. The result essentally states that an m n matrx BG obtaned as a product of a determnstc matrx B and a random matrx G wth ndependent sub-gaussan entres satsfes BG B + n B wth hgh probablty. For random matrces wth heavy-taled rather than subgaussan components, ths problem was studed n [7]. Theorem 3. Norms of random matrces). Let B be a fxed m N matrx, and let G be an N n random matrx wth ndependent entres that satsfy E G j = 0, E G j = 1 and G j ψ K. Then for any s, t 1 we have P BG > CK s B + t n B ) } exp s r t n). Here r = B / B s the stable rank of B. Proof. We need to bound BGx unformly for all x S n 1. Let us frst fx x S n 1. By concatenatng the rows of G, we can vew G as a long vector n R Nn. Consder the lnear operator T : l Nn l m defned as T G) = BGx, and let us apply Theorem.1 for T G). To ths end, t s not dffcult to see that the the Hlbert-Schmdt norm of T equals B and the operator norm of T s at most B. The latter follows from BGx B G x B G, and from the fact the G s the Eucldean norm of G as a vector n l Nn ). Then for any u 0, we have P BGx > B + u} exp cu ) K 4 B. The last part of the proof s a standard coverng argument. Let N be an 1/-net of S n 1 n the Eucldean metrc. We can choose ths net so that N 5 n, see [8, Lemma 5.]. By a unon bound, wth probablty at least 5 n exp cu K 4 B ), 3.1) 7
8 MARK RUDELSON AND ROMAN VERSHYNIN every x N satsfes BGx B +u. On ths event, the approxmaton lemma see [8, Lemma 5.]) mples that every x S n 1 satsfes BGx B + u). It remans to choose u = CK s B + t n B ) wth suffcently large absolutely constant C n order to make the probablty bound 3.1) smaller than exp s r t n). Ths completes the proof. Remark 3.3. A couple of specal cases n Theorem 3. are worth mentonng. B = P s a projecton n R N of rank r then P P G > CK s r + t n) } exp s r t n). The same holds f B = P s an r N matrx such that P P T = I r. In partcular, f B = I N we obtan P G > CK s N + t } n) exp s N t n). 3.1. Complexfcaton. We formulated the results n Sectons and 3 for real matrces and real valued random varables. Usng a standard complexfcaton trck, one can easly obtan complex versons of these results. Let us show how to complexfy Theorem.1; the other applcatons follow from t. Suppose A s a complex matrx whle X s a real-valued random vector [ as] before. Re A Then we can apply Theorem.1 for the real m n matrx à :=. Note Im A that ÃX = AX, à = A and à = A. Then the concluson of Theorem.1 follows for A. Suppose now that both A and X are complex. Let us assume that the components X have ndependent real and magnary parts, such that Re X = 0, ERe X ) = 1, Re X ψ K, and smlarly [ for Im X. Then ] we can apply Theorem.1 for the real m n Re A Im A matrx A := and vector X Im A Re A = Re X Im X) R n. Note that A X = AX, A = A and A = A. Then the concluson of Theorem.1 follows for A. References [1] L. Erdös, H.-T. Yau, J. Yn, Bulk unversalty for generalzed Wgner matrces, Probablty Theory and Related Felds 154, 341 407. [] D. L. Hanson, E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables, Ann. Math. Statst. 4 1971), 1079 1083. [3] R. Latala, P. Mankewcz, K. Oleszkewcz, N. Tomczak-Jaegermann, Banach-Mazur dstances and projectons on random subgaussan polytopes, Dscrete Comput. Geom. 38 007), 9 50. [4] M. Ledoux, The concentraton of measure phenomenon. Mathematcal Surveys and Monographs, 89. Provdence: Amercan Mathematcal Socety, 005. [5] M. Talagrand, Concentraton of measure and sopermetrc nequaltes n product spaces, IHES Publ. Math. No. 81 1995), 73 05. [6] T. Tao, Topcs n random matrx theory. Amercan Mathematcal Socety, 01. [7] R. Vershynn, Spectral norm of products of random and determnstc matrces, Probablty Theory and Related Felds 150 011), 471 509. [8] R. Vershynn, Introducton to the non-asymptotc analyss of random matrces. Compressed sensng, 10 68, Cambrdge Unv. Press, Cambrdge, 01. If
9 [9] E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables whose dstrbutons are not necessarly symmetrc, Ann. Probablty 1 1973), 1068 1070. Department of Mathematcs, Unversty of Mchgan, 530 Church St., Ann Arbor, MI 48109, U.S.A. E-mal address: rudelson, romanv}@umch.edu