HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION

HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION MARK RUDELSON AND ROMAN VERSHYNIN Abstract. In ths expostory note, we gve a modern proof of Hanson-Wrght nequalty for quadratc forms n sub-gaussan random varables. We deduce a useful concentraton nequalty for sub-gaussan random vectors. Two examples are gven to llustrate these results: a concentraton of dstances between random vectors and subspaces, and a bound on the norms of products of random and determnstc matrces. 1. Hanson-Wrght nequalty Hanson-Wrght nequalty s a general concentraton result for quadratc forms n sub-gaussan random varables. A verson of ths theorem was frst proved n [9, 19], however wth one weak pont mentoned n Remark 1.. In ths artcle we gve a modern proof of Hanson-Wrght nequalty, whch automatcally fxes the orgnal weak pont. We then deduce a useful concentraton nequalty for sub-gaussan random vectors, and llustrate t wth two applcatons. Our arguments use standard tools of hgh-dmensonal probablty. The reader unfamlar wth them may beneft from consultng the tutoral [18]. Stll, we wll recall the basc notons where possble. A random varable ξ s called sub-gaussan f ts dstrbuton s domnated by that of a normal random varable. Ths can be expressed by requrng that E expξ /K ) for some K > 0; the nfmum of such K s tradtonally called the sub-gaussan or ψ norm of ξ. Ths turns the set of subgaussan random varables nto the Orlcz space wth the Orlcz functon ψ t) = expt ) 1. A number of other equvalent defntons are used n the lterature. In partcular, ξ s sub-gaussan f an only f E ξ p = Op) p/ as p, so we can redefne the sub-gaussan norm of ξ as ξ ψ = sup p 1/ E X p ) 1/p. p 1 One can show that ξ ψ defned ths way s wthn an absolute constant factor from the nfmum of K > 0 mentoned above, see [18, Secton 5..3]. One can smlarly defne sub-exponental random varables,.e. by requrng that ξ ψ1 = sup p 1 p 1 E X p ) 1/p <. For an m n matrx A = a j ), recall that the operator norm of A s A = max x 0 Ax / x and the Hlbert-Schmdt or Frobenus) norm of A s A =,j a,j ) 1/. Throughout the paper, C, C 1, c, c 1,... denote postve absolute constants. Date: October 1, 013. M. R. was partally supported by NSF grant DMS 116137. R. V. was partally supported by NSF grant DMS 100189 and 16578. 1

MARK RUDELSON AND ROMAN VERSHYNIN Theorem 1.1 Hanson-Wrght nequalty). Let X = X 1,..., X n ) R n be a random vector wth ndependent components X whch satsfy E X = 0 and X ψ K. Let A be an n n matrx. Then, for every t 0, } [ P X T AX E X T t AX > t exp c mn K 4 A, t )] K. A Remark 1. Related results). One of the ams of ths note s to gve a smple and self-contaned proof of the Hanson Wrght nequalty usng only the standard toolkt of the large devaton theory. Several partal results and alternatve proofs are scattered n the lterature. Improvng upon an earler result on Hanson-Wrght [9], Wrght [19] establshed a slghtly weaker verson of Theorem 1.1. Instead of A = a j ), both papers had a j ) n the rght sde. The latter norm can be much larger than the norm of A, and t s often less easy to compute. Ths weak pont went unnotced n several later applcatons of Hanson-Wrght nequalty, however t was clear to experts that t could be fxed. A proof for the case where X 1,..., X n are ndependent symmetrc Bernoull random varables appears n the lecture notes of Nelson [14]. The moment nequalty whch essentally mples the result of [14] can be also found n [6]. A dfferent approach to Hanson-Wrght nequalty, due to Rauhut and Tropp, can be found n [8, Proposton 8.13]. It s presented for dagonal-free matrces however ths assumpton can be removed by treatng the dagonal separately as s done below), and for ndependent symmetrc Bernoull random varables but the proof can be extended to sub-gaussan random varables). An upper bound for P X T AX E X T AX > t }, whch s equvalent to what appears n the Hanson Wrght nequalty, can be found n [10]. However, the assumptons n [10] are somewhat dfferent. On the one hand, t s assumed that the matrx A s postve-semdefnte, whle n our result A can be arbtrary. On the other hand, a weaker assumpton s placed on the random vector X = X 1,..., X n ). Instead of assumng that the coordnates of X are ndependent subgaussan random varables, t s assumed n [10] that the margnals of X are unformly subgaussan,.e., that sup y S n 1 X, y ψ K. The paper [3] contans an alternatve short proof of Hanson Wrght nequalty due to Latala for dagonal-free matrces. Lke n the proof below, Latala s argument uses decouplng of the order chaos. However, unlke the current paper, whch uses a smple decouplng argument of Bourgan [], hs proof uses a more general and more dffcult decouplng theorem for U-statstcs due to de la Peña and Montgomery- Smth [5]. For an extensve dscusson of modern decouplng methods see [4]. Large devaton nequaltes for polynomals of hgher degree, whch extend the Hanson-Wrght type nequaltes, have been obtaned by Latala [11] and recently by Adamczak and Wolff [1]. Proof of Theorem 1.1. By replacng X wth X/K we can assume wthout loss of generalty that K = 1. Let us frst estmate } p := P X T AX E X T AX > t.

3 Let A = a j ) n,j=1. By ndependence and zero mean of X, we can represent X T AX E X T AX =,j a j X X j a E X = a X E X ) +,j: j a j X X j. The problem reduces to estmatng the dagonal and off-dagonal sums: } p P a X E X ) > t/ + P a j X X j > t/ =: p 1 + p.,j: j Step 1: dagonal sum. Note that X E X are ndependent mean-zero subexponental random varables, and X E X ψ1 X ψ1 4 X ψ 4K. These standard bounds can be found n [18, Remark 5.18 and Lemma 5.14]. Then we can use a Bernsten-type nequalty see [18, Proposton 5.16]) and obtan [ t t )] [ t t )] p 1 c mn, exp c mn,. 1.1) a max a A A Step : decouplng. It remans to bound the off-dagonal sum S := a j X X j.,j: j The argument wll be based on estmatng the moment generatng functon of S by decouplng and reducton to normal random varables. Let λ > 0 be a parameter whose value we wll determne later. By Chebyshev s nequalty, we have p = P S > t/} = P λs > λt/} exp λt/) E expλs). 1.) Consder ndependent Bernoull random varables δ 0, 1} wth E δ = 1/. Snce E δ 1 δ j ) equals 1/4 for j and 0 for = j, we have S = 4 E δ S δ, where S δ =,j δ 1 δ j )a j X X j. Here E δ denotes the expectaton wth respect to δ = δ 1,..., δ n ). Jensen s nequalty yelds E expλs) E X,δ exp4λs δ ) 1.3) where E X,δ denotes expectaton wth respect to both X and δ. Consder the set of ndces Λ δ = [n] : δ = 1} and express S δ = a j X X j = ) X j a j X. Λ δ Λ δ, j Λ c δ Now we condton on δ and X ) Λδ. Then S δ s a lnear combnaton of meanzero sub-gaussan random varables X j, j Λ c δ, wth fxed coeffcents. It follows that the condtonal dstrbuton of S δ s sub-gaussan, and ts sub-gaussan norm j Λ c δ

4 MARK RUDELSON AND ROMAN VERSHYNIN s bounded by the l -norm of the coeffcent vector see e.g. n [18, Lemma 5.9]). Specfcally, S δ ψ Cσ δ where σδ := ). a j X Λ δ Next, we use a standard estmate of the moment generatng functon of centered sub-gaussan random varables, see [18, Lemma 5.5]. It yelds j Λ c δ E Xj ) j Λ c δ exp4λs δ ) expcλ S δ ψ ) expc λ σ δ ). Takng expectatons of both sdes wth respect to X ) Λδ, we obtan E X exp4λs δ ) E X expc λ σ δ ) =: E δ. 1.4) Recall that ths estmate holds for every fxed δ. It remans to estmate E δ. Step 3: reducton to normal random varables. Consder g = g 1,..., g n ) where g are ndependent N0, 1) random varables. The rotaton nvarance of normal dstrbuton mples that for each fxed δ and X, we have Z := ) g j a j X N0, σδ ). Λ δ j Λ c δ By the formula for the moment generatng functon of normal dstrbuton, we have E g expsz) = exps σ δ /). Comparng ths wth the formula defnng E δ n 1.4), we fnd that the two expressons are somewhat smlar. Choosng s = C λ, we can match the two expressons as follows: E δ = E X,g expc 1 λz) where C 1 = C. Rearrangng the terms, we can wrte Z = Λ δ X j Λ c a j g j ). Then we δ can bound the moment generatng functon of Z n the same way we bounded the moment generatng functon of S δ n Step, only now relyng on the sub-gaussan propertes of X, Λ δ. We obtan E δ E g exp [C λ Λδ ) a j g j ]. To express ths more compactly, let P δ denotes the coordnate projecton restrcton) of R n onto R Λ δ, and defne the matrx A δ = P δ AI P δ ). Then what we obtaned ) E δ E g exp C λ A δ g. Recall that ths bound holds for each fxed δ. We have removed the orgnal random varables X from the problem, so t now becomes a problem about normal random varables g. Step 4: calculaton for normal random varables. By the rotaton nvarance of the dstrbuton of g, the random varable A δ g s dstrbuted dentcally wth s g where s denote the sngular values of A δ. Hence by ndependence, E δ = E g exp C λ ) s g = E g exp C λ s g ). j Λ c δ

Note that each g has the ch-squared dstrbuton wth one degree of freedom, whose moment generatng functon s E exptg ) = 1 t) 1/ for t < 1/. Therefore 5 E δ 1 C λ s ) 1/ provded max C λ s < 1/. Usng the numerc nequalty 1 z) 1/ e z whch s vald for all 0 z 1/, we can smplfy ths as follows: E δ expc 3 λ s ) = exp C 3 λ ) s provded max C 3 λ s < 1/. Snce max s = A δ A and s = A δ A, we have proved the followng: E δ exp C 3 λ A ) for λ c 0 / A. Ths s a unform bound for all δ. Now we take expectaton wth respect to δ. Recallng 1.3) and 1.4), we obtan the followng estmate on the moment generatng functon of S: E expλs) E δ E δ exp C 3 λ A ) for λ c 0 / A. Step 5: concluson. Puttng ths estmate nto the exponental Chebyshev s nequalty 1.), we obtan p exp λt/ + C 3 λ A ) for λ c 0 / A. Optmzng over λ, we conclude that [ t p exp c mn A, t )] =: pa, t). A Now we combne wth a smlar estmate 1.1) for p 1 and obtan p = p 1 + p pa, t). Repeatng the argument for A nstead of A, we get P X T AX E X T AX < t } pa, t). Combnng the two events, we obtan P X T AX E X T AX > t } 4pA, t). Fnally, one can reduce the factor 4 to by adjustng the constant c n pa, t). The proof s complete.. Sub-gaussan concentraton Hanson-Wrght nequalty has a useful consequence, a concentraton nequalty for random vectors wth ndependent sub-gaussan coordnates. Theorem.1 Sub-gaussan concentraton). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P AX A > t } exp ct ) K 4 A. Remark.. The consequence of Theorem.1 can be alternatvely formulated as follows: the random varable Z = AX A s sub-gaussan, and Z ψ CK A.

6 MARK RUDELSON AND ROMAN VERSHYNIN Remark.3. A few specal cases of Theorem.1 can be easly deduced from classcal concentraton nequaltes. For Gaussan random varables X, ths result s a standard consequence of Gaussan concentraton, see e.g. [13]. For bounded random varables X, t can be deduced n a smlar way from Talagrand s concentraton for convex Lpschtz functons [15], see [16, Theorem.1.13]. For more general random varables, one can fnd versons of Theorem.1 wth varyng degrees of generalty scattered n the lterature e.g. the appendx of [7]). However, we were unable to fnd Theorem.1 n the exstng lterature. Proof. Let us apply Hanson-Wrght nequalty, Theorem 1.1, for the matrx Q = A T A. Snce X T QX = AX, we have E XT QX = A. Also, note that snce all X have unt varance, we have K 1/. Thus we obtan for any u 0 that P AX A > u } [ exp C u K 4 mn A, u )] A T A. Let ε 0 be arbtrary, and let us use ths estmate for u = ε A. Snce A T A AT A = A A, t follows that P AX A } [ > ε A exp c mnε, ε ) A ] K 4 A..1) Now let δ 0 be arbtrary; we shall use ths nequalty for ε = maxδ, δ ). Observe that the lkely) event AX A ε A mples the event AX A δ A. Ths can be seen by dvdng both sdes of the nequaltes by A and A respectvely, and usng the numerc bound max z 1, z 1 ) z 1, whch s vald for all z 0. Usng ths observaton along wth the dentty mnε, ε ) = δ, we deduce from.1) that P } AX A > δ A exp cδ A ) K 4 A. Settng δ = t/ A, we obtan the desred nequalty..1. Small ball probabltes. Usng a standard symmetrzaton argument, we can deduce from Theorem.1 some bounds on small ball probabltes. The followng result s due to Latala et al. [1, Theorem.5]. Corollary.4 Small ball probabltes). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 A } exp c A ) K 4 A. Remark.5. Informally, Corollary.4 states that the small ball probablty decays exponentally n the stable rank ra) = A / A. Proof. Let X denote an ndependent copy of the random vector X. Denote p = P AX y < 1 A }. Usng ndependence and trangle nequalty, we have p = P AX y < 1 A, AX y < 1 } A P AX X ) < A }..)

The components of the random vector X X have mean zero, varances bounded below by and sub-gaussan norms bounded above by K. Thus we can apply Theorem.1 for 1 X X ) and conclude that P AX X ) < } A t) exp ct ) K 4 A, t 0. Usng ths wth t = 1 1/ ) A, we obtan the desred bound for.). The followng consequence of Corollary.4 s even more nformatve. It states that AX y A + y wth hgh probablty. Corollary.6 Small ball probabltes, mproved). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 } 6 A + y ) exp c A ) K 4 A. Proof. Denote h := A. Combnng the conclusons of Theorem.1 and Corollary.4, we obtan that wth probablty at least 1 4 exp ch /K 4 A ), the followng two estmates hold smultaneously: AX 3 h and AX y 1 h..3) Suppose ths event occurs. Then by trangle nequalty, AX y y AX y 3 h. Combnng ths wth the second nequalty n.3), we obtan that 1 AX y max h, y 3 ) h 1 6 h + y ). The proof s complete. 3. Two applcatons Concentraton results lke Theorem.1 have many useful consequences. We nclude two applcatons n ths artcle; the reader wll certanly fnd more. The frst applcaton s a concentraton of dstance from a random vector to a fxed subspace. For random vectors wth bounded components, one can fnd a smlar result n [16, Corollary.1.19], where t was deduced from Talagrand s concentraton nequalty. Corollary 3.1 Dstance between a random vector and a subspace). Let E be a subspace of R n of dmenson d. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P dx, E) n d } > t exp ct /K 4 ). Proof. The concluson follows from Theorem.1 for A = P E, the orthogonal projecton onto E. Indeed, dx, E) = P E X, P E = dme ) = n d and P E = 1. 7

8 MARK RUDELSON AND ROMAN VERSHYNIN Our second applcaton of Theorem.1 s for operator norms of random matrces. The result essentally states that an m n matrx BG obtaned as a product of a determnstc matrx B and a random matrx G wth ndependent sub-gaussan entres satsfes BG B + n B wth hgh probablty. For random matrces wth heavy-taled rather than subgaussan components, ths problem was studed n [17]. Theorem 3. Norms of random matrces). Let B be a fxed m N matrx, and let G be an N n random matrx wth ndependent entres that satsfy E G j = 0, E G j = 1 and G j ψ K. Then for any s, t 1 we have P BG > CK s B + t n B ) } exp s r t n). Here r = B / B s the stable rank of B. Proof. We need to bound BGx unformly for all x S n 1. Let us frst fx x S n 1. By concatenatng the rows of G, we can vew G as a long vector n R Nn. Consder the lnear operator T : l Nn l m defned as T G) = BGx, and let us apply Theorem.1 for T G). To ths end, t s not dffcult to see that the the Hlbert-Schmdt norm of T equals B and the operator norm of T s at most B. The latter follows from BGx B G x B G, and from the fact the G s the Eucldean norm of G as a vector n l Nn ). Then for any u 0, we have P BGx > B + u} exp cu ) K 4 B. The last part of the proof s a standard coverng argument. Let N be an 1/-net of S n 1 n the Eucldean metrc. We can choose ths net so that N 5 n, see [18, Lemma 5.]. By a unon bound, wth probablty at least 5 n exp cu ) K 4 B, 3.1) every x N satsfes BGx B + u. On ths event, the approxmaton lemma see [18, Lemma 5.]) mples that every x S n 1 satsfes BGx B + u). It remans to choose u = CK s B + t n B ) wth suffcently large absolutely constant C n order to make the probablty bound 3.1) smaller than exp s r t n). Ths completes the proof. Remark 3.3. A couple of specal cases n Theorem 3. are worth mentonng. B = P s a projecton n R N of rank r then If P P G > CK s r + t n) } exp s r t n). The same holds f B = P s an r N matrx such that P P T = I r. In partcular, f B = I N we obtan P G > CK s N + t } n) exp s N t n).

3.1. Complexfcaton. We formulated the results n Sectons and 3 for real matrces and real valued random varables. Usng a standard complexfcaton trck, one can easly obtan complex versons of these results. Let us show how to complexfy Theorem.1; the other applcatons follow from t. Suppose A s a complex matrx whle X s a real-valued random vector [ as] before. Re A Then we can apply Theorem.1 for the real m n matrx Ã :=. Note Im A that ÃX = AX, Ã = A and Ã = A. Then the concluson of Theorem.1 follows for A. Suppose now that both A and X are complex. Let us assume that the components X have ndependent real and magnary parts, such that Re X = 0, ERe X ) = 1, Re X ψ K, and smlarly [ for Im X. Then ] we can apply Theorem.1 for the real m n Re A Im A matrx A := and vector X Im A Re A = Re X Im X) R n. Note that A X = AX, A = A and A = A. Then the concluson of Theorem.1 follows for A. References [1] R. Adamczak, P. Wolff, Concentraton nequaltes for non-lpschtz functons wth bounded dervatves of hgher order, http://arxv.org/abs/1304.186 [] J. Bourgan, Random ponts n sotropc convex sets. In: Convex geo- metrc analyss, Berkeley, CA, 1996, Math. Sc. Res. Inst. Publ., Vol. 34, 53 58, Cambrdge Unv. Press, Cambrdge 1999). [3] F. Barthe, E. Mlman, Transference Prncples for Log-Sobolev and Spectral-Gap wth Applcatons to Conservatve Spn Systems, http://arxv.org/abs/10.5318. [4] V. H. de la Peña, E. Gné, Decouplng. From dependence to ndependence. Randomly stopped processes. U-statstcs and processes. Martngales and beyond. Probablty and ts Applcatons New York). Sprnger-Verlag, New York, 1999. [5] V. H. de la Peña, S. J. Montgomery-Smth, Decouplng nequaltes for the tal probabltes of multvarate U-statstcs, Ann. Probab. 3 1995), no., 806 816. [6] I. Dakonkolas, D. M. Kane, J. Nelson, Bounded Independence Fools Degree- Threshold Functons, Proceedngs of the 51st Annual IEEE Symposum on Foundatons of Computer Scence FOCS 010), Las Vegas, NV, October 3-6, 010. [7] L. Erdös, H.-T. Yau, J. Yn, Bulk unversalty for generalzed Wgner matrces, Probablty Theory and Related Felds 154, 341 407. [8] A. Foucart, H. Rauhut, A Mathematcal Introducton to Compressve Sensng. Appled and Numercal Harmonc Analyss. Brkhäuser, 013. [9] D. L. Hanson, E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables, Ann. Math. Statst. 4 1971), 1079 1083. [10] D. Hsu, S. Kakade, T. Zhang, A tal nequalty for quadratc forms of subgaussan random vectors, Electron. Commun. Probab. 17 01), no. 5, 1 6. [11] R. Latala, Estmates of moments and tals of Gaussan chaoses, Ann. Probab. 34 006), no. 6, 315 331. [1] R. Latala, P. Mankewcz, K. Oleszkewcz, N. Tomczak-Jaegermann, Banach-Mazur dstances and projectons on random subgaussan polytopes, Dscrete Comput. Geom. 38 007), 9 50. [13] M. Ledoux, The concentraton of measure phenomenon. Mathematcal Surveys and Monographs, 89. Provdence: Amercan Mathematcal Socety, 005. [14] J. Nelson, Johnson Lndenstrauss notes, http://web.mt.edu/mnlek/www/jl notes.pdf. [15] M. Talagrand, Concentraton of measure and sopermetrc nequaltes n product spaces, IHES Publ. Math. No. 81 1995), 73 05. 9

10 MARK RUDELSON AND ROMAN VERSHYNIN [16] T. Tao, Topcs n random matrx theory. Graduate Studes n Mathematcs, 13. Amercan Mathematcal Socety, Provdence, RI, 01. [17] R. Vershynn, Spectral norm of products of random and determnstc matrces, Probablty Theory and Related Felds 150 011), 471 509. [18] R. Vershynn, Introducton to the non-asymptotc analyss of random matrces. Compressed sensng, 10 68, Cambrdge Unv. Press, Cambrdge, 01. [19] E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables whose dstrbutons are not necessarly symmetrc, Ann. Probablty 1 1973), 1068 1070. Department of Mathematcs, Unversty of Mchgan, 530 Church St., Ann Arbor, MI 48109, U.S.A. E-mal address: rudelson, romanv}@umch.edu