HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION

Similar documents
HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION

Hanson-Wright inequality and sub-gaussian concentration

Dimensionality Reduction Notes 1

APPENDIX A Some Linear Algebra

P exp(tx) = 1 + t 2k M 2k. k N

The Order Relation and Trace Inequalities for. Hermitian Operators

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Lecture 4: September 12

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Eigenvalues of Random Graphs

More metrics on cartesian products

An introduction to chaining, and applications to sublinear algorithms

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Four lectures on probabilistic methods for data science

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

CSCE 790S Background Results

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

Composite Hypotheses testing

Math 217 Fall 2013 Homework 2 Solutions

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Lecture 3. Ax x i a i. i i

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

Statistical pattern recognition

Canonical transformations

Section 8.3 Polar Form of Complex Numbers

Notes on Frequency Estimation in Data Streams

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

First Year Examination Department of Statistics, University of Florida

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Perron Vectors of an Irreducible Nonnegative Interval Matrix

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Asymptotic Properties of the Jarque-Bera Test for Normality in General Autoregressions with a Deterministic Term

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Supplement to Clustering with Statistical Error Control

The Geometry of Logit and Probit

Another converse of Jensen s inequality

Linear Approximation with Regularization and Moving Least Squares

On the correction of the h-index for career length

Errors for Linear Systems

Appendix B. Criterion of Riemann-Stieltjes Integrability

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Vapnik-Chervonenkis theory

Lecture 10 Support Vector Machines II

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Affine transformations and convexity

The Expectation-Maximization Algorithm

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

PHYS 705: Classical Mechanics. Calculus of Variations II

Finding Primitive Roots Pseudo-Deterministically

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

CIS 700: algorithms for Big Data

1 Matrix representations of canonical matrices

Difference Equations

A CHARACTERIZATION OF ADDITIVE DERIVATIONS ON VON NEUMANN ALGEBRAS

A Note on Bound for Jensen-Shannon Divergence by Jeffreys

Lecture 4: Universal Hash Functions/Streaming Cont d

THE WEIGHTED WEAK TYPE INEQUALITY FOR THE STRONG MAXIMAL FUNCTION

Randić Energy and Randić Estrada Index of a Graph

Lecture 5 September 17, 2015

s: 1 (corresponding author); 2

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

SUCCESSIVE MINIMA AND LATTICE POINTS (AFTER HENK, GILLET AND SOULÉ) M(B) := # ( B Z N)

Geometry of Müntz Spaces

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Deriving the X-Z Identity from Auxiliary Space Method

Group Analysis of Ordinary Differential Equations of the Order n>2

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Convergence of random processes

Lecture 12: Discrete Laplacian

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Norm Bounds for a Transformed Activity Level. Vector in Sraffian Systems: A Dual Exercise

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Singular Value Decomposition: Theory and Applications

Limited Dependent Variables

The Second Anti-Mathima on Game Theory

The Feynman path integral

REAL ANALYSIS I HOMEWORK 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

e - c o m p a n i o n

Spectral Graph Theory and its Applications September 16, Lecture 5

Norms, Condition Numbers, Eigenvalues and Eigenvectors

Edge Isoperimetric Inequalities

Transcription:

HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION MARK RUDELSON AND ROMAN VERSHYNIN Abstract. In ths expostory note, we gve a modern proof of Hanson-Wrght nequalty for quadratc forms n sub-gaussan random varables. We deduce a useful concentraton nequalty for sub-gaussan random vectors. Two examples are gven to llustrate these results: a concentraton of dstances between random vectors and subspaces, and a bound on the norms of products of random and determnstc matrces. 1. Hanson-Wrght nequalty Hanson-Wrght nequalty s a general concentraton result for quadratc forms n sub-gaussan random varables. A verson of ths theorem was frst proved n [9, 19], however wth one weak pont mentoned n Remark 1.. In ths artcle we gve a modern proof of Hanson-Wrght nequalty, whch automatcally fxes the orgnal weak pont. We then deduce a useful concentraton nequalty for sub-gaussan random vectors, and llustrate t wth two applcatons. Our arguments use standard tools of hgh-dmensonal probablty. The reader unfamlar wth them may beneft from consultng the tutoral [18]. Stll, we wll recall the basc notons where possble. A random varable ξ s called sub-gaussan f ts dstrbuton s domnated by that of a normal random varable. Ths can be expressed by requrng that E expξ /K ) for some K > 0; the nfmum of such K s tradtonally called the sub-gaussan or ψ norm of ξ. Ths turns the set of subgaussan random varables nto the Orlcz space wth the Orlcz functon ψ t) = expt ) 1. A number of other equvalent defntons are used n the lterature. In partcular, ξ s sub-gaussan f an only f E ξ p = Op) p/ as p, so we can redefne the sub-gaussan norm of ξ as ξ ψ = sup p 1/ E X p ) 1/p. p 1 One can show that ξ ψ defned ths way s wthn an absolute constant factor from the nfmum of K > 0 mentoned above, see [18, Secton 5..3]. One can smlarly defne sub-exponental random varables,.e. by requrng that ξ ψ1 = sup p 1 p 1 E X p ) 1/p <. For an m n matrx A = a j ), recall that the operator norm of A s A = max x 0 Ax / x and the Hlbert-Schmdt or Frobenus) norm of A s A =,j a,j ) 1/. Throughout the paper, C, C 1, c, c 1,... denote postve absolute constants. Date: October 1, 013. M. R. was partally supported by NSF grant DMS 116137. R. V. was partally supported by NSF grant DMS 100189 and 16578. 1

MARK RUDELSON AND ROMAN VERSHYNIN Theorem 1.1 Hanson-Wrght nequalty). Let X = X 1,..., X n ) R n be a random vector wth ndependent components X whch satsfy E X = 0 and X ψ K. Let A be an n n matrx. Then, for every t 0, } [ P X T AX E X T t AX > t exp c mn K 4 A, t )] K. A Remark 1. Related results). One of the ams of ths note s to gve a smple and self-contaned proof of the Hanson Wrght nequalty usng only the standard toolkt of the large devaton theory. Several partal results and alternatve proofs are scattered n the lterature. Improvng upon an earler result on Hanson-Wrght [9], Wrght [19] establshed a slghtly weaker verson of Theorem 1.1. Instead of A = a j ), both papers had a j ) n the rght sde. The latter norm can be much larger than the norm of A, and t s often less easy to compute. Ths weak pont went unnotced n several later applcatons of Hanson-Wrght nequalty, however t was clear to experts that t could be fxed. A proof for the case where X 1,..., X n are ndependent symmetrc Bernoull random varables appears n the lecture notes of Nelson [14]. The moment nequalty whch essentally mples the result of [14] can be also found n [6]. A dfferent approach to Hanson-Wrght nequalty, due to Rauhut and Tropp, can be found n [8, Proposton 8.13]. It s presented for dagonal-free matrces however ths assumpton can be removed by treatng the dagonal separately as s done below), and for ndependent symmetrc Bernoull random varables but the proof can be extended to sub-gaussan random varables). An upper bound for P X T AX E X T AX > t }, whch s equvalent to what appears n the Hanson Wrght nequalty, can be found n [10]. However, the assumptons n [10] are somewhat dfferent. On the one hand, t s assumed that the matrx A s postve-semdefnte, whle n our result A can be arbtrary. On the other hand, a weaker assumpton s placed on the random vector X = X 1,..., X n ). Instead of assumng that the coordnates of X are ndependent subgaussan random varables, t s assumed n [10] that the margnals of X are unformly subgaussan,.e., that sup y S n 1 X, y ψ K. The paper [3] contans an alternatve short proof of Hanson Wrght nequalty due to Latala for dagonal-free matrces. Lke n the proof below, Latala s argument uses decouplng of the order chaos. However, unlke the current paper, whch uses a smple decouplng argument of Bourgan [], hs proof uses a more general and more dffcult decouplng theorem for U-statstcs due to de la Peña and Montgomery- Smth [5]. For an extensve dscusson of modern decouplng methods see [4]. Large devaton nequaltes for polynomals of hgher degree, whch extend the Hanson-Wrght type nequaltes, have been obtaned by Latala [11] and recently by Adamczak and Wolff [1]. Proof of Theorem 1.1. By replacng X wth X/K we can assume wthout loss of generalty that K = 1. Let us frst estmate } p := P X T AX E X T AX > t.

3 Let A = a j ) n,j=1. By ndependence and zero mean of X, we can represent X T AX E X T AX =,j a j X X j a E X = a X E X ) +,j: j a j X X j. The problem reduces to estmatng the dagonal and off-dagonal sums: } p P a X E X ) > t/ + P a j X X j > t/ =: p 1 + p.,j: j Step 1: dagonal sum. Note that X E X are ndependent mean-zero subexponental random varables, and X E X ψ1 X ψ1 4 X ψ 4K. These standard bounds can be found n [18, Remark 5.18 and Lemma 5.14]. Then we can use a Bernsten-type nequalty see [18, Proposton 5.16]) and obtan [ t t )] [ t t )] p 1 c mn, exp c mn,. 1.1) a max a A A Step : decouplng. It remans to bound the off-dagonal sum S := a j X X j.,j: j The argument wll be based on estmatng the moment generatng functon of S by decouplng and reducton to normal random varables. Let λ > 0 be a parameter whose value we wll determne later. By Chebyshev s nequalty, we have p = P S > t/} = P λs > λt/} exp λt/) E expλs). 1.) Consder ndependent Bernoull random varables δ 0, 1} wth E δ = 1/. Snce E δ 1 δ j ) equals 1/4 for j and 0 for = j, we have S = 4 E δ S δ, where S δ =,j δ 1 δ j )a j X X j. Here E δ denotes the expectaton wth respect to δ = δ 1,..., δ n ). Jensen s nequalty yelds E expλs) E X,δ exp4λs δ ) 1.3) where E X,δ denotes expectaton wth respect to both X and δ. Consder the set of ndces Λ δ = [n] : δ = 1} and express S δ = a j X X j = ) X j a j X. Λ δ Λ δ, j Λ c δ Now we condton on δ and X ) Λδ. Then S δ s a lnear combnaton of meanzero sub-gaussan random varables X j, j Λ c δ, wth fxed coeffcents. It follows that the condtonal dstrbuton of S δ s sub-gaussan, and ts sub-gaussan norm j Λ c δ

4 MARK RUDELSON AND ROMAN VERSHYNIN s bounded by the l -norm of the coeffcent vector see e.g. n [18, Lemma 5.9]). Specfcally, S δ ψ Cσ δ where σδ := ). a j X Λ δ Next, we use a standard estmate of the moment generatng functon of centered sub-gaussan random varables, see [18, Lemma 5.5]. It yelds j Λ c δ E Xj ) j Λ c δ exp4λs δ ) expcλ S δ ψ ) expc λ σ δ ). Takng expectatons of both sdes wth respect to X ) Λδ, we obtan E X exp4λs δ ) E X expc λ σ δ ) =: E δ. 1.4) Recall that ths estmate holds for every fxed δ. It remans to estmate E δ. Step 3: reducton to normal random varables. Consder g = g 1,..., g n ) where g are ndependent N0, 1) random varables. The rotaton nvarance of normal dstrbuton mples that for each fxed δ and X, we have Z := ) g j a j X N0, σδ ). Λ δ j Λ c δ By the formula for the moment generatng functon of normal dstrbuton, we have E g expsz) = exps σ δ /). Comparng ths wth the formula defnng E δ n 1.4), we fnd that the two expressons are somewhat smlar. Choosng s = C λ, we can match the two expressons as follows: E δ = E X,g expc 1 λz) where C 1 = C. Rearrangng the terms, we can wrte Z = Λ δ X j Λ c a j g j ). Then we δ can bound the moment generatng functon of Z n the same way we bounded the moment generatng functon of S δ n Step, only now relyng on the sub-gaussan propertes of X, Λ δ. We obtan E δ E g exp [C λ Λδ ) a j g j ]. To express ths more compactly, let P δ denotes the coordnate projecton restrcton) of R n onto R Λ δ, and defne the matrx A δ = P δ AI P δ ). Then what we obtaned ) E δ E g exp C λ A δ g. Recall that ths bound holds for each fxed δ. We have removed the orgnal random varables X from the problem, so t now becomes a problem about normal random varables g. Step 4: calculaton for normal random varables. By the rotaton nvarance of the dstrbuton of g, the random varable A δ g s dstrbuted dentcally wth s g where s denote the sngular values of A δ. Hence by ndependence, E δ = E g exp C λ ) s g = E g exp C λ s g ). j Λ c δ

Note that each g has the ch-squared dstrbuton wth one degree of freedom, whose moment generatng functon s E exptg ) = 1 t) 1/ for t < 1/. Therefore 5 E δ 1 C λ s ) 1/ provded max C λ s < 1/. Usng the numerc nequalty 1 z) 1/ e z whch s vald for all 0 z 1/, we can smplfy ths as follows: E δ expc 3 λ s ) = exp C 3 λ ) s provded max C 3 λ s < 1/. Snce max s = A δ A and s = A δ A, we have proved the followng: E δ exp C 3 λ A ) for λ c 0 / A. Ths s a unform bound for all δ. Now we take expectaton wth respect to δ. Recallng 1.3) and 1.4), we obtan the followng estmate on the moment generatng functon of S: E expλs) E δ E δ exp C 3 λ A ) for λ c 0 / A. Step 5: concluson. Puttng ths estmate nto the exponental Chebyshev s nequalty 1.), we obtan p exp λt/ + C 3 λ A ) for λ c 0 / A. Optmzng over λ, we conclude that [ t p exp c mn A, t )] =: pa, t). A Now we combne wth a smlar estmate 1.1) for p 1 and obtan p = p 1 + p pa, t). Repeatng the argument for A nstead of A, we get P X T AX E X T AX < t } pa, t). Combnng the two events, we obtan P X T AX E X T AX > t } 4pA, t). Fnally, one can reduce the factor 4 to by adjustng the constant c n pa, t). The proof s complete.. Sub-gaussan concentraton Hanson-Wrght nequalty has a useful consequence, a concentraton nequalty for random vectors wth ndependent sub-gaussan coordnates. Theorem.1 Sub-gaussan concentraton). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P AX A > t } exp ct ) K 4 A. Remark.. The consequence of Theorem.1 can be alternatvely formulated as follows: the random varable Z = AX A s sub-gaussan, and Z ψ CK A.

6 MARK RUDELSON AND ROMAN VERSHYNIN Remark.3. A few specal cases of Theorem.1 can be easly deduced from classcal concentraton nequaltes. For Gaussan random varables X, ths result s a standard consequence of Gaussan concentraton, see e.g. [13]. For bounded random varables X, t can be deduced n a smlar way from Talagrand s concentraton for convex Lpschtz functons [15], see [16, Theorem.1.13]. For more general random varables, one can fnd versons of Theorem.1 wth varyng degrees of generalty scattered n the lterature e.g. the appendx of [7]). However, we were unable to fnd Theorem.1 n the exstng lterature. Proof. Let us apply Hanson-Wrght nequalty, Theorem 1.1, for the matrx Q = A T A. Snce X T QX = AX, we have E XT QX = A. Also, note that snce all X have unt varance, we have K 1/. Thus we obtan for any u 0 that P AX A > u } [ exp C u K 4 mn A, u )] A T A. Let ε 0 be arbtrary, and let us use ths estmate for u = ε A. Snce A T A AT A = A A, t follows that P AX A } [ > ε A exp c mnε, ε ) A ] K 4 A..1) Now let δ 0 be arbtrary; we shall use ths nequalty for ε = maxδ, δ ). Observe that the lkely) event AX A ε A mples the event AX A δ A. Ths can be seen by dvdng both sdes of the nequaltes by A and A respectvely, and usng the numerc bound max z 1, z 1 ) z 1, whch s vald for all z 0. Usng ths observaton along wth the dentty mnε, ε ) = δ, we deduce from.1) that P } AX A > δ A exp cδ A ) K 4 A. Settng δ = t/ A, we obtan the desred nequalty..1. Small ball probabltes. Usng a standard symmetrzaton argument, we can deduce from Theorem.1 some bounds on small ball probabltes. The followng result s due to Latala et al. [1, Theorem.5]. Corollary.4 Small ball probabltes). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 A } exp c A ) K 4 A. Remark.5. Informally, Corollary.4 states that the small ball probablty decays exponentally n the stable rank ra) = A / A. Proof. Let X denote an ndependent copy of the random vector X. Denote p = P AX y < 1 A }. Usng ndependence and trangle nequalty, we have p = P AX y < 1 A, AX y < 1 } A P AX X ) < A }..)

The components of the random vector X X have mean zero, varances bounded below by and sub-gaussan norms bounded above by K. Thus we can apply Theorem.1 for 1 X X ) and conclude that P AX X ) < } A t) exp ct ) K 4 A, t 0. Usng ths wth t = 1 1/ ) A, we obtan the desred bound for.). The followng consequence of Corollary.4 s even more nformatve. It states that AX y A + y wth hgh probablty. Corollary.6 Small ball probabltes, mproved). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 } 6 A + y ) exp c A ) K 4 A. Proof. Denote h := A. Combnng the conclusons of Theorem.1 and Corollary.4, we obtan that wth probablty at least 1 4 exp ch /K 4 A ), the followng two estmates hold smultaneously: AX 3 h and AX y 1 h..3) Suppose ths event occurs. Then by trangle nequalty, AX y y AX y 3 h. Combnng ths wth the second nequalty n.3), we obtan that 1 AX y max h, y 3 ) h 1 6 h + y ). The proof s complete. 3. Two applcatons Concentraton results lke Theorem.1 have many useful consequences. We nclude two applcatons n ths artcle; the reader wll certanly fnd more. The frst applcaton s a concentraton of dstance from a random vector to a fxed subspace. For random vectors wth bounded components, one can fnd a smlar result n [16, Corollary.1.19], where t was deduced from Talagrand s concentraton nequalty. Corollary 3.1 Dstance between a random vector and a subspace). Let E be a subspace of R n of dmenson d. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P dx, E) n d } > t exp ct /K 4 ). Proof. The concluson follows from Theorem.1 for A = P E, the orthogonal projecton onto E. Indeed, dx, E) = P E X, P E = dme ) = n d and P E = 1. 7

8 MARK RUDELSON AND ROMAN VERSHYNIN Our second applcaton of Theorem.1 s for operator norms of random matrces. The result essentally states that an m n matrx BG obtaned as a product of a determnstc matrx B and a random matrx G wth ndependent sub-gaussan entres satsfes BG B + n B wth hgh probablty. For random matrces wth heavy-taled rather than subgaussan components, ths problem was studed n [17]. Theorem 3. Norms of random matrces). Let B be a fxed m N matrx, and let G be an N n random matrx wth ndependent entres that satsfy E G j = 0, E G j = 1 and G j ψ K. Then for any s, t 1 we have P BG > CK s B + t n B ) } exp s r t n). Here r = B / B s the stable rank of B. Proof. We need to bound BGx unformly for all x S n 1. Let us frst fx x S n 1. By concatenatng the rows of G, we can vew G as a long vector n R Nn. Consder the lnear operator T : l Nn l m defned as T G) = BGx, and let us apply Theorem.1 for T G). To ths end, t s not dffcult to see that the the Hlbert-Schmdt norm of T equals B and the operator norm of T s at most B. The latter follows from BGx B G x B G, and from the fact the G s the Eucldean norm of G as a vector n l Nn ). Then for any u 0, we have P BGx > B + u} exp cu ) K 4 B. The last part of the proof s a standard coverng argument. Let N be an 1/-net of S n 1 n the Eucldean metrc. We can choose ths net so that N 5 n, see [18, Lemma 5.]. By a unon bound, wth probablty at least 5 n exp cu ) K 4 B, 3.1) every x N satsfes BGx B + u. On ths event, the approxmaton lemma see [18, Lemma 5.]) mples that every x S n 1 satsfes BGx B + u). It remans to choose u = CK s B + t n B ) wth suffcently large absolutely constant C n order to make the probablty bound 3.1) smaller than exp s r t n). Ths completes the proof. Remark 3.3. A couple of specal cases n Theorem 3. are worth mentonng. B = P s a projecton n R N of rank r then If P P G > CK s r + t n) } exp s r t n). The same holds f B = P s an r N matrx such that P P T = I r. In partcular, f B = I N we obtan P G > CK s N + t } n) exp s N t n).

3.1. Complexfcaton. We formulated the results n Sectons and 3 for real matrces and real valued random varables. Usng a standard complexfcaton trck, one can easly obtan complex versons of these results. Let us show how to complexfy Theorem.1; the other applcatons follow from t. Suppose A s a complex matrx whle X s a real-valued random vector [ as] before. Re A Then we can apply Theorem.1 for the real m n matrx à :=. Note Im A that ÃX = AX, à = A and à = A. Then the concluson of Theorem.1 follows for A. Suppose now that both A and X are complex. Let us assume that the components X have ndependent real and magnary parts, such that Re X = 0, ERe X ) = 1, Re X ψ K, and smlarly [ for Im X. Then ] we can apply Theorem.1 for the real m n Re A Im A matrx A := and vector X Im A Re A = Re X Im X) R n. Note that A X = AX, A = A and A = A. Then the concluson of Theorem.1 follows for A. References [1] R. Adamczak, P. Wolff, Concentraton nequaltes for non-lpschtz functons wth bounded dervatves of hgher order, http://arxv.org/abs/1304.186 [] J. Bourgan, Random ponts n sotropc convex sets. In: Convex geo- metrc analyss, Berkeley, CA, 1996, Math. Sc. Res. Inst. Publ., Vol. 34, 53 58, Cambrdge Unv. Press, Cambrdge 1999). [3] F. Barthe, E. Mlman, Transference Prncples for Log-Sobolev and Spectral-Gap wth Applcatons to Conservatve Spn Systems, http://arxv.org/abs/10.5318. [4] V. H. de la Peña, E. Gné, Decouplng. From dependence to ndependence. Randomly stopped processes. U-statstcs and processes. Martngales and beyond. Probablty and ts Applcatons New York). Sprnger-Verlag, New York, 1999. [5] V. H. de la Peña, S. J. Montgomery-Smth, Decouplng nequaltes for the tal probabltes of multvarate U-statstcs, Ann. Probab. 3 1995), no., 806 816. [6] I. Dakonkolas, D. M. Kane, J. Nelson, Bounded Independence Fools Degree- Threshold Functons, Proceedngs of the 51st Annual IEEE Symposum on Foundatons of Computer Scence FOCS 010), Las Vegas, NV, October 3-6, 010. [7] L. Erdös, H.-T. Yau, J. Yn, Bulk unversalty for generalzed Wgner matrces, Probablty Theory and Related Felds 154, 341 407. [8] A. Foucart, H. Rauhut, A Mathematcal Introducton to Compressve Sensng. Appled and Numercal Harmonc Analyss. Brkhäuser, 013. [9] D. L. Hanson, E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables, Ann. Math. Statst. 4 1971), 1079 1083. [10] D. Hsu, S. Kakade, T. Zhang, A tal nequalty for quadratc forms of subgaussan random vectors, Electron. Commun. Probab. 17 01), no. 5, 1 6. [11] R. Latala, Estmates of moments and tals of Gaussan chaoses, Ann. Probab. 34 006), no. 6, 315 331. [1] R. Latala, P. Mankewcz, K. Oleszkewcz, N. Tomczak-Jaegermann, Banach-Mazur dstances and projectons on random subgaussan polytopes, Dscrete Comput. Geom. 38 007), 9 50. [13] M. Ledoux, The concentraton of measure phenomenon. Mathematcal Surveys and Monographs, 89. Provdence: Amercan Mathematcal Socety, 005. [14] J. Nelson, Johnson Lndenstrauss notes, http://web.mt.edu/mnlek/www/jl notes.pdf. [15] M. Talagrand, Concentraton of measure and sopermetrc nequaltes n product spaces, IHES Publ. Math. No. 81 1995), 73 05. 9

10 MARK RUDELSON AND ROMAN VERSHYNIN [16] T. Tao, Topcs n random matrx theory. Graduate Studes n Mathematcs, 13. Amercan Mathematcal Socety, Provdence, RI, 01. [17] R. Vershynn, Spectral norm of products of random and determnstc matrces, Probablty Theory and Related Felds 150 011), 471 509. [18] R. Vershynn, Introducton to the non-asymptotc analyss of random matrces. Compressed sensng, 10 68, Cambrdge Unv. Press, Cambrdge, 01. [19] E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables whose dstrbutons are not necessarly symmetrc, Ann. Probablty 1 1973), 1068 1070. Department of Mathematcs, Unversty of Mchgan, 530 Church St., Ann Arbor, MI 48109, U.S.A. E-mal address: rudelson, romanv}@umch.edu