HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION

Similar documents
HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION

Hanson-Wright inequality and sub-gaussian concentration

Dimensionality Reduction Notes 1

APPENDIX A Some Linear Algebra

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Eigenvalues of Random Graphs

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

The Order Relation and Trace Inequalities for. Hermitian Operators

More metrics on cartesian products

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Lecture 4: September 12

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

P exp(tx) = 1 + t 2k M 2k. k N

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

CSCE 790S Background Results

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

An introduction to chaining, and applications to sublinear algorithms

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Section 8.3 Polar Form of Complex Numbers

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Composite Hypotheses testing

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Canonical transformations

Four lectures on probabilistic methods for data science

Lecture 3. Ax x i a i. i i

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Perron Vectors of an Irreducible Nonnegative Interval Matrix

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Errors for Linear Systems

Notes on Frequency Estimation in Data Streams

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Appendix B. Criterion of Riemann-Stieltjes Integrability

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

Singular Value Decomposition: Theory and Applications

Vapnik-Chervonenkis theory

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

THE WEIGHTED WEAK TYPE INEQUALITY FOR THE STRONG MAXIMAL FUNCTION

Another converse of Jensen s inequality

1 Matrix representations of canonical matrices

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

Supplement to Clustering with Statistical Error Control

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Affine transformations and convexity

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

e - c o m p a n i o n

Math 217 Fall 2013 Homework 2 Solutions

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

First Year Examination Department of Statistics, University of Florida

Lecture 5 September 17, 2015

Statistical pattern recognition

On the correction of the h-index for career length

The Feynman path integral

CIS 700: algorithms for Big Data

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Norm Bounds for a Transformed Activity Level. Vector in Sraffian Systems: A Dual Exercise

ON THE EXTENDED HAAGERUP TENSOR PRODUCT IN OPERATOR SPACES. 1. Introduction

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

A Note on Bound for Jensen-Shannon Divergence by Jeffreys

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Short running title: A generating function approach A GENERATING FUNCTION APPROACH TO COUNTING THEOREMS FOR SQUARE-FREE POLYNOMIALS AND MAXIMAL TORI

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Norms, Condition Numbers, Eigenvalues and Eigenvectors

On Finite Rank Perturbation of Diagonalizable Operators

Finding Primitive Roots Pseudo-Deterministically

Lecture 4: Universal Hash Functions/Streaming Cont d

Linear Approximation with Regularization and Moving Least Squares

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Randić Energy and Randić Estrada Index of a Graph

The Expectation-Maximization Algorithm

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Deriving the X-Z Identity from Auxiliary Space Method

s: 1 (corresponding author); 2

Lecture 12: Discrete Laplacian

A CHARACTERIZATION OF ADDITIVE DERIVATIONS ON VON NEUMANN ALGEBRAS

STEINHAUS PROPERTY IN BANACH LATTICES

The Geometry of Logit and Probit

Determinants Containing Powers of Generalized Fibonacci Numbers

Convexity preserving interpolation by splines of arbitrary degree

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

Asymptotic Properties of the Jarque-Bera Test for Normality in General Autoregressions with a Deterministic Term

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Edge Isoperimetric Inequalities

Limited Dependent Variables

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

REAL ANALYSIS I HOMEWORK 1

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Goodness of fit and Wilks theorem

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Homework Notes Week 7

2.3 Nilpotent endomorphisms

Linear Algebra and its Applications

Difference Equations

Transcription:

HANSON-WRIGHT INEQUALITY AND SUB-GAUSSIAN CONCENTRATION MARK RUDELSON AND ROMAN VERSHYNIN Abstract. In ths expostory note, we gve a modern proof of Hanson-Wrght nequalty for quadratc forms n sub-gaussan random varables. We deduce a useful concentraton nequalty for sub-gaussan random vectors. Two examples are gven to llustrate these results: a concentraton of dstances between random vectors and subspaces, and a bound on the norms of products of random and determnstc matrces. 1. Hanson-Wrght nequalty Hanson-Wrght nequalty s a general concentraton result for quadratc forms n sub-gaussan random varables. A verson of ths theorem was frst proved n [, 9], however wth one weak pont mentoned n Remark 1.. In ths artcle we gve a modern proof of Hanson-Wrght nequalty, whch automatcally fxes the orgnal weak pont. We then deduce a useful concentraton nequalty for sub-gaussan random vectors, and llustrate t wth two applcatons. Our arguments use standard tools of hgh-dmensonal probablty. The reader unfamlar wth them may beneft from consultng the tutoral [8]. Stll, we wll recall the basc notons where possble. A random varable ξ s called sub-gaussan f ts dstrbuton s domnated by that of a normal random varable. One can express ths by the growth of the moments E ξ p = Op) p/ as p. Ths can be quanttatvely captured by the sub-gaussan norm of ξ, whch s defned as ξ ψ = sup p 1/ E X p ) 1/p ; p 1 ξ s sub-gaussan whenever ths quantty s fnte. One can smlarly defne subexponental random varables, by settng ξ ψ1 = sup p 1 p 1 E X p ) 1/p. For an m n matrx A = a j ), recall that the operator norm of A s A = max x 0 Ax / x and the Hlbert-Schmdt or Frobenus) norm of A s A =,j a,j ) 1/. Throughout the paper, C, C 1, c, c 1,... denote postve absolute constants. Theorem 1.1 Hanson-Wrght nequalty). Let X = X 1,..., X n ) R n be a random vector wth ndependent components X whch satsfy E X = 0 and X ψ K. Let A be an n n matrx. Then, for every t 0, } [ P X T AX E X T t t )] AX > t exp c mn K 4 A, K. A Date: June 1, 013. M. R. was partally supported by NSF grant DMS 116137. R. V. was partally supported by NSF grant DMS 100189 and 16578. 1

MARK RUDELSON AND ROMAN VERSHYNIN Remark 1. Relaton to the orgnal Hanson-Wrght nequalty). Improvng upon an earler result on Hanson-Wrght [], Wrght [9] establshed a slghtly weaker verson of Theorem 1.1. Instead of A = a j ), both papers had a j ) n the rght sde. The latter norm can be much larger than the norm of A, and t s often less easy to compute. Ths weak pont went unnotced n several later applcatons of Hanson-Wrght nequalty, however t was clear to experts that t could be fxed. Proof. By replacng X wth X/K we can assume wthout loss of generalty that K = 1. Let us frst estmate } p := P X T AX E X T AX > t. Let A = a j ) n,j=1. By ndependence and zero mean of X, we can represent X T AX E X T AX =,j a j X X j a E X = a X E X ) +,j: j a j X X j. The problem reduces to estmatng the dagonal and off-dagonal sums: } p P a X E X ) > t/ + P a j X X j > t/ =: p 1 + p.,j: j Step 1: dagonal sum. Note that X E X are ndependent mean-zero subexponental random varables, and X E X ψ1 X ψ1 4 X ψ 4K. These standard bounds can be found n [8, Remark 5.18 and Lemma 5.14]. Then we can use a Bernsten-type nequalty see [8, Proposton 5.16]) and obtan [ t t )] [ t t )] p 1 c mn, exp c mn,. 1.1) a max a A A Step : decouplng. It remans to bound the off-dagonal sum S := a j X X j.,j: j The argument wll be based on estmatng the moment generatng functon of S by decouplng and reducton to normal random varables. Let λ > 0 be a parameter whose value we wll determne later. By Chebyshev s nequalty, we have p = P S > t/} = P λs > λt/} exp λt/) E expλs). 1.) Consder ndependent Bernoull random varables δ 0, 1} wth E δ = 1/. Snce E δ 1 δ j ) equals 1/4 for j and 0 for = j, we have S = 4 E δ S δ, where S δ =,j δ 1 δ j )a j X X j. Here E δ denotes the expectaton wth respect to δ = δ 1,..., δ n ). Jensen s nequalty yelds E expλs) E X,δ exp4λs δ ) 1.3)

where E X,δ denotes expectaton wth respect to both X and δ. Consder the set of ndces Λ δ = [n] : δ = 1} and express S δ = a j X X j = ) X j a j X. Λ δ Λ δ, j Λ c δ Now we condton on δ and X ) Λδ. Then S δ s a lnear combnaton of meanzero sub-gaussan random varables X j, j Λ c δ, wth fxed coeffcents. It follows that the condtonal dstrbuton of S δ s sub-gaussan, and ts sub-gaussan norm s bounded by the l -norm of the coeffcent vector see e.g. n [8, Lemma 5.9]). Specfcally, S δ ψ Cσ δ where σδ := ). a j X Λ δ Next, we use a standard estmate of the moment generatng functon of sub-gaussan random varables, see [8, Lemma 5.5]. It yelds j Λ c δ j Λ c δ E Xj ) j Λ c δ exp4λs δ ) expcλ S δ ψ ) expc λ σ δ ). Takng expectatons of both sdes wth respect to X ) Λδ, we obtan E X exp4λs δ ) E X expc λ σ δ ) =: E δ. 1.4) Recall that ths estmate holds for every fxed δ. It remans to estmate E δ. Step 3: reducton to normal random varables. Consder g = g 1,..., g n ) where g are ndependent N0, 1) random varables. The rotaton nvarance of normal dstrbuton mples that for each fxed δ and X, we have Z := ) g j a j X N0, σδ ). Λ δ j Λ c δ By the formula for the moment generatng functon of normal dstrbuton, we have E g expsz) = exps σ δ /). Comparng ths wth the formula defnng E δ n 1.4), we fnd that the two expressons are somewhat smlar. Choosng s = C λ, we can match the two expressons as follows: E δ = E X,g expc 1 λz) where C 1 = C. Rearrangng the terms, we can wrte Z = Λ δ X j Λ c a j g j ). Then we δ can bound the moment generatng functon of Z n the same way we bounded the moment generatng functon of S δ n Step, only now relyng on the sub-gaussan propertes of X, Λ δ. We obtan E δ E g exp [C λ Λδ ) a j g j ]. To express ths more compactly, let P δ denotes the coordnate projecton restrcton) of R n onto R Λ δ, and defne the matrx A δ = P δ AI P δ ). Then what we obtaned ) E δ E g exp C λ A δ g. j Λ c δ 3

4 MARK RUDELSON AND ROMAN VERSHYNIN Recall that ths bound holds for each fxed δ. We have removed the orgnal random varables X from the problem, so t now becomes a problem about normal random varables g. Step 4: calculaton for normal random varables. By the rotaton nvarance of the dstrbuton of g, the random varable A δ g s dstrbuted dentcally wth s g where s denote the sngular values of A δ. Hence by ndependence, E δ = E g exp C λ ) s g = E g exp C λ s g ). Note that each g has the ch-squared dstrbuton wth one degree of freedom, whose moment generatng functon s E exptg ) = 1 t) 1/ for t < 1/. Therefore E δ 1 C λ s ) 1/ provded max C λ s < 1/. Usng the numerc nequalty 1 z) 1/ e z whch s vald for all 0 z 1/, we can smplfy ths as follows: E δ expc 3 λ s ) = exp C 3 λ ) s provded max C 3 λ s < 1/. Snce max s = A δ A and s = A δ A, we have proved the followng: E δ exp C 3 λ A ) for λ c 0 / A. Ths s a unform bound for all δ. Now we take expectaton wth respect to δ. Recallng 1.3) and 1.4), we obtan the followng estmate on the moment generatng functon of S: E expλs) E δ E δ exp C 3 λ A ) for λ c 0 / A. Step 5: concluson. Puttng ths estmate nto the exponental Chebyshev s nequalty 1.), we obtan p exp λt/ + C 3 λ A ) for λ c 0 / A. Optmzng over λ, we conclude that [ t t )] p exp c mn A, =: pa, t). A Now we combne wth a smlar estmate 1.1) for p 1 and obtan p = p 1 + p pa, t). Repeatng the argument for A nstead of A, we get P X T AX E X T AX < t } pa, t). Combnng the two events, we obtan P X T AX E X T AX > t } 4pA, t). Fnally, one can reduce the factor 4 to by adjustng the constant c n pa, t). The proof s complete.

. Sub-gaussan concentraton Hanson-Wrght nequalty has a useful consequence, a concentraton nequalty for random vectors wth ndependent sub-gaussan coordnates. Theorem.1 Sub-gaussan concentraton). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P AX A > t } exp ct ) K 4 A. Remark.. The consequence of Theorem.1 can be alternatvely formulated as follows: the random varable Z = AX A s sub-gaussan, and Z ψ CK A. Remark.3. A few specal cases of Theorem.1 can be easly deduced from classcal concentraton nequaltes. For Gaussan random varables X, ths result s a standard consequence of Gaussan concentraton, see e.g. [4]. For bounded random varables X, t can be deduced n a smlar way from Talagrand s concentraton for convex Lpschtz functons [5], see [6, Theorem.1.13]. For more general random varables, one can fnd versons of Theorem.1 wth varyng degrees of generalty scattered n the lterature e.g. the appendx of [1]). However, we were unable to fnd Theorem.1 n the exstng lterature. Proof. Let us apply Hanson-Wrght nequalty, Theorem 1.1, for the matrx Q = A T A. Snce X T QX = AX, we have E XT QX = A. Also, note that snce all X have unt varance, we have K 1/. Thus we obtan for any u 0 that P AX A > u } [ exp C u K 4 mn A, u )] A T A. Let ε 0 be arbtrary, and let us use ths estmate for u = ε A. Snce A T A AT A = A A, t follows that P AX A > ε A } [ exp c mnε, ε ) A ] K 4 A..1) Now let δ 0 be arbtrary; we shall use ths nequalty for ε = maxδ, δ ). Observe that the lkely) event AX A ε A mples the event AX A δ A. Ths can be seen by dvdng both sdes of the nequaltes by A and A respectvely, and usng the numerc bound max z 1, z 1 ) z 1, whch s vald for all z 0. Usng ths observaton along wth the dentty mnε, ε ) = δ, we deduce from.1) that P } AX A > δ A exp cδ A ) K 4 A. Settng δ = t/ A, we obtan the desred nequalty..1. Small ball probabltes. Usng a standard symmetrzaton argument, we can deduce from Theorem.1 some bounds on small ball probabltes. The followng result s due to R. Latala et al. [3, Theorem.5]. 5

6 MARK RUDELSON AND ROMAN VERSHYNIN Corollary.4 Small ball probabltes). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 } A exp c A ) K 4 A. Remark.5. Informally, Corollary.4 states that the small ball probablty decays exponentally n the stable rank ra) = A / A. Proof. Let X denote an ndependent copy of the random vector X. Denote p = P AX y < 1 A }. Usng ndependence and trangle nequalty, we have p = P AX y < 1 A, AX y < 1 } A P AX X ) < A }..) The components of the random vector X X have mean zero, varances bounded below by and sub-gaussan norms bounded above by K. Thus we can apply Theorem.1 for 1 X X ) and conclude that P AX X ) < } A t) exp ct ) K 4 A, t 0. Usng ths wth t = 1 1/ ) A, we obtan the desred bound for.). The followng consequence of Corollary.4 s even more nformatve. It states that AX y A + y wth hgh probablty. Corollary.6 Small ball probabltes, mproved). Let A be a fxed m n matrx. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for every y R m we have P AX y < 1 } 6 A + y ) exp c A ) K 4 A. Proof. Denote h := A. Combnng the conclusons of Theorem.1 and Corollary.4, we obtan that wth probablty at least 1 4 exp ch /K 4 A ), the followng two estmates hold smultaneously: AX 3 h and AX y 1 h..3) Suppose ths event occurs. Then by trangle nequalty, AX y y AX y 3 h. Combnng ths wth the second nequalty n.3), we obtan that 1 AX y max h, y 3 ) h 1 6 h + y ). The proof s complete.

3. Two applcatons Concentraton results lke Theorem.1 have many useful consequences. We nclude two applcatons n ths artcle; the reader wll certanly fnd more. The frst applcaton s a concentraton of dstance from a random vector to a fxed subspace. For random vectors wth bounded components, one can fnd a smlar result n [6, Corollary.1.19], where t was deduced from Talagrand s concentraton nequalty. Corollary 3.1 Dstance between a random vector and a subspace). Let E be a subspace of R n of dmenson d. Consder a random vector X = X 1,..., X n ) where X are ndependent random varables satsfyng E X = 0, E X = 1 and X ψ K. Then for any t 0, we have P dx, E) n d } > t exp ct /K 4 ). Proof. The concluson follows from Theorem.1 for A = P E, the orthogonal projecton onto E. Indeed, dx, E) = P E X, P E = dme ) = n d and P E = 1. Our second applcaton of Theorem.1 s for operator norms of random matrces. The result essentally states that an m n matrx BG obtaned as a product of a determnstc matrx B and a random matrx G wth ndependent sub-gaussan entres satsfes BG B + n B wth hgh probablty. For random matrces wth heavy-taled rather than subgaussan components, ths problem was studed n [7]. Theorem 3. Norms of random matrces). Let B be a fxed m N matrx, and let G be an N n random matrx wth ndependent entres that satsfy E G j = 0, E G j = 1 and G j ψ K. Then for any s, t 1 we have P BG > CK s B + t n B ) } exp s r t n). Here r = B / B s the stable rank of B. Proof. We need to bound BGx unformly for all x S n 1. Let us frst fx x S n 1. By concatenatng the rows of G, we can vew G as a long vector n R Nn. Consder the lnear operator T : l Nn l m defned as T G) = BGx, and let us apply Theorem.1 for T G). To ths end, t s not dffcult to see that the the Hlbert-Schmdt norm of T equals B and the operator norm of T s at most B. The latter follows from BGx B G x B G, and from the fact the G s the Eucldean norm of G as a vector n l Nn ). Then for any u 0, we have P BGx > B + u} exp cu ) K 4 B. The last part of the proof s a standard coverng argument. Let N be an 1/-net of S n 1 n the Eucldean metrc. We can choose ths net so that N 5 n, see [8, Lemma 5.]. By a unon bound, wth probablty at least 5 n exp cu K 4 B ), 3.1) 7

8 MARK RUDELSON AND ROMAN VERSHYNIN every x N satsfes BGx B +u. On ths event, the approxmaton lemma see [8, Lemma 5.]) mples that every x S n 1 satsfes BGx B + u). It remans to choose u = CK s B + t n B ) wth suffcently large absolutely constant C n order to make the probablty bound 3.1) smaller than exp s r t n). Ths completes the proof. Remark 3.3. A couple of specal cases n Theorem 3. are worth mentonng. B = P s a projecton n R N of rank r then P P G > CK s r + t n) } exp s r t n). The same holds f B = P s an r N matrx such that P P T = I r. In partcular, f B = I N we obtan P G > CK s N + t } n) exp s N t n). 3.1. Complexfcaton. We formulated the results n Sectons and 3 for real matrces and real valued random varables. Usng a standard complexfcaton trck, one can easly obtan complex versons of these results. Let us show how to complexfy Theorem.1; the other applcatons follow from t. Suppose A s a complex matrx whle X s a real-valued random vector [ as] before. Re A Then we can apply Theorem.1 for the real m n matrx à :=. Note Im A that ÃX = AX, à = A and à = A. Then the concluson of Theorem.1 follows for A. Suppose now that both A and X are complex. Let us assume that the components X have ndependent real and magnary parts, such that Re X = 0, ERe X ) = 1, Re X ψ K, and smlarly [ for Im X. Then ] we can apply Theorem.1 for the real m n Re A Im A matrx A := and vector X Im A Re A = Re X Im X) R n. Note that A X = AX, A = A and A = A. Then the concluson of Theorem.1 follows for A. References [1] L. Erdös, H.-T. Yau, J. Yn, Bulk unversalty for generalzed Wgner matrces, Probablty Theory and Related Felds 154, 341 407. [] D. L. Hanson, E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables, Ann. Math. Statst. 4 1971), 1079 1083. [3] R. Latala, P. Mankewcz, K. Oleszkewcz, N. Tomczak-Jaegermann, Banach-Mazur dstances and projectons on random subgaussan polytopes, Dscrete Comput. Geom. 38 007), 9 50. [4] M. Ledoux, The concentraton of measure phenomenon. Mathematcal Surveys and Monographs, 89. Provdence: Amercan Mathematcal Socety, 005. [5] M. Talagrand, Concentraton of measure and sopermetrc nequaltes n product spaces, IHES Publ. Math. No. 81 1995), 73 05. [6] T. Tao, Topcs n random matrx theory. Amercan Mathematcal Socety, 01. [7] R. Vershynn, Spectral norm of products of random and determnstc matrces, Probablty Theory and Related Felds 150 011), 471 509. [8] R. Vershynn, Introducton to the non-asymptotc analyss of random matrces. Compressed sensng, 10 68, Cambrdge Unv. Press, Cambrdge, 01. If

9 [9] E. T. Wrght, A bound on tal probabltes for quadratc forms n ndependent random varables whose dstrbutons are not necessarly symmetrc, Ann. Probablty 1 1973), 1068 1070. Department of Mathematcs, Unversty of Mchgan, 530 Church St., Ann Arbor, MI 48109, U.S.A. E-mal address: rudelson, romanv}@umch.edu