arxiv: v3 [math.pr] 1 Nov 2007

Size: px

Start display at page:

Download "arxiv: v3 [math.pr] 1 Nov 2007"

Doris Sharp
6 years ago
Views:

1 arxiv: v3 [math.pr] Nov 2007 The Circular Law for Random Matrices February 2, 2008 F. Götze Faculty of Mathematics University of Bielefeld Germany A. Tikhomirov Faculty of Mathematics and Mechanics Sankt-Peterburg State University S.-Peterburg, Russia Abstract We consider the joint distribution of real and imaginary parts of eigenvalues of random matrices with independent real entries with mean zero and unit variance. We prove the convergence of this distribution to the uniform distribution on the unit disc without assumptions on the existence of a density for the distribution of entries. We assume that the entries have a finite moment of order larger than two and consider the case of sparse matrices. The results are based on previous work of Bai, Rudelson and the authors extending results to a larger class of sparse matrices. Introduction Let X jk, j,k <, be complex random variables with EX jk = 0 and E X jk 2 =. For a fixed n, denote by λ,...,λ n the eigenvalues of the n n matrix X = (X n (j,k)) n j,k=, X n(j,k) = n X jk, for j,k n, (.) and define its empirical spectral distribution function by G n (x,y) = n I {Re {λj } x, Im{λ j } y}, (.2) where I {B} denotes the indicator of an event B. We investigate the convergence of the expected spectral distribution function EG n (x,y) to the distribution function G(x,y) of the uniform distribution over the unit disc in R 2. The main result of our paper is the following Partially supported by RFBF grant N a, by RF grant of the leading scientific schools NSh Partially supported by DFG Project G0-420/5-

2 Theorem.. Let X jk be independent random variables with E X jk = 0, E X jk 2 =, and E X jk 2 ϕ(x jk ) κ, where ϕ(x) = (ln( + x )) 9+η, for some η > 0. Then E G n (x,y) converges weakly to the distribution function G(x, y) as n. We shall prove the same result for the follows class of sparse matrices. Let ε jk, j,k =,...,n denote Bernoulli random variables which are independent in aggregate and independent of (X jk ) n j,k= with p n := Pr{ε jk = }. Consider the matrix X (ε) = npn (ε jk X jk ) n j,k=. Let λ(ε),...,λ(ε) n denote the (complex) eigenvalues of the matrix X (ε) and denote by G (ε) n (x,y) the empirical spectral distribution function of the matrix X (ε), i. e. G (ε) n (x,y) := n Theorem.2. Let X jk be independent random variables with I (ε) {Re{λ j } x, Im{λ (ε) j } y}. (.3) E X jk = 0, E X jk 2 =, and E X jk 2 ϕ(x jk ) κ, where ϕ(x) = (ln( + x )) 9+η, for some η > 0. Assume that p n = O(n θ ) for some θ > 0. Then E G (ε) n (x, y) converges weakly to the distribution function G(x, y) as n. Remark.3. The crucial problem of the proofs of Theorems. and.2 is to bound the smallest singular values s (z) of the shifted matrices X zi and X (ε) zi. These bounds are based on the results obtained by Rudelson and Vershynin in [2]. In our preprint [0] we have used the corresponding results of Rudelson [20] proving the circular law in the case of i.i.d. sub-gaussian random variables. In fact, the results in [0] actually imply the circular law for i.i.d. random variables with E X jk 4 κ 4 < in view of the fact (explicitly stated by Rudelson in [20]) that in his results the sub-gaussian condition is needed for the proof of Pr{ X > K} C exp{ cn} only. Restricting oneself to the set Ω n (z) = {s (z) cn 3 ; X K} for the investigation of the smallest singular values, the bound Pr{Ω (c) n } cn 2 follows from the results of Rudelson [20] without the assumption of sub-gaussian tails for the matrix X. A similar result has been proved by Pan and Zhou in [5] based on results of Rudelson and Vershynin [2] and Bai and Silverstein [3]. The circular law assuming less restrictive moment condition of order larger than 2 only and comparable sparsity assumptions was proved independently by T. Tao and V. Vu in [25] based on the results of [26] in connection with the multivariate Littlewood Offord problem. The approach in this paper though is based on the fruitful idea of Rudelson and Vershynin to characterize the vectors leading to small singular values of matrices with independent entries via compressible and incompressible vectors, see [2], Section 3.2, 2

3 p. 5. For the approximation of the distribution of singular values of X zi we use a scheme different from the approach used in Bai []. The investigation of the convergence the spectral distribution functions of real or complex (non-symmetric and non-hermitian) random matrices with independent entries has a long history. Ginibre s in 965, [7], studied the real, complex and quaternion matrices with i. i. d. Gaussian entries. He derived the joint density for the distribution of eigenvalues of matrix. Applying Ginibre formula Mehta in 967, [7] determined the density of the expected spectral distribution function of random matrix with Gaussian entries with independent real and imaginary parts and deduced the circle law. Pastur suggested in 973 the circular law for the general case (see [8], p. 64). Using the Ginibre results, Edelman in 997, [5] proved the circular law for the matrices with i. i. d. Gaussian entries. Rider proved in [24] and [23] results about the spectral radius and about linear statistics of eigenvalues of non-hermitian matrices with Gaussian entries. Girko in 984, [6], investigated the circular law for general matrices with independent entries assuming that the distribution of the entries have densities. As pointed out by Bai [], Girko s proof had serious gaps. Bai in [] gave a proof of the circular law for random matrices with independent entries assuming that the entries had bounded densities and finite sixth moments. His result does not cover the case of the Wigner ensemble and in particular ensembles of matrices with Rademacher entries. These ensembles are of some interest in various applications, see e.g. [27]. Girko s [6] approach using families of spectra of Hermitian matrices for a characterization of the circular-law based on the so-called V-transform was fruitful for all later work. See, for example, Girko s Lemma in []. In fact, Girko [6] was the first who used the logarithmic potential to prove the circular law. We shall outline his approach using logarithmic potential theory. Let ξ denote a random variable uniformly distributed over the unit disc and independent of the matrix X. For any r > 0, consider the matrix, X(r) = X rξi, where I denotes the identity matrix of order n. Let µ (r) n (resp. µ n ) be empirical spectral measure of matrix X(r) (resp. X) defined on the complex plane as empirical measure of the set of eigenvalues of matrix. We define a logarithmic potential of the expected spectral measure E µ (r) n (ds,dt) as U (r) n (z) = n E log det(x(r) zi) = n E log λj z rξ, where λ,...,λ n are the eigenvalues of the matrix X. Note that the expected spectral measure E µ (r) n is the convolution of the measure E µ n and the uniform distribution on the disc of radius r (see Lemma 6.4 in the Appendix for details). Lemma.. Assume that the sequence E µ (r) n converges weakly to a measure µ as n and r 0. Then µ = lim E µ n. (.4) n 3

4 Proof. Let J be a random variable which is uniformly distributed on the set {,...,n} and independent of the matrix X. We may represent the measure E µ (r) n as distribution of a random variable λ J +rξ where λ J and ξ are independent. Computing the characteristic function of this measure and passing first to the limit with respect to n and then with respect to r 0 (see also Lemma 6.5 in the Appendix), we conclude the result. Now we may fix r > 0 and consider the measures E µ (r) n. They have bounded densities. Assume that the measures E µ n have supports in a fixed compact set and that E µ n converges weakly to a measure µ. Applying Theorem 6.9 (Lower Envelope Theorem) from [6], p. 73 (see also Subsection 6. in the Appendix), we obtain that under these assumptions lim inf n U(r) n (z) = U(r) (z), (.5) for quasi-everywhere in C (for the definition of quasi-everywhere see for example [6], p 24 and Subsection 6. in the Appendix). Here U (r) (z) denotes the logarithmic potential of measure µ (r) which is the convolution of a measure µ and of the uniform distribution on the disc of radius r. Furthermore, note that U (r) (z) may be represented as where U (r) (z 0 ) = 2 r r 2 vl(µ;z 0,v)dv, 0 L(µ;z 0,v) = π U (µ) (z 0 + v exp{iθ})dθ. (.6) 2π π Applying Theorem.2 in [6], p. 84, (Theorem 6.2 in Subsection 6. in the Appendix) we get lim r 0 U(r) µ (z) = U µ (z). Let s (X)... s n (X) denote the singular values of the matrix X. Since E Tr (XX ) = the sequence of measures E µ n is weakly relatively compact. These results imply that for any η > 0 we may restrict the measures E µ n to some compact set K η such that sup n E µ n (K (c) η ) < η. Moreover, Lemma 6.2 implies the existence of a compact K such that lim n sup n E µ n (K (c) )=0. If we take some subsequence of the sequence of restricted measures E µ n which converges to some measure µ, then lim inf n U (r) µ n (z) = U (r) µ (z), r > 0 and lim r 0 U (r) µ (z) = U µ (z). If we prove that lim inf n U µ (r) n (z) exists and U µ (z) is equal to the logarithmic potential corresponding the uniform distribution on the unit disc then the sequence of measures E µ n weakly converges to the uniform distribution on the unit disc. Moreover, it is enough to prove that for some sequence r = r(n) 0, lim n U µ (r) n (z) = U µ (z). Furthermore, let s (ε) (z,r)... s(ε) X (ε) (z,r) = X (ε) (r) zi. We shall investigate the logarithmic potential U (r) n (z, r) denote the singular values of matrix µ n (z). Using elementary properties of singular values (see for instance Lemma 3.3 [8], p.35), we 4

5 may represent the function U (r) µ n (z) as follows U (r) µ n (z) = n E log s (ε) j (z,r) = 2 0 log x ν (ε) n (dx,z,r), where ν n (ε) (,z,r) denotes the expected spectral measure of the matrix H (ε) n (z,r) = (X (ε) (r) zi)(x (ε) (r) zi), which is the expectation of the counting measure of the set of eigenvalues of the matrix H (ε) n (z,r)). In Section 2 we investigate convergence of measure ν n (ε) (,z) = ν (ε) (,z,0). In Section 3 we study the properties of the limit measures ν(,z). But the crucial problem for the proof of the circular law is the so called regularization of potential problem. We solve this problem using bounds for the minimal singular values of matrices X (ε) (z) := X (ε) zi based on techniques developed in Rudelson [20] and Rudelson and Vershynin [2]. These bounds are given in Section 4 and in the Appendix, Subsection.2. In Section 5 we give the proof of the main Theorem. In the Appendix we combine precise statements of relevant results from potential theory and some auxiliary inequalities for the resolvent matrices. In the what follows we shall denote by C and c or α,β,δ,ρ,η (without indexes) some general absolute constant which may be change from line to line. To specify a constant we shall use subindexes. By I A we shall denote the indicator of an event A. For any matrix G we denote the Frobenius norm by G 2 and we denote by G the operator norm. Acknowledgment. The authors would like to thank Terence Tao for drawing our attention to a gap in a first version of the paper. The authors would like to thank Dmitry Timushev for careful reading of the manuscript. 2 Convergence of ν (ε) n (, z) Denote by F (ε) n (x,z) the distribution function of the measure ν (ε) n (,z), F (ε) n (x,z) = n E I {(s (ε) j (z)) 2 <x}, where s (ε) (z)... s(ε) n (z) 0 denote the singular values of the matrix X (ε) (z) = X (ε) zi. For a positive random variable ξ and a Rademacher random variable (r. v.) (ε) κ consider the transformed r. v. ξ = κ ξ. If ζ has distribution function F n (x,z) the variable ζ (ε) has distribution function F n (x,z), given by F n (ε) (x,z) = (ε) ( + sgn{x}f n (x 2,z)) 2 for all real x. Note that this induces a one-to-one corresponds between the respective measures ν n (ε) (,z) and ν n (ε) (,z). The limit distribution function of F n (ε) (x,z) as n, 5

6 is denoted by F(,z). The corresponding symmetrization F(x,z) is the limit of as n. We have sup x F (ε) n (x,z) F(x,z) = 2sup x (ε) F n (x,z) F(x,z). F (ε) n (x, z) Denote by s (ε) n (α,z) (resp. s(α,z)) and S n (ε) (x, z) (resp. S(x, z)) the Stieltjes transforms of the measures ν n (ε) (,z) (resp. ν(,z)) and ν n (ε) (, z) (resp. ν(, z)) correspondingly. Then we have S (ε) n (α,z) = αs (ε) n (α 2,z), S(α,z) = αs(α 2,z). Remark 2.. As is shown in Bai [], the measure ν(,z) has a density p(x,z) with bounded support. More precisely, p(x, z) C max{, x }. Thus the measure ν(,z) has bounded support and bounded density p(x,z) = x p(x 2,z). Theorem 2.2. Let E X jk = 0, E X jk 2 =. Assume for some function ϕ(x) > 0 such that ϕ(x) as x and such that the function x/ϕ(x) is non-decreasing we have Then κ := max E X jk 2 ϕ(x jk ) <. (2.) j,k< sup F n (ε) (x,z) F(x,z) Cκ(ϕ( np n )) 6. (2.2) x Corollary 2.. Let E X jk = 0, E X jk 2 =, and κ = max E X jk 3 <. (2.3) j,k< Then sup F n (ε) (x,z) F(x,z) C(np n) 2. (2.4) x (ε) Proof. To bound the distance between the distribution functions F n (x,z) and F(x,z) we investigate the distance between their the Stieltjes transforms. Introduce the Hermitian 2n 2n matrix ( On (X W = (ε) ) zi) (X (ε) zi), O n where O n denotes n n matrix with zero entries. From Šur s complement formula (see for example [2], Ch. 08, p. 2) it follows that, for α = u + iv, v > 0, ( (W αi 2n ) α ( X = (ε) (z)(x (ε) (z)) α 2 I ) X (ε) (z) ( X (ε) (z)(x (ε) (z)) α 2 I ) ) ( (X (ε) (z)) X (ε) (z) α 2 I ) (X (ε) (z)) α ( (X (ε) (z)) X (ε) (z) α 2 I ) (2.5) 6

7 where X (ε) (z) = X (ε) zi and I 2n denotes the unit matrix of order 2n. By definition of S n (ε) (α,z), we have S n (ε) (α,z) = 2n E Tr(W αi 2n). Set R(α,z) := (R j,k (α,z)) 2n j,k= = (W αi 2n). It is easy to check that We may rewrite this equality as + αs (ε) n (α,z) = 2n np n z 2n We introduce the notations + αs (ε) n (α,z) = 2n E TrWR(α,z). j,k= E (ε jk X jk R k+n,j (α,z) + ε jk X jk R j+n,k (α,z)) E R j,j+n (α,z) z 2n A = (X (ε) (z)(x (ε) (z)) α 2 I), E R j+n,j (α,z). (2.6) B = X (ε) (z)a, C = ((X (ε) (z)) X (ε) (z) α 2 I), D = C(X (ε) (z)). With these notations we rewrite equality (2.5) as follows ( ) R(α,z) = (W αi 2n ) αa B =. (2.7) D αc Equalities (2.7) and (2.6) together imply + αs (ε) n (α,z) = 2n np n j,k= E (ε jk X jk R k+n,j (α,z) + ε jk X jk R j,k+n (α,z)) z 2n E TrD z E TrB. (2.8) 2n In the what follows we shall use a simple resolvent equality. For two matrices U and V let R U = (U αi), R U+V = (U + V αi), then R U+V = R U R U VR U+V. Let {e,...e 2n } denote the canonical orthonormal basis in R 2n. Let W (jk) denote the matrix is obtained from W by replacing the both entries X j,k and X j,k by 0. In our notation we may write W = W (jk) + npn ε jk X jk e j e T k+n + npn ε jk X jk e k+n e T j. (2.9) 7

8 Using this representation and the resolvent equality, we get R = R (j,k) ε jk X jk R (j,k) e j e T k+n R ε jk X jk R (j,k) e k+n e T j R. (2.0) npn npn Here and in the what follows we omit the arguments α and z in the notation of resolvent matrices. For any vector a, let a T denote the transposed vector a. Applying the resolvent equality again, we obtain R = R (j,k) npn ε jk X jk R (j,k) e j e T k+n R(j,k) npn ε jk X jk R (j,k) e k+n e T j R(j,k) + T (jk), (2.) where T (jk) = npn ε jk X jk R (j,k) e j e T k+n (R(j,k) R) + npn ε jk X jk R (j,k) e j e T k+n (R(j,k) R) + npn ε jk (X jk )R (j,k) e k+n e T j (R(j,k) R) + npn ε jk X jk R (j,k) e k+n e T j (R (j,k) R). (2.2) This implies R j,k+n = R (j,k) j,k+n ε jk X jk R (j,k) j,j npn R (j,k) k+n,k+n ε jk X jk (R (j,k) npn j,k+n )2 + T (j,k) j,k+n, R k+n,j = R (j,k) k+n,j ε jk X jk R (j,k) npn k+n,j R(j,k) j,k+n ε jk X jk R (j,k) npn k+n,k+n R(j,k) j,j + T (j,k) k+n,j. (2.3) Applying these notations to the equality (2.8) and taking into account that X jk and R (jk) are independent, we get + αs (ε) n (α,z) + z 2n TrD + z 2n TrB = n 2 p n 2n 2 p n 2n np n j,k= j,k= E ε jk R (j,k) j,j R (j,k) k+n,k+n E ε jk X jk 2 E (R (j,k) j,k+n )2 j,k= E (ε jk X jk T (j,k) k+n,j + ε jkx jk T (j,k) j,k+n ). (2.4) 8

9 From (2.0) it follows immediately that for any p,q =,...,2n, j,k =,... n, R p,p R p,p (j,k) Cε jk X jk ( R jk pj R k+n,p + R jk npn p,k+n R jp ). (2.5) Since n m,l= R m,l 2 n/v 2 and n m,l= Rjk m,l 2 n/v 2, equality (2.3) implies n 2 j,k= E R (j,k) j,k+n 2 C nv4. (2.6) By definition (2.2) of T (j,k), applying standard resolvent properties, we obtain the following bounds, for any z = u + iv, v > 0, n np n j,k= E ε jk X jk T (j,k) j,k+n Cκ v 3 ϕ( np n ). (2.7) For the proof of this inequality see Lemma 6.3 in the Appendix. Using the last inequalities we obtain, that for v > 0 E R jj R k+n,k+n n n n 2 E R (jk) jj R (jk) k+n,k+n k= k= C n 2 E ε jk X jk ( R (jk) jj R k+n,j + R (jk) np n v j,k+n R jj ) k= C nv4. (2.8) Since n n R jj = n n k= R k+n,k+n = 2nTrR(α,z), we obtain n 2 k= E R (jk) jj R (jk) k+n,k+n E ( 2n TrR(α,z))2 C nv4. (2.9) Note that for any Hermitian random matrix W with independent entries on and above the diagonal we have E n TrR(α,z) E n TrR(α,z) 2 C nv2. (2.20) The proof of this inequality is easy and due to a martingale type expansion already used by Girko. Inequalities (2.9) and (2.20) together imply that for v > 0 n 2 E R (jk) jj R (jk) k+n,k+n (S(ε) n (α,z)) 2 C nv4. (2.2) k= 9

10 Denote by r(α,z) some generic function with r(α,z) not necessary the same from line to line. We may now rewrite equality (2.8) as follows + αs n (ε) (α,z) + (S n (ε) (α,z)) 2 = z 2n E TrD z 2n E TrB + r(α,z) v 3 ϕ( np n ). (2.22) where v > cϕ( np n )/n. We now investigate the functions T(α,z) = n E TrB and V (α,z) = ned. Since the arguments for both functions are similar we provide it for the first one only. By definition of the matrix B, we have TrB = npn j,k= According to equality (2.7), we have ε jk X j,k (X (ε) (z)(x (ε) (z)) α 2 ) ) kj ztra. TrB = α np n ε jk X j,k R kj ztra. j,k= Using the resolvent equality (2.0) and Lemma 6.3, we get, for v > cϕ( np n )/n T(α,z) = αn 2 Similar to (2.2) we obtain n 2 j,k= j,k= E R (jk) k,k+n R(jk) jj E R (jk) jj R (jk) k,k+n V (α,z)s(ε) n (α,z) Inequalities (2.23) and (2.24) together imply, for v > cϕ( np n )/n, Analogously we get T(α,z) = zs(ε) n (α, z) α + S (ε) n (α,z) + z Cκr(α,z) α S(ε) n (α,z) + v 3 ϕ( np n ). (2.23) C nv4. (2.24) Cκr(α,z) ϕ( np n )v 3 α + S n (ε) (α,z). (2.25) V (α,z) = zs(ε) n (α, z) α + S n (ε) (α,z) + θ C ϕ( np n )v 3 α + S n (ε) (α,z). (2.26) Inserting (2.25) and (2.26) in (2.4), we get (S (ε) n (α,z))2 + αs (ε) n (α,z) + z 2 S n (ε) (α, z) 0 α + S (ε) n (α,z) = δ n(z), (2.27)

11 where Cκ δ n (α,z) ϕ( np n )v 3 S n (ε) (α,z) + α. or equivalently ( ) 2 ) S n (ε) (α,z) α + S n (ε) (α,z) + (α + S n ( (ε) α,z) where δ n (α,z) = θ Cκr(α,z) ϕ( np n)v 3. We may rewrite the last equation as where S (ε) n (α,z) = z 2 S (ε) n (α,z) = δ n (α,z), (2.28) α + S n (ε) (α, z) (α + S n (ε) (α,z)) 2 z + δ n (α,z), (2.29) 2 δ n (α,z) δ n (α,z) = (α + S n (ε) (2.30) (α,z)) 2 z 2. Furthermore, we prove the following simple Lemma. Lemma 2.2. Let α = u + iv, v > 0. Let S(α,z) satisfy the equation and Im{S(α, z)} > 0. Then the following inequality holds. α + S(α,z) S(α,z) = (α + S(α,z)) 2 z 2. (2.3) S(α,z) 2 z 2 S(α,z) 2 α + S(α,z) 2 v v +. Proof. For α = u + iv with v > 0, the Stieltjes transform S(α,z) satisfies the following equation α + S(α,z) S(α,z) = (α + S(α,z)) 2 z 2. (2.32) Comparing the imaginary parts of both sides of this equation, we get Im{α + S(α,z)} = Im{α + S(α,z)} α + S(α,z) 2 + z 2 (α + S(α,z)) 2 z 2 + v. (2.33) 2 Equations (2.3) and (2.33) together imply Im{α + S(α,z)} ( α + S(α,z) 2 + z 2 ) (α + S(α,z)) 2 z 2 2 = v. (2.34) Since v > 0 and Im{α + S(α,z)} > 0, it follows that α + S(α,z) 2 + z 2 (α + S(α,z)) 2 z 2 2 > 0.

12 In particular we have S(α,z). Inequality (2.34) and the last remark together imply The proof is completed. α + S(α,z) 2 + z 2 (α + S(α,z)) 2 z 2 2 = v Im{α + S(α,z)} To compare the functions S(α,z) and S n (α,z) we prove Lemma 2.3. Let Then the following inequality holds Proof. By the assumption, we have δ n (α,z) v 2. α + S(ε) n (α,z) 2 + z 2 (α + S n (ε) (α,z)) 2 z 2 v 2 4. Im{ δ n (α,z) + α} > v 2. Repeating the arguments of Lemma 2.2 completes the proof. v v +. The next Lemma give us a bound for the distance between the Stieltjes transforms S(α,z) and S (ε) n (α,z). Lemma 2.4. Let Then S (ε) δ n (α,z) v 8. n (α,z) S(α,z) 4 δ n (α,z). v Proof. Note that S(α,z) and S (ε) n (α,z) satisfy the equations α + S(α,z) S(α,z) = (α + S(α,z) 2 z 2 (2.35) and S n (ε) (α,z) = α + S n (ε) (α, z) (α + S n (ε) (α,z) 2 z + δ n (α,z) (2.36) 2 respectively. These equations together imply S(α,z) S (ε) n (α,z) = (α + S (ε) n (α,z))(α + S(α,z)) + z 2 ((α + S(α,z) 2 z 2 )((α + S (ε) n (α,z) 2 z 2 ) (S(α,z) S (ε) n (α,z)) + δ n (α,z). (2.37) 2

13 Applying inequality ab 2 (a2 + b 2 ), we get (α + S n (ε) (α,z))(α + S(α,z)) + z 2 ((α + S(α,z) 2 z 2 )((α + S n (ε) (α,z) 2 z 2 ( ) α + S n(α,z) 2 + z 2 2 (α + S n (ε) (α,z)) 2 z ( α + S(α,z) 2 + z 2 ) 2 (α + S(α,z)) 2 z 2 2. The last inequality and Lemmas 2.2 and 2.3 together imply (α + S n (ε) (α,z))(α + S(α,z)) + z 2 ((α + S(α,z) 2 z 2 )((α + S n (ε) (α,z) 2 z 2 ) v 4. This completes the proof of the Lemma. To bound the distance between the distribution function F n (x,z) and the distribution function F(x,z) corresponding the Stieltjes transforms S n (α,z) and S(α,z) we use Corollary 2.3 from [9]. In the next lemma we give an integral bound for the distance between the Stieltjes transforms S(α,z) and S (ε) n (α,z). Lemma 2.5. For v v 0 (n) = c(ϕ( np n )) 6 the inequality holds. S(α,z) S (ε) n (α,z) du C( + z 2 )κ ϕ( np n )v 7. Proof. Note that (α + s (ε n (α,z)) 2 z 2 α + s (ε n (α,z) z ) α + s (ε n (α,z) + z ) v 2. (2.38) It follows from here that δ n (α,z) C v 5 ϕ( np n) and δ n (α,z) v/8 for v c(ϕ( np n )) /6. Lemma 2.4 implies that it is enough to prove inequality where γ n = δ n (α,z) du Cγ n, C v 6 ϕ( np n). By definition of δ(α,z), we have δ n (α,z) du cκ v 3 ϕ( np n ) du (α + S n (ε) (α,z)) 2 z 2. (2.39) 3

14 Furthermore, the representation (2.29) implies that (α + S n (ε) (α,z)) 2 z 2 S (ε) n (α, z) α + S n (ε) (α,z) + Note that, according to the relation (2.27), δn (α,z) α + S n (ε) (α,z). (2.40) α + S n (ε) (α,z) z 2 (ε) Sn (α, z) α + S n (ε) (α,z) + 2 S(ε) n (α,z) + δ n (α,z) α + S n (ε) (α,z). (2.4) 2 This inequality implies S n (ε) (α, z) α + S n (ε) (α,z) du C( + z 2 ) v 2 S n (ε) (α,z) 2 du+ It follows from the relation (2.27), for v > c(ϕ( np n )) 6, that δ n (α,z) δ n (α,z) S n (ε) (α, z) α + S n (ε) (α,z) du. (2.42) Cκ (ϕ( < /2. (2.43) np n ))v4 The last two inequalities together imply that for sufficiently large n and v > c(ϕ( np n )) 6, S n (ε) (α, z) α + S n (ε) (α,z) du C( + z 2 ) v 2 S n (ε) (α,z) 2 du C( + z 2 ) v 3. (2.44) The inequalities (2.4), (2.39), and the definition of δ n (α,z) together imply If we choose v such that δ n (α,z) du C( + z 2 ) v 6 ϕ( np n ) + Cκ v 4 ϕ( np n ) Cκ v 4 ϕ( np n) < 2 we obtain δ n (α,z) du. (2.45) δ n (α,z) du C( + z 2 ) ϕ( np n )v 6. (2.46) In Section 3 we show that the measure ν(,z) has bounded support and bounded density for any z. To bound the distance between the distribution functions F n (ε) (x,z) and F(x,z) we may apply Corollary 3.2 from [9] (see also Lemma 6.6 in the Appendix). We take V = and v 0 = C(ϕ( np n )) 6. Then Lemmas 2.2 and 2.3 together imply sup F n (ε) (x,z) F(x,z) C(ϕ( np n )) 6. (2.47) x 4

15 3 Properties of the measure ν(, z) In this Section we investigate the properties of the measure ν(,z). At first note that there exists a solution S(α,z) of the equation such that, for v > 0, S(α,z) + α S(α,z) = (S(α,z) + α) 2 z 2 (3.) Im{S(α,z)} 0 and S(α,z) is an analytic function in the upper half-plane α = u + iv, v > 0. This follows from the relative compactness of the sequence of analytic functions S n (α,z), n N. From (2.35) it follows immediately that S(α, z). (3.2) Set y = S(x,z) + x and consider the equation (2.35) on the real line or Set y y = y 2 + x, (3.3) z 2 y 3 xy 2 + ( z 2 )y + x z 2 = 0. (3.4) x 2 = z ( + 8 z 2 ) z 2, x 2 2 = z 2 ( + 8 z 2 ) z 2. (3.5) It is straightforward to check that for z 3( z 2 ) x and x 2 2 and x 2 2 = 0 for z =, and x2 2 > 0 for z >. < 0 for z < Lemma 3.. In the case z equation (3.4) has one real root for x x and three real roots for x > x. In the case z > equation (3.4) has one real root for x 2 x x and has tree real roots for x x 2 or for x x. Proof. Set We consider the roots equation The roots of this equation are L(y) := y 3 xy 2 + ( z 2 )y + x z 2. L (y) = 3y 2 2xy + ( z 2 ) = 0. (3.6) y,2 = x ± x 2 3( z 2 ). 3 5

16 This implies that, for z and for x 3( z 2 ), the equation (3.4) has one real root. Furthermore, direct calculations show that L(y )L(y 2 ) = 27 ( 4 z 2 x 4 + (8 z z 2 )x 2 + 4( z 2 ) 3). Solving the equation L(y )L(y 2 ) = 0 with respect to x, we get for z and 3( z 2 ) x x and for z and x > L(y )L(y 2 ) 0, 20+8 z (+8 z 2 ) z 2 L(y )L(y 2 ) < 0, These relations imply that for z the function L(y) has three real roots for x x and one real root for x < x. Consider the case z > now. In this case y,2 are real for all x and x 2 2 > 0. Note that for x x 2 and for x x and L(y )L(y 2 ) 0 L(y )L(y 2 ) > 0 for x 2 < x < x. These implies that for z > and for x 2 < x < x the function L(y) has one real root and for x x 2 or for x x the function L(y) has three real roots. The Lemma is proved. Remark 3.. From Lemma 3. it follows that the measure ν(x,z) has a density p(x,z) and p(x,z), for all x and z; for z, if x x then p(x,z) = 0; for z, if x x or x x 2 then p(x,z) = 0; p(x, z) > 0 otherwise. The next lemma is an analogue of Lemma 4.4 in Bai []. Lemma 3.2. The following equality ( ) log xν(dx, z) s holds. 0 6 = R{g(x,z)} (3.7) 2

17 Proof. Following Bai [] Lemma 4.4, we consider We have I(C) := C 0 y(x) dx. (3.8) s y 3 + 2xy 2 + x 2 y z 2 y + y + x = 0. (3.9) Taking the derivatives with respect to x and s correspondingly, we get y ( 3y 2 + 4xy + ( z 2 + x 2 ) ) = 2y(x + y) (3.0) x and y ( 3y 2 + 4xy + ( z 2 + x 2 ) ) = 2sy. (3.) s These equalities together imply From equation (3.9) it follows that y s = 2sy y + 2y(x + y) x. (3.2) + 2y(y + x) = ± + 4 z 2 y 2. (3.3) Using the results of Remark 3., it is straightforward to check that for z + 2y(y + x) = + 4 z 2 y 2 (3.4) and for z > there exists a number x 0 such that + 4 z 2 y 2 = 0. Furthermore, we have for x 0 x 0 + 2y(y + x) = + 4 z 2 y 2 (3.5) and for x < x 0 we obtain Using these equalities, we get For z, we have 0 C y 0 s dx = C 0 C + 2y(y + x) = + 4 z 2 y 2. (3.6) y 0 s dx = C 2sy y + 4 z 2 y 2 x dx = 2sy y dx. (3.7) + 2y(x + y) x s ( + 4 z 4 z 2 2 y 2 ( C) + ) + 4 z 2 ( z 2 ). (3.8) 7

18 In the limit C, we get, for z, 0 For z >, we have 0 y s dx = 0 x 0 2sy y x0 + 4 z 2 y 2 x dx Similar to Bai [] (equality (4.39)) we have 0 C y(x)dx = After differentiation we get s 0 0 C = ln C + = ln C + y(x)dx = 0 0 ln uν(du,z) = s y s dx = s 2. (3.9) C 0 0 2sy y + 4 z 2 y 2 x dx = u + x ν(du,z)dx [ln(u + C) ln u]ν(du,z) s 2 z 2. (3.20) ln( + u C )ν(du,z) ln uν(du, z). (3.2) Relations (3.9) (3.22) together imply the result. 0 ln( + u C )ν(du,z) 0 0 C y(x)dx. (3.22) s 4 The smallest singular value Let X (ε) = npn (ε jk X jk ) n j,k= be an n n matrix with independent entries ε jkx jk, j,k =,...,n. Assume that E X jk = 0 and E Xjk 2 = and ε jk denote Bernoulli random variables with p n = Pr{ε jk = }, j,k =,...,n. Denote by s (ε) (z)... s(ε) n (z) the singular values of the matrix X (ε) (z) := X (ε) zi. In this Section we prove a bound for the minimal singular value of the matrices X (ε) (z). We prove the following result. Theorem 4.. Let X jk be independent random complex variables with E X jk = 0 and E X jk 2 =, which are uniformly integrable, i.e. max j,k E X jk 2 I { Xjk >M} 0 as M 0. (4.) Let ε jk, j,k =,...,n be independent Bernoulli random variables with p n := Pr{ε jk = }. Assume that ε jk are independent from X jk in aggregate. Let p n = O(n θ ) for some 0 < θ. Let K. Then there exist constants c,c,b > 0 depending on θ and K such that for any z C and positive ε we have Pr{s (ε) n (z) ε/nb ; s (ε) (z) Kn p n } exp{ cp n n} + C ln n npn. (4.2) 8

19 Remark 4.2. Let X jk be i.i.d. random variables with E X jk = 0 and E X jk 2 =. Then the condition (4.) holds. Remark 4.3. Consider the event A that there exists at least one row with zero entries only. Its probability is given by Pr{A} ( ( p n ) n ) n. (4.3) Simple calculations show that if np n ln n for all n, then Pr{A} δ > 0. (4.4) Hence in the case np n ln n and np n we have no invertibility with positive probability Remark 4.4. The proof of Theorem 4. uses ideas of Rudelson and Vershynin [2], to classify with high probability vectors x in the (n )-dimensional unit sphere S n such that X (ε) (z)x 2 is extremely small into two classes called compressible and incompressible vectors. We develop our approach for shifted sparse and normalized matrices X (ε) (z). The generalization to the case of complex sparse and shifted matrices X (ε) (z) is straightforward. For details see for example the paper of Götze and Tikhomirov [0] and proof of the Lemma 4. below. Remark 4.5. We can relax the condition p n = O(n θ ) to p n = o(n/ln2 n). The quantity B in Theorem 4. should be of order ln n in this case. See Remark 4.0 for details. Lemma 4.. Let x = (x,...,x n ) S n be a fixed unit vector and X (ε) (z) be a matrix as in Theorem 4.. Then there exist some positive absolute constants γ 0 and c 0 such that for any 0 < τ γ 0 Pr{ X (ε) (z)x 2 τ} exp{ c 0 np n } exp{ c 0 n}, (4.5) where x y denotes the larger of x and y Proof of Lemma 4.. Recall that E X ij = 0 and E X ij 2 = 0. Assume first that X ij are real independent r.v. with mean zero, and variance at least. Let X (ε) ij = X ij ε ij with independent Bernoulli variables which are independent of X ij in aggregate and let z = 0. Assume also that x is a real vector. Then X (ε) x 2 2 = 2 x np n k X jk ε jk =: ζj 2. (4.6) np n By Chebyshev s inequality we have k= k= Pr{ ζ 2 j < τ2 np n } = Pr{ τ2 np n 2 2 n ζj 2 > 0} exp{np nτ 2 t 2 /2} E exp{ t 2 ζj 2 /2}. (4.7) 9

20 Using e t2 /2 = E exp{itξ} where ξ is a standard Gaussian random variable, we obtain Pr{ ζj 2 < τ 2 np n } exp{np n τ 2 t 2 /2} n E ξj n k= E εjk X jk exp{itξ j x k ε jk X jk }, (4.8) where ξ j, j =,...,n denote i.i.d. standard Gaussian r.v. s and E Z denotes expectation with respect to Z conditional on all other r. v. s. For every α, x [0, ] and ρ (0, ) the following inequality holds ( β ρ αx + α x β β, (4.9) α) (see [4], inequality (3.7)). Take α = Pr{ ξ j C } for some absolute positive constant C which will be chosen later. Then it follows from (4.8) that Pr{ ζj 2 < τ2 np n } exp{np n τ 2 t 2 /2} ( ( n n ) ) α E ξ j E εjk X jk exp{itξ j x k ε jk X jk } ξ j C + α. (4.0) Furthermore, we note that k= E εjk X jk exp{itξ j x k ε jk X jk } exp{ 2 ( E ε jk X jk exp{itξ j x k ε jk X jk } 2 )} ( exp{ p n ( p n )( Ref jk (tx k ξ j )) + p ) n 2 ( f jk(tx k ξ j ) 2 ) }, (4.) where f jk (u) = E exp{iux jk }. Assuming (4.), choose a constant M > 0 such that supe X jk 2 I { Xjk >M} /2. (4.2) jk Since cos x /24x 2 for x, conditioning on the event ξ j C, we get for 0 < t /(MC ) Ref jk (tx k ξ j ) = E Xjk ( cos(tx k X jk ξ j )) 24 t2 x 2 k ξ2 j E X jk 2 I { Xjk M}, (4.3) and similarly f jk (tx k ξ j ) 2 = E Xjk ( cos(tx k Xjk ξ j )) 24 t2 x 2 k ξ2 je X jk 2 I { Xjk M} (4.4) It follows from (4.) for 0 < t < /(MC ) and for some constant c > 0 E εjk X jk exp{itξ j x k ε jk X jk } exp{ cp n t 2 x 2 k ξ2 j }. (4.5) 20

21 This implies that conditionally on ξ j C and for 0 < t /(MC ) n k= E εjk X jk exp{itξ j x k ε jk X jk } exp{ cp n t 2 ξj 2 }. (4.6) Let Φ 0 (x) := 2Φ(x), x > 0 where Φ(x) denotes the standard Gaussian distribution function. It is straightforward to show that ( ) E ξj exp{ cpn t 2 ξj 2 } ξ j C ) (C + 2t 2 cp n Applying Taylor s formula, we obtain Φ 0 ( C + 2ct 2 p n ) Φ 0 (C ) = + 2ct 2 p n Φ 0 Φ 0 Φ 0 (C ) ( ) = + + 2t 2 cp n (C ( + ) + 2ct 2 p n Φ 0 (C ) Using that for 0 < y < 8 we have y/4 + y y/2 and Φ 0 Φ 0 (C ), we get Φ 0 ( C + 2ct 2 p n ) Φ 0 (C ). (4.7). (4.8) ( C ( + ) + 2t 2 p n c + ct 2 p n Φ 0 (C ) Φ 0 (C ). (4.9) We may choose C large enough such that following inequalities hold E ξj ( exp{ cp n t 2 ξ 2 j } ξj C ) + ct2 p n /8 + ct 2 p n /4 exp{ ct 2 p n /24} (4.20) for all t /(MC ) < 8. Inequalities (4.8), (4.9), (4.), (4.20) together imply that for any β (0,) Pr { n ζj 2 < τ 2 } ( np n exp{npn τ 2 t 2 /2} exp{ cβnt 2 p n /24} + ( ) nβ β β ). (4.2) α Without loss of generality we may take C sufficiently large, such that α 4/5 and choose β = 2/5. Then we obtain Pr{ ( ζj 2 < τ 2 np n } exp{np n τ 2 t 2 /2} exp{ ct 2 np n /60} + 2 ( )2n 3 ). (4.22) 2

22 For τ < c 60 we conclude from here that for t /(MC ) Pr{ ζj 2 < τ 2 np n } exp{ ct 2 np n /20}. (4.23) Inequality (4.23) implies that inequality (4.5) holds with some positive constant c 0 > 0. This concludes the proof in the real case. Consider now the general case. Let X jk = ξ jk + iη jk with i = with E X jk 2 = and x k = u k + iv k and z = u + iv. In this notation we have Pr{ (X (ε) zi)x 2 τ} exp{τ 2 np n t 2 /2}min E exp t2 (ξ jk u k η jk v k )ε jk 2 np n (uu j vv j ) /2, k= E exp t2 (ξ jk v k + η jk u k )ε jk 2 np n (vu j + uv j ) /2. k= (4.24) Note that for x = (x,...,x n ) S (n ) (the unit sphere in C n ) and for any set A {,...,n} max{ x k 2, x k 2 } /2. (4.25) k A k A c For any j =,...,n we introduce the set A j as follows A j := {k {,...,n} : E ξ jk u k η jk v k 2 x k 2 /2}. (4.26) It is straightforward to check that for any k / A j E η jk u k + ξ jk v k 2 x k 2 /2. (4.27) According to inequality (4.25), for any j =,...,n, there exist a set B j such that k B j x k 2 /2 (4.28) and for any k B j E ξ jk u k η jk v k 2 x k 2 /2, (4.29) or E η jk u k + ξ jk v k 2 x k 2 /2. (4.30) Introduce the following random variables for any j, k =,..., n ζ jk := ξ jk u k η jk v k, (4.3) 22

23 and ζ jk := η jk u k + ξ jk v k. (4.32) The inequalities (4.29) and (4.30) together imply that one of the following two inequalities { } card j : for any k B j E ζ jk 2 x k 2 /2 n/2 (4.33) or { } card j : for any k B j E ζ jk 2 x k 2 /2 n/2 (4.34) holds. If (4.33) holds we shall bound the first term on the right hand side of (4.24). In the other case we shall bound the second term. In what follows we may repeat the arguments leading to inequalities (4.0) (4.6). Thus the Lemma is proved. For any q n (0,) and K > 0 to be chosen later we define K n := Kn p n, q n := q n / ( ln(2/p n )ln K n ) and pn := p n / ( ln(2/p n )ln K n ). Without loss of generality we shall assume that ln K n / ln γ 0 and ln K n >. (4.35) Proposition 4.6. Assume there exist an absolute constant c > 0 and values γ n,q n (0,) such that for any x C S (n ) Pr{ X (ε) (z)x 2 γ n and X (ε) (z) K n } exp{ cnq n } (4.36) holds. Then there exists a constant δ 0 > 0 depending on K and c only such that, for k < δ 0 n q n, Pr{ inf x S k C X (ε) (z)x 2 γ n /2 and X (ε) (z) K n } exp{ cnq n /8}. Proof. Let η > 0 to be chosen later. There exists an η net N in S k C of cardinality N ( 3 η )2k (see e.g. Lemma 3.4 in [20]). By condition (4.36), we have for τ γ n Pr{ there exists x N : X (ε) (z)x 2 < τ and X (ε) (z) K n } ( 3 η )2k exp{ cnq n }. (4.37) Let V be the event that X (ε) (z) K n and X (ε) (z)y 2 2τ for some point y S (k ) C. Assume that V occurs and choose a point x N such that y x 2 η. Then X (ε) (z)x 2 X (ε) (z)y 2 + X (ε) (z) x y 2 2 τ + K nη = τ (4.38) if we set η = τ/(2k n ). Hence, (( 3 ) 2δ0 /(ln K n ln(2/p n)) c ) 0 nqn. Pr(V ) exp{ η 4 } (4.39) Note that under assumption (4.35) we have Choosing δ 0 = c 80 and τ = γ n, we conclude the proof. 2ln(3/η) ln 2ln K n 0. (4.40) 23

24 Following Rudelson and Vershynin [2], we shall partition the unit sphere S (n ) into the two sets of socalled compressible and incompressible vectors and we will show the invertibility of X on each set separately. Definition 4.7. Let δ,ρ (0,). A vector x R n is called Sparse if supp(x) δn. A vector x S (n ) is called compressible if x is within Euclidean distance ρ from the set of all sparse vectors. A vector x S (n ) is called incompressible if it is not compressible. The sets of sparse, compressible and incompressible vectors depending on δ and ρ will be denoted by Sparse(δ), Comp(δ, ρ), Incomp(δ, ρ), (4.4) respectively. Lemma 4.2. Let X (ε) (z) be a random matrix as in Theorem.2, and let K n = Kn p n with a constant K. Assume there exist an absolute constant c > 0 and values γ n,q n (0,) such that for any x C S (n ) Pr{ X (ε) (z)x 2 γ n and X (ε) (z) K n } exp{ cnq n } (4.42) holds. Then there exist δ,c that depend on K and c only, such that { } Pr inf x Comp(δ bq n, ρ X(ε) (z)x 2 γ n and X (ε) (z) K n exp{ c nq n }, (4.43) n) C where ρ n := γ n /(4K n ). Proof. At first we estimate the invertibility for sparse vectors. Let k = [δ n q n ] with some positive constant δ which will be chosen later. According to Proposition 4.6 for any δ δ 0 and for any τ γ n /2, we have the following inequality { } Pr inf x Sparse(δ bp X(ε) (z)x 2 τ and X (ε) (z) K n n) C { } = Pr there exist σ, σ = k : inf x R σ C, x 2 = X(ε) (z)x 2 τ and X (ε) (z) K n ( ) n exp{ c 0 nq n /8}. k Using Stirling s formula, we get for some absolute positive constant C ( ) n n ln Cδ q n ln(δ q n ). (4.44) k We may choose δ small enough that ( ) n n ln c 0 q n /6. (4.45) k 24

25 Thus we get { } Pr inf x Sparse(δ bp X(ε) (z)x 2 τ and X (ε) (z) K n exp{ c nq n }. (4.46) n) C Choose ρ := γ := γ n /4. Let V be the event that X (ε) (z) K n and X (ε) (z)y 2 γ for some point y Comp(δ p n,ρkn ). Assume that V occurs and choose a point x Sparse(δ p n ) such that y x 2 ρkn. Then Hence, X (ε) (z)x 2 X (ε) (z)y 2 + X (ε) (z) x y 2 γ + ρ = γ n /2. (4.47) Thus the Lemma is proved. Pr(V ) exp{ c 0 8 nq n}. (4.48) Lemma 4.3. Let δ,ρ (0,). Let x Incomp(δ,ρ). Then there exists a set σ(x) {,...,n} of cardinality σ(x) 2nδ such that x k 2 2 ρ2 (4.49) and ρ 2n x k k σ(x) which we shall call spread set of x henceforth. nδ/2, for any k σ(x) (4.50) Proof. See proof in [2], p. 6, proof of Lemma 3.4. For the readers convenience we repeat this proof here. Consider the subsets of {,...,n} defined by σ (x) := {k : x k δn/2 }, σ 2 (x) = {k : x k ρ 2n }, (4.5) and put σ(x) = σ (x) σ 2 (x). Denote by P σ(x) the orthogonal projection onto R σ(x) in R n. By Chebyshev s inequality σ (x) c δn/2. Then y := P σ (x) cx Sparse(δ), so the incompressibility of x implies that P σ (x)x 2 = x y 2 > ρ. By the definition of σ 2 (x), we have P σ2 (x) cx 2 n ρ2 2n = ρ2 /2. Hence Thus the Lemma is proved. P σ(x) x 2 2 P σ (x)x 2 2 P σ 2 (x)x 2 2 ρ2 /2. (4.52) Remark 4.8. If x Incomp(δ p n,ρ) then there exists a set σ(x) with cardinality σ(x) 2 nδ p n such that ρ 2n x k nδ pn /2 (4.53) and P σ(x) x ρ2. (4.54) 25

26 Let Q(η) = sup jk sup u C Pr{ X jk u η}. Introduce the maximal concentration function of the weighed sums of the rows of the matrix (X jk ) n j,k=, p x (η) = max sup Pr{ j {,...,n} u C X jk ε jk x k u η}. (4.55) We shall now bound this concentration function and prove a tensorization lemma for incompressible vectors. Lemma 4.4. Let δ n and ρ n be some functions of n such that ρ n,δ n (0,). Let η 0 and r 0 as in Lemma 6.7. Let x Incomp(δ n,ρ n ). Then there exists positive constants r and r 2 depending on r 0 such that for any 0 < η η 0 we have k= p x (ηρ n / 2n) r 2 δ n np n (4.56) for nδ n p n /3 and for nδ n p n > /3. p x (ηρ n / 2n) r < (4.57) Proof. Put m = nδ n. We have sup u Pr{ m X jk ε jk x k u ηρ n / m 2n} Pr{ ε jk = 0} k= + Pr{ k= m X jk ε jk x k u ηρ n / 2n; k= m ε jk }. k= (4.58) Introduce σ(x) := {k {,...,n} : ρ n / 2n x k / m/2}. Since x Incomp(δ n,ρ n ) the cardinality of σ(x) is at least m/2. Using that the concentration function of sum of independent random variables is less then concentration function of its summands, we obtain sup u Pr{ m X jk ε jk x k u ηρ n / 2n} ( p n ) m + Q(η)( ( p n ) m ). (4.59) k= According to Lemma 6.7 in the Appendix for any η η 0, we have Q(η) r 0 <. Assume that mp n /3. Then we have sup u Pr{ m X jk ε jk x k u ηρ n / 2n} r 0 + ( r 0 )e mpn k= ( e /3 )( r 0 ) =: r <. (4.60) 26

27 If mp n /3 then ( p n ) m mp n /3 and sup u Pr{ m X jk ε jk x k u ηρ n / 2n} ( r 0 )mp n /3 =: r 2 mp n. (4.6) k= The Lemma is proved. Now we state a tensorization lemma. Lemma 4.5. Let ζ,...,ζ n be independent non-negative random variables. Assume that Pr{ζ j λ n } q n (4.62) for some positive q n (0,) and λ n > 0. Then there exists positive absolute constants K and K 2 such that Pr{ ζj 2 K2 nq nλ 2 n } exp{ K 2nq n }. (4.63) Proof. We repeat the proof of Lemma 4.4 in [3]. Let t = K qn λ n. For any τ > 0 we have Pr{ ζj 2 n nt2 } e nτ E exp{ τζj 2 /t2 }. (4.64) Furthermore, E exp{ τζ 2 j /t 2 } = = 0 Pr{exp{ τζ 2 j /t 2 } > s}ds Pr { /s > exp{τζ j /t 2 } } ds 0 exp{ τλ 2 n /t 2 } 0 ds + ( q n )ds exp{ τλ 2 n /t2 } q n ( exp{ τλ 2 n/t 2 }) = q n ( exp{ τ/(k 2 q n )}). (4.65) Choosing τ := q n /4 and K 2 := 4ln2, we get Thus the Lemma is proved. Pr{ ζj 2 nt 2 } exp{ nq n /2}. (4.66) Recall that we assumed p n = O(n θ ), θ > 0. For this fixed θ consider L := [ θ ]. Hence by definition p n,l := (n p n ) l p n 0, n for l =,...,L and lim sup n (np n ) L p n > 0. We put p n,l :=. We shall assume that n is large enough such that (np n ) L p n q > 0 for some constant q > 0. Starting with a decomposition of C 0 := S (n ) into compressible vectors x in 27

28 Ĉ := C 0 Comp(δ p n,,ρ n, ), where p n, = p n, ρ n, = γ 0 /(4K n ), and the constants γ 0 and δ as in Lemma 4. and Lemma 4.2 respectively. Then Lemma 4. implies inequality (4.42) with q n replaced by p n and γ n replaced by γ 0. Hence, using Lemma 4.2, one obtains the claim for the subset of vectors Ĉ. The remaining vectors xx in C 0 lie in C := Incomp(δ p n,,ρ n, ). According to Lemmas 4.4, 4.5 we again have inequality (4.42) for these vectors but with new parameters q n = np n δ p n, and γ n = cρ n, δ p n,. Thus we may again subdivide the vectors in C into the vectors within distance ρ n,2 from these sparses ones i.e. Ĉ 2 := C Comp(δ 2 p n,2,ρ n,2 ) and the remaining ones, i.e. C 2 := C Incomp(δ 2 p n,2,ρ n,2 ). Iterating this procedure L times we arrive at the incompressible set C L of vectors xx where Lemmas 4.4, 4.5 and Proposition 4.6 yield the required bound of order exp{ δn}, for sufficiently small absolute constant δ > 0. Summarizing, we will determine iteratively constants δ l, ρ n,l, for l =,...,L and the following sets of vectors C l := l i= Incomp(δ i p n,i,ρ n,i ) (4.67) and Note that Ĉ l := C l Comp(δ l p n,l,ρ n,l ) with C 0 = S (n ). (4.68) S (n ) = L l= Ĉl C L. (4.69) The main bounds to carry out this procedure are given in the following Lemmas 4.6 and 4.7. Lemma 4.6. Let δ n,ρ n (0,) and let x Incomp(δ n,ρ n ) and X (ε) (z) be a matrix as in Theorem 4.. Then there exists some positive constants c and c 2 depending on K, r 0, η 0 such that for any 0 < τ γ n with Pr{ X (ε) (z)x 2 τ} exp{ c n((p n nδ n ) )} (4.70) where a b denotes the minimum from a and b. γ n := c 2 ρ n δn, (4.7) Proof. Assume at first that nδ n p n /3. According to Lemma 4.4, we have, for any j =...,n, Pr{ X jk ε jk x k u η 0 ρ n / 2n} r δ n np n. (4.72) sup u C k= Applying Lemma 4.5 with q n = r δ n np n, we get Pr{ X (ε) (z)x 2 γ n /2 and X (ε) (z) K n } exp{ cnδ n np n }. (4.73) Consider now the case nδ n p n /3. According to Lemma 4.4, we have sup u C Pr{ X jk ε jk x k u η 0 ρ n / 2n} r. (4.74) k= 28

29 Applying Lemma 4.5 with q n = r δ n np n, we get Pr{ X (ε) (z)x 2 γ n /2 and X (ε) (z) K n } exp{ cn}. (4.75) This completes the proof of the Lemma. Lemma 4.7. For l = 2,...,L assume that δ i,ρ n,i have been already determined for i =,...,l. Then there exist absolute constants ĉ l > 0 and c l > 0 and δ l > 0 such that Pr{ inf x b C l X (ε) (z)x 2 γ n,l and X (ε) (z) K n } exp{ c l n(((n p n ) l p n ) )}, with γ n,l defined by and ρ n,l defined by where Ĉl := C l Comp(δ l p n,l,ρ n,l ). (4.76) γ n,l = ĉ l ρ n,l δl p n,l, (4.77) Remark 4.9. There exists some absolute constant c > 0 that ρ n,l := γ n,l /(4K n ), (4.78) γ n,l cn L/2 and ρ n,l cn (L+3)/2. (4.79) Proof of the Remark. Note that p n,l = O(n +lθ ) This implies that γ n,l = L(L )θ ρ n,o(n According to Lemmas 4. and 4.2, we have ρ n get 2 +L( θ) ) (4.80) = O(n 3 θ 2 ). After simple calculations we γ n,l = O(nL/2 ). (4.8) Proof of Lemma 4.7. To prove of this Lemma we may use arguments similar to those in the proofs of Lemmas 2.6 and 3.3 in [2]. From x C l it follows that x Incomp(δ l p n,l,ρ n,l ). Applying Lemma 4.6 with δ n = p n,l and ρ n = ρ n,l, we get with Pr{ X (ε) (z)x 2 γ n,l and X (ε) (z) K n } exp{ c n((np n p n,l ) )} (4.82) Inequality (4.82) and Lemma 4.2 together imply γ n,l = c 2 ρ n,l δl p n,l. (4.83) Pr{ inf x C l X (ε) (z)x 2 γ n,l and X (ε) (z) K n } exp{ c n p n,l } (4.84) with δ l defined in Lemma 4.2 and ρ n,l := γ n,l /(4K n ). (4.85) Thus the Lemma is proved. 29

30 The next Lemma gives an estimate of small ball probabilities adapted to our case. Lemma 4.8. Let x Incomp(δ,ρ n,l ). Let X,...,X n be random variables with zero mean and variance at least. Assume that the following condition holds, L(M) := max n max k n E X k 2 I { Xk >M} 0 as M. (4.86) Then there exists some constants C > 0 depending on δ such that for every ε > 0 p x (ερ n,l / 2n) := sup v Pr{ k= Proof. Put L := [ log 2 (ρ n,l 2δ)]. Note that ρ n,l 2n x k ε k X k v ερ n,l / 2n} C ln n npn. (4.87) 2 L +/2 nδ 2ρ n,l 2n. (4.88) According to Remark 4.9, we have ρ n,l cn L. This implies L C ln n. Let σ(x) denote the spread set of the vector x, i.e. { σ(x) := k : ρ n,l / } 2 2n x k. (4.89) nδ By Lemma 4.3, we have σ(x) nδ/2. (4.90) We divide the spread interval of the vector x into L + 2 intervals l, l = 0,...,L + by { } ρ n,l 0 : = k : x k 2n 2, (4.9) L +/2 nδ { } 2 2 l : = k : 2 l nδ x k 2 l, l =,...,L +. (4.92) nδ Note that there exists an l 0 = 0,...,L + such that l0 nδ/(2(l + 2)) Cn/lnn. (4.93) Let y = P l0 x. Put a l := min k l x k and b l := max k l x k. Choose a constant M such that L(M) /2. By the properties of concentration functions, we have By definition of l0, we have p x (ερ n,l / 2n) p y (ερ n,l / 2n) p y (Mb l0 ). (4.94) k l0 x k 2 a 2 l 0 l0 ρ 2 n,l/(2n) l0, (4.95) 30

31 and Define a l0 b l0 2. (4.96) D(ξ,λ) = λ 2 E ξ 2 I { ξ <λ} (4.97) and introduce for a random variable ξ, ξ := ξ ξ where ξ denotes an independent copy of ξ. Put ξ k := x k ε k X k. We use the following inequality for a concentration function of a sum of independent random variables p y (Mb l0 ) CMb l0 λ 2 k D( ξ k ε k ;λ k ) k l0 2 (4.98) with λ k Mb l0. See Petrov [22], p.43, Theorem 3. Put λ k = M x k. It is straightforward to check that, λ 2 k D( ξ k ε k ;λ k ) p n x k 2 (E X k 2 L(M)). (4.99) k l0 k l0 This implies λ 2 k D( ξ k ε k ;λ k ) p n 2 k l0 Combining this inequality with (4.98) and (4.94) we obtain p x (ερ n,l / 2n) CMb l 0 l0 p n a l0 The last relation concludes the proof. Invertibility for the incompressible vectors via distance. k l0 x k 2 p n 2 l 0 a 2 l 0. (4.00) CM C ln n. (4.0) l0 p n npn Lemma 4.9. Let X,X 2,...,X n denote the columns of np n X (ε) (z), and let H k denotes the span of all column vectors except k-th. Then for every δ,ρ (0,) and every η > 0 one has { Pr inf x C L X (ε) (z)x 2 < η(ρ n,l / n) 2 / np n Proof. Note that { Pr } nδ L } Pr{dist(X k, H k ) < ηρ n,l / n}. inf X (ε) (z)x 2 < η(ρ n,l / n) 2 / np n x bc L { Pr inf x Incomp(δ L,ρ n,l ) X(ε) (z)x 2 < η(ρ n,l / n) 2 / } np n. (4.02) k= 3

32 For the upper bound of the r.h.s. of (4.02) see [2], proof of Lemma 3.5. For reader convenience we repeat this proof. Introduce the matrix G := np n X (ε) (z). Recall that X,...,X n denote the column vector of the matrix G and H k denotes the span of all column vectors except the k th. Writing Gx = n k= x kx k, we have Put Then Gx max dist(x kx k, H k ) = max x k dist(x k, H k ). (4.03) k=,...,n k=,...,n p k := Pr { dist(x k, H k ) < ηρ n,l / n }. (4.04) E {k : dist(x k, H k ) < ηρ n,l / n} = p k. (4.05) Denote by U the event that the set σ := {k : dist(x k,h k ) ηρ n,l / n} contains more than ( δ L )n elements. Then by Chebyshev s inequality Pr{U c } nδ L k= p k. (4.06) On the other hand, for every incompressible vector x, the set σ 2 (x) := {k : x k ρ n,l / n} contains at least nδ L elements. (Otherwise, since P σ2 (x) cx 2 ρ n,l, we have x y 2 ρ n,l for the sparse vector y := P σ2 (x)x, which would contradict the incompressibility of x). Assume that the event U occurs. Fix any incompressible vector x. Then σ + σ 2 (x) > ( δ L )n + nδ L > n, so the sets σ and σ 2 (x) have nonempty intersection. Let k σ σ 2 (x). Then by (4.03) and by definitions of the sets σ and σ 2 (x), we have Summarizing we have shown that k= Gx 2 x k dist(x k, H k ) ηρ n,l n /2. (4.07) Pr{ inf Gx 2 η(ρ n,l n /2 ) 2 } Pr{U c } x Incomp(δ L,ρ n,l ) nδ L This completes the proof. p k. (4.08) k= We now reformulate Lemma 3.6 from [2]. Let X n to be any unit vector orthogonal to X,...,X n. Consider the subspace H n = span(x,...,x n ). Lemma 4.0. Let δ l,ρ l,c l, l =,...,L be as in Lemma 4.2 and δ L,ρ L,c L as in Lemma 4.7. Then there exists an absolute constant ĉ L > 0 such that } Pr {X / C L and X (ε) (z) K n exp{ ĉ L np n }. (4.09) 32

33 Proof. Note that The event {X / C L S (n ) = L l= Ĉl C L. (4.0) and X (ε) (z) K n } implies that the event E := { inf x L l= b C l : x 2 = occurs for any positive c. This implies, for c > 0, L Pr{ l= inf x b C l : x 2 = X (ε) (z)x 2 c and X (ε) (z) K n } (4.) Pr{X / C L and X (ε) (z) K n } (4.2) X (ε) (z)x c and X (ε) (z) K n }. (4.3) Now choose c := min{γ n,l, l =,...,L }. Applying Lemma 4.7 proves the claim. Lemma 4.. Let X (ε) (z) be a random matrix as in Theorem.2. Let X,...,X n denote column vectors of matrix np n X (ε) (z), and consider the subspace H n = span(x,...,x n ). Let K n = Kn p n. Then we have Pr{dist(X n, H n ) < ρ n,l / n and X (ε) (z) K n } C ln n npn. (4.4) Proof. We repeat Rudelson and Vershynin s proof of Lemma 3.8 in [2]. Let X be any unit vector orthogonal to X,X 2,...,X n. We can choose X so that it is a random vector that depends on X,X 2,...,X n only and is independent of X n. We have dist(x n, H n ) < X n,x >. We denote the probability with respect to X n by Pr n and the expectation with respect to X,...,X n by E,...,n. Then Pr{dist(X n, H n ) < ρ n,l / n and X (ε) (z) K n } E,...,n Pr n { < X,X n > ρ n,l / n and X C L } + Pr{X / C L and X (ε) (z) K n }. (4.5) According to Lemmas 4.0, the second term in the right hand side of the last inequality is less then exp{ ĉ L n}. Since the vectors X = (a,...,a n ) S (n ) and X n = (ε ξ,...,ε n ξ n ) are independent, we may use small ball probability estimates. We have S =< X n,x >= a k ε k ξ k. k= 33

arxiv: v1 [math.pr] 22 May 2008

arxiv: v1 [math.pr] 22 May 2008 THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered