Non-convex Optimization for Linear System with Pregaussian Matrices. and Recovery from Multiple Measurements. Yang Liu

Size: px

Start display at page:

Download "Non-convex Optimization for Linear System with Pregaussian Matrices. and Recovery from Multiple Measurements. Yang Liu"

Scot Morgan
5 years ago
Views:

1 Non-convex Optimization for Linear System with Pregaussian Matrices and Recovery from Multiple Measurements by Yang Liu Under the direction of Ming-Jun Lai Abstract The extremal singular values of random matrices in l 2 -norm, including Gaussian random matrices, Bernoulli random matrices, subgaussian random matrices, etc, have attracted major research interest in recent years. In this thesis, we study the -singular values, defined in terms of the l -uasinorm, of pregaussian random matrices. We give the upper tail probability estimate on the largest -singular value of pregaussian random matrices for 0 < 1, and also the lower tail probability estimate. Particularly, these estimates show that the largest -singular value is of order m 1/ with high probability for pregaussian random matrices of size m by m. Moreover, we also give probabilistic estimates for the smallest -singular value of pregaussian random matrices. In addition, we also present some results on the largest p-singular value for p > 1, and some numerical-experimental results as well. Compressed sensing, a techniue for recovering sparse signals, has also been an active research topic recently. The extremal singular values of random matrices have applications in compressed sensing, mainly because the restricted isometry constant of sensing matrices depends on them. We prove that the pregaussian random matrices with m much less than N but much larger than N /2 have the -modified restricted isometry property for 0 < 1 with overwhelming probability. As a result, we show that every sparse vector can be recovered

2 as a solution to the l -minimization problem with overwhelming probability if m is much less than N but much larger than N /2. In compressed sensing, we also show that the real and complex null space properties NSP are euivalent for the sparse recovery by l -minimization and more generally for the NSP for the joint-sparse recovery from multiple measurements via l -minimization. These results answer the open uestions raised by Foucart and Gribonval. We also extend Berg and Friedlander s theorem on NSP for recovery from multiple measurements. As a conseuence of the euivalence on NSP and the extension, we give a necessary and sufficient condition for the uniueness of the solution to the multiple-measurement-vector non-convex optimization problem. Index words: Optimization, Random Matrices, Sparse Recovery, Null Space Property

3 Non-convex Optimization for Linear System with Pregaussian Matrices and Recovery from Multiple Measurements by Yang Liu B.S., Guangzhou University, China, 2003 M.S., Sun Yat-sen University, China, 2006 A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Reuirements for the Degree Doctor of Philosophy Athens, Georgia 2010

5 Non-convex Optimization for Linear System with Pregaussian Matrices and Recovery from Multiple Measurements by Yang Liu Approved: Major Professor: Ming-Jun Lai Committee: Edward Azoff Alexander Petukhov Andrew Sornborger Jingzhi Tie Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia August 2010

6 Dedication To my parents, Ye-Ren Liu and Guo-E Long, to my sister Fen Liu, and to my brother Zhen Liu. iv

7 Acknowledgments First of all, I want to give the greatest thanks to my advisor, Prof. Ming-Jun Lai, for his guidance, encouragement, and support. He is such a great advisor who has inspired me so much in my mathematical research and my career. Without him, this thesis would never have been possible. His teaching has made the research subjects interesting to me, and he has motivated me to do cutting edge research, particularly to work on open problems in the area. I would like to express my utmost gratitude to him for the wonderful advisory work on my completion of the thesis and on helping me to achieve career goals. I am very grateful to Prof. Edward Azoff and Prof. Jingzhi Tie for their very helpful advices in the course of my doctoral study at the University of Georgia, especially the valuable suggestions and helpful discussions concerning some part of the work in the thesis, for their generosity on their time, and furthermore for helping me in shaping my career. Moreover, I am also very thankful to Prof. Alexander Petukhov and Prof. Andrew Sornborger for serving on my advisory committee and for stimulating my interest in the applied side of mathematics. In addition, thanks to many other faculty members in the Department of Mathematics for the support through my years here. My sincere thanks also go to my friends here, who were always there to help whenever I needed them, including Jennifer Belton, Qianying Hong, Tin Kong, Leopold Matamba, Jie Yu, and many others. It is a big blessing in my life having these precious friends. There are also many friends out there who have made my life in Athens more enjoyable. I wish to thank them for their help in my life as well, which I think is also very important to my academic life here. Last but not the least, I am deeply indebted to my family. They have supported me through my studies and life, and I could never have come this far without their constant v

8 vi support, encouragement, and patience. Their endless love has been the source of my courage and strength to overcome difficulties and to pursue my dreams.

9 Preface The work in this thesis is mainly on optimization and random matrices. This thesis is organized in the following way: Chapter 1 gives an introduction of the thesis; the following three chapters are devoted to the study of generalized singular values of pregaussian matrices, including Chapter 2 on the largest -singular value of pregaussian random matrices for 0 < 1, Chapter 3 on the smallest -singular value of pregaussian random matrices for 0 < 1, and additionally Chapter 4 on the largest p-singular value of pregaussian random matrices for p > 1; in Chapter 5, we study the -modified restricted isometry property for 0 < 1; and lastly, Chapter 6 is on the null space property for recovery from multiple measurements via -minimization. vii

10 Contents Page Acknowledgments v Preface vii List of Figures x List of Tables xi Chapter 1 Introduction Probabilistic Estimate on the Largest -singular Values of Pregaussian Random Matrices Preliminaries Linear Combination of Pregaussian random variables The Largest -Singular Value of Pregaussian random matrices Probabilistic Estimate on the Smallest -singular Values of Pregaussian Random Matrices Singular Value of Pregaussian Random Matrices in l The -singular Values of an m n Matrix The Smallest -singular Value of Random Matrices Upper Tail Probability of the Smallest -singular Value 49 4 The Probabilistic Estimates on the Largest p-singular Value of Pregaussian Random Matrices for p > viii

11 ix 4.1 Introduction Lower Tail Probability of Largest p-singular Value for p > Upper tail probability of the largest p-singular value for p > Modified Restricted Isometry Property and Sparse Recovery Modified Restricted Isometry Property modified Isometry Property On the Sparse Recovery Null Space Property for Recovery from Multiple Measurements via l -minimization Introduction Null Space Property for Recovery via l minimization Real versus Complex Null Space Properties for l On Recovery from Multiple Measurements The Null Space Property for Recovery from Multiple Measurements Bibliography Appendix A Examples on the Duality B Non-convex Geometry Under Linear Map C Some Numerical Experiments on p-sigular Value for p > 1 and -singular Value for 0 < 1 of Random Matrices

12 List of Figures A.1 Minimization of the maximum of y 1 and y 1 + y B.1 Parallelograms B.2 The graph of difference function f θ C.1 Largest 2-singular value of Gaussian random matrices C.2 Largest 1-singular value of Gaussian random matrices: Experiment C.3 Largest 1-singular value of Gaussian random matrices: Experiment C.4 Largest 1-singular value of Gaussian random matrices: Experiment C.5 Largest -singular value of Gaussian random matrices C.6 Largest 1 -singular value of Gaussian random matrices C.7 Largest 1 -singular value of Gaussian random matrices C.8 Largest 1 -singular value of rectangular Gaussian random matrices x

13 List of Tables 3.1 Results obtained on the probabilistic estimates on the largest and smallest -singular values of pregaussian random suare matrices for 0 < xi

14 Chapter 1 Introduction The largest singular value and the smallest singular value of random matrices in l 2 -norm, including Gaussian random matrices, Bernoulli random matrices, subgaussian random matrices, etc, have attracted major research interest in recent years, and have applications in compressed sensing, a techniue for recovering sparse or compressible signals. For instance, [Soshnikov and Fyodorov 2002, [50]] and [Soshnikov 2005, [51]], studied the largest singular value of random matrices, and [Rudelson and Vershynin 2008, [43]], [Rudelson and Vershynin 2008, [44]], [Tao and Vu 2009, [59]], and some others, studied the smallest singular values. In the study of the asymptotic behavior of eigenvalues of symmetric random matrices, Wigner symmetric matrix is a typical example, whose upper or lower diagonal entries are independent random variables with uniform bounded moments. Wigner proved in [Wigner 1958, [62]] that the normalized eigenvalues are asymptotically distributed in the semicircular distribution. Precisely, let A be a symmetric gaussian random matrix of size n n whose upper diagonal entries are independent and identically-distributed copies of the standard gaussian random variable, then the probability distribution function of the eigenvalues of 1 n A is asymptotically 1 2π 4 x2 dx for x 2 p x := 0 for x > as the size n goes to infinity. This is the well-known Wigner s Semicircle law. More generally, for a random matrix whose entries are independent and identically-distributed i.i.d. copies 1

15 2 of a complex random variable with mean 0 and variance 1, Tao and Vu showed in [Tao and Vu 2008, [56]] and [Tao and Vu 2010, [60]] that the eigenvalues of 1 n A converges to the uniform distribution on the unit circle as n goes to, and that holds not only for the random matrices with real entries but also for complex entries. Their result has also generalized Girko s circular law in [Girko 1984, [27]] and solved the circular law conjecture open since the 1950 s, that the smallest eigenvalue converges to the uniform distribution over the unit disk as n tends to infinity see also [4]. For random matrices whose entries are i.i.d. random variable satisfying certain moment conditions, the largest singular value was studied in [Geman 1980, [26]] and [Yin, Bai, and Krishnaiah 1988, [65]]. Furthermore, the distribution of the eigenvalue of Wishart matrices, W N,n = AA, where A = A N,n is an N n random Gaussian random matrix, was studied in [Soshnikov 2002, [50]]. More generally, Seginer in [Seginer 2000, [47]] compared the Euclidean operator norm of a random matrix with i.i.d. mean zero entries to the Euclidean norm of its rows and columns. Later, Latala in [Latala 2005, [34]] gave the upper bound on the expectation or average value of largest singular value namely the norm of any random matrix whose entries are independent mean zero random variables with uniformly bounded fourth moment. The condition number, which is the ratio of the largest singular value over the smallest singular value of a matrix, is critical to the stability of linear systems. In [Edelman 1988, [17]], the distribution of the condition number of Gaussian random matrices, was particularly investigated in numerical experiments. As a typical example of subgaussian random matrices, the invertibility of Bernoulli random matrices was also studied. In [Tao and Vu 2007, [55]], the probability of Bernoulli random matrices to be singular is shown to be at most o 1 n, where n is the size of the matrices. Their result shows that the probability of the smallest singular value of Bernoulli random matrices to be zero is exponentially small as n tends to infinity.

16 3 The recent studies of the smallest singular value have also been motivated, in a large sense, by some open uestions or conjectures. In [Spielman and Teng 2002, [52]], the following conjecture was proposed in the International Congress of Mathematicians in Conjecture Spielman-Teng. Let ξ be Bernoulli random variable, in other words, Pξ = 1 = Pξ = 1 = 1 2. Then for all t > 0 and some 0 < c < 1. Ps n Mξ t n t + c n 1.2 In the breakthrough work on the estimate on the smallest singular value, [Rudelson and Vershynin 2008, [44]], Rudelson and Vershynin obtained the upper tail probabilistic estimate on the smallest value in l 2 -norm for suare matrices of centered random variables, with unit variance and appropriate moment assumptions. In particular, they proved the Spielman- Teng conjecture up to a constant. The lower tail probabilistic estimate on the smallest value in l 2 -norm for suare matrices was estimated in [Rudelson and Vershynin 2008, [43]]. These results have shown that the smallest singular value of the n n subgaussian random matrices is of order n 1 2 in high probability for large n. In a more explicit way, the distribution of the smallest singular value of random was given in [Tao and Vu 2009, [59]] by using property testing from combinatorics and theoretical computer science. The pregaussian matrices were used to recover sparse image in [Foucart and Lai 2010, [21]]. Very recently, Rudelson and Vershynin gave a comprehensive survey on the extreme singular values of random matrices in [Rudelson and Vershynin 2010, [46]]. In this thesis, we will study the probabilistic estimates on the largest -singular value defined in terms of the in l -uasinorm as s Ax 1 A := sup 1.3 x R N, x 0 x and the smallest -singular value defined accordingly see [49] for instance of pregaussian matrices, as they have become important to the l -approach in compressed sensing, because of the restricted isometry property see [6] and [21], that will be studied in later chapters.

17 4 Before proceeding, we would like to mention some notations we will use in this thesis. X = O Y denotes X < CY for some constant C > 0. As conventional in probability theory, P E denotes the probability that the event E occurs, and E X denotes the expectation of the random variable X. It is well-known that the classic singular value is defined in terms of l 2 -norm, then a natural uestion would be what if one defines the singular value by the l -uasinorm for 0 < 1 and l p -norm for p > 1. The first three chapters of the thesis will be devoted to the study of generalized singular values of pregaussian matrices, including Chapter 2 on the largest -singular value of pregaussian random matrices for 0 < 1, Chapter 3 on the smallest -singular value of pregaussian random matrices for 0 < 1, and additionally Chapter 4 on the largest p-singular value of pregaussian random matrices for p > 1. Our main results include the tail probability estimates on these generalized singular values of pregaussian matrices. There were some remarkable results by other researchers on the largest singular values of random matrices in the l 2 -norm. For example, Geman in [26] and Yin, Bai, and Krishnaiah in [65] showed that the largest singular value of random matrices of size m N with independent entries of mean 0 and variance 1 tends to m + N almost surely. However, we want to study the singular values of random matrices in terms of the l -uasinorm for 0 < 1, because there are some advantages of using general l -norm to study the singular value of random matrices, as suggested in [Foucart and Lai 2009, [20]] and [Foucart and Lai 2010, [21]]. In this thesis, we study the -singular values of random matrices whose entries are independent and identically-distributed copies of a pregaussian random variable, defined in terms of the l -uasinorm. We obtain the decay on the upper tail probability of the largest -singular value s 1 for all 0 < 1, defined in terms of the non-convex l -uasinorm, as the number of rows of the matrices becomes very large. This result is stated as Theorem Upper tail probability of the largest -singular value,0 < 1. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m N matrix

18 5 with i.i.d. copies of ξ in its entries, then for every 0 < < 1, P s 1 A Cm 1 exp C m 1.4 for some C, C > 0 only dependent on the pregaussian random variable ξ. We have also obtained the lower tail probability of the largest -singular value s 1 for all 0 < 1, based on a linear bound for partial binomial expansion. Theorem Lower tail probability of the largest -singular value, 0 < 1. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m N matrix with i.i.d. copies of ξ in its entries, then for every 0 < < 1 and any ε > 0, there exists K > 0 such that P s 1 A Km 1 ε 1.5 in which K only depends on, ε and the pregaussian random variable ξ. In particular, these estimates show s 1 A m 1 with high probability for m N pregaussian random matrix A. On the smallest singular value of random matrices, Rudelson and Vershynin first showed the following results in [43], Theorem Rudelson-Vershynin. If A is a matrix of size n n whose entries are independent random variables with variance 1 and bounded fourth moment. Then for any δ > 0, there exists ɛ > 0 and integer n 0 > 0 such that P s n A ɛ n δ, for all n n 0. Later, they proved the following theorem in [44]. Theorem Rudelson-Vershynin. Let A be an n n matrix whose entries are i.i.d. centered random variables with unit variance and fourth moment bounded by B. Then, for

19 6 every δ > 0 there exist K > 0 and n 0 which depend polynomially only on δ and B, and such that P s n A > n K δ, for all n n 0. For the smallest -singular value of an n n pregaussian random matrix, we have obtained the estimate on the lower tail probability in this thesis. Theorem Lower tail probability of the smallest -singular value, 0 < 1. Given any 0 < 1, and let ξ be the pregaussian random variable with variance 1 and A be an n n matrix with i.i.d. copies of ξ in its entries. Then for any ε > 0, there exists some γ > 0 such that P s n A < γn 1 < ε, 1.6 where γ only depends on, ε and the pregaussian random variable ξ. On the upper tail probability of the smallest -singular value, we have Theorem Upper tail probabilistic estimate on the smallest -singular value. Given any 0 < 1, and let ξ be the pregaussian random variable with variance 1 and A be an n n matrix with i.i.d. copies of ξ in its entries. Then for any K > e, there exist some C > 0, 0 < c < 1, and α > 0 only dependent on pregaussian random variable ξ,, such that P s n A > Kn 1 2 In particular, for any ε > 0, there exist some K > 0 and n 0, such that for all n n 0. C ln Kα K α + c n. 1.7 P s n A > Kn 1 2 < ε 1.8 However, we strongly believe the probabilistic estimate on the smallest singular value in 1.8 may be improved and conjecture that P s n A > Kn 1 < ε 1.9

20 7 under the assumptions in Theorem While extending the -singular value for 0 < 1 to p-singular value for p > 1, we would like to investigate the probability analysis on the p-singular value for p > 1 as well. In [36], the lower tail probability of the l p -norm of a seuence of independent, centered, Gaussian random variables was estimated. Very recently, on random matrices, the extremal singular values defined in terms of l 2 were studied in [43], [44], [45], [59], etc, as we mentioned earlier. However, the p-singular value, defined in terms of l p -norm for p > 1 will be the main topic in Chapter 4 of this thesis. In this respect, we have the following two theorems. Theorem Lower tail probability of the largest -singular value, p > 1. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m N matrix with i.i.d. copies of ξ in its entries, then for every p > 1 and any ε > 0, there exists γ > 0 such that P s p 1 A γm 1 p ε 1.10 in which γ only depends on p, ε and the pregaussian random variable ξ. The duality theorem on the largest p-singular value allows us to extend these results from 1 < p < 2 to p > 2 and vice versa, and thus we are able to obtain the probabilistic estimates on the growth rate of the largest p-singular value for p 1, including p =, as the size of the random matrices grows. Random matrices are often used as the sensing matrices in the optimization problem for compressed sensing. Foucart and Lai defined the new uasinorm x f, := N t i x i f t 1 f t N dt 1 dt N i= for any pregaussian distribution probability density function f and > 0 and introduced the -modified restricted isometry property, 0 < 1 based on the uasinorm. The extreme singular values of random matrices, including the largest singular value and smallest singular

21 8 value, become important for the restricted isometry property introduced in [6] and the generalized restricted property defined in [21], called -modified restricted isometry property. In Chapter 5 of this thesis, we study the modified the -modified restricted isometry property, 0 < 1. We prove that the pregaussian random matrices, whose entries are independent and identically-distributed pregaussian random variables see e.g. [10], with N 2 m N, have the -modified restricted isometry property with overwhelming probability. As a result, we show that every sparse vector can be recovered as a solution to the l -minimization problem with overwhelming probability if N 2 m N for 0 < 1. Theorem Suppose that A is an m N matrix whose entries are independent and identically-distributed copies of a symmetric pregaussian random variable with probability density function f. Then P 1 ε m x f, Ax 1 + ε m x f, 1 2e κmn 2 ε for any 0 < ε < 1 and 0 < 1, and some κ > 0 dependent of f and. Chapter 6 of this thesis is devoted to the null space property in compressed sensing, especially for multiple measurement vector problem MMV. In compressed sensing, we want to recover a sparse or compressible signal via solving the minimization problem, minimize x R N x 0 subject to Ax = b, 1.13 in which x 0 is the the number of non-zero entries of the vector x, namely the sparsity of x. Since x 0 can be approximated by x, one can use the l -approach see [24], [33] and [20], which is considering the following l -minimization problem with 0 < 1, minimize x R N x subject to Ax = b Foucart and Lai presented some numerical experimental results which indicate that the -method performs better than other available methods and a sufficient condition on the matrix of an underdetermined linear system which guarantees the solution of the system

22 9 with minimal -uasinorm cf. [20]. The l -method in unconstrained minimization can be also used to generate the sparse solution to underdetermined linear system. See [33]. These have motivated us to consider the null space property for l -minimization, as it characterizes the uniueness of the solution to the l -minimization problem. Instead of single measurement vector, multiple measurement vectors have been used in many fields of technology, for instance, neuromagnetic imaging [15] and communication channels see e.g. [14], as pointed out in [8]. In general situations, precisely, the multiplemeasurement-vector MMV problem, which is minimize X,p subject to AX = B, 1.15 for an m N matrix A and an m r matrix B, in which X,p := N j=1 X j p where X j is the j-th row of X, for p > 0 and > 0. 1, 1.16 In recent literature, there were many significant studies on the MMV problem. Berg and Friedlander in [8] studied the joint-sparse recovery from multiple measurements through the MMV problem, as an extension of the single-measurement-vector SMV problem They implemented the ReMBo Reduce MMV and Boost algorithm combined with l 1 - minimization, introduced and also performed in [39] and [40], based on the reduction to SMV problem. Some other researchers also studied the algorithmic approaches to solve the MMV convex optimization problem that is the MMV problem 1.15 with 1 for the recovery from multiple measurements. For instance, Malioutov, Cetin and Willsky in [38] used the interior point implementation to solve the MMV problem with p = 2 and = 1, in which 1.16 is neither linear nor uadratic, efficiently in a second-order cone SOC programming. Tropp in [61] developed another algorithmic approach called convex relaxation, by replacing l 0 -uasinorm by l 1 -norm, and it under certain conditions provides good solutions to MMV problem for p = and = 1. For = 1 and general p 1, Chen and Huo in [12] showed that

23 10 the orthogonal matching pursuit for MMV OMPMMV can find the sparsest solution to the MMV problem with computational efficiency. In addition to the above studies, Cotter, Rao and others in [15] extended the matching pursuit MP and Focal Underdetermined System Solver FOCUSS, introduced earlier in [22] for tomographic source reconstruction in neural electromagnetic inverse problems and designed to obtain sparse solutions by successively solving uadratic optimization problems cf. [64], to solve the MMV problem for p 1 and = 2. For the SMV non-convex l -minimization problem 5.49, Foucart and Lai in [20] compared the numerical experiments results by their l -algorithm with other algorithm, such as the regularized orthogonal matching pursuit, the l 1 -minimization, the reweighted l 1 - minimization, and the comparisons showed that l -method performs better. However, we will study the multiple-measurement-vector MMV non-convex optimization problem, that is minimize X,p subject to AX = B, 1.17 in which 0 < 1. The null space property has been used to uantify the error of approximations Cohen, Dahmen, and DeVore 2009, [13], and it also guarantees the exact recovery. The l 1 null space property is the sufficient condition for uniue recovery through l 1 -minimization. Foucart and Gribonval in [Foucart and Gribonval 2009, [19]] proved that the real null space and complex null space property are euivalent for the sparse recovery achieved by l 1 -minimization. They considered the convex optimization problem minimize z R N z 1 subject to Az = y in which A is a real-valued measurement matrix. But a real-valued measurement matrix is also a complex-valued one, they proved the following result

24 11 Theorem Foucart-Gribonval. For a measurement matrix A R m N and a set S {1, 2,, N}, the real null space property that u j < u j 1.19 j S c j S for all u 1, u 2,, u N T ker R A \ {0} is euivalent to the complex null space property, in view of ker C A = ker R A + i ker R A, that j S u 2j + v2j < j Sc u 2 j + v2 j 1.20 for all u 1, u 2,, u N T ker R A and v 1, v 2,, v N T ker R A with either of the vectors non-zero. Replacing l 1 -norm by l -uasinorm in the l 1 null space property, the condition is then called l null space property, that becomes the sufficient condition for uniue recovery through solving the l -minimization problem. On the other hand, the well-known null space property cf. [16] and [24] for the standard l 1 minimization in the setting of single measurement vector SMV has been extended to this setting of multiple measurement vectors MMV. In the study on these aspects, we show that the real null space and complex null space property are euivalent for the sparse recovery achieved by l -minimization and more generally that for the joint-sparse recovery from multiple measurements, which answer the open uestions raised in [19]. Berg and Friedlander in [8] showed the null space property for recovery from multiple measurements via l 1 -minimization. In [8], the following result is proved. Theorem Berg-Friedlander. Let A be a real matrix of m N and S {1, 2,, N} be a fixed index set. Denote by S c the complement set of S in {1, 2,, N}. Let be any norm. Then all x k with support x k in S for k = 1,, r can be uniuely recovered using the following N minimize x k R N { x 1,j,, x r,j : subject to Ax k = b k, k = 1,, r} 1.21 k = 1,, r j=1

25 12 if and only if all vectors u 1,, u r NA r \{0, 0,, 0} satisfy the following u 1,j,, u r,j < u 1,j,, u r,j, 1.22 j S c j S where NA stands for the null space of A. We extend their theorem on the null space property from l 1 to l, 0 < 1, for recovery from multiple measurements. As a conseuence of the extension, we give a necessary and sufficient condition for the uniueness of the solution to the multiple-measurement-vector MMV non-convex optimization problem. Precisely, we prove the following Theorem Null space property for MMV recovery via l -minimization. Let A be an m N matrix and S {1, 2, N} be an index set. Then for any 0 < 1, all X 0 R N r with the support of the rows of X 0 contained in S can be uniuely recovered via solving the MMV non-convex optimization problem 1.17 for p = 2 if and only if u S < u S c for all u ker A \ {0}. The above theorem characterizes the uniueness of the solution to the MMV non-convex optimization problem 1.17, and it would simplify the algorithm checking the uniueness of the solution to or the exact recovery through the MMV non-convex optimization problem. In fact, we consider a joint recovery from multiple measurement vectors via minimize N j=1 x 2 1,j + + x2 r,j : subject to Ax 1 = b 1,, Ax r = b r 1.23 for a given 0 < 1, where x k = x k,1,, x k,n T R N for all k = 1,, r, and this is actually 1.17 for when p = 2. Written as euivalent conditions, the theorem on the exact recovery we mainly prove is the following Theorem Let A be a real matrix of size m N and S {1, 2,, N} be a fixed index set. Fix p 0, 1] and r 1. Then the following conditions are euivalent: a All x k with support in S for k = 1,, r can be uniuely recovered using 1.23;

26 13 b For all vectors u 1,, u r NA r \{0, 0,, 0} j S c For all vector z NA with z 0, where z = z 1,, z N T R N. u 2 1,j + + u2 r,j < u 2 1,j r,j + + u2 ; 1.24 j S c z j < z j, 1.25 j S c j S Our theorem can be applied to develop algorithms, as condition c significantly reduces the complexity of verification of condition b for uniue recovery from multiple measurement vectors by using 1.23.

27 Chapter 2 Probabilistic Estimate on the Largest -singular Values of Pregaussian Random Matrices 2.1 Preliminaries In this section, we are going to give an introduction on some basics on the Gaussian random variable, subgaussian random variable, and pregaussian random variable. There are many references on these standard terminologies, such as [9], [10], [21], and [23]. The Gaussian random variable is well-known as a basic random variable widely used in the probability and statistics studies, and is also an important random variable because of the central limit theorem in probability theory. A symmetric Gaussian random variable X obeys the standard normal distribution, p X x = 1 2πσ e x2 2σ where σ 2 is the variance, and its moment generating function is M X t = E e tx = e tx p X xdx = e σ2 t Subgaussian random variable is more general than Gaussian random variable. Definition Subgaussian random variable. A random variable X is called subgaussian, if its moment generating function satisfies M X t e σ2 t for all t R and some σ 0. 14

28 15 A typical example of subgaussian random variable is the centered Bernoulli random variable. Example Bernoulli random variable. A Bernoulli random variable X with mean 0 and variance 1 is a discrete random variable with probability mass function 1 p X x =, x = , x = 1. 2 The moment generating function of X M X t = E e tx = 1 2 e t + e t = k=0 t 2k 2k! t 2k k! = et2, 2.5 k=0 so Bernoulli random variable X is subgaussian. More generally, pregaussian random variable is defined as Definition Pregaussian random variable. A random variable X is called Pregaussian if there exists some T > 0 such that the moment generating function of X satisfies M X t e σ2 t for all t [ T, T ] and some σ 0. Pregaussian random variable can be also characterized by the growth rate of its moments. Proposition Pregaussian random variable and growth rate of moments, [10]. A random variable X is pregaussian if and only if E X 2k 2k!λ 2k 2.7 for some constant λ > 0. Let us also see an example for pregaussian random variable.

29 16 Example Laplacian random variable. A Laplacian random variable X with mean 0 has its probability distribution function p X x = 1 2λ exp x λ for some λ > 0, and E X 2k = Γ 2k + 1 λ 2k = 2k!λ 2k. 2.8 The Bernstein ineuality is useful in obtaining estimates on the tail probability of the sum of independent variables, provided the growth rate of the moments of the variables are bounded appropriately. Theorem Bernstein s ineuality, [10]. Let X 1,, X n be independent random variables. Suppose that EX i = 0 and E X k i EX2 i 2 Hk 2 k!, 2.9 for every positive integer k and some H > 0, i = 1,, n. Then for all t > 0. P { } n X i t i=1 < 2 exp t 2 2 th + n i=1 EX2 i, 2.10 Remark In particular, let X 1,, X n be independent pregaussian random variables, then they satisfy the assumptions of Bernstein ineuality. Indeed, by E X k i E X 2k i 1 2, Proposition implies 2.9, and using the Taylor expansion of the exponential function in E e tx i, one can deduce EXi = 0, i = 1,, n. 2.2 Linear Combination of Pregaussian random variables Let s consider the linear combination of pregaussian random variables in this section. Lemma Linear combination of pregaussian random variables. Let a ij, i = 1, 2,, m and j = 1, 2,, N, be pregaussian random variables and η i = N j=1 a ijx j, then η i are pregaussian random variables for i = 1, 2,, m.

30 17 Proof. Since a ij are pregaussian random variables, then Ea ij = 0 and there are constants λ ij > 0 such that E a ij k k!λ k ij for i = 1, 2,, m and j = 1, 2,, N, then Eη i = N x j Ea ij = j=1 and using the convexity of the function x x k for positive integer k, we have E η i k = E N j=1 a k ijx j x j k x k 1 E N j=1 a x ij 1 x k N xj 1 j=1 E a x ij k Thus the expectations of the pregaussian random variables a ij give that E η i k x k 1 for all integers k 1. N j=1 k x j k!λ k ij = x k 1 x k! max λ ij = k! j 1 k x 1 max λ ij 2.13 j Note that the product of pregaussian random variables may not be pregaussian. For instance, let s consider the suare of a pregaussian random variable. Let ξ i := η 2 i. However, ξ i is not necessary to be pregaussian. For example, let η i be Laplace random variables with probability density functions p ηi η i = 1 exp η i λ i λ i η i is pregaussian, as we know from the previous example. However, the k-th moment of ξ i E ξ i k = E η i 2k = Γ 2k + 1 λ 2k i = 2k! λ 2 k i It has a growth pattern different from k!λ k for any λ > 0, as k!λ k = o 2k! λ 2 i k when k goes to, that one can see by Stirling s formula in one way. In another way, the momentgenerating function of ξ i is E exp tξ i = E exp tη 2 i = 1 exp tηi 2 η i dη i = λ i λ i

31 18 for any t > 0. By the definition or the property of pregaussian random variable, we know that ξ i is not pregaussian, as its moments grow faster than the moments of any pregaussian random variable. 2.3 The Largest -Singular Value of Pregaussian random matrices Largest -singular Values and Their Properties One can define the largest -singular value by { } s Ax 1 A := sup : x R N with x x for an m N matrix A. Firstly, one can derive the following Lemma For 1, 2.17 defines a norm on the space of m N matrices and max a j 1 j N s 1 A N 1 max a j 1 j N, 2.18 in which a j, j = 1, 2,, N, are the column vectors of A. For 0 < < 1, 2.17 defines a uasinorm on the space of m N matrices and s 1 A + B s 1 A + s 1 B for any m N matrices A and B, moreover, s 1 A = max 1 j N a j. Proof. It follows that s 1 A, 1, defines a norm from the known norm l for 1. Moreover, by the Minkowski ineuality and discrete Hölder ineuality, when > 2, Ax = N j=1 x ja j N j=1 x j a j x p N j=1 a j 1 x p N 1 max1 j N a j x N 1 p max1 j N a j 2.19 for all x R N, in which 1 p = 1, i.e. p =, and likewise, when 1 < 2, Ax N j=1 x j a j x 1 max a j 1 j N x N 1 p max a j 1 j N 2.20

32 19 for all x R N. Thus Ax x N 1 max a j 1 j N 2.21 for 1, which yields s 1 A N 1 max 1 j N a j. Choosing x to be the standard basis vectors of R N gives us max 1 j N a j s 1 A. The latter part of the claim follows from the fact that l for 0 < < 1 is a uasinorm and x + y x + y for all x, y RN if 0 < < 1. Similarly to the case when 1, we have for 0 < < 1, Ax N j=1 x j a j x max 1 j N a j, 2.22 which implies s 1 A max 1 j N a j. On the other hand, we also have max 1 j N a j s 1 A. Thus s 1 A = max 1 j N a j in the case of 0 < < 1. Remark In particular, if = 1, 2.18 implies that s 1 1 A = max j a j 1, but if =, 2.18 implies max 1 j N a j s 1 A N max 1 j N a j. To have a better estimate on s 1, we need to consider each component N j=1 x ja i,j, where a i,j m N := A, in the column vector N j=1 x ja j. In this way, N Ax = max x j a i,j max i x j max 1 j N j=1 1 i m j=1 N a i,j = x max 1 i m N a i,j, j= and one can choose a vector x 0 consisting of ±1 such that Ax 0 = x 0, so s 1 A = max i N j=1 a i,j. The example of s 1 suggests that it would be more advantageous to use row vectors instead of column vectors in estimating s 1, when is very large. On the other hand, we can see that s 1 1 A = s 1 A T. In general, for the relation between s 1 and s p 1, 1 p + 1 following lemma on the general rectangular matrix. Lemma For any 1 and m N matrix A, = 1, 1, one can deduce the s 1 A = s p 1 A T 2.24

33 20 in which 1 p + 1 = 1. Proof. By the discrete Hölder ineuality, we know that if 1 p + 1 = 1 then Ax, y Ax 2.25 for all x R N and y R m with y p = 1, and furthermore the euality holds for some y 0 with y 0 p = 1. Thus Ax = By the definition of the largest -singular value, sup Ax, y y R m, y p =1 s 1 A = sup x R N, x =1 Ax = sup x R N, x =1 sup y R m, y p =1 Ax, y In the same way, we have s p 1 A T = sup sup y R m, y p =1 x R N, x =1 Ax, y Finally, using Ax, y = A T y, x and exchanging taking the supremums, one can get s 1 A = s p 1 A T. Remark In the study of the distribution of the largest -singular value of suare random matrices when 2, the above lemma allows us to use the the distribution of the largest singular value s p 1 for 1 p Some Estimates on the Upper Tail Probability of the Largest - singular Value The distribution of the singular values of random matrices has been an interesting topic in recent years, especially for the two extremal singular values, the largest one and the smallest one, for Gaussian random, subgaussian random matrices, etc, see for instance, [48], [44], [59]. Here we are going to study the largest -singular value for m N matrix A are i.i.d.

34 21 copies of a symmetric pregaussian random variable as m and N are large. Theorem tells us that where x f, := Ax P 1 + ε 1 1 m 2e κmn 2 ε 2, 2.29 x f, N t i x i f t 1 f t N dt 1 dt N i=1 1, 2.30 for any 0 < ε < 1 and 0 < 1, and some κ > 0 dependent of f and. To get an estimate on the largest -singular value, one needs to use the finite cover of the unit sphere in l and the union bound for probability. First we need to convert the probabilities involving x f, into those containing x instead. Lemma Given 0 < 1, for every 0 < ε < 1, Ax P C ε 1 1 m 2e κmn 2 ε x for some κ > 0 dependent of f and. Proof. By Lemma and 5.40, we know x f, C 1 x 2 C 1 x, 2.32 which gives It follows that Ax P x f, 1 + ε 1 1 m Ax C x f, = P 1 P Ax x C 1 Ax x Ax 1 + ε 1 1 x m C ε 1 1 m, 2.34 and then by 2.29 we obtain Ax P C ε 1 1 m 2e κmn 2 ε 2, 2.35 x where C = 2σ π Γ +1 2 see 5.40.

35 22 The cardinality of a finite cover of the unit sphere in l for 1 and more generally any norm on R N are estimates in [41], and for l and more generally f, are given in [21]. It allows us to derive the following Theorem Suppose that A is an m N matrix whose entries are i.i.d. copies of a pregaussian random variable with an even probability density function f. Then for any given 0 < 1, the tail probability for the largest -singular value P s 1 A C ε 1 1 m 2.36 exponentially decays for any ε > 0, provided that m N Proof. By Lemma 2.2 in [21], there exists a finite set U S, in which S is the unit ball in R N in l -norm such that for all x S and min x u u U ε 2.37 cardu ε N Then for any x S there is some u 0 U, such that x u 0 ε. Therefore which yields s 1 A = sup x S Ax sup u U Au + ε s 1, A, 2.39 s 1 A 1 ε 1 supu U Au Therefore by 2.38, Lemma 2.3.5, and the union probability, P s 1 A C 1 1+ε 1 m 1 1 ε Psup u U Au C ε 1 1 m u U P Au C ε 1 m But ε N 2N e ε, thus P s 1 A C ε 1 ε ε N 2e κmn 2 ε 2. 1 m 1 2 exp κmn 2 ε 2 + 2N ε

36 23 Thus if m N 1+ 2 then P s 1 A C ε 1 ε 1 m 1 = P s 1, A C ε 0 m for ε 0 > 0 exponentially decays. Remark In [21] it is shown that for 0 < 1 3, P Ax x f, 1 + ε 1 m 1 2e κmε 2, 2.44 by the same method of using the finite covering of the unit ball of l, we have P s 1 A C ε 1 1 m 2e cmε for some c > 0 dependent of f and for 0 < Estimate on the Upper Tail Probability of the Largest -singular Value in General To obtain estimates on the upper tail probability of the largest -singular value for general 0 < 1, let us start with the random matrices which are Bernoulli ensemble. Theorem singular value of Bernoulli ensemble. Let ξ be the Bernoulli random variable normalized to have mean 0 and variance 1, and A be an m N matrix with i.i.d. copies of ξ in its entries, then s 1 A = m 1 for all 0 < 1. Proof. That is because we know from Lemma that s 1 A = max j a j for 0 < 1 and we have a j = m 1 for all j since every entry of A is Bernoulli with mean 0 and variance 1. More generally, for pregaussian random variables, we need to consider the distribution or the probability properties of m i=1 a ij 1 when aij are i.i.d. copies of a pregaussian random variable. The following theorem on = 1 follows from the Bernstein ineuality and the estimate of the probability of pregaussian random variables on the domain outside of the m-dimensional cross polytopes, which are also the balls in the l 1 -norm on R m.

37 24 Theorem Upper tail probability of the largest 1-singular value. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m m matrix with i.i.d. copies of ξ in its entries, then P s 1 1 A Cm exp C m 2.46 for some C, C > 0 only dependent on the pregaussian random variable ξ. Proof. Since a ij are i.i.d. copies of the pregaussian random variable ξ, then Ea ij = 0 and there exists some λ > 0, such that E a ij k k!λ k for all k. Since a ij has variance 1, then Ea 2 ij = 1. Therefore E Ea a k 2 ij ij 2 Hk 2 k! 2.47 for H := 2λ 3 and all k 2. By the Bernstein ineuality, we know P n a ij t i=1 2 exp for all t > 0. In particular, when t = Cm, n P a ij Cm j=1 t 2 2 m + th 2 exp in which a condition on C will be determined later. On the other hand, by Lemma 2.3.1, s 1 1 A = max a j 1 = j = 2 exp C2 m 4Cλ t 2 2 m + 2tλ , 2.49 m a ij for some j 0. Furthermore, for any t > 0, by the probability of the union, m m P a ij0 t ɛ i a ij0 t But a ij0 i=1 i=1 ɛ 1,,ɛ m { 1,1} m P has the same pregaussian properties as a ij0, precisely, E a ij0 = 0, E a ij0 k i=1 k!λ k. Thus we have P s 1 1 A Cm m P m i=1 a ij 0 Cm 2 m 1 mp n j=1 a ij 0 Cm 2 m exp C2 m + ln m 4Cλ 3 +2 C = exp 2 ln 2e m. 4Cλ

38 To have an exponential decay for the probability P s 1 C 1 A Cm, we reuire that 2 4Cλ 3 +2 ln 2e > 0, for which Finally, choosing C = C > 2λ 3 ln 2e + C2 ln 2e, we get Cλ ln 2e 2 λ ln 2e The previous theorem allows us to estimate the largest -singular value for 0 < < 1. Theorem Upper tail probability of the largest -singular value, 0 < < 1. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m m matrix with i.i.d. copies of ξ in its entries, then for any 0 < < 1, P s 1 A Cm 1 exp C m 2.54 for some C, C > 0 only dependent on the pregaussian random variable ξ. Proof. By the discrete Hölder ineuality, m a j = m a ij m 1 a ij = m 1 a j i=1 for any 0 < < 1, which implies a j m 1 1 a j 1. It follows that s 1 A = max j i=1 a j m 1 1 max a j j 1 = m 1 1 s 1 1 A From 2.46, we have for some C, C > 0. P s 1 A Cm 1 P m 1 1 s 1 1 A Cm 1 = P s 1 1 A Cm exp C m The Lower Tail Probability of the -singular Value For the lower tail probability of s 1 A, we know about it for the Bernoulli ensemble from Theorem 2.3.8, and can derive the exponential decay for P s 1 A C 1 1 ε 1 1 m 2.58

39 26 for random matrices whose entries are independent and identically-distributed copies of a pregaussian random variable for any ε > 0 and 0 < < 1 3 by using finite covering of the unit sphere in l and moreover the decay for 0 < < 1 under some conditions analogue to the ones Theorem that we imposed. But to attain a lower tail probability similar to Theorem for all 0 < < 1, the method we used in the proof of Theorem becomes somewhat infeasible. However, we can still give similar estimates on the lower tail probability with some different techniues. Lemma Linear bound for partial binomial expansion with even integer power. For every positive integer n, for all x [0, 1]. 2n k=n+1 2n k x k 1 x 2n k 8x 2.59 Proof. For every x [ 1, 1], we have 8 2n k=n+1 2n x k 1 x 2n k k 2n k=0 2n k x k 1 x 2n k = 1 8x then But for x [ 0, 1 8], let f x := 2n k=n+1 2n k x k 1 x 2n k 2.61 f x + f 1 x + 2n k x n 1 x n = 1, 2.62 and in particular, f 1 = n 2n k=n+1 2n k By De Moivre-Stirling s formula see e.g. [18] and furthermore the estimate in [42], n! = n n 2πn e λ n, 2.64 e

40 27 1 in which < λ 12n+1 n < 1. Therefore, 12n 2n n = 2π 2n 2n e 2n e λ 2n 2πn n e n e λn 2 = 4n πn e λ 2n 2λ n n πn e 1 36n 24n12n+1 4n πn. Since 2n k 2n n for n + 1 k 2n, then f x 2n 2n n x k 1 x 2n k 2n 2n n x k n 2n n x n k=n+1 k=n+1 for all x [0, 1]. Using 2.65, we then have Now let g x := 4 n n π xn, then Thus if x [ 0, 1 8], f x 4 n n π xn ln g x = n ln 4x ln n π ln g x 1 2 ln n n ln ln π n n ln Hence, we obtain 0. f x xg x x 8x 2.70 for all x [ 0, 1 8]. Remark The coefficient in the linear bound can be definitely improved, because one can give sharper estimates for 2.69 on an interval with the right endpoint larger than 1 8 and thus the coefficient in the linear bound will be less than 8. But for our purpose of using it to estimate the probabilities later, the linear bound obtained will be sufficient.

41 28 Considering odd integers together, we have in general Lemma Linear bound for partial binomial expansion. For every positive integer n, n n k x k 1 x n k 8x 2.71 k= n 2 +1 for all x [0, 1]. To prove this lemma, we can modify the proof of Lemma Proof. It suffices to show that for every positive integer n, for all x [0, 1]. 2n+1 k=n+1 2n + 1 k x k 1 x 2n k+1 8x 2.72 Same as 2.60, 2n+1 k=n+1 2n + 1 k x k 1 x 2n k+1 1 8x for all x [ 1 8, 1]. Let f 1 x := 2n+1 k=n+1 2n + 1 k x k 1 x 2n k+1, then f 1 x n 2n + 1 x n+1 n 2n + 2 n + 1 n + 1 x n for x [0, 1], similar to Taking advantage of 2.65, we have 2n + 2 n n+1 π n It follows that f 1 x 4n+1 n π n + 1 x n

42 29 Let g 1 x := 4n+1 n x n, then πn+1 ln g 1 x = n + 1 ln 4 + n ln x + ln n 1 2 ln n ln π n ln 4x ln n ln ln π 2.77 n ln 4x + 1n + ln 4 1 ln π. 2 2 Thus if x [ 0, 1 8], ln g 1 x 1 2 n n ln 2 + ln ln π ln ln π 2.78 ln 8. So for all x [ 0, 1 8], and that completes the proof. The above lemma can be applied to estimate probabilities. f 1 x = xg 1 x 8x 2.79 Lemma Suppose ξ 1, ξ 2,, ξ n are i.i.d copies of a random variable ξ, then for any ε > 0, P n i=1 ξ i nε 8P ξ ε Proof. First, we have the relation on the probability events that { } n ξ 1, ξ n : ξ i nε 2 is contained in i= n { ξ1, ξ n : ξ i1 ε, ξ ik ε, ξik+1 > ε, ξin > ε } := E 2.82 k= n 2 +1 where {i 1, i 2,, i k } is a subset of {1, 2, n} and {i k+1,, i n } is its complement. Let x = P ξ 1 ε, then by the union probability, n P E = n x k 1 x n k, 2.83 k k= n 2 +1

43 30 and applying Lemma , we have P E 8x = 8P ξ 1 ε Since the event 2.81 is contained in the event 2.82, n P ξ i nε P E 8P ξ 1 ε i=1 Now we are ready to give a lower tail probability for the 1-singular values. Theorem Lower tail probability of the largest 1-singular value. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m N matrix with i.i.d. copies of ξ in its entries, then for any ε > 0, there exists K > 0 such that P s 1 1 A Km ε 2.86 in which K only depends on ε and the pregaussian random variable ξ. Proof. Since a ij is pregaussian with variance 1, then for any ε > 0, there is some δ > 0, such that But we know P a ij δ ε s 1 1 A = m a ij for some j 0. Therefore, by Lemma , P s 1 1 A δ m 2 m = P a ij0 mδ 8P a ij δ ε Thus letting K = δ, we obtain For general 0 < < 1, we have i=1 i=1

44 31 Theorem Lower tail probability of the largest -singular value. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m N matrix with i.i.d. copies of ξ in its entries, then for any ε > 0, there exists K > 0 such that P s 1 A Km 1 ε 2.90 in which K only depends on, ε and the pregaussian random variable ξ. Proof. We can use the same method used in the proof of Theorem First, for any ε > 0, there is some δ > 0, such that P a ij δ ε Therefore, By Lemma 2.3.1, for some j 0. Then by Lemma , 1 P s δ 1 1 A m = P 2 s 1 A = m Thus let K = 1 δ, then 2.90 follows. 2 i=1 m i=1 a ij a ij0 mδ 8P a ij δ ε Remark From Theorem and Theorem , we know that s 1 A m 1 in probability for m m pregaussian random matrix A.

45 Chapter 3 Probabilistic Estimate on the Smallest -singular Values of Pregaussian Random Matrices In this section, we will derive some probability tail bounds for the smallest singular value for pregaussian random matrices in l 2 -norm first, and then we ll make some generalizations to the smallest singular value for pregaussian random matrices l. 3.1 Singular Value of Pregaussian Random Matrices in l 2 As we know, pregaussian random variable is a more general type of random variables than subgaussian random variable. A subgaussian random variable ξ has its moment generating function Ee tξ = e Ot2 3.1 for all t,. By relaxing the condition on the domain of t, one can define a pregaussian random variable. Specifically, 3.1 holds on a bounded and and centered interval for a pregaussian random variable. As pointed out by Buldygin and Kozachenko in [10], pregaussian random variable has the following property Lemma If ξ is a pregaussian random variable, then for any p > 0, E ξ p <, and Eξ = 0. Proof. By the definition of pregaussian random variable, Ee tξ = e Ot2 for all t [ T, T ] for some T > 0, then we have Ee T ξ Ee T ξ + Ee T ξ = 2e OT 2 <,

46 33 Let n := p + 1. Then applying Hölder s ineuality we have E ξ p E ξ p n p p n E1 n n p n p n = E ξ n p n 3.3 But it follows from 3.2 that E ξ n n! Ee T ξ <. So E ξ p < for any p > 0. T { n } Let L = inf c > 0 : Ee tξ e cλ2, t [ T, T ], then Ee tξ e Lt2 3.4 for all t [ T, T ]. However, d dt t=0ee tξ = Eξ gives that Ee tξ = 1 + teξ + oλ 3.5 for small λ, and by the Taylor expansion, e Lt2 = 1 + Lt 2 + ot Thus 3.4 implies Dividing 3.7 by t > 0 and taking t 0 +, we get Eξ = 0. teξ + ot Lt 2 + ot Thus ξ has mean 0, and the moment bound condition is satisfied sufficiently for pregaussian random variable ξ. In [59], Tao and Vu proved the following theorem on the universality for the least singular value, Theorem Tao-Vu. Let ξ be R-normalized, and suppose E ξ C 0 < for some sufficiently large absolute constant C 0. Then for all t > 0, we have Pnσ n M n ξ 2 t = t x 2 x x e x/2+ dx + On c 3.8 where c > 0 is an absolute constant. The implied constants in the O notation depend on E ξ C 0 but are uniform in t.

47 34 By Lemma 3.1.1, we know E ξ C 0 < if ξ is pregausssian. Additionally normalizing ξ to have variance 1, by the theorem of the distribution of smallest singular values, we have the following Corollary Let ξ be pregaussian random variable with variance 1 and M n ξ be n n matrix whose entries are i.i.d. copies of ξ. Then Ps n M n ξ > t n = e t2 /2 t + On c 3.9 for all t > 0 and some absolute constant c > The -singular Values of an m n Matrix Now let s consider in general the k-th -singular value first. The k-th -singular value is defined in [49] as Definition The k-th -singular value of an m n matrix A, { } } s k {sup A := inf Ax : x V \ {0} : V R n, dim V n k x Remark Since the norm of A restricted on a subspace is less or eual to the norm of A restricted on another subspace which contains the first subspace, Also, it is easy to see that s k A = inf sup V R n, dimv =n k+1 x V, x =1 Ax s 1 A s 2 A s minm,n A The two extremal -singular values, s 1, and s minm,n, are of special interest in various studies, for which the largest -singular value has been discussed earlier and the smallest -singular value will be studied in the following sections. If m n, then the n-th -singular value is the smallest -singular value, which can also be expressed in another way.

48 35 Lemma Let A be a m n matrix with m n, then the smallest -singular value { } Ax s n A = inf : x R n with x x Proof. By the definition { { s Ax } } n A = inf sup : x V \ {0} : V R n, dim V 1 x { { Av } } inf sup : v V \ {0} : V = span x : x R n \ {0} v { Ax } = inf : x R n with x 0. x 3.14 We also know the infimum can be achieved by considering the unit l -sphere in the finite dimensional space, and so the claim follows. Remark In particular, if A is an n n invertible matrix, then { s Ax } n A = inf : x R n with x 0 x 1 = { A 1 x sup x :x R n with x 0 } 3.15 = 1 s 1 A 1. In compressed sensing, we usually have a matrix with dimension m N, m N, in the underdetermined system. So let us make a remark on the singular value of a m n matrix when m < n. Remark If m < n, in some literature, for instance [49], it is put that k runs from 1 through n. In fact, in this case, by the definition of k-th -singular value we know that s k,2 = 0 for k = m+1,, n because there are n m linearly independent vectors in the null space of the m n matrix with m < n, which can span an n k + 1-dimensional subspace of R n for k = m + 1,, n. On the other hand, the diagonal matrix in the singular value decomposition of an m < n matrix with m < n in l 2 does not have the diagonal entries s k,2, k = m + 1,, n, so here we assume that the smallest -singular value is the m-th -th singular value but not the n-th -th singular value for the case m < n. There are some properties on the smallest singular value s minm,n,, which are given in the following lemmas.

49 36 Lemma Full rank. An m n matrix A has full rank, if and only if s minm,n A > 0. Proof. If m n, by Lemma 3.2.3, we have But S := s minm,n { } x R n : x = 1 A = s n A { Ax } = inf : x R n with x 0 x { } = inf Ax : x = is compact and the map x Ax is continuous, then Ax achieves its minimum at some x 0 S. Thus because A has rank n in this case. s minm,n A = Ax 0 > If m < n, then the null space of A, denoted as N, has dimension n m since A has rank m in this case. Therefore for any V R n with dim V n m + 1, we have dim N + dim V n n m + n m + 1 = n + 1 > dim R n By the inclusion-exclusion principle, we know that there a vector x V V with x V = 1 such that x V N V S and so Ax V > 0. Hence by the definition of the m-th singular value, s minm,n A = s m A { } inf Ax V : V R n, dim V n m + 1 } inf { Ax : x N S > since N S is compact. For the converse, assume that the m n matrix A has rank not larger than min m, n 1, then the null space N has dimension not less than n min m, n + 1. Hence { } Ax sup : x V \ {0} = x and dim V n min m, n + 1 when V s minm,n A = 0. So if s minm,n A > 0 then A has full rank. is the null space. Thus by the definition

50 37 Furthermore, in matrix approximation theory, the singular values of a matrix are related to the matrices of a certain rank closest to it. Theorem Schmidt-Mirsky, [2]. Let s 2 k A be the k-th singular value of an m N matrix A in l 2. Then We have shown that max j a j min A B 2 = s 2 k+1 A rankb=k s 1 A n 1 max j a j in Lemma for the largest -singular value for an m n matrix A. Nevertheless, the smallest -singular value does not rely much on a single column or row of the matrix, though we have the very rudimentary estimate s minm,n A min j a j since one can just choose x to be the basis vectors of R N such that Ax = min j a j and then choose the subspace in R n spanned by x and other m n linearly independent null vectors if m n. Analogue to Lemma 2.3.3, for the smallest -singular value, we have Lemma For any 1 and n n matrix A, in which 1 p + 1 = 1. s p n Proof. We consider the following two cases. One case is s p n A = s n A T 3.22 A = 0. By Lemma 3.2.6, we know that A is not full rank, and then so A T = 0. does A T. Applying Lemma again, s n The other case is s p n A > 0, then A is invertible, by Remark and Lemma 2.3.3, we know the claim holds. An immediate corollary of this lemma for rectangular matrices is Corollary For any 1 and m n matrix A which has n m columns if m n or m n rows if m n are zeroes, in which 1 p + 1 = 1. s p minm,n A = s minm,n A T 3.23

51 38 Proof. The claim will follow from the lemma, by using the natural embedding of R minm,n into R maxm,n and the natural projection from R maxm,n onto R minm,n. Indeed, without loss of generality, we assume that m n and the last min m, n = m columns are zeroes. Let A 1 be the submatrix of A formed by the first m columns. Because of the compactness of the sphere of unit l p -norm and Lemma 3.2.8, there are some x R m with x 0 p = 1 and y R m with y 0 p = 1 such that A 1 x 0 p = inf A 1x p = x R m, x p =1 inf y R m, y =1 A T 1 y = A T 1 y Now let x 0 R n be the extended vector whose first m components are those of x and other components are zeroes, and R m be the subspace of R n consists of the vectors whose first m components are zeroes, and take V 0 := span x 0 R m. Since for any ζ R m, A x 0 + ζ p x 0 + ζ p = A x 0 p x 0 + ζ p A x 0 p x 0 p = A x 0 p = A 1 x 0 p, 3.25 then sup x V0, x p =1 Ax p = A 1 x 0 p. For any V R n with dim V = n m + 1, there is some x V with x p = 1 whose last n m components are zeroes, because of the dimensions, then we know sup Ax p Ax p = A 1 x p A 1 x 0 p 3.26 x V, x p =1 where x is the vector in R m whose m components are the first m components of x. Thus s p m A = inf sup V R m,dimv =n m+1 x V, x p =1 Ax p = A x 0 p On the other hand, A T y = A T 1 y for any y R m with y = 1. Hence s m A T = inf y R m, y =1 A T y = by 3.24, and then the claim follows. inf y R m, y =1 A T 1 y = A T 1 y 0 = A 1 x 0 p, 3.28 For general rectangular matrices, we have the duality property for all 2-singular values.

52 39 Theorem Adjoint. For any m n matrix A, s 2 k A = s 2 k A T 3.29 for k = 1, 2,, min m, n. Proof. Without loss of generality, we assume m n. If A is diagonal with λ 1, λ 2,, λ m, λ 1 λ 2 λ m 0, on its diagonal. Let {e i : i = 1, 2,, n} be the standard basis for R n, and V 0 be the subspace spanned by {e i : i = k, k + 1,, n}. For any v := n i=k t ie i V 0 and v 0, m m Av 2 v 2 = i=k λ2 i t2 i n i=k t2 i i=k λ2 k t2 i n i=k t2 i m i=k = λ t2 i k n i=k t2 i λ k Particularly, Ae k 2 e k 2 = λ k. So we have sup Av 2 = λ k v V 0, v 2 =1 Let V 1 be the subspace spanned by {e i : i = 1, 2,, k}, then for any n k + 1- dimensional subspace V in R n, there exists some nonzero vector v V 1 V, since dim V + dim V 1 = n + 1. Thus there are t i R, i = 1, 2,, k, which are not all zero, such that v = k i=1 t ie i. Then we have sup Av 2 Av 2 = v V, v 2 =1 v 2 k i=1 λ2 i t2 i k i=1 t2 i k i=1 λ2 k t2 i k i=1 t2 i = λ k So s 2 k A = λ k. Similarly, we have s 2 k A T = λ k, since A T is also diagonal with λ 1, λ 2,, λ m on its diagonal. Hence, s 2 k A = s 2 k A T for k = 1, 2,, min m, n if A is diagonal. For general matrix A of size m n, let UΛV T := A be the singular value decomposition of A. Replace all the subspaces in the above argument by the image of the subspaces transformed by the orthogonal matrix V, and then the claim follows, since the orthogonal transformation U preserves the l 2 -norm of every vector.

53 40 For general p 1 and general rectangular matrices, we have the following duality theorem. Theorem Duality. For any p 1 and m n matrix A, s p minm,n A = s minm,n A T 3.33 in which 1 p + 1 = 1. Proof. Without loss of generality, we assume m < n, since we have proved the case of m = n in Lemma If s p minm,n A = 0, then s minm,n A T = 0 by Lemma If s p minm,n A 0, then A is of full rank, by Lemma Assume p > 1 first. Since the spaces we are considering are finite dimensional, there exists some w 0 R m with w 0 = 1, such that By Hölder s ineuality, A T w 0 = inf w R m, w =1 A T w A T w 0 A T w 0, v 3.35 for all v R n with v p = 1, and there is some v 0 R n with v 0 p = 1 such that A T w 0 = A T w 0, v Now take V 0 to be the direct sum of span {v 0 } and ker A, that is an n m + 1-dimensional subspace in R n since ker A is n m-dimensional, as A has full rank. Thus it follows from 3.35and 3.36 that for any z ker A, A T w 0, v 0 A T v 0 + z w 0, = v 0 + z p But we also know 1 v 0 + z p A T w 0, v 0 + z A T w 0, v 0 + z = w 0, A v 0 + z = w 0, Av 0 = A T w 0, v 0, 3.38

54 41 thus in other words, v 0 + z p 1. Therefore we have A T w 0, v 0 1 v 0 + z p A T w 0, v 0, 3.39 A v 0 + z p v 0 + z p Av 0 p 3.40 when considering the action of A on V 0, and then sup v V0, v p =1 Av = Av 0 p. Next, we are going to show Av 0 p = A T w 0. Since we know that w 0, Av 0 = A T w 0, v 0 = A T w 0 from 3.36, then it is sufficient to show that for any w R m with w = 1, w 0, Av 0 w, Av 0, that is to show A T w 0, v 0 A T w, v Let S 1 := { } w R m : w = be the unit sphere in l -norm in R m, A S T w 0 := {v A T R m : v = } A T w be the sphere of radius A T w 0 in l -norm in A T R m, and A T S 1 be the image of S 1 by map A T. Since inf w R m, w =1 A T w achieves its infimum at w 0, then the surfaces A T S 1 and S A T w 0 are tangent at the point A T w 0 in A T R m and let P be the A same tangent plane of dimension m 1 to the surfaces A T S and S T w 0 at A T w 0 in A T R m. In fact, by 3.36, v 0 is the gradient of the function f u := u, u R n, at { } the point u = A T w 0. Therefore, v 0 is orthogonal to P. Let B 1 := w R m : w 1 be the unit ball in l -norm in R m, which is convex, as 1, then its image by the linear map A T, A T B 1, is also convex. Therefore A T B 1 will be on the same side of P in A T R m. Hence for any w R m with w = 1, if A T w, v 0 > 0 then there is some λ 1 such that λw P, as v 0 is orthogonal to P, and it follows that A T w, v 0 A T λw, v 0 = A T w 0, v 0 + A T λw w 0, v 0 = A T w 0, v

55 42 as λw w 0 in P is orthogonal to v 0 ; if A T w, v 0 0 it is obvious that A T w 0, v 0 A T w, v 0, since A T w 0, v 0 = A T w 0 0. Thus 3.41 holds for all w R m with w = 1. This proves that s p m A s m A T On the other hand, there is a surface S of dimension m in R n such that for any z ker A and v S, v + z p v p. We know that there is some v 0 S with v 0 p = 1 such that Av 0 p = inf Av p, 3.46 v S, v p =1 and for any n m + 1-dimensional subspace V in R n, there is some v V S with v p = 1, because dim V + dim S = n + 1 and S is centrally symmetric, and then sup Av p Av p Av 0 p v V, v p =1 Thus s p m A = Av 0 p. Similar to the previous argument, by Hölder s ineuality, Av 0 p w, Av for all w R n with w p = 1, and there is some w 0 R 2 with w 0 = 1 such that Av 0 p = w 0, Av Thus for the opposite direction of the ineuality in 3.45, we just need to show A T w 0 = Av 0 p, that is to show w 0, Av 0 w 0, Av 3.50 for all v R n with v p = 1. But then we can use a very similar argument with the one for 3.41 to show this. For p = 1, by the continuity of l p at p = 1 and as 3.33 holds for p > 1, we know that s 1 minm,n A = s minm,n A T. That completes the proof. Remark For some computable examples on this theorem, see Appendix A.

56 The Smallest -singular Value of Random Matrices If 0 < < 1, we can still define the k-th singular value by 3.10, and Lemma still holds for finite dimensional vector spaces. In [43] and [58], the estimates on the smallest singular value defined by the usual l 2 -norm for suare random matrix and the sum of a random matrix and a deterministic matrix were given. In this section, we study the probability estimates of the smallest -singular value of rectangular random matrices and suare random matrices. For rectangular matrices, we are going to give the following estimate on the smallest -singular value, Theorem Given any 0 < 1, and let ξ be the pregaussian random variable with variance 1 and A be an m n matrix with i.i.d. copies of ξ in its entries. Then for any ε > 0 there exist some γ > 0 and c > 0 and r 0, 1 dependent on and ε, such that if n < rm. P s n A < γm 1 < e cm 3.51 To prove the above theorem, it is sufficient to show the following lemma, by considering the probability in union. Lemma Given any 0 < 1 and let A be an m n pregaussian random matrix, then there exist some c, λ 0, 1 such that for each v S m 1, where S m 1 First let us establish the following P Av < λ 1 m 1 c m 3.52 is the m 1-dimensional unit sphere in l -uasinorm. Lemma Given any 0 < 1, then for any ξ 1,, ξ n which are i.i.d. copies of a pregaussian random variable with variance 1, there exist c, λ 0, 1 such that n P ξ k < λn c n k=1

57 44 Proof. Same as the beginning of the proof for Theorem , for pregaussian random variables ξ 1,, ξ n with variance 1, we know that there exists some δ > 0, such that ε 0 := P ξ k δ < 1 2e 3.54 for k = 1, 2,, n. Then using the Riemann Stieltjes integral for expectation, we have E exp ξ k λ = exp 0 ξ k λ dp ξ k t δ 0 dp ξ k t + δ exp t λ dp ξk t = ε 0 + δ exp t λ dp ξk t Choose λ > 0 to be small enough, such that exp t exp λ δ < λ ε ε for t δ. Therefore, it follows that E exp ξ k ε 0 ε 0 + λ 2 1 ε 0 Finally, applying Markov s ineuality, we obtain δ dp ξ k t ε 0 + ε 0 2 = 3 2 ε 0 < 3 4e P n k=1 ξ k < λn = P exp n 1 n λ k=1 ξ k > 1 n k=1 ξ k E exp n 1 λ = e n n k=1 E exp c n ξ k λ for c := 3 2 ε 0 e < 3 4. The property of linear combination of pregaussian random variables will allow us to obtain the probabilistic estimate on Av for the pregaussian ensemble A. Indeed, the combination of Lemma and the lemma on linear combination of pregaussian random variables, Lemma yields Lemma Using the estimate on the covering number of the unit sphere in l -uasinorm by ε-net [21] and the probabilistic union bound, we are going to prove Theorem

58 45 Proof. Let the unit sphere in l -uasinorm S n be covered by ε-net N, we know from [21], that N n ε By the probabilistic union bound, P v N such that Av < λ 1 m n c m ε On the other hand, by Theorem , we know that for some C > 0. P s 1 A Km 1 1 exp C m 3.61 Now assume the event that Av < 1 2 λ 1m 1 for some v S n occurs. Therefore, there exists some v N such that v v ε, and by the triangular ineuality, Av A v v + Av s 1 A v v + Av K mε λ 1m 3.62 = λ 1m if we set ε = λ 1. It follows that 2K P s 1 A Km 1 and Av < λ 1 2 m 1 for some v S n P Av < λ 1 m But the event s n A < γm 1, where γ := 1 2 λ 1, implies Av < 1 2 λ 1m 1 for some v S n, therefore P s 1 A Km 1 and s n A < γm 1 = P Av < λ 1 m n c m 3.64 ε 1 n m + 2 m c ε by Choosing an appropriate r < 1, we have that if n < rm, P s 1 A Km 1 and s A < γm 1 e c 1m n 3.65

59 46 for some c 1 > 0. Finally, the claim that P s n A < γm 1 e cm 3.66 for some c > 0 follows from 3.65 and Now let us consider the suare random matrices whose entries are independent and identically-distributed copies of a pregaussian random variable. Theorem Given any 0 < 1, and let ξ be the pregaussian random variable with variance 1 and A be an n n matrix with i.i.d. copies of ξ in its entries. Then for any ε > 0 and 0 < 1, there exist some K > 0 and c > 0 dependent on and ε, such that P s n A < εn 1 < Cε + Cα n + P A > Kn where α 0, 1 and C > 0 depend only on the pregaussian random variable and K. To prove the above theorem, that is to estimate Av for all v R n, we decompose S n 1 into the set of compressible vectors and the set of incompressible vectors, generalized from the approach used in [43], [58], for instance. Definition Compressible and incompressible vectors in S n 1. Given any ρ, λ 0, 1. Let Comp λ, ρ be the set of vectors v S n 1 such that there is a vector v with v 0 λn satisfying v v ρ. The set of incompressible vector is defined as Incomp λ, ρ := S n 1 \ Comp λ, ρ Now using the decomposition, we have P s n A < εn 1 P inf v compλ,ρ Av < εn 1 +P inf v Incompλ,ρ Av < εn 1, 3.69 and in the following we are going to consider each term in the right hand side.

60 47 For the first term on compressible vectors, we can apply Lemma and use the union bound to get the probabilistic estimate on Av with v Comp λ, ρ, as P inf Av < 1 v comp εn P inf Av < εn λ,ρ v comp λ,ρ Therefore the first term actually decays exponentially for large n. However, for incompressible vectors, we first consider dis X j, H j, which denotes the distance between column X j of an n n random matrix A and the span of other columns H j := span X 1,, X j 1, X j+1,, X n, and obtain a generalized lemma for l -uasinorm from the counterpart in [43], which allows us to transform the probabilistic estimate on Av for v Incomp λ, ρ to the probabilistic estimate on the average of the distances dist X j, H j, j = 1, 2,, n. Lemma Let A be an n n random matrix with columns X 1,, X n, and H j := span X 1,, X j 1, X j+1,, X n. Then for any ρ, λ 0, 1 and ε > 0, one has P inf Av < 1 v Incomp ερn < 1 n P dist X j, H j < ε, 3.71 λ,ρ λn in which dist is the distance defined by the l -uasinorm. Proof. For every v Incomp λ, ρ, by Definition 3.3.5, there are at least λn components of v, v j, satisfying v j ρn 1, because otherwise, v would be within l -distance ρ of the sparse vector, the restriction of v on the components v j satisfying v j ρn 1 with sparsity { } less than λn, and thus v would be compressible. Thus if we let I 1 v := j : v j ρn 1, then I 1 v λn. Next, let I 2 A := {j : dist X j, H j < ε} and E be the event that the cardinality of I 2 A, I 2 A λn. Applying Markov s ineuality, we have j=1 P E = P {I 2 A : I 2 A, ε λn} 1 λn E I 2 A = 1 λn E {j : dist X j, H j < ε} = 1 λn n j=1 P dist X j, H j < ε. 3.72

61 48 Since E c is the event that {j : dist X j, H j ε} > 1 λ n 3.73 for random matrix A, thus if E c occurs, then for every v Incomp λ, ρ, I 1 v + I 2 A > λn + 1 λ n = n Hence there is some j 0 I 1 v I 2 A. So we have Av dist Av, H j0 = dist v j0 X j0, H j0 = v j0 dist X j0, H j0 ερn Contra-positively, if the events Av < ερn 1 P inf Av < v Incomp ερn λ,ρ 1 P E 1 λn occurs then E also occurs. Thus n P dist X j, H j < ε j=1 Remark dist defined by the l -uasinorm is different from the usual dist defined by the l 2 -uasinorm, because simply in a right triangle the hypotenuse as a vector can be shorter than the leg as a vector in l -uasinorm. Nevertheless, dist X j, H j dist X j, H j because 2, therefore one can take the advantage of the estimate on P dist X j, H j < ε to obtain the estimate on P dist X j, H j < ε. Theorem Distance bound, [43]. Let A be a random matrix whose entries are independent variables with variance at least 1 and fourth moment bounded by B. Let K 1. Then for every ε > 0, P dist X j, H j < ε and A Kn 1 2 C ε + α n 3.77 where α 0, 1 and C > 0 depend only on B and K.

62 49 The above theorem implies that P dist X j, H j < ε P dist X j, H j < ε C ε + α n + P A Kn Combining 3.69 and applying Lemma 3.3.6, we now reach the desired ineuality in Theorem Furthermore, since A is pregaussian, using a standard concentration bound we know that for every ε > 0, there exists some K > 0 such that P A Kn 1 2 < ε. Thus, we have proved the following Theorem Lower tail probabilistic estimate on the smallest -singular value. Given any 0 < 1, and let ξ be the pregaussian random variable with variance 1 and A be an n n matrix with i.i.d. copies of ξ in its entries. Then for any ε > 0, there exists some γ > 0 such that P s n A < γn 1 < ε, 3.79 where γ only depends on, ε and the pregaussian random variable ξ. 3.4 Upper Tail Probability of the Smallest -singular Value In this section, we want to obtain the estimate on the upper tail probability of the smallest -singular value of an n n pregaussian random matrix. Theorem Upper tail probabilistic estimate on the smallest -singular value. Given any 0 < 1, and let ξ be the pregaussian random variable with variance 1 and A be an n n matrix with i.i.d. copies of ξ in its entries. Then for any K > e, there exist some C > 0, 0 < c < 1, and α > 0 only dependent on pregaussian random variable ξ,, such that P s n A > Kn 1 2 In particular, for any ε > 0, there exist some K > 0 and n 0, such that for all n n 0. C ln Kα K α + c n P s n A > Kn 1 2 < ε 3.81

63 50 Proof. From the previous section and by Lemma 3.2.6, the n n pregaussian random matrix A is invertible with very high probability. Therefore, we have P s n A ut ε n 1 2 and thus it suffices to show P v n u, A 1 v ε t n 1 P v n u, A 1 v ε t n 1 for some v R n, 3.82 for some v R n 1 ε Now we choose v = X j π j X j, where X j is the j-th column vector of A and π j is the projection onto the hyperplane H j := span X 1,, X j 1, X j+1,, X n, to have a small probability for both the event A 1 v ε n 1 t and the event v u in Using the result established in [44], we can easily estimate the probability of the event that A 1 v ε t n 1 occurs. Since A 1 v A 1 v 2, we know that P A 1 v ε n 1 t P A 1 v 2 ε n 1 t = P A 1 v 2 ε n t 2p 4ε, t, n 2 where p ε, t, n := C 1 ε + e c 1t 2 + e c 2n for some c 1, c 2, C 1 > It remains to show that v u in 3.82 has a small probability, and that will be shown in the next lemma. In fact, for the normal vector to the hyperplane H j, v = X j π j X j, we have Lemma For every u > 0, one has P X j π j X j un C 2 e c3u + C 3 n c for some c 3, c 4, C 2, C 3 > 0 and any j = 1, 2,, n. Proof. Without loss of generality, assume j = 1. Let a 1, a 2,, a n := X 1 π 1 X 1, normal to the hyperplane H 1, and ξ 1, ξ 2,, ξ n := X 1.

64 51 Applying the Bessy-Esseen theorem see for instance [54], we know that P X j π j X j 2 u n i=1 = P a iξ i u = P g u + O n c 3.86 n i=1 a2 i for some c > 0, in which g is a standard normal random variable. By the discrete Hölder ineuality, X j π j X j n 1 X j π j X j 1 n Xj π j X j 2, 3.87 and then P X j π j X j n u P n Xj π j X j 2 n u = P X j π j X j 2 u Therefore it follows from 3.86 that for some c 3, c 4, C 2, C 3 > 0. P X j π j X j un P g u + O n c = 2 2π u e 1 2 x2 dx + O n c C 2 e c 3u + C 3 n c Now we can choose u = t = ln M, where M > 1, and ε = 1, then 3.82, 3.84, and M 3.85 imply that P s n A > M ln M n 1 2 C M + α cn 3.90 for some C > 0, 0 < c < 1, and α > 0. Then let K := M ln M, thus we have P s n A > Kn 1 2 C ln Mα K α + c n C ln M ln Mα K α + c n = C ln Kα K α + c n 3.91 if M e, that reuires K > e. Thus, we have completed the proof of Theorem Lastly, we make a table to summarize the results of the probabilistic estimates on the largest and smallest -Singular values of pregaussian random suare matrices for 0 < 1 see Table 3.1.

65 52 Table 3.1: Results obtained on the probabilistic estimates on the largest and smallest - singular values of pregaussian random suare matrices for 0 < 1 n n matrix Order of largest -singular value Order of smallest -singular value Upper Tail n 1 n 1 2 n 1 conjectured Lower Tail n 1 n 1

66 Chapter 4 The Probabilistic Estimates on the Largest p-singular Value of Pregaussian Random Matrices for p > Introduction The largest and smallest -singular values of pregaussian random matrices for 0 < 1 have been studied in [32]. Similar to the -singular value when 0 < 1, we can define the largest p-singular value when p > 1. Definition For an m N matrix A, the largest p-singular value of A denoted as s p 1 A is defined as for given p > 1. { } s p Ax p 1 A := sup : x R N with x 0 x p Lower Tail Probability of Largest p-singular Value for p > 1 In [32] we have established the following Lemma Linear bound for partial binomial expansion. For every positive integer n, for all x [0, 1]. n n x k 1 x n k 8x 4.2 k k= n 2 +1 The above lemma can be applied to estimate probabilities. 53

67 54 Lemma Suppose ξ 1, ξ 2,, ξ n are i.i.d copies of a random variable ξ, then for any ε > 0, for any given p > 1. P n i=1 ξ i p nε 8P ξ ε Proof. Given p > 1, we have the relation on the probability events that { } n ξ 1, ξ n : ξ i p nε 2 i=1 4.4 is contained in n { ξ1, ξ n : ξ i1 p ε, ξ ik p ε, ξik+1 p > ε, ξ in p > ε } := E 4.5 k= n 2 +1 where {i 1, i 2,, i k } is a subset of {1, 2, n} and {i k+1,, i n } is its complement. Let x = P ξ 1 p ε, then by the union probability, n P E = n x k 1 x n k, 4.6 k and applying Lemma 4.2.1, we have k= n 2 +1 P E 8x = 8P ξ 1 ε. 4.7 Since the event 4.4 is contained in the event 4.5, n P ξ i p nε P E 8P ξ 1 p ε i=1 To estimate the lower tail probability of the largest p-singular value, we have Theorem Lower tail probability of the largest p-singular value, p > 1. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m N matrix with i.i.d. copies of ξ in its entries, then for every p > 1 and any ε > 0, there exists γ > 0 such that P s p 1 A γm 1 p ε 4.9 in which γ only depends on p, ε and the pregaussian random variable ξ.

68 55 Proof. Since a ij is pregaussian with variance 1, then any ε > 0, there is some δ > 0, such that But we know P a ij p δ ε s p 1 A m i=1 a ij p 1 p 4.11 for all j, because by the definition of the largest p-singular value 4.1.1, choosing x to be the standard basis vectors of R N gives us max j m i=1 a ij p 1 p s p 1 A. Therefore, by Lemma 4.2.2, P s p 1 A 1 δ 2 p m 1 p P Thus let γ = 1 δ p, then 4.9 follows. 2 m i=1 a ij0 p mδ 8P a ij p δ ε Upper tail probability of the largest p-singular value for p > 1 For the upper tail probability of the largest p-singular value, p > 1, we can derive the following lemma first by using the Minkowski ineuality and discrete Hölder ineuality. Lemma For p 1, 2.17 defines a norm on the space of m N matrices and max a j j p s p 1 A N p 1 p in which a j, j = 1, 2,, N, are the column vectors of A. max a j j p, 4.13 Applying the above lemma, an estimate we can derive easily is the following Theorem Upper tail probability of the largest p-singular value of Bernoulli random matrices, p > 1. Let ξ be a Bernoulli random variable normalized to have variance 1 and A be an m N matrix with i.i.d. copies of ξ in its entries, then m 1 p s p 1 A m 1 p N p 1 p 4.14 For the more general rectangular matrices, we have

69 56 Theorem Upper tail probability of the largest p-singular value of rectangular matrices, 1 < p 2. Let ξ be a pregaussian variable normalized to have variance 1 and A is an m N matrix with i.i.d. copies of ξ in its entries, then for every 1 < p 2 and any ε > 0, there exists K > 0 such that P s p 1 A K m 1 1 p + m p N 2 ε 4.15 where K only depends on p, ε and the pregaussian variable ξ. Proof. By the discrete Hölder ineuality and the definition of the largest p-singular value, s p Ax p m 1 p 1 2 Ax 2 1 A = sup sup x R N, x 0 x p x R N, x 0 x 2 We also know that there exists K > 0 such that P s 2 1 A K m N 2 = m 1 p s 1 A ε Therefore, we have P s p 1 A K m 1 1 p + m p N 2 P s 2 1 A K m N 2 ε Using the duality lemma on the largest p-singular value, we have Theorem Lower tail probability of the largest p-singular value of rectangular matrices, p > 2. Let ξ be a pregaussian random variable normalized to have variance 1 and A be an m N matrix with i.i.d. copies of ξ in its entries, then for every p > 2 and any ε > 0, there exists γ > 0 such that P s p 1 A γm p 1 p ε 4.19 in which γ only depends on p, ε and the pregaussian random variable ξ. Also, we have the upper tail probability of the largest p-singular value of rectangular matrices for p > 2.

70 57 Theorem Upper tail probability of the largest p-singular value of rectangular matrices, p > 2. Let ξ be a pregaussian variable normalized to have variance 1 and A is an m N matrix with i.i.d. copies of ξ in its entries, then for every p > 2 and any ε > 0, there exists K > 0 such that P s p 1 A K N p 1 p + m 1 p 2 2 N 2p ε 4.20 where K only depends on p, ε and the pregaussian variable ξ. Remark In particular, for p =, s 1 A is approximately O n. For some numerical experiments, see Appendix C.

71 Chapter 5 Modified Restricted Isometry Property and Sparse Recovery Modified Restricted Isometry Property For an integer s n, the restricted isometry constant δ s A is the smallest number δ which satisfies 1 δ x 2 Ax δ x 2 for all x R n, support x s. Euivalently, the ineuality 1 δ smin A S s max A S 1 + δ holds for any m s submatrix A S. In an earlier version of [21], Foucart and Lai defined the so-called s-th -modified restricted isometry property, 1 δ x f, Ax 1 + δ x f, 5.1 for all x R N and x 0 < s. On the other hand, in [11], the -restricted isometry property, 1 δ s x 2 Ax 1 + δ s x for all x R N and x 0 < s was defined. But in order to study the sparse recovery via l -minimization, we want to use x instead of x f,, so we define Definition An m N matrix A is said to have the s-th -restricted isometry property, if 1 δ s x Ax 1 + δ s x 5.3 for all x R N and x 0 < s. 58

72 59 In an earlier version of [21] Foucart and Lai also introduced the new uasinorm x f, := N t i x i f t 1 f t N dt 1 dt N i= for any pregaussian distribution probability density function f and > 0, and showed that, for any x R N and 0 < 1, the probability that x satisfies the -modified restricted 3 isometry property, P 1 ε m x f, Ax 1 + ε m x f, 1 2e κmε2 5.5 for some κ > 0 independent of and f. However, if we consider the case that = 1 for the sake of clearness, we will present the result for general 0 < 1 in the later section, we have the following Theorem Suppose that A is an m N matrix whose entries are independent and identically-distributed copies of a symmetric pregaussian random variable with probability density function f. Then P 1 ε m x f,1 Ax ε m x f,1 1 2e κ m N ε for any 0 < ε < 1 and some κ > 0 dependent of f. First of all, it turns out that the uasinorm 5.4 has the following properties Lemma For any x R N, and even probability density function f, 1. x f,2 = σ x 2, in which σ 2 is the variance of the even probability density function f; 2. x f, x f, for any 0 < ; 3. x f, N 1 σ 1 x p, in which 1 p + 1 = 1 for p, 1 and σ is the -th absolute moment of the probability density function f, σ := t f t dt; 5.7

73 60 4. x f, x σ 1,p, in which σ,p := N t t N p f t1 f t N dt 1 dt N 5.8 i=1 and 1 p + 1 = 1 for p, 1. Proof. By the definition, x f,2 = 1 N i=1 t 2 2 ix i f t1 f t N dt 1 dt N = N i=1 x2 i t 2 i = N + N i,j=1,i j x ix j t i t j f t 1 f t N dt 1 dt N t2 2 i f t 1 f t N dt 1 dt N i=1 x2 i 5.9 = σ x 2 since f is even, thus we obtain 1. For 2, we can just use Hölder s ineuality in the probability measure associated with the density function, x f, = = N N i=1 t ix i i=1 t ix i f t1 f t N dt 1 dt N 1 f t 1 f t N dt 1 dt N f t 1 f t N dt 1 dt 1 N N i=1 t ix i f t 1 f t N dt 1 dt N = x f,. Again, by the discrete Hölder ineuality, x f, = N i=1 t 1 ix i f t1 f t N dt 1 dt N N i=1 t i N i=1 x i p 1 p f t1 f t N dt 1 dt N = x 1 N p i=1 t i f t 1 f t N dt 1 dt N = N 1 σ 1 x p, 5.11 which yields 3.

74 61 By the same token, 4 is obtained as follows, x f, = N i=1 t 1 ix i f t1 f t N dt 1 dt N N i=1 x i N i=1 t i p 1 p f t1 f t N dt 1 dt N = x N i=1 t i p 1 p f t1 f t N dt 1 dt N = x σ 1,p In addition, the following useful lemmas were established in an earlier version of [21] by Foucart and Lai. Lemma [21]. ]For any x R N and 1, x f, x 1 σ 1. Proof. By Jensen s ineuality and the convexity of the function u u for 1, x f, = N i=1 t 1 ix i f t1 f t N dt 1 dt N = x 1 N i=1 t x i 1 i x f t1 f t N dt 1 dt N 1 x 1 N i=1 = x 1 σ 1. x i 1 t x 1 i f t 1 f t N dt 1 dt N 5.13 Now, let us compare x f, and x 2. Lemma Comparison with l 2, [21]. There exist c, C > 0 such that c x 2 x f, C x for all x R N. Now let s prove Theorem Proof. [Proof of Theorem 5.1.2] Let X i := N j=1 a ijx j and ξ i := X i EX i, then m ξ i = Ax 1 m x f, i=1

75 62 because x f,1 = N t i x i f t 1 f t N dt 1 dt N = EX i i=1 Obviously Eξ i = 0 and Eξi 2 = EXi 2 EX i 2 = x 2 f,2 x 2 f,1 := ν 2 < 5.17 since E a ij k k!λ k for some λ > 0. Moreover, but by Lemma we have Therefore by Lemma again. Eξ k i = E X i EX i k = E X j i k k 1 k j EX j i j x k j f,1, 5.18 j=0 x j 1 σ j j!λ x 1 j Eξ i k k k j=0 j j! λ x 1 j x k j f,1 k! k k j=0 j λ x 1 j x 1 σ 1 k j Now let H := 2λ x 1, then Eξ k i k! k j=0 k j = k! 2λ x 1 k λ x 1 j λ x 1 k j 5.20 k!h k. Applying the Bernstein ineuality see for instance [10] and [30], we have t 2 P Ax 1 m x f,1 > t 2 exp th + mν 2 Then we can choose t = εm x f,1, thus P Ax 1 m x f,1 > εm x f,1 2 exp = 2 exp ε 2 m 2 x 2 f,1 2εm x f,1 H+mν 2 ε2 m x 2 f,1 2ε x f,1 H+ν By Lemma 5.1.3, x f,2 = σ x 2 in which σ 2 is the variance of the even probability density function f, and x f,2 x f,1, and by Lemma 2.3 in [21], x f,1 c 1 x 2 = σ 1 π minγ 3 2, Γ1 x 2 = 2 2 σ 1 x 2, 5.23

76 63 in which σ 1 is the first absolute moment of the probability density function f, hence 2 exp ε2 m x 2 f,1 ε 2ε x f,1 2 exp 2 m x 2 f,1 H+ν 2 2ε x f,2 H+ x 2 f,2 ε = 2 exp 2 m x 2 f,1 2εσ x 2 H+σ 2 x 2 2 ε 2 exp 2 mσ1 2 x 2 2 4εσ x 2 H+σ 2 x 2 2 = 2 exp ε2 mσ1 2 x 2. 4σεH+σ x Furthermore 2 exp ε2 mσ1 2 x 2 4σεH+σ x 2 = 2 exp 2 exp = 2 exp ε 2 mσ1 2 x 2 4σ2ελ x 1 +σ x 2 ε 2 mσ1 2 x 2 4σ2ελ N x 2 +σ x 2 ε2 mσ1 2 4σ2ελ N+σ Thus, the combination of 5.22, 5.24 and 5.25 yields σ1 P Ax 1 m x f,1 > εm x f,1 2 exp 2 4σ2ελ+σ/ N ε2 m 2 exp σ2 1 ε2 m 4σ2λ+σ N = 2e κ m N ε 2, N 5.26 where κ := σ 2 1 4σ2λ+σ dependent only of f, as desired. We can see from 5.26 that the additional assumption N m exponential decay, so we have the following = o1 will give us the Corollary If N 1 2 m N, then we have the exponential decay for the tail probability for some some κ > 0 dependent of f. P Ax 1 m x f,1 > εm x f,1 2e κε and the following corollary on suare matrices

77 64 Corollary If A is an n n matrix whose entries are independent and identicallydistributed copies of a symmetric pregaussian random variable with probability density function f. Then P1 εn x f,1 Ax εn x f,1 1 2e κ nε for any 0 < ε < 1 and some κ > 0 dependent of f modified Isometry Property We can actually prove generally for 0 < 1. Theorem Suppose that A is an m N matrix whose entries are independent and identically-distributed copies of a symmetric pregaussian random variable with probability density function f. Then P 1 ε m x f, Ax 1 + ε m x f, 1 2e κmn 2 ε for any 0 < ε < 1 and 0 < 1, and some κ > 0 dependent of f and. Proof. Let X i := n j=1 a ijx j and ξi := X i EX i. Since x f, = N t i x i f t 1 f t N dt 1 dt N = EX i i=1 then Obviously Eξ i = 0 and m ξ i = Ax m x f, i=1 Eξi 2 = EXi 2 EX i 2 = x 2 2 f,2 x f, := ν2 < 5.32 because E a ij k k!λ k for some λ > 0. Moreover, Eξ k i = E X i EX i k = k k 1 k j EX j i j x k j f,, 5.33 j=0

78 65 but by 2 of Lemma and Lemma we have E X j i = x j j f,j x f,j x j 1 σ j j! λ x 1 j j!λ x 1 j 5.34 Therefore Eξ k i k k j=0 j j! λ x 1 j x k j f, k! k j=0 k j = k! k j=0 k j k! k j=0 k j = k! 2 λ x 1 k λ x 1 j x k j f,1 λ x 1 j x 1 σ 1 k j λ x 1 j λ x 1 k j 5.35 by 2 of Lemma and Lemma again. Now let H := 2 λ x 1,then Eξ i k k!h k. Applying the Bernstein ineuality, we have Ax P m x f, > t 2 exp t 2 2 th + mν Then we can choose t = εm x f,, thus Ax P m x f, > εm x f, 2 exp = 2 exp ε 2 m 2 x 2 f, 2εm x f, H+mν2 ε2 m x 2 f, 2ε x f, H+ν By Lemma 5.1.3, x f,2 = σ x 2 in which σ 2 is the variance of the even probability density function f, and x f,2 x f,1, and by Lemma 2.3 in [21], c x 2 x f, C x 2, 5.38 in which and c = 2 2 σ min Γ, Γ π 2 2 = 2 2 σ Γ π C = 2 2 σ σ + 1 max Γ, Γ = Γ π 2 2 π 2 3 = σ

79 66 since Γ again, +1 < Γ for 0 < 1 see for example [35]. Hence by 1 and 2 of Lemma 2 2 exp ε2 m x 2 f, 2ε x f, H+ν2 2 exp 2 exp = 2 exp 2 exp = 2 exp ε 2 m x 2 f, 2ε x f, H+ x 2 f,2 ε 2 m x 2 f, 2ε x f, H+ x 2 f,2 ε 2 m x 2 f, 2ε x f, H+σ2 x 2 2 ε 2 mc 2 x 2 2 2εC x 2 H+σ2 x 2 2 ε2 mc 2 x 2 2εC H+σ 2 x Furthermore, 2 exp ε2 mc 2 x 2 2εC H+σ 2 x 2 = 2 exp 2 exp = 2 exp ε 2 mc 2 x 2 22εC λ x 1 +σ 2 x 2 ε 2 mc 2 x 2 22εC λ N +σ 2 22εC λ N x 2 +σ 2 x 2 ε 2 mc Finally, the combination of 5.37, 5.41 and 5.42 yields Ax P m x f, > εm x f, 2 exp c2 mn 2 ε 2 2 2εC λ +σ 2 /N 2 c 2 exp 2 mn 22C λ +σ 2 2 ε 2 = 2e κmn 2 ε 2, 5.43 where κ := Also, we have c 2 22C λ +σ 2 dependent only of f, and the claim follows. Corollary If N 2 m N, 0 < 1, then we have the exponential decay for the tail probability Ax P m x f, > εm x f, 2e κε for some some κ > 0 dependent of f and. and the following corollary on suare matrices

80 67 Corollary modified RIP for suare matrices. If A is an n n matrix whose entries are independent and identically-distributed copies of a symmetric pregaussian random variable with probability density function f. Then P 1 ε m x f, Ax 1 + ε m 2 x κn 2 ε f, 1 2e for any 0 < ε < 1 and some κ > 0 dependent of f and, 0 < On the Sparse Recovery Now, following the idea in [5] and that in [21] generalized by Foucart and Lai, which is using the estimate of cardinality of the finite cover of one can show S f, := { x R N : x f, = 1 }, 5.46 Theorem Suppose that the entries of the m N matrix A are independent copies of a pregaussian random variable with an even probability density function f. Given 0 < 1 and 0 < ɛ < 1, there exist constants k 1, k 2, and k 3 depending on f and such that Ax P m x f, ɛm x f, for all x R N with x s e κ 1mN 2 ɛ 2 provided that k 2 s ɛ 2 + k 3 s ɛ 2 ln en s N 2 < m N. Proof. We follow the proof of Theorem 4.1 of [21], with modified condition on m. By the cardinality estimate, union probability and Theorem 5.2.1, one has Ax P m x f, ɛm x f, for all x RN with x s t N k e k 1mN 2 ɛ 2 ɛ 2 N k exp k 1mN 2 ɛ 2 2 exp 4 + 8s ɛ k 1mN 2 ɛ 2 + 8s + s ln en 4 ɛ s, 5.48 and then the claim follows if m > s s k 2 + k ɛ 2 3 ln en ɛ 2 s N 2 for k 1 = κ, k 8 2 = 64 and κ κ 1 = 8. κ

81 68 In compressed sensing, the l -minimization problem with 0 < 1 is the following: minimize z R N z subject to Az = y, 5.49 that can be used to reconstruct a sparse vector that is the solution to the minimization minimize z R N z 0 subject to Az = y Using Theorem 5.1 of [21], we are able to have the sparse recovery through pregaussian compression matrices. Theorem Suppose that the entries of the m N matrix A are independent copies of a pregaussian random variable with an even probability density function. If 0 < 1, then the probability that every s-sparse vector x R N is recovered as a solution to the optimization problem 5.49 exceeds 1 exp c 1 mn 2, provided that cs ln N s N 2 < m N, in which c and c 1 are dependent on the pregaussian distribution and.

82 Chapter 6 Null Space Property for Recovery from Multiple Measurements via l -minimization 6.1 Introduction In recent year, compressed sensing, a techniue for recovering a sparse or compressible signals, attract much interest. The methods in compressed sensing including the convex l 1 relaxation method and non-convex l -method, 0 < < 1. On the l -method, Foucart and Lai in [20] presented some numerical experimental results which indicates that the -method performs better than other available methods and a sufficient condition on the matrix of an underdetermined linear system which guarantees that the solution of the system with minimal -uasinorm. The l -method in unconstrained minimization can be also used to generate the sparse solution to underdetermined linear system. In data fitting, Tikhonov regularization is a method to get the regularized solution to underdetermined linear systems in least-suare sense, which considers an unconstrained optimization problem instead of a constrained one. Recently, Lai and Wang in [33] gave an iterating algorithm to generate the sparse solution to underdetermined linear systems by defining the unconstrained l -minimization arg min x R N x + 1 2λ Ax b for 0 < 1 and using Γ-convergence to get the minimizer for the unconstrained l 0 - minimization through the cluster points of minimizers for the unconstrained l -minimization. 69

83 70 To get exact recovery from a single vector via l -minimization, the sensing matrix needs to satisfy the l null space property, as we have seen in Proposition In multiple measurement problem MMV, that is given a set of r measurements Ax k = b k for k = 1,, r, 6.2 find the vectors x k which are jointly sparse, i.e., have nonzero entries at the same locations, and that arises in biomedical engineering such as neuromagnetic imaging, one may use the multiple-measurement-vector MMV non-convex optimization problem. It is minimize X,p subject to AX = B, 6.3 in which A, X and B are matrices, and X,p := N j=1 X j p 1, 6.4 where X j is the j-th row of X, for 0 < 1. We can find a condition for the exact recovery similar to the l null space property for single measurement vector problem SMV. In the case of r = 2, p = 2 and = 1, Foucart and Gribonval in [19] proved that the real null space and complex null space property are euivalent for the sparse recovery achieved by l 1 -minimization. In the following sections, we will start with the case r = 2 and prove the euivalence for p = 2 and 0 < 1 in this case, and then present our results for general r. 6.2 Null Space Property for Recovery via l minimization For minimization problem 5.50, the uniueness of the solution is related to a parameter of the matrix A, spark A, which is defined as the cardinality of the smallest subset of linearly dependent columns of A see [16] and has the obvious property that spark A rank A+1. To be precise, if z 0 < 1spark A then z can be uniuely recovered by solving the l 2 0- minimization 5.50.

84 71 As for the minimization problem 5.49, one can introduce the so-called null space property of A. The null space property has also been used in uantifying the error of approximations, see e.g. [13]. However, the null space property which guarantees the sparse recovery is basically the following theorem in [28] by Gribonval and Nielsen, which is also generalized by them to general f-norm but not necessarily a norm on Hilbert space in [29]. Proposition Restricted null space property. Let S {1, 2,, N} be a fixed index set. Then a vector z with support z S can be uniuely recovered from Az = b using l -minimization 5.49 if for all non-zero v in the null space of A, v S < v S c, 6.5 in which S c is the complement of S in {1, 2,, N}. For clarity we give a concise proof here. Proof. We know for any non-zero x in the null space of A, z S x S + z + x S. 6.6 By the assumption x S < x S c, we then have z S < z + x S + x S c. 6.7 But we also know that z vanishes on S c, thus z = z S < z + x S + z + x S c = z + x, 6.8 and so z R N is the uniue solution to the minimization problem In [29], Gribonval and Nielsen also discussed the negated cases of 6.5 in general. Here we include a version in l for our topic. Proposition Let N be the null space of A and S be any subset of {1, 2,, N}.

85 72 1. If x S > x S c for some x N, then there exist some z and z, such that Az = Az, support z S and z < z. 2. If x S = x S c for some non-zero x N, then there exist some z and z, z z, such that Az = Az, support z S and z = z. Proof. For 1, without loss of generality, we can assume S = {1, 2,, s}. Take z to be the vector whose components in S are the components of x restricted on S and whose other components are zeroes, and z to be the vector whose components in S c are the negative of the components of x restricted o S c and whose other components are zeroes, then z z = x. It follows that Az Az = A z z = Ax = 0, 6.9 the support of z is in S, and z = x S c < x S = z For 2, one can just change the < in 6.10 to = and other steps are the same with Remark This proposition tell us that z, that is supported on S, can not be uniuely recovered through solving the l -minimization 5.49, if x S x S c for some non-zero x N. Therefore, if all the vectors supported on S can be uniuely recovered, then A satisfies the null space property on S. Precisely, one has an euivalence as follows Proposition Let S {1, 2,, N} be a fixed index set. Then all the vectors supported on S can be uniuely recovered from Az = b using l 1 -minimization 5.49 if and only if for all non-zero v in the null space of A, v S 1 < v S c

86 73 To give a concrete sense that v S 1 < v S c 1 can fail if there is a vector supported on S can be uniuely recovered, we would like to give an example. Example Consider the l 1 -minimization problem minimize z R N z 1 subject to Az = y, 6.12 in which A = , 6.13 N = 3 and y = To solve the minimization problem, we can do a linear programming which the l 1 - minimization is euivalent to, see [7]. On the other hand, we can also obtain the solution directly. The solutions of Az = y are t 2 t 1 t : t R, 6.15 and then the function ft := t + 2 t + 1 t, that is the l 1 -norm of the solutions, achieves 1 its minimum at t = 1. So the minimization problem has a uniue solution z = 1 with 0 1 the support S = {1, 2}. However, the non-zero null vector of A, x := 1 does not 1 satisfy x S 1 < x S c 1.

87 Real versus Complex Null Space Properties for l In this section, we consider the open problem whether for any 0 < 1 the real null space property x S < x S c 6.16 in Proposition or the so-called stable null space property x S ρ x S c for some 0 < ρ < 1, 6.17 for all x N \ {0} in the following we use the notation that was used in [19] to denote either < or ρ for the l -minimization 5.49 is euivalent to the complex version in other words j S vj 2 + w2 j vj 2 + w2 j, 6.18 j S c z S z S c 6.19 for z = v + 1w, for all v, w N 2 \ {0, 0}, raised by Foucart and Gribonval in [19]. Remark In finite dimensional vector spaces the null space property and the stable null space property are euivalent, because N S is compact and then the function x x S x S c achieves its maximum which is strictly less than 1. But in infinite dimensional vector spaces, for instance Hilbert space, the null space property does not necessarily imply the stable null space property. Theorem Comparison theorem. Let S {1, 2,, N} be an index set with S = s. Give 0 < 1 and a matrix B R 2 N with columns c 1, c 2,, c N R 2, if x, yb S x, yb S c 6.20 for all x, y R 2 \ {0} and some S {1, 2,, N} with S = s. Then c k p 2 c k p k S c k S

88 75 Proof. Let B =: b i,j 2 N, and without loss of generality we can assume S := {1, 2,, s}. We show 6.21 holds for s = 1 first and then for s 2. If s = 1, by the assumption, b 1,1 x + b 2,1 y = x, yb S p x, yb S c p = N b 1,j x + b 2,j y 6.22 j=2 for all x, y R 2 \ {0, 0}. Choosing x = b 1,1 b 2 1,1 +b 2 2,j and y = b 2,1 b 2 1,j +b 2 2,1 and applying the Cauchy-Schwarz ineuality, we have b 2 1,1 + b 2 2,1 N j=2 N j=2 1 b 2 1,1 +b 2 2,1 b 2 1,j + b2 2,j from 6.22, thus the claim 6.21 for s = 1 follows. For the case when s 2, we have s b 1,j x + b 2,j y j=1 N j=s+1 for all x, y R 2 \ {0} from the assumption. Let v j := S 1 is the unit circle, j = 1, 2,, N, then s j=1 b 2 1,j + b2 2,j vj, ξ N j=s+1 b 1,j b 1,1 + b 2,j b 2, b 1,j x + b 2,j y 6.24 b 1,j b 2 1,j +b 2 2,j, b 1,j b 2 1,j +b 2 2,j S 1 where b 2 1,j + b2 2,j vj, ξ 6.25 for all unit vector ξ S 1 particularly. Taking the integral of 6.25 on S 1, we have s j=1 b 2 1,j + b2 2,j S 1 v j, ξ dξ N j=s+1 b 2 1,j + b2 2,j S 1 v j, ξ dξ Note that S 1, ξ dξ is a rotation invariant function from the perspective of integral geometry cf. [1], [3] and [37]. That is, S 1 v j, ξ dξ is constant independent of j. In fact we have Lemma Given any integer r 2. For any > 0, v, ξ dξ = C S r 1 for all v S r 1, where C > 0 is a constant dependent only on p.

89 76 Proof. Let U be an orthogonal transformation of R r. Then for any v S r 1, the sphere of the unit ball in R r, we have U v, ξ = v, U 1 ξ 6.27 for all ξ S r 1. Also, we know that S r 1 = {U v : U O r}, 6.28 where O r denotes the set of all r r orthogonal matrices. By change of variables and using the fact that det U 1 = 1, we get S r 1 U v, ξ dξ = S r 1 v, U 1 ξ dξ = S r 1 v, U 1 ξ du 1 ξ 6.29 = S r 1 v, ξ dξ for all U Or. Thus we see that S r 1 v, ξ dξ C for some C > 0 and for all v S r 1. Therefore So the claim follows. s j=1 b 2 1,j + N b2 2,j b 2 1,j 2,j + b j=s+1 An immediate conseuence of the above theorem is the following theorem on the euivalence of real null space property and complex null space property. Theorem The real null space property u S u S c 6.31 for all u ker A \ {0} is euivalent to the complex null space property j S for all v, w ker A 2 \ {0, 0} and all 0 < 1. vj 2 + w2 j vj 2 + w2 j, 6.32 j S c

90 77 Proof. From 6.32 to 6.31, it is obvious. For the converse, letting we have B = v 1 v 2 v N w 1 w 2 w N, 6.33 x, yb S x, yb S c 6.34 by Then applying Theorem 6.3.2, we obtain that c k 2 c k k S c where c 1, c 2,, c N are the columns of B. That yields k S Remark By Remark B.0.16, we know that Corollary Theorem also holds for > 1. The application of the result in Theorem in compressed sensing is that we can recover a sparse vector in complex space via solving two real l -minimization problems instead of a complex l -minimization one, and an algorithm for the real l -minimization problem 5.49 is given in [20]. 6.4 On Recovery from Multiple Measurements We also want to consider another open problem that Foucart and Gribonval proposed in [19], which is whether the null space property 6.31 is euivalent to the so called mixed l 1,2 null space property j S u 21,j + + u2n,j j Sc u 2 1,j + + u2 n,j, 6.36 for the joint recovery from multiple measurements via minimize N i=1 z 2 1,j + + z2 n,j subject to Az 1 = Ax 1,, Az n = Ax n 6.37

91 78 see [8], in which x 1,, x n is an n-tuple of vectors in R N. Based on the answer to the problem on complex null space above, we are able to solve the problem concerning n-tuples as well. More generally, we provide an affirmative answer to the mixed l,2 null space property u 2 1,j + + u2 n,j u 2 1,j n,j + + u2, 6.38 j S j S c for the joint recovery from multiple measurements via N minimize z1,j n,j z2 subject to Az1 = Ax 1,, Az n = Ax n 6.39 i=1 see [8] for all 0 < 1. Theorem The null space property u S u S c 6.40 for all u ker A\{0} is euivalent to the mixed l,2 null space property 6.38 for all n 3 and all 0 < 1. Proof. It is trivial to deduce null space property 6.40 from mixed l,2 null space property For the converse, we can modify the proof of Theorem to obtain a generalized comparison theorem for any B R n N, n 3, and 0 < 1. Specifically, we will have b 2 1,j + + b2 n,j vj, ξ b 2 1,j n,j + + b2 vj, ξ 6.41 j S j S c 1 for v j := b b 2 1,j + +b 2 1,j,, b n,j and all vector ξ in the unit n 1-sphere S n 1. Analogously n,j to the case of n = 2, taking the integral of 6.41 on S n 1, we have j S b 2 1,j + + b2 n,j v S n 1 j, ξ dξ j S b 2 c 1,j + + b2 n,j v S n 1 j, ξ dξ Using the fact that S n 1, ξ dξ is a rotation invariant function from the perspective of integral geometry see for instance [1], [3] and [37], we get b 2 1,j + + b2 n,j b 2 1,j n,j + + b j S j S c

92 79 For joint-sparse recovery from multiple measurements, it can also be achieved by mixed l,2 -minimization for 0 < 1, for which one can see [20] and [8]. The above theorem may allow us to consider the l -minimization instead, that will simplify the program for the mixed l,2 -minimization. Furthermore, multiple measurements have been used in many fields of technology, for instance, neuromagnetic imaging [15] and communication channels see e.g. [14], as pointed out in [8], so our results may have real applications in these fields. 6.5 The Null Space Property for Recovery from Multiple Measurements In this section, we want to study the MMV non-convex optimization problem, which is minimize X,p subject to AX = B with 0 < In particular, when p = 2, 6.44 becomes On the null space property of the matrix A for joint recovery from multiple measurements via 6.39, in which p = 2 and 0 < 1, we have Theorem Let A be an m N matrix and S {1, 2, N} be an index set. Then for any 0 < 1, all X 0 R N r with the support of the rows of X 0 contained in S can be uniuely recovered from AX = B using 6.44 for p = 2 if and only if for all Z with columns Z k ker A \ {0}, k = 1, 2,, r, in which Z S,2 := j S Zj 1 2 Z S,2 < Z S c, and Z S c,2 := j S c Z j 2 Proof. Assume that 6.45 holds for all Z with columns Z k ker A \ {0} and X R N r with the support of the rows of of X contained in S is an solution to AX = B. Then for Z with columns Z k ker A \ {0}, for 0 < 1, X j 2 j S j S Z j 2 + j S X + Z j

93 80 From the assumption, j S Zj 2 < j S Z j c 2, we thus have X j < Z j + X + Z j j S c 2 j S j S But the support of the rows of X are contained in S, hence N j=1 Xj 2 = j S Xj 2 < j S c X + Z j = N j=1 + j S 2 X + Z j, 2 X + Z j and so X R N r is the uniue solution to the minimization problem 6.44 for p = 2 and 0 < 1. For the converse, assume that there is some Z with columns Z k ker A \ {0} such that Z S,2 Z S c,2. We can choose X R N r such that the rows of X restricted on S eual to those of Z, and the remaining rows of X are zeros. Therefore, AX = AX AZ = A X Z 6.49 and X,2 = X S,2 = Z S,2 Z S c,2 = X Z,2, which contradicts with the uniueness of the recovery. Corollary Let A be an m N matrix and S {1, 2, N} be an index set. Then for any p 1 and 0 < 1, all X 0 R N r with the support of the rows of X 0 contained in S can be uniuely recovered from AX = B using 6.44 if and only if for all Z with columns Z k ker A \ {0}, k = 1, 2,, r, Z S,p < Z S c,p Proof. One can just change the 2 -norm to p -norm for p 1 in the proof of Theorem 6.5.1, and that will give us a proof for the claim. Furthermore, we have

94 81 Theorem Let Let A be an m N matrix and S {1, 2, N} be an index set. Then for any 0 < 1, all X 0 R N r with the support of the rows of X 0 contained in S can be uniuely recovered via solving the MMV non-convex optimization problem 6.44 for p = 2 if and only if u S u S c 6.51 for all u ker A \ {0}. Proof. This is a conseuence of the combination of Theorem and Theorem

95 Bibliography [1] Alesker, S., Continuous rotation invariant valuations on convex sets, Annals of Mathematics, , [3] Alesker, S. and J. Bernstein, Range characterization of the cosine transform on higher Grassmannians, Advances in Mathematics,Volume 184, Issue 2, 1 June 2004, Pages [2] Antoulas, A. C., Approximation of large-scale dynamical systems, Series: Advances in Design and Control No. 6, Society for Industrial and Applied Mathematic, [4] Bai, Z. D., Circular law, Ann. Probab. Volume 25, Number , [5] Baraniuk, R., M. Davenport, and R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices, Constructive Approximation, 2008, [8] Berg, E. and M. P. Friedlander, Joint-sparse recovery from multiple measurements, arxiv: v1, [7] Bloomfield, P., W. L. Steiger, Least absolute deviations: theory, applications, and algorithms, Volume 6 of Progress in probability and statistics, Birkhäuser, [9] Buldygin, V. V., Sub-Gaussian random variables, Ukrainian Mathematical Journal, Volume 32, Number 6, November, 1980, [10] Buldygin, V. V. and I. U. V. Kozachenko, Metric characterization of random variables and random processes, Translations of mathematical monographs, Vol. 188, Published by AMS Bookstore,

96 83 [6] Candes, E. and T. Tao, Decoding by linear programming, IEEE Transactions on Information Theory, Vol. 51, No. 12. December 2005, pp [11] Chartrand, R. and V. Staneva, Restricted isometry properties and nonconvex compressive sensing, Inverse Problems, Volume 24, Number 3, 2008, doi: / /24/3/ [12] Chen, J. and X. Huo, Theoretical results on sparse represenations of multiplemeasurement vectors, IEEE Transactions on Signal Processing, 2006, : [13] Cohen, A. and W. Dahmen, R. DeVore, Compressed sensing and best k-term approximation, J. Amer. Math. Soc , [14] Cotter, S. F. and B. D. Rao, Sparse channel estimation via matching pursuit with application to eualization, IEEE Transactions on Communications, 503, March 2002, [15] Cotter, S. F., B. D. Rao, K. Engang, and K. Kreutz-Delgado, Sparse solutions to linear inverse problems with multiple measurement vectors, IEEE Transactions on Signal Processing, 53: , July [16] Donoho, D. L. and Elad, M., Optimally sparse representation in general nonorthogonal dictionaries via l1 minimization, Proceedings of the National Academy of Sciences, 2003, doi: /pnas [17] Edelman, A., Eigenvalues and condition numbers of random matrices, SIAM J. Matrix Anal. Appl , [18] Fisher, A., C. Dickson, and W. Boynge, The mathematical theory of probabilities and its application to freuency curves and statistical methods, New York, Macmillan Co., 1922.

97 84 [19] Foucart, S. and R. Gribonval, Real vs. complex null space properties for sparse vector recovery, Submitted, [20] Foucart, S. and M. J. Lai, Sparsest Solutions of Underdetermined Linear Systems via l minimization for 0 < 1, Applied and Computational Harmonic Analysis, Volume 26, Issue 3, May 2009, Pages [21] Foucart, S. and M. J. Lai, Sparse Recovery with Pre-Gaussian Random Matrices, accepted for publication by Stud. Math., [22] Gorodnitskya, I. F., J. S. Georgeb and B. D. Raoa, Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm, Electroencephalography and Clinical Neurophysiology Volume 95, Issue 4, October 1995, Pages [23] Garling, D. J. H., Ineualities : a journey into linear analysis, Cambridge University Press, [24] Gribonval, R. and M. Nielsen, Highly sparse representations from dictionaries are uniue and independent of the sparseness measure, Appl. Comput. Harmon. Anal [25] Golub, G. H. and C. F. V. Loan, Matrix computations Johns Hopkins Studies in Mathematical Sciences, The Johns Hopkins University Press, [26] Geman, S., A limit theorem for the norm of random matrices, Ann. Probab , [27] Girko, V. L., "Circular Law." Theory Probab. Appl. 29, , [28] Gribonval, R. and M. Nielsen, Sparse decompositions in unions of bases, IEEE Trans. Inform. Theory

98 85 [29] Gribonval, R. and M. Nielsen, Highly sparse representations from dictionaries are uniue and independent of the sparseness measurestar, Applied and Computational Harmonic Analysis Volume 22, Issue 3, May 2007, Pages [30] Lai, M. J., On sparse solution of underdetermined linear systems, Journal of Concrete and Applicable Mathematics, 82010, [31] Lai, M. J. and Y. Liu, The Null Space Property for Sparse Recovery from Multiple Measurement Vectors, submitted to Applied and Computational Harmonic Analysis, [32] Lai, M. J. and Y. Liu, The Probabilistic Estimates on the Largest and Smallest - Singular Values of Pre-Gaussian Random Matrices, submitted to Advances in Mathematics, [33] Lai, M. J. and J. Wang, An unconstrained l -minimization for sparse solution of under determined linear systems, submitted, [34] Latala, R., Some estimates of norms of random matrices, Proc. Amer. Math. Soc , [35] Li, X. and C. P. Chen, Ineualities for the Gamma Function, vol. 8, iss. 1, Journal of Ineualities in Pure and Applied Mathematics, [36] Li, W. V., On the lower tail of Gaussian measures on $l_p$, Progress in Probability, Vol. 30, , [37] Lonke, Y., Derivatives of the L p -cosine transform, Advances in Mathematics, Volume 176, Issue 2, 25 June 2003, Pages [38] Malioutov, D., M. Cetin, and A. S. Willsky, A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Transactions on Signal Processing, 538: , August 2005.

99 86 [39] Mishali, M. and Y. C. Eldar, The ReMBo Algorithm: Accelerated Recovery of Jointly Sparse Vectors, 16th European Signal Processing Conference EUSIPCO 2008, Lausanne, Switzerland, August 25-29, [40] Mishali M. and Y. C. Eldar, Reduce and boost: Recovering arbitrary sets of jointly sparse vectors. IEEE Transactions on Signal Processing, 5610: , October [41] Pisier, G., The volume of convex bodies and Banach space geometry, Cambridge University Press, [42] Robbins, H. A., Remark of Stirling s Formula, Amer. Math. Monthly 62, 26-29, [43] Rudelson, M. and R. Vershynin, The Littlewood Offord problem and invertibility of random matrices, Advances in Mathematics, Volume 218, Issue 2, 1 June 2008, Pages [44] Rudelson, M. and R. Vershynin, The least singular value of a random suare matrix is on 1/2, Comptes rendus-mathématiue, Volume 346, Issues 15-16, August 2008, Pages [45] Rudelson, M. and R. Vershynin, Smallest singular value of a random rectangular matrix, Communications on Pure and Applied Mathematics, 2009 Communications on Pure and Applied Mathematics, Volume 62 Issue 12, Pages , [46] Rudelson, M. and R. Vershynin, Non-asymptotic theory of random matrices: extreme singular values, Proceedings of the International Congress of Mathematicians, Hyderabad, India, [47] Seginer, Y., The expected norm of random matrices, Combin. Probab. Comput , [48] Shen, J., On the singular values of Gaussian random matrices, Linear Algebra and its Applications, Volume 326, Issues 1-3, 15 March 2001, Pages 1-14.

100 87 [49] Siahaan, H. B., S. Weiland, and A. A. Stoorvogel, Optimal approximation of linear operators: a singular value decomposition approach, In D.S. Gilliam, J. Rosenthal Eds., Proceedings 15th International Symposium on Mathematical Theory of Networks and Systems MNTS 02, Notre Dame, Session FM3. pp , Univ. Notre Dame, [50] Soshnikov, A., A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices, J. Statist. Phys , no. 5-6, [51] Soshnikov, A. and Y. V. Fyodorov, On the largest singular values of random matrices with independent Cauchy entries, J. Math. Phys. 46, ; doi: / pages. [52] Spielman, D. and S. H. Teng, Smoothed analysis of algorithms. Proceedings of the International Congress of Mathematicians, Vol. I Beijing, 2002, , Higher Ed. Press, Beijing, [54] Stroock, D., Probability theory. An analytic view, Cambridge University Press, Cambridge, [53] Szarek, S. J., Spaces with large distance to l n and random matrices, American Journal of Mathematics , no. 6, [55] Tao, T. and V. Vu, On the singularity probability of random Bernoulli matrices, J. Amer. Math. Soc. 20, [56] Tao, T. and V. Vu, Random Matrices: The circular Law, Communications in Contemporary Mathematics, , [57] Tao, T. and V. Vu, On the permanent of random Bernoulli matrices, Advances in Mathematics, Volume 220, Issue 3, 15 February 2009, Pages

101 88 [58] Tao, T. and V. Vu, Smooth Analysis of the Condition Number and the Least Singular Value, Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, 2009, , DOI: / [59] Tao, T. and V. Vu, Random matrices: the distribution of smallest singular values, Geometric And Functional Analysis, Volume 20, Number 1 / June, 2010, [60] Tao, T., V. Vu, and M. Krishnapur, Random matrices: Universality of ESDs and the circular law, preprint arxiv: [math.pr], to appear in the Annals of Probability, [61] Tropp, J. A., Algorithms for simultaneous sparse approximation: Part II: Convex relaxation. Signal Processing, 86: , [62] Wigner, E., "On the Distribution of the Roots of Certain Symmetric Matrices." Ann. of Math. 67, , [63] Wojtaszczyk, P., Banach Spaces For Analysts, Cambridge University Press, [64] Ye, J. C., S. Tak, and Y. Han, H. W. Park, Projection reconstruction MR imaging using FOCUSS, Magn Reson Med Apr;574: [65] Yin, Y. Q., Z. D. Bai, and P. R. Krishnaiah, On the limit of the largest eigenvalue of the large-dimensional sample covariance matrix, Probab. Theory Related Fields ,

102 Appendix A Examples on the Duality Example A.0.4. A = 1 0 1, m = 2, n = 3, and p Let y = y y 1 1, then A T y = y 2 y 2. Therefore, y 1 + y 2 A T y = y 1 y 2 y 1 + y 2 A.1 y. and euality in A.1 holds when y 1 = y 2. So we have s 2 A T = inf V R 2,dimV =1 sup y V, y =1 = inf y R 2, y =1 A T y A T y A.2 On the other hand, let x = dim V = 2, there is a vector = 1. x 1 x 2 0 x 1 x 2 x 3, then Ax = x 1 + x 3. For any V R 3 with x 2 + x 3 in V whose third coordinate is zero and x 1 89 x 2 p =

103 90 1. Therefore x 1 A V p A x 2 = x 1 = 1. A.3 x 2 p 0 p 1 0 Choose particular V 0 := span 0, 1, then A V 0 p = 1. So 0 0 s 1 2 A = inf V R 3,dimV =2 sup x V, x 1 =1 Ax 1 = inf V R 3,dimV =2 A V p A.4 = 1. Thus, we conclude that for any p 1, s p minm,n A = s minm,n A T, A.5 where = 1, in this example. p Example A.0.5. A = 1 1 1, m = 2, n = 3, and p = Let y = y y 1 1, then A T y = y 2 + y 2 y 2. Therefore y 1 2y 2 s 2 A T = inf V R 2,dimV =1 sup y V, y =1 = inf y R 2, y =1 A T y A T y A.6 Now discuss it in two cases: 1. If y 1 = 1 and y 2 1, then = inf y1,y 2 R 2,max y 1, y 2 =1 max y 1, y 1 + y 2, y 1 2y 2. max y 1, y 1 + y 2, y 1 2y 2 y 1 = 1. A.7

104 91 2. If y 2 = 1 and y 1 1, then max y 1, y 1 + y 2, y 1 2y 2 y 1 2y 2 y 1 2 y 2 = y A.8 Thus max y 1, y 1 + y 2, y 1 2y 2 1 for all y 1, y 2 R 2 with max y 1, y 2 = 1. Hence inf max y 1, y 1 + y 2, y 1 2y 2 1. y 1,y 2 R 2,max y 1, y 2 =1 A.9 Particularly at y 1, y 2 = 1, 0, we have [max y 1, y 1 + y 2, y 1 2y 2 ] y1 =1, y 2 =0 = 1. A.10 Thus in other words, s 2 inf y 1,y 2 R 2,max y 1, y 2 =1 A T = 1. On the other hand, let x = x 1 x 2 x 3 max y 1, y 1 + y 2, y 1 2y 2 = 1,, then Ax = x 1 + x 2 + x 3. Therefore, x 2 2x 3 s 1 2 A = inf V R 3,dimV =2 sup x V, x 1 =1 Ax 1 = inf V R 3,dimV =2 sup x V, x1 + x 2 + x 3 =1 x 1 + x 2 + x 3 + x 2 2x 3 A.11 = inf V R 3,dimV =2 sup x V \{0} x 1 +x 2 +x 3 + x 2 2x 3 x 1 + x 2 + x 3. x Let F V := sup 1 +x 2 +x 3 + x 2 2x 3 x V \{0} x 1 + x 2 + x 3, then s 1 2 A = inf V R 3,dimV =2 F V. For partic- ular V 0 = x 1 x 2 x 3 : x 2 = 2x 3, we know x 1 + x 2 + x 3 + x 2 2x 3 sup x V 0 \{0} x 1 + x 2 + x 3 x 1 + x 2 + x 3 = sup x V 0 \{0} x 1 + x 2 + x 3 = 1, A.12 and it follows that s 1 2 A 1. For generic V, that is actually a plane passing through the origin in R 3, we can assume V = x 1 x 2 x 3 R3 : ax 1 + bx 2 + cx 3 = 0 A.13

105 92 where a, b and c are not all zeroes. Now we discuss A.11 in three cases: 1. If c 0, then x 3 = ax c 1 + bx c 2 = αx 1 + βx 2 for α := a and β := b. Thus we have c c 1+αx 1 +1+βx βx 2 2αx 1 x 1 + x 2 + αx 1 +βx 2 [ ] 1+αx1 +1+βx βx 2 2αx 1 x 1 + x 2 + αx 1 +βx 2 x 1 =1, x 2 =0 F V = sup x1,x 2 R 2 \{0,0} = 1+α + 2α 1+ α. A.14 It is not hard to see that 1+α + 2α 1+ α c 0. 1 for all α R. So F V 1 for all V in which 2. If b 0, then x 2 = a b x 1 + c b x 3 = αx 1 + γx 3 for α := a b and γ := c b. Therefore, 1+αx 1 +1+γx 3 + αx 1 +γ 2x 3 x 1 + αx 1 +γx 3 + x 3 [ ] 1+αx1 +1+γx 3 + αx 1 +γ 2x 3 x 1 + αx 1 +γx 3 + x 3 x 1 =0, x 3 =1 F V = sup x1,x 3 R 2 \{0,0} = 1+γ + γ 2 1+ γ. A.15 It is not hard to see that 1+γ + γ 2 1+ γ b 0. 1 for all γ R. So F V 1 for all V in which 3. If a 0, then x 1 = b a x 2 + c a x 3 = βx 2 + γx 3 for β := b a and γ := c a. Therefore, 1+βx 2 +1+γx 3 + x 2 2x 3 βx 2 +γx 3 + x 2 + x 3 [ ] 1+βx2 +1+γx 3 + x 2 2x 3 βx 2 +γx 3 + x 2 + x 3 x 2 =0, x 3 =1 F V = sup x2,x 3 R 2 \{0,0} = 1+γ +2 γ +1. A.16 It is not hard to see that 1+γ γ a 0. 1 for all γ R. So F V 1 for all V in which So we have shown that x 1 + x 2 + x 3 + x 2 2x 3 sup x V \{0} x 1 + x 2 + x 3 1 A.17 for all V R 3 with dim V = 2. Hence s 1 2 A = 1 in A.11.

106 93 Finally, we conclude that s 1 minm,n A = s minm,n A T A.18 in this example. Remark A.0.6. For p = 2, using sigular value decomposition, we get s 2 minm,n A = s 2 minm,n A T = 4 2 for this example. Let us give another example. Example A.0.7. A = 1 1 0, m = 2, n = 3, and p = Let y = y y 1 1, then A T y = y 1 + y 2 y 2. Therefore 0 s 2 A T = inf y1,y 2 R 2,max y 1, y 2 =1 max y 1, y 1 + y 2, 0 = inf y1,y 2 R 2,max y 1, y 2 =1 max y 1, y 1 + y 2. A.19 For any y 1, y 2 R 2 with max y 1, y 2 = 1, y 1 + y 2 is the distance between the origin and the y 1 -intercept of the line passing through y 1, y 2 with slope 1 and obviously, y 1 is the horizontal y 1 -directional distance between the origin and y 1, y 2, and see Figure A.1. Minimizing the maximum of these two distances, we can see from Figure A.1 that max y 1, y 1 + y 2 achieves its minimum at y 1, y 2 = 1, 1 on the boundary of the 2 suare, {y 1, y 2 R 2 : max y 1, y 2 = 1}. Thus we have s 2 A T = [max y 1, y 1 + y 2 ] y1 = 1 2, y 2=1 = 1 2. A.20

107 94 1, Figure A.1: Minimization of the maximum of y 1 and y 1 + y 2 On the other hand, let x = x 1 x 2 x 3, then Ax = x 1 + x 2 x 2. Therefore, s 1 2 A = inf V R 3,dimV =2 sup x V, x 1 =1 Ax 1 = inf V R 3,dimV =2 sup x V, x1 + x 2 + x 3 =1 x 1 + x 2 + x 2 A.21 = inf V R 3,dimV =2 sup x V \{0} x 1 +x 2 + x 2 x 1 + x 2 + x 3. x Let F V := sup 1 +x 2 + x 2 x V \{0} x 1 + x 2 + x 3, then s1 2 A = inf V R 3,dimV =2 f V. For generic V R 3 with dim V = 2, we assume V = x 1 x 2 x 3 R3 : ax 1 + bx 2 + cx 3 = 0 where a, b and c are not all zeroes. Now we discuss A.21 in three cases: A.22

108 95 1. If c 0, then x 3 = ax c 1 + bx c 2 = αx 1 + βx 2 for α := a and β := b. Thus when α and β c c are not both zeroes, we have F V = sup x1,x 2 R 2 \{0,0} [ ] x 1 +x 2 + x 2 x 1 + x 2 + αx 1 +βx 2 = β α + α α + β = β α + α + β α + α 2 α + β x 1 +x 2 + x 2 x 1 + x 2 + αx 1 +βx 2 x 1 =β, x 2 = α A and when α and β are both zeroes, obviously F V = x 1 + x 2 + x 2 sup x 1,x 2 R 2 \{0,0} x 1 + x 2 [ ] x1 + x 2 + x 2 x 1 + x 2 x 2 =0 = 1. A.24 So F V 1 2 for all V in which c If b 0, then x 2 = ax b 1 + cx b 3 = αx 1 + γx 3 for α := a and γ := c, and then F V can b b be written as F V = By the triangle ineuality, sup x 1,x 3 R 2 \{0,0} 1 + α x 1 + γx 3 + αx 1 + γx 3. A.25 x 1 + αx 1 + γx 3 + x α x 1 + γx 3 + αx 1 + γx x α x 1 + γx 3 + αx 1 + γx 3, A.26 and therefore, F V sup x1,x 3 R 2 \{0,0} x αx 1 +γx 3 + αx 1 +γx 3 2 x 1 + αx 1 +γx 3 + x 3. A.27 When x 1 = 1 + γ and x 3 = 1 + α, using the triangle ineuality, 1 + α x 1 + γx α x 1 γ x 3 = 1 + α 1 + γ γ 1 + α A.28 = x 3.

109 96 Finally, from A.27 we have [ ] x α x 1 + γx 3 + αx 1 + γx 3 F V 1 2 x 1 + αx 1 + γx 3 + x 3 x 1 =1+ γ, x 3 = 1+α 2 A.29 for all V in which b If a 0, then x 1 = b a x 2 + c a x 3 = βx 2 + γx 3 for β := b a and γ := c a. Therefore, F V = sup x1,x 3 R 2 \{0,0} ] [ 1+βx2 +γx 3 + x 2 βx 2 +γx 3 + x 2 + x 3 = 1+β +1 β +1 = 1+β β +1 2 β +1 1+βx 2 +γx 3 + x 2 βx 2 +γx 3 + x 2 + x 3 x 3 =0 A for all V in which a 0. So we have shown that sup x V \{0} x 1 + x 2 + x 2 x 1 + x 2 + x for all V R 3 with dim V = 2. However, for particular V 0 := x 1 x 2 x 3 A.31 R3 : x 1 = x 2, A.32 we know sup x V 0 \{0} x 1 + x 2 + x 2 x 1 + x 2 + x 3 = 1 2, A.33 and thus F V = sup x V \{0} x 1 +x 2 + x 2 x 1 + x 2 + x 3 attains its infimum at V 0. Hence s 1 2 A = 1 2. Finally, we conclude that holds for this example. Lastly, let us also see an example of 2 by 4 matrix. s 1 minm,n A = s minm,n A T A.34

110 97 Remark A.0.8. For p = 2, using sigular value decomposition, we get s 2 minm,n A = s 2 minm,n A T = 5 1 for this example. 2 Example A.0.9. A = , m = 2, n = 4, and p = Let y = y 1, then A T y 2 y =. Therefore y 2 y 1 2y 2 y 1 y 2 y 1 s 2 A T = inf V R 2,dimV =1 sup y V, y =1 = inf y R 2, y =1 A T y A T y = inf y1,y 2 R 2,max y 1, y 2 =1 max y 1, y 1 + y 2, y 1 2y 2, y 1 y 2 A.35 By Example A.0.5, inf y1,y 2 R 2,max y 1, y 2 =1 max y 1, y 1 + y 2, y 1 2y 2 inf max y 1, y 1 + y 2, y 1 2y 2 = 1 y 1,y 2 R 2,max y 1, y 2 =1 A.36 Thus inf max y 1, y 1 + y 2, y 1 2y 2, y 1 y 2 1. y 1,y 2 R 2,max y 1, y 2 =1 A.37 Particularly at y 1, y 2 = 1, 0, we have [max y 1, y 1 + y 2, y 1 2y 2, y 1 y 2 ] y1 =1, y 2 =0 = 1. A.38 Thus inf max y 1, y 1 + y 2, y 1 2y 2, y 1 y 2 = 1, y 1,y 2 R 2,max y 1, y 2 =1 in other words, s 2 A T = 1.

111 98 On the other hand, let x = x 1 x 2 x 3, then Ax = x 1 + x 2 + x 3 + x 4 x 2 2x 3 x 4. Therefore, x 4 s 1 2 A = inf V R 3,dimV =2 sup x V, x 1 =1 Ax 1 = inf V R 3,dimV =2 sup x V, x1 + x 2 + x 3 + x 4 =1 x 1 + x 2 + x 3 + x 4 + x 2 2x 3 x 4 = inf V R 3,dimV =2 sup x V \{0} x 1 +x 2 +x 3 +x 4 + x 2 2x 3 x 4 x 1 + x 2 + x 3 + x 4. A.39 x Let F V := sup 1 +x 2 +x 3 +x 4 + x 2 2x 3 x 4 x V \{0} x 1 + x 2 + x 3 + x 4, then s 1 2 A = inf V R 3,dimV =2 F V. For particular V 0 = x 1 x 2 x 3 x 4 : x 2 = 2x 3 + x 4, we know x 1 + x 2 + x 3 + x 4 + x 2 2x 3 x 4 sup x V 0 \{0} x 1 + x 2 + x 3 + x 4 x 1 + x 2 + x 3 + x 4 = sup x V 0 \{0} x 1 + x 2 + x 3 + x 4 = 1, A.40 and it follows that s 1 2 A 1. For generic V, that is actually a 3-dimensioanl subspace in R 4, we can assume V = x 1 x 2 x 3 x 4 R 3 : ax 1 + bx 2 + cx 3 + dx 4 = 0 where a, b, c and d are not all zeroes. Now we discuss A.39 in four cases: A.41

112 99 1. If a 0, then x 1 = b a x 2 + c a x 3 + d a x 4 = βx 2 + γx 3 + δx 4 for β := b a, γ := c a and δ := d a. Therefore, 1+βx 2 +1+γx 3 +1+δx 4 + x 2 2x 3 x 4 βx 2 +γx 3 +δx 4 + x 2 + x 3 + x 4 [ ] 1+βx2 +1+γx 3 +1+δx 4 + x 2 2x 3 x 4 βx 2 +γx 3 +δx 4 + x 2 + x 3 + x 4 x 2 =0, x 3 =1 x 4 =0 F V = sup x2,x 3,x 4 R 3 \{0,0,0} = 1+γ +2 γ +1. It is not hard to see that 1+γ γ a 0. A.42 1 for all γ R. So F V 1 for all V in which 2. If b 0, then x 2 = a b x 1 + c b x 3 + d b x 4 = αx 1 + γx 3 + δx 4 for α := a b, γ := c b and δ := d b. Therefore, 1+αx 1 +1+γx 3 +1+δx 4 + αx 1 +γ 2x 3 x 4 x 1 + αx 1 +γx 3 + x 3 + x 4 [ ] 1+αx1 +1+γx 3 +1+δx 4 + αx 1 +γ 2x 3 x 4 x 1 + αx 1 +γx 3 + x 3 + x 4 x 1 =0, x 3 =1 x 4 =0 F V = sup x1,x 3,x 4 R 3 \{0,0,0} = 1+γ + γ 2 1+ γ. It is not hard to see that 1+γ + γ 2 1+ γ b 0. A.43 1 for all γ R. So F V 1 for all V in which 3. If c 0, then x 3 = a c x 1 + b c x 2 + d c x 4 = αx 1 + βx 2 + δx 4 for α := a c, β := b c and δ := d c. Thus we have 1+αx 1 +1+βx 2 +1+δx βx 2 2αx 1 x 4 x 1 + x 2 + αx 1 +βx 2 + x 4 [ ] 1+αx1 +1+βx 2 +1+δx βx 2 2αx 1 x 4 x 1 + x 2 + αx 1 +βx 2 + x 4 x 1 =1, x 2 =0 x 4 =0 F V = sup x1,x 2,x 4 R 3 \{0,0,0} = 1+α + 2α 1+ α. It is not hard to see that 1+α + 2α 1+ α c 0. A.44 1 for all α R. So F V 1 for all V in which 4. If d 0, then x 4 = a d x 1 + b d x 2 + c d x 3 = αx 1 + βx 2 + γx 3 for α := a d, β := b d and γ := c d. Thus we have 1+αx 1 +1+βx 2 +1+γx βx 2 αx 1 2+γx 3 x 1 + x 2 + x 3 + αx 1 +βx 2 +γx 3 [ ] 1+αx1 +1+βx 2 +1+γx βx 2 αx 1 2+γx 3 x 1 + x 2 + x 3 + αx 1 +βx 2 +γx 3 x 1 =0, x 2 =1 x 3 =0 F V = sup x1,x 2 R 2 \{0,0} = 1+β + 1 β 1+ β. A.45

113 100 It is not hard to see that 1+β + 1 β 1+ β d 0. 1 for all β R. So F V 1 for all V in which So we have shown that x 1 + x 2 + x 3 + x 4 + x 2 2x 3 x 4 sup x V \{0} x 1 + x 2 + x 3 + x 4 1 A.46 for all V R 4 with dim V = 3. Hence s 1 2 A = 1 in A.39. Finally, we conclude that s 1 minm,n A = s minm,n A T A.47 in this example. Remark A For p = 2, using sigular value decomposition, we get s 2 minm,n A = s 2 minm,n A T = 5 5 for this example. Remark A The above are some computable examples for the duality. However, if we change p = 1 to some finite number p > 1 and p 2 for a general rectangular matrix A, to get the exact values of the smallest p-singular value of A and the smallest -singular value of A T can be computationally hard, due to the potential difficulty in finding the exact subspace at which the infimum is achieved, but one may show they turn out to be eual in an indirect way.

114 Appendix B Non-convex Geometry Under Linear Map In this appendix, we would like to present some work on non-convex geometry under linear map, as it is closely related to the l -null space property of matrices for 0 < < 1. First of all, let us introduce the generalized -perimeter of a polygon. Definition B perimeter. The -perimeter of a polygon P with edges e 1, e 2,,e m, denoted by Perimeter P, is defined as Perimeter P := m lengthe k. k=1 B.1 map. We have the following lemma for the generalized -perimeter of polygon under linear Lemma B perimeter of polygon under linear map. For any 2 n matrix M with columns c 1, c 1,, c n R 2, we have n Perimeter M[ 1, 1] n = 2 +1 c k 2. k=1 B.2 Proof. By the singular value decomposition, there exist a 2 2 orthogonal matrix U, an n n orthogonal matrix V and a 2 n diagonal matrix Λ with non-negative entries λ 1 and λ 2 on its main diagonal, such that M = UΛV T. Let M := ΛV T, and let s first show that if the claim holds for M then it is true for M = UM. Now suppose n Perimeter M [ 1, 1] n = 2 +1 c k 2 k=1 B.3 101

115 102 where c 1, c 2,, c n are the columns of M. Since an orthogonal transformation preserves the geometric features including the length of the edges of a polygon, then Perimeter M[ 1, 1] n = Perimeter M [ 1, 1] n. B.4 Moreover, we also have c k 2 = Uc k 2 = c k 2. So B.2 holds for t M = ΛV T if it is true for M. Next, we ll show that B.3 is true for M = ΛV T. For the orthogonal matrix V T =: V i,j n n, we know that V T is a composition of permutations, reflections, and or rotations, hence V T [ 1, 1] n is also an n-cube obtained by a rigid body motion of the n-cube [ 1, 1] n. Let V T = V 1, V 2,, V n in which V i is the i-th column of V for i = 1, 2,, n. To get the -perimeter of a polygon, we just need to find its vertices. The set of vertices of the n-cube [ 1, 1] n is {ɛ 1, ɛ 1,, ɛ n R n : ɛ j = ±1, j = 1, 2,, n}, B.5 and then that of the n-cube V T [ 1, 1] n is { } n ɛ j V j T R n : ɛ j = ±1, j = 1, 2,, n. B.6 j=1 Therefore, ΛV T [ 1, 1] n is a polygon with vertices { } n n λ 1 ɛ j V 1,j, λ 2 ɛ j V 2,j R 2 : ɛ j = ±1, j = 1, 2,, n. B.7 j=1 j=1 Summing the -length of the edges between the adjacent vertices, in which only one component of the ɛ s differs, we have Perimeter ΛV T [ 1, 1] n = 2 n j=1 = 2 +1 n j=1 2λ1 V 1,j 2 + 2λ 2 V 2,j 2 λ1 V 1,j 2 + λ 2 V 2,j 2. B.8 On the other hand, if c 1, c 2,, c n are the columns of ΛV T, then n c k 2 = λ 1V 1,j n = λ 1 V 1,j 2 + λ 2 V 2,j 2. B.9 k=1 λ 2 V 2,j j=1 Thus we have finished the proof. 2

116 103 Remark B Lemma B.0.13 also holds for > 1. In the case when n = 1, M[ 1, 1] n is a line segment, but we view it as a digon, which is degenerate in R 2, and then Perimeter M[ 1, 1] n is 2 times the -th power of the length of the line segment. This lemma and Theorem imply the following Corollary B Given 0 < 1 and B R 2 N, x, yb S x, yb S c B.10 for all x, y R 2 \ {0} and some S {1, 2,, N} with S = s. Then Perimeter B S [ 1, 1] s Perimeter BS c [ 1, 1] N s. B.11 Remark B In [19], it was shown that B S [ 1, 1] s B S c [ 1, 1] N s B.12 holds if x, yb S 1 x, yb S c 1 B.13 for all x, y R 2 \ {0, 0}, for which dual spaces and the Hahn-Banach theorem were used in the proof. In fact, more generally, B.12 holds if x, yb S x, yb S c B.14 for some 1 and all x, y R 2 \ {0, 0}, because of the convexity of the function t t when 1. Then the containing relation B.12 on convex polygons allows one to compare their perimeters. However, this method becomes infeasible for the case of 0 < < 1 because of the following reasons. Firstly, B.12 may not hold if B.14 holds for some 0 < < 1 and all x, y R 2 \ {0, 0}. An example is = 1 2, B = B.15

117 104 A B 16 E F 100 D Figure B.1: Parallelograms and S = {1}. In this example, x, yb S 1 2 < x, yb S c 1 2 B.16 for all x, y R 2 \{0, 0}, that will be verified later, but the point B S {1} = {1601, 1600} in B S [ 1, 1] is not contained in the parallelogram B S c [ 1, 1] 2. Secondly, for a given 0 < < 1, 6.21 may not hold even if B.12 holds, due to the non-convexity of the function t t when 0 < < 1. An example for this is the following. Let B be the 2 4 matrix such that B S c [ 1, 1] 2 is parallelogram ABCD and B S [ 1, 1] 2 is parallelogram AECF with the lengths of the edges prescribed in Figure B.1, and = 1 2. The 1-perimeter of parallelogram AECF is 30 and the 1 -perimeter of parallelogram ABCD 2 2 is , so the -perimeter of parallelogram AECF is larger than the -perimeter of parallelogram ABCD for = 1, although parallelogram AECF is contained in parallelogram 2 ABCD. Now let us verify the claim that if B = , B.17 then x, yb S 1 2 for S = {1} and all x, y R 2 \ {0, 0}. < x, yb S c 1 2 B.18 It suffices to show 1601x y 1 2 < 500x + 720y x + 900y 1 2 B.19

118 fθ Θ Figure B.2: The graph of difference function f θ all x, y R 2 \ {0, 0}, that can be written as 1601 cos θ sin θ 1 2 < 500 cos θ sin θ cos θ sin θ 1 2 B.20 for all θ [0, 2π]. Indeed, B.20 holds for all θ [0, 2π], because graphing the difference function f θ := 500 cos θ sin θ cos θ sin θ 1601 cos θ sin θ B.21 by using Mathematica, we can see from Figure B.2 that f θ is always positive for θ [0, 2π]. So B.18 holds for S = {1} and all x, y R 2 \ {0, 0}.

119 Appendix C Some Numerical Experiments on p-sigular Value for p > 1 and -singular Value for 0 < 1 of Random Matrices In this appendix, we show the results from some numerical experiments on the p-singular value for p > 1 and -singular value for 0 < 1 of random matrices. For p = 2, we plot the largest 2-singular value of Gaussian random matrices of size n n, where n runs from 1 through 100. See Figure C.1. This graph shows that the 2-singular value is O n. For p = 1, in the first numerical experiment we plot the largest 1-singular value of Gaussian random of size n n, where n runs from 1 through 100. See Figure C.2. The graph shows that the largest 1-singular value is O n, as estimated in Theorem , Theorem , and Theorem as well. In the second numerical experiment for p = 1, we plot the largest 1-singular value of Gaussian random matrices of size n n, where n runs from 1 through 200. See Figure C.3. Figure C.1: Largest 2-singular value of Gaussian random matrices 106

120 107 Figure C.2: Largest 1-singular value of Gaussian random matrices: Experiment 1 Figure C.3: Largest 1-singular value of Gaussian random matrices: Experiment 2 The graph shows that the largest 1-singular value is O n, as estimated in Theorem , Theorem , and Theorem as well. In the third experiment for p = 1, we plot the largest 1-singular value of Gaussian random matrices of size n n, where n runs from 1 through 400. See Figure C.4. The graph shows that the largest 1-singular value is O n, as estimated in Theorem , Theorem , and Theorem as well. For p =, we plot the largest -singular value of Gaussian random matrices of size n n, where n runs from 1 through 500. See Figure C.5. This graph shows that the -singular value is O n, as estimated in Theorem and Theorem

121 108 Figure C.4: Largest 1-singular value of Gaussian random matrices: Experiment 3 Figure C.5: Largest -singular value of Gaussian random matrices For p = 1, we plot the largest 1 -singular value of Gaussian random matrices of size n n, 3 3 where n runs from 1 through 500. See Figure C.6. This graph shows that the -singular value is approximately O n 3, as estimated in Theorem and Theorem For p = 1, we plot the largest 1 -singular value of Gaussian random matrices of size n n, 4 4 where n runs from 1 through 300. See Figure C.7. This graph shows that the 1 -singular value 4 is approximately O n 4, as estimated in Theorem and Theorem For rectangular matrices, we also plot the largest 1 -singular value of Gaussian random 4 matrices of size m n, where m and n run from 1 through 100. See Figure C.8. This graph

122 109 Figure C.6: Largest 1 -singular value of Gaussian random matrices 3 Figure C.7: Largest 1 -singular value of Gaussian random matrices 4 shows that the 1 4 -singular value is approximately O m4, as estimated in Theorem and Theorem

123 Figure C.8: Largest 1 -singular value of rectangular Gaussian random matrices 4 110

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA) The circular law Lewis Memorial Lecture / DIMACS minicourse March 19, 2008 Terence Tao (UCLA) 1 Eigenvalue distributions Let M = (a ij ) 1 i n;1 j n be a square matrix. Then one has n (generalised) eigenvalues