Sparsity Lower Bounds for Dimensionality Reducing Maps

Size: px
Start display at page:

Download "Sparsity Lower Bounds for Dimensionality Reducing Maps"

Transcription

1 Sparsity Lower Bounds for Dimensionality Reducing Maps Jelani Nelson Huy L. Nguy ên November 5, 01 Abstract We give near-tight lower bounds for the sparsity required in several dimensionality reducing linear maps. First, consider the Johnson-Lindenstrauss (JL) lemma which states that for any set of n vectors in R d there is a matrix A R m d with m = O(ε log n) such that mapping by A preserves pairwise Euclidean distances of these n vectors up to a 1±ε factor. We show that there exists a set of n vectors such that any such matrix A with at most s non-zero entries per column must have s = Ω(ε 1 log n/ log(1/ε)) as long as m < O(n/ log(1/ε)). This bound improves the lower bound of Ω(min{ε, ε 1 log m d}) by [Dasgupta-Kumar-Sarlós, STOC 010], which only held against the stronger property of distributional JL, and only against a certain restricted class of distributions. Meanwhile our lower bound is against the JL lemma itself, with no restrictions. Our lower bound matches the sparse Johnson-Lindenstrauss upper bound of [Kane- Nelson, SODA 01] up to an O(log(1/ε)) factor. Next, we show that any m n matrix with the k-restricted isometry property (RIP) with constant distortion must have at least Ω(k log(n/k)) non-zeroes per column if m = O(k log(n/k)), the optimal number of rows of RIP matrices, and k < n/ polylog n. This improves the previous lower bound of Ω(min{k, n/m}) by [Chandar, 010] and shows that for virtually all k it is impossible to have a sparse RIP matrix with an optimal number of rows. Both lower bounds above also offer a tradeoff between sparsity and the number of rows. Lastly, we show that any oblivious distribution over subspace embedding matrices with 1 non-zero per column and preserving distances in a d dimensional-subspace up to a constant factor must have at least Ω(d ) rows. This matches one of the upper bounds in [Nelson-Nguy ên, 01] and shows the impossibility of obtaining the best of both of constructions in that work, namely 1 non-zero per column and Õ(d) rows. 1 Introduction The last decade has witnessed a burgeoning interest in algorithms for large-scale data. A common feature in many of these works is the exploitation of data sparsity to achieve algorithmic efficiency, for example to have running times proportional to the actual complexity of the data rather than the dimension of the ambient space it lives in. This approach has found applications in compressed sensing [CT05,Don06], dimension reduction [BOR10,DKS10,KN10,KN1,WDL + 09], and numerical linear algebra [CW1, MM1, MP1, NN1]. Given the success of these algorithms, it is important to understand their limitations. Until now, for most of these problems it is not known how far one Institute for Advanced Study. minilek@ias.edu. Supported by NSF CCF and NSF DMS Princeton University. hlnguyen@princeton.edu. Supported in part by NSF CCF and a Gordon Wu fellowship. 1

2 can reduce the running time on sparse inputs. In this work we make a step towards understanding the performance of algorithms for sparse data and show several tight lower bounds. In this work we provide three main contributions. We give near-optimal or optimal sparsity lower bounds for Johnson-Lindenstrauss transforms, matrices satisfying the restricted isometry property for use in compressed sensing, and subspace embeddings used in numerical linear algebra. These three contributions are discussed in Section 1.1, Section 1., and Section 1.3, respectively. 1.1 Johnson-Lindenstrauss The following lemma, due to Johnson and Lindenstrauss [JL84], has been used widely in many areas of computer science to reduce data dimension. Theorem 1 (Johnson-Lindenstrauss (JL) lemma [JL84]). For any 0 < ε < 1/ and any x 1,..., x n in R d, there exists A R m d with m = O(ε log n) such that for all i, j [n] 1, Ax i Ax j = (1 ± ε) x i x j. Typically one uses the lemma in algorithm design by mapping some instance of a high-dimensional computational geometry problem to a lower dimension. The running time to solve the instance then becomes the time needed for the lower-dimensional problem, plus the time to perform the matrixvector multiplications Ax i ; see [Ind01,Vem04] for further discussion. This latter step highlights the importance of having a JL matrix supporting fast matrix-vector multiplication. The original proofs of the JL lemma took A to be a random dense matrix, e.g. with i.i.d. Gaussian, Rademacher, or even subgaussian entries [Ach03, AV06, DG03, FM88, IM98, JL84, Mat08]. The time to compute Ax then becomes O(m x 0 ), where x has x 0 d non-zero entries. A beautiful work of Ailon and Chazelle [AC09] described a construction of a JL matrix A supporting matrix-vector multiplication in time O(d log d + m 3 ), also with m = O(ε log n). This was improved to O(d log d + m +γ ) [AL09] with the same m for any constant γ > 0, or to O(d log d) with m = O(ε log n log 4 d) [AL11, KW11]. Thus if ε log n d one can obtain nearly-linear O(d log d) embedding time with the same target dimension m as the original JL lemma, or one can also obtain nearly-linear time for any setting of ε, n by increasing m slightly by polylog d factors. While the previous paragraph may seem to present the end of the story, in fact note that the nearly-linear O(d log d) embedding time is actually much worse than the original O(m x 0 ) time of dense JL matrices when x 0 is very small, i.e. when x is sparse. Indeed, in several applications we expect x to be sparse. Consider the bag of words model in information retrieval: in for example an spam collaborative filtering system for Yahoo! Mail [WDL + 09], each is treated as a d-dimensional vector where d is the size of the lexicon. The ith entry of the vector is some weighted count of the number of occurrences of word i (frequent words like the should be weighted less heavily). A machine learning algorithm is employed to learn a spam classifier, which involves dot products of vectors with some learned classifier vector, and JL dimensionality reduction is used to speed up the repeated dot products that are computed during training. Note that in this scenario we expect x to be sparse since most s do not contain nearly every word in the lexicon. An even starker scenario is the turnstile streaming model, where the vectors x may receive coordinate-wise updates in a data stream. In this case maintaining Ax in a stream given some update of the form add v to x i requires adding vae i to the compression Ax stored in memory. Since e i = 1, we would not like to spend O(d log d) per streaming update. 1 Here and throughout this paper, [n] denotes the set {1,..., n}.

3 The intuition behind all the works [AC09, AL09, AL11, KW11] to obtain O(d log d) embedding time was as follows. Picking A to be a scaled sampling matrix (where each row has a 1 in a random location) gives the correct expectation for Ax, but the variance may be too high. Indeed, the variance is high exactly when x is sparse; consider the extreme case where x 0 = 1 so that sampling is not even expected to see the non-zero coordinate unless m d. These works then all essentially proceed by randomly preconditioning x to ensure that x is very well-spread (i.e. far from sparse) with high probability, so that sampling works, and thus fundamentally cannot take advantage of input sparsity. One way of obtaining faster matrix-vector multiplication for sparse inputs is to have sparse JL matrices A. Indeed, if A has at most s non-zero entries per column then Ax can be computed in O(s x 0 + m) time. A line of work [Ach03,Mat08,DKS10,BOR10,KN10,KN1] investigated the value s achievable in a JL matrix, culminating in [KN1] showing that it is possible to simultaneously have m = O(ε log n) and s = O(ε 1 log n). Such a sparse JL transform thus speeds up embeddings by a factor of roughly 1/ε without increasing the target dimension. Our Contribution I: We show that for any n and any ε = Ω(1/ n), there exists a set of n vectors x 1,..., x n R n such that any JL matrix for this set of vectors with m = O(ε log n) rows requires column sparsity s = Ω(ε 1 log n/ log(1/ε)) as long as m = O(n/ log(1/ε)). Thus the sparse JL transforms of [KN1] achieve optimal sparsity up to an O(log(1/ε)) factor. In fact this lower bound on s continues to hold even if m = O(ε c log n) for any positive constant c. Note that if m = n one can simply take A to be the identity matrix which achieves s = 1, and thus the restriction m = O(n/ log(1/ε)) is nearly optimal. Also note that we can assume ε = Ω(1/ n) since otherwise m = Ω(n) is required in any JL matrix [Alo09], and thus the m = O(n/ log(1/ε)) restriction is no worse than requiring m = O(n/ log n). Furthermore if all the entries of A are required to be equal in magnitude, our lower bound holds as long as m n/10. Before our work, only a restricted lower bound of s = Ω(min{1/ε, ε 1 log m d}) had been shown [DKS10]. In fact this lower bound only applied to the distributional JL problem, a much stronger guarantee where one wants to design a distribution over m d matrices such that any fixed vector x has Ax = (1 ± ε) x with probability 1 δ over the choice of A. Indeed any distributional JL construction yields the JL lemma by setting δ = 1/n and union bounding over all the x i x j difference vectors. Thus, aside from the weaker lower bound on s, [DKS10] only provided a lower bound against this stronger guarantee, and furthermore only for a certain restricted class of distributions that made certain independence assumptions amongst matrix entries, and also assumed certain bounds on the sum of fourth moments of matrix entries in each row. It was shown by Alon [Alo09] that m = Ω(ε log n/ log(1/ε)) is required for the set of points {0, e 1,..., e n } and d = n as long as 1/ε < n/. Here e i is the ith standard basis vector. Simple manipulations show that, when appropriately scaled, any JL matrix A for this set of vectors is O(ε)-incoherent, in the sense that all its columns v 1,..., v n have unit l norm and the dot products v i, v j between pairs of columns are all at most O(ε) in magnitude. We study this exact same hard input to the JL lemma; what we show is that any such matrix A must have column sparsity s = Ω(ε 1 log n/ log(1/ε)). In some sense our lower bound can be viewed as a generalization of the Singleton bound for error-correcting codes in a certain parameter regime. The Singleton bound states that for any set of n codewords with block length t, alphabet size q, and relative distance r, it must be that n q t r+1. If the code has relative distance 1 ε then t r εt, so that if t 1/ε the Singleton bound implies t = Ω(ε 1 log n/ log q). The connection to incoherent matrices (and thus the JL 3

4 lemma), observed in [Alo09], is the following. For any such code {C 1,..., C n }, form a matrix A R m n with m = qt. The rows are partitioned into t chunks each of size q. In the ith column of A, in the jth chunk we put a 1/ t in the row of that chunk corresponding to the symbol (C i ) j, and we put zeroes everywhere else in that column. All columns then have l norm 1, and the code having relative distance 1 ε implies that all pairs of columns have dot products at most ε. The Singleton bound thus implies that any incoherent matrix formed from codes in this way has t = Ω(ε 1 log n/ log q). Note the column sparsity of A is t, and thus this matches our lower bound for q poly(1/ε). Our sparsity lower bound thus recovers this Singleton-like bound, without the requirement that the matrix takes this special structure of being formed from a code in the manner described above. One reason this is perhaps surprising is that incoherent matrices from codes have all nonnegative entries; our lower bound thus implies that the use of negative entries cannot be exploited to obtain sparser incoherent matrices. 1. Compressed sensing and the restricted isometry property Another object of interest are matrices satisfying the restricted isometry property (RIP). Such matrices are widely used in compressed sensing. Definition ( [CT05, CRT06b, Can08]). For any integer k > 0, a matrix A is said to have the k-restricted isometry property with distortion δ k if (1 δ k ) x Ax (1 + δ k) x for all x with x 0 k. The goal of the area of compressed sensing is to take few nonadaptive linear measurements of a vector x R n to allow for later recovery from those measurements. That is to say, if those measurements are organized as the rows of some matrix A R m n, we would like to recover x from Ax. Furthermore, we would like do so with m n so that Ax is a compressed representation of x. Of course if m < n we cannot recover all vectors x R n with any meaningful guarantee, since then A will have a non-trivial kernel, and x, x + y are indistinguishable for y ker(a). Compressed sensing literature has typically focused on the case of x being sparse [CRT06a, Don06], in which case a recovery algorithm could hope to recover x by finding the sparsest x such that A x = Ax. The works [Can08, CRT06b, CT05] show that if A satisfies the k-rip with distortion δ k < 1, and if x is k-sparse, then given Ax there is a polynomial-time solvable linear program to recover x. In fact for any x, not necessarily sparse, the linear program recovers a vector x satisfying x x O(1/ k) inf x z 1, z 0 k known as the l /l 1 guarantee. That is, the recovery error depends on the l 1 norm of the best k-sparse approximation z to x. It is known [BIPW10,GG84,Kaš77] that any matrix A allowing for the l /l 1 guarantee simultaneously for all vectors x, and thus RIP matrices, must have m = Ω(k log(n/k)) rows. For completeness we give a proof of the new stronger lower bound m = Ω(log 1 (1/δ k )(δ 1 k k log(n/k) + δ k k)) in Section 5, though we remark here that current uses of RIP all take δ k = Θ(1). Although the recovery x of x can be found in polynomial time as mentioned above, this polynomial is quite large as the algorithm involves solving a linear program with n variables and m constraints. This downside has led researchers to design alternative measurement and/or recovery schemes which allow for much faster sparse recovery, sometimes even at the cost of obtaining a recovery guarantee weaker than l /l 1 recovery for the sake of algorithmic performance. Many 4

5 of these schemes are iterative, such as CoSaMP [NT09], Expander Matching Pursuit [IR08], and several others [BI09, BIR08, BD08, DTDlS1, Fou11, GK09, NV09, NV10, TG07], and several of their running times depend on the product of the number of iterations and the time required to multiply by A or A (here A denotes the conjugate transpose of A). Several of these algorithms furthermore apply A, A to vectors which are themselves sparse. Thus, recovery time is improved significantly in the case that A is sparse. Previously the only known lower bound for column sparsity s for an RIP matrix with an optimal m = Θ(k log(n/k)) number of rows was s = Ω(min{k, n/m}) [Cha10]. Note that if an RIP construction existed matching the [Cha10] column sparsity lower bound, application to a k-sparse vector would take time O(min{k, nk/m}), which is always o(n) and can be very fast for small k. Furthermore, in several applications of compressed sensing m is very close to n, in which case an Ω(n/m) lower bound on column sparsity does not rule out very sparse RIP matrices. For example, in applications of compressed sensing to magnetic resonance imaging, [LDP07] recommended setting the number of measurements m to be between 5-10% of n to obtain good performance for recovery of brain and angiogram images. We remark that one could also obtain speedup by using structured RIP matrices, such as those obtained by sampling rows of the discrete Fourier matrix [CT06], though such constructions require matrix-vector multiplication time Θ(n log n) independent of input sparsity. Another upside of sparse RIP matrices is that they allow faster algorithms for encoding x Ax. If A has s non-zeroes per column and x receives, for example, turnstile streaming updates, then the compression Ax can be maintained on the fly in O(s) time per update (assuming the non-zero entries of any column of A can be recovered in O(s) time). Our Contribution II: We show as long as k < n/ polylog n, any k-rip matrix with distortion O(1) and m = Θ(k log(n/k)) rows with s non-zero entries per column must have s = Ω(k log(n/k)). That is, RIP matrices with the optimal number of rows must be dense for almost the full range of k up to n. This lower bound strongly rules out any hope for faster recovery and compression algorithms for compressed sensing by using sparse RIP matrices as mentioned above. We note that any sparsity lower bound should fail as k approaches n since the n n identity matrix trivially satisfies k-rip for any k and has column sparsity 1. Thus, our lower bound holds for almost the full range of parameters for k. 1.3 Oblivious Subspace Embeddings The last problem we consider is the oblivious subspace embedding (OSE) problem. Here one aims to design a distribution D over m n matrices A such that for any d-dimensional subspace W R n, P A D ( x W Ax (1 ± ε) x ) > /3. Sarlós showed in [Sar06] that OSE s are useful for approximate least squares regression and low rank approximation, and they have also been shown useful for approximating statistical leverage scores [DMIMW1], an important concept in statistics and machine learning. See [CW1] for an overview of several applications of OSE s. To give more details of how OSE s are typically used, consider the example of solving an overconstrained least-squares regression problem, where one must compute argmin x Sx b for some S R n d. By overconstrained we mean n > d, and really one should imagine n d in what 5

6 follows. There is a closed form solution for the minimizing vector x, which requires computing the Moore-Penrose pseudoinverse of S. The total running time is O(nd ω 1 ), where ω is the exponent of square matrix multiplication. Now suppose we are only interested in finding some x so that S x b (1 + ε) argmin x Sx b. Then it suffices to have a matrix A such that Az = (1 ± O(ε)) z for all z in the subspace spanned by b and the columns of A, in which case we could obtain such an x by solving the new least squares regression problem of computing argmin x AS x Ab. If A has m rows, the new running time is the sum of three terms: (1) the time to compute Ab, () the time to compute AS, and (3) the O(md ω 1 ) time required to solve the new least-squares problem. It turns out it is possible to obtain such an A with m = O(d/ε ) by choosing, for example, a matrix with independent Gaussian entries (see e.g. [Gor88,KM05]), but then computing AS takes time Ω(nd ω 1 ), providing no benefit. The work of Sarlós picked A with special structure so that AS can be computed in time O(nd log n), namely by using the Fast Johnson-Lindenstrauss Transform of [AC09] (see also [Tro11]). Unfortunately the time is O(nd log n) even for sparse matrices S, and several applications require solving numerical linear algebra problems on sparse matrix inputs. For example in the Netflix matrix where rows are users and columns are movies, and S i,j is some rating score, S is very sparse since most users rate only a tiny fraction of all movies [ZWSP08]. If nnz(s) denotes the number of non-zero entries of S, we would like running times closer to O(nnz(S)) than O(nd log n) to multiply A by S. Such a running time would be possible, for example, if A only had s = O(1) non-zero entries per column. In a recent and surprising work, Clarkson and Woodruff [CW1] gave an OSE with m = poly(d/ε) and s = 1, thus providing fast numerical linear algebra algorithms for sparse matrices. For example, the running time for least-squares regression becomes O(nnz(A) + poly(d/ε)). The dependence on d, ε was improved in [NN1] to m = O(d /ε ). The work [NN1] also showed how to obtain m = O(d 1+γ /ε ), s = O(1/ε) for any constant γ > 0 (the constant in the big-oh depends polynomially on 1/γ), or m = (d polylog d)/ε, s = (polylog d)/ε. It is thus natural to ask whether one can obtain the best of both worlds: can there be an OSE with m d/ε and s = 1? Our Contribution III: In this work we show that any OSE such that all matrices in its support have m rows and s = 1 non-zero entries per column must have m = Ω(d ) if n d. Thus for constant ε and large n, the upper bound of [NN1] is optimal. 1.4 Organization In Section we prove our lower bound for the sparsity required in JL matrices. In Section 3 we give our sparsity lower bound for RIP matrices, and in Section 4 we give our lower bound on the number of rows for OSE s having sparsity 1. In Section 5 we give a lower bound involving δ k on the number of rows in an RIP matrix, and in Section 6 we state an open problem. JL Sparsity Lower Bound Define an ε-incoherent matrix A R m n as any matrix whose columns have unit l norm, and such that every pair of columns has dot product at most ε in magnitude. A simple observation 6

7 of [Alo09] is that any JL matrix A for the set of vectors {0, e 1,..., e n } R n, when its columns are scaled by their l norms, must be O(ε)-incoherent. In this section, we consider an ε-incoherent matrix A R m n with at most s non-zero entries per column. We show a lower bound on s in terms of ε, n, m. In particular if m = O(ε log n) is the number of rows guaranteed by the JL lemma, we show that s = Ω(ε 1 log n/ log(1/ε)) as long as m < n/ polylog n. In fact if all the entries in A are either 0 or equal in magnitude, we show that the lower bound even holds up to m < n/10. In Section.1 we give the lower bound on s in the case that all entries in A are in {0, 1/ s, 1/ s}. In Section. we give our lower bound without making any assumption on the magnitudes of entries in A. Before proceeding further, we prove a couple lemmas used throughout this section, and also later in this paper. Throughout this section A is always an ε-incoherent matrix. Lemma 3. For any x ε, A cannot have any row with at least 5/x entries greater than x, nor can it have any row with at least 1/x entries less than x. Proof. For the sake of contradiction, suppose A did have such a row, say the jth row. Suppose A j,i1,..., A j,in > x for some x ε, where N 5/x (the case where they are each less than x is argued identically). Let v i denote the ith column of A. Let u i be v i but with the jth coordinate replaced with 0. Then for any k 1, k [N] u ik1, u ik v ik1, v ik x ε x x/. Thus we have N 0 j=1 u ij N xn(n 1)/4, and rearranging gives the contradiction 1/x (N 1)/4 > 1/x. Lemma 4. Let s, q, r be positive reals with q/r and s q/e. Then if s ln(q/s) r it must be the case that s = Ω(r/ ln(q/r)). Proof. Define the function f(s) = s ln(q/s). Then f (s) = ln(q/(es)) is increasing for s q/e. Then since q/r, for s = cr ln(q/r) for constant c > 0 we have the equality s ln(q/s) = cr/ ln(q/r) ln((q/r) ln(q/r)) = (c+o q/r (1))r ln(q/r), where the o q/r (1) term goes to zero as q/r. Thus for c sufficiently small we have that the c + o q/r (1) term must be less than 1, so in order to have f(s) r, since f is increasing we must have s = Ω(r/ ln(q/r))..1 Sign matrices In this section we consider the case that all entries of A are either 0 or ±1/ s and show a lower bound on s in this case. Lemma 5. Suppose m < n/10 and all entries of A are in {0, 1/ s, 1/ s}. Then s 1/(ε). Proof. For the sake of contradiction suppose s < 1/(ε). There are ns non-zero entries in A and thus at least ns/ of these entries have the same sign by the pigeonhole principle; wlog let us say 1/ s appears at least ns/ times. Then again by pigeonhole some row j of A has N = ns/(m) values that are 1/ s. The claim now follows by Lemma 3 with x = 1/ s. We now show how to improve the bound to the desired form. 7

8 Theorem 6. Suppose m < n/10 and all entries of A are in {0, 1/ s, 1/ s}. Ω(ε 1 log n/ log(m/ log n)). Then s Proof. We know s 1/(ε) by Lemma 5. Let t = εs 1. Every v i has ( s t) subsets of size t of nonzero coordinates. Thus by pigeonhole there exists a set of t rows i 1,..., i t and N = n ( ) s t /( t ( ) m t ) columns v j1,..., v jn such that for each row all entries in those columns are 1/ s in magnitude and have the same sign (the signs may vary across rows). Letting u j be v j but with those t coordinates set to 0, we have u jk1, u jk = v jk1, v jk t/s ε t/s t/(s). Thus we have so that rearranging gives N 0 u jk N tn(n 1)/(4s) k=1 ( ( n s t s t(n 1)/4 = (t/4) t( ) m 1 t ) ) (t/4) (n(s/(em)) t 1). Suppose s < cε 1 log n/ log(em/n) for some small constant c so that n(s/(em)) t. Then Thus s (tn/8) (s/(em)) t. εn 4 = tn 8s ( ) em t. s Taking the natural logarithm of both sides gives ( ) em s ln 1 ( εn s ε ln 4 Define q = em, r = ε 1 ln(εn/4)/. Then s q/e, since s m. By [Alo09] we must have m = Ω(ε log n/ log(1/ε)), so q/r for ε smaller than some fixed constant. Thus by Lemma 4 we have s = Ω(r/ ln(q/r)). The theorem follows since log(εm/ log n) = Θ(m/ log n) since m = Ω(ε log n/ log(1/ε)) [Alo09]. Corollary 7. Suppose m poly(1/ε) log n < n/10 and all entries of A are in {0, 1/ s, 1/ s}. Then s Ω(ε 1 log n/ log(1/ε)).. General matrices We now consider arbitrary sparse and nearly orthogonal matrices A R m n. That is, we no longer require the non-zero entries of A to be 1/ s in magnitude. Lemma 8. Suppose m < n/(0 ln(1/ε)). Then s 1/(4ε). ). 8

9 Proof. For the sake of contradiction suppose s < 1/(4ε). We know by Lemma 3 that for any x ε, no row of A can have more than 5/x entries of value at least x in magnitude and of the same sign. Define S i = {j : A i,j ε}. Let S+ i be the subset of indices j in S i with A i,j > 0, and define Si = S i \S i +. Let X denote the square of a random positive value from S+ i. Then j S + i 1 1 A i,j = S i + EX = S+ i P (X > x) dx ε S i x dx = ε S+ i + 5 ln(1/ε). 0 ε By analogously bounding the sum of squares of entries in Si, we have that the sum of squares of entries at least ε in magnitude is never more than ε S i +10 ln(1/ε) in the ith row of A, for any i. Thus the total sum of squares of all entries in the matrix less than ε in magnitude is at most ε(ns i S i ). Meanwhile the sum of all other entries is at most ε( i S i )+10m ln(1/ε). Thus the sum of squares of all entries in the matrix is at most εns+10m ln(1/ε) < n/+10m ln(1/ε), by our assumption on s. This quantity must be n, since every column of A has unit l norm. However for our stated value of m this is impossible since 10m ln(1/ε) < n/, a contradiction. We now show how to obtain the extra factor of log n/ log(1/ε) in the lower bound. Lemma 9. Let 0 < ε < 1/. Suppose v 1,..., v n R m each have v = 1 and v 0 s, and furthermore v i, v j ε for i j. Then for any t [s] with t/s > Cε, we must have s t(n 1)/(C) with n N = t( )( m (s+t) ), C = /(1 1/ ). t t Proof. We label each vector v i by its t-type, defined in the following way. The t-type of a vector v i is the set of locations of the t largest coordinates in magnitude, as well as the signs of those coordinates, together with a rounding of those top t coordinates so that their squares round to the nearest integer multiple of 1/(s). In the rounding, values halfway between two multiples are rounded arbitrarily; say downward, to be concrete. Note that the amount of l mass contained in the top t coordinates of any v i after such a rounding is at most 1 + t/(s), and thus the number of roundings possible is at most the number of ways to write a positive integer in [s + t] as a sum of t positive integers, which is ( ) s+t t. Thus the total number of possible t-types is at most t( )( m (s+t) ) ( t t ( m ) t choices of the largest t coordinates, t choices of their signs, and ( ) (s+t) t choices for how they round). Thus by the pigeonhole principle, there exist N vectors v i1,..., v in each with the same t-type such that N n/( t( m t )( (s+t) t ) ). Now for these vectors v i1,..., v in, let S [n] of size t be the set of the largest coordinates (in magnitude) in each v ij. Define u ij = (v ij ) [n]\s ; that is, we zero out the coordinates in S. Then for j k [N], u ij, u ik = v ij, v ik r S(v ij ) r (v ik ) r ε r S(v ij ) r ((v ij ) r ± 1/ s) ε r S ( (v ij ) r (v ij ) r / ) s 9

10 ε (v ij ) S + t/(s) (v ij ) S ( ε 1 1 ) t/s. (1) The last inequality used that (v ij ) S t/s. Also we pick t to ensure t/s > ε/(1 1/ ) so that the right hand side of Eq. (1) is less than ((1 1/ )/)t/s = Ct/s. The penultimate inequality follows by Cauchy-Schwarz. Thus we have N N u ij = u ij + u ij, u ik j=1 j=1 j k N C(t/s)N(N 1)/ () However we also have j u i j 0, which implies s C(N 1)t/ by rearranging Eq. (). Theorem 10. There is some fixed 0 < ε 0 1/ so that the following holds. Let 1/ n < ε < ε 0. Suppose v 1,..., v n R m each have v = 1 and v 0 s, and furthermore v i, v j ε for i j. Then s Ω(ε 1 log n/ log(m/ log n)) as long as m < O(n/ ln(1/ε)). Proof. By Lemma 8, 4εs 1. Set t = 7εs so that Lemma 9 applies. Then by Lemma 9, as long as t( )( m (s+t) ) t t n/, 7εn = tn ( )( ) m (s + t) s 4C t t t ( 8e ) 7εs m 4C 49ε, s where C is as in Lemma 9. Taking the natural logarithm on both sides, ( 8e ) m ln(7εn/(4c)) (7εs) ln 49ε s In other words, s ln(7εn/(4c)) ( ). 7ε ln 8e m 49ε s Define r = ln(7εn/(4c))/(7ε), q = 8e m/(49ε ). Thus we have s ln(q/s) r. We have that s q/e is always the case for ε < 1/ since then q/e m and we have that s m. Also note for ε smaller than some constant we have that q/r > since m = Ω(log n) by [Alo09]. Thus by Lemma 4 we have s Ω(r/ ln(q/r)). Using that ln(εn) = Θ(log n) since ε > 1/ n, and that t( )( m (s+t) ) t t (8e m/(49ε s)) n/ for our setting of t when s = o(ε 1 log n/ log(m/(ε 1 log n))) gives s = Ω(ε 1 log n/ log(ε 1 m/ log n)). Since m = Ω(ε log n/ log(1/ε)) [Alo09], this is equivalent to our lower bound in the theorem statement. Corollary 11. Let ε, m, s be as in Theorem 10. m poly(1/ε) log n < O(n/ ln(1/ε)). Then s = Ω(ε 1 log n/ log(1/ε)) as long as Remark 1. From Theorem 10, we can deduce that for constant ε, in order for the sparsity s to be a constant independent of n, it must be the case that m = n Ω(1). This fact rules out very sparse mappings even when we significantly increase the target dimension. 10

11 3 RIP Sparsity Lower Bound Consider a k-rip matrix A R m n with distortion δ k where each column has at most s non-zero entries. We will show for δ k = Θ(1) that s cannot be very small when m has the optimal number of rows Θ(k log(n/k)). Theorem 13. Assume k, δ k < δ for some fixed universal small constant δ > 0, m < n/(64 log 3 n). Then we must have s = Ω(min{k log(n/k)/ log(m/(k log(n/k))), m}). Proof. Assume for the sake of contradiction that s < min{k log(n/k)/(64 log(m/s)), m/64}. Consider the ith column of A for some fixed i. By k-rip, the l norm of each column of A is at least 1 δ k > 1/, so the sum of squares of entries greater than 1/( s) in magnitude is at least 1/4. Therefore, there exists a scale 1 t log s such that the number of entries of absolute value greater than or equal to (t 3)/ / s is at least t 1 s/t. To see this, let S be the set of rows j such that A j,i 1/( s). For the sake of contradiction, suppose that every scale 1 t log s has strictly fewer than t 1 s/t values that are at least (t 3)/ / s in magnitude (note this also implies S < s/4). Let X be the square of a random element of S. Then A j,i = S EX = S j S 0 P (X > x) dx < /4s P (X > x) dx < t=1 t 8s s t+1 t < 1 4, a contradiction. Let a pattern at scale t be a subset of size u = max{ 4 t s/k, 1} of [m] along with u signs. There are ( ) t 1 s/t u patterns P where A v,i t 3 /s for all v P and the signs of A v,i match the signs of P. There are u( m u) possible patterns at scale t. By an averaging argument, there exists a scale t, and a pattern P such that the number of columns of A with this pattern is at least z = n ( ) t 1 s/t u /((log s) u ( m u) ). Consider cases. Case 1 (z k): Pick an arbitrary set of k such columns. Consider the vector v with k ones at locations corresponding to those columns and zeroes everywhere else. We have v = k and for each j P, we have (Av) j k t 3 /s. Thus, Av uk t 3 /s k. This contradicts the assumption that Av (1 + δ k) v. Case (z < k): Consider the vector v with z ones at locations corresponding to those columns and zeroes everywhere else. We have v = z and for each j P, we have (Av) j z t 3 /s. Consider subcases. Case.1 (u = 1): Then z = n t s/t (log s)m, so Av z t 3 /s 5 n/t z z. (3) (log s)m This contradicts the assumption that Av (1 + δ k) v. 11

12 Case. (u = 4 t s/k):. We have z = n( ) t 1 s/t u (log s) u( ) m u n ( s ) u log s t t+ em n log s (log(m/s)+log e+t++ log t)4 t s/k n log s (log(m/s)+log e+t++ log t)4 t log(n/k)/(64 log(m/s)) (4) n log s (k/n)1/4 (5) k. Eq. (4) follows from s < k log(n/k)/(64 log(m/s)). Eq. (5) follows from the fact that f(t) = (log(m/s) + log e + t + + log t) t is monotonically decreasing for t 1. Indeed, f (t) = t ( ln (log(m/s) + log e + + t + log t) + t ln + 1) ( t 9 ln t ln + ) t ln Eq. (6) follows since k < n/ log 4/3 n < n/ log 4/3 s, which holds since k m n/(64 log 3 n). This contradicts the assumption of Case that z < k. Thus we have s min{k log(n/k)/(64 log(m/s)), m/64} as desired. If s m/64 we are done. Otherwise we have s k log(n/k)/(64 log(m/s)). Define q = m, r = k log(n/k)/64. Thus we have s log(q/s) r. We have q/r for δ k smaller than some constant by Theorem 0, and we have s < q/e = m/e since we assume we are in the case s < m/64. Thus by Lemma 4 we have s = Ω(r/ ln(q/r)), which completes the proof of the theorem. Corollary 14. When k, δ k < δ for some universal constant δ > 0, and the number of rows m = Θ(k log(n/k)) < n/(3 log 3 n), we must have s = Ω(k log(n/k)). Remark 15. The restriction m = O(n/ log 3 n) in Theorem 13 was relevant in Eq. (3). Note the choice of t in the proof was just so that t 1/t converges. We could instead have chosen t 1+γ and obtained a qualitatively similar result, but with the slightly milder restriction m = O(n/ log +γ n), where γ > 0 can be chosen as an arbitrary constant. 4 Oblivious Subspace Embedding Sparsity Lower Bound In this section, we show a lower bound on the dimension of very sparse OSE s. Theorem 16. Consider d at least a large enough constant and n d. Any OSE with matrices A in its support having m rows and at most 1 non-zero entry per column such that with probability at least 1/5, the lengths of all vectors in a fixed subspace of dimension d of R n are preserved up to a factor, must have m d /14. (6) 1

13 Proof. Assume for the sake of contradiction that m < d /14. By Yao s minimax principle, we only need to show there exists a distribution over subspaces such that any fixed matrix A with column sparsity 1 and too few rows would fail to preserve lengths of vectors in the subspace with probability more than 4/5. Consider the uniform distribution over subspaces spanned by d standard basis vectors in R n : e i1, e i,..., e id with i 1,..., i d {1,..., n}. Let a(i) be the row of the non-zero entry in column i of A and b(j) be the number of non-zeroes in row j. We say i collides with j if a(i) = a(j). Let the n 10m. set of heavy rows be the set of rows j such that b(j) If we pick i 1,..., i d one by one. Conditioned on i 1,..., i t 1, the probability that a(i t ) is heavy is at least 9 10 d n 4 5. Therefore, by a Chernoff bound, with probability at least 9/10, the number of indices i t such that a(i t ) are heavy is at least 3d/4. We will show that conditioned on the number of such i t being at least 3d/4, with probability at least 9/10, two such indices collide. Let j 1,..., j 3d/4 be indices with b(a(j t )) n 10m. Conditioned on a(j 1 ),..., a(j t 1 ), the probability that j t does not collide with any previous index is at most t 1 1 b(a(j u ))/(n t + 1) + (t 1)/(n t + 1) e t 1 u=1 b(a(ju))/n+(t 1)/n e (t 1)/(10m)+(t 1)/n. u=1 Thus, the probability that no collision occurs is at most e ( (3d/4) /(40m))+((3d/4) /n) < 1/10. In other words, collision occurs with probability at least 9/10. When collision occurs, the number of non-zero entries of AM, where M is the matrix whose columns are e i1,..., e id, is at most d 1 so it has rank at most d 1. Therefore, with probability at least 4/5, A maps some non-zero vector in the subspace to the zero vector (any vector Mx for x ker(am)) and fails to preserve the length of all vectors in the subspace. 5 Lower Bound on Number of Rows for RIP Matrices In this section we show a lower bound on the number of rows of any k-rip matrix with distortion δ k. First we need the following form of the Chernoff bound. Theorem 17 (Chernoff bound). Let X 1,..., X n be independent random variables each at most K in magnitude almost surely, and with n i=1 EX i = µ and Var [ n i=1 X i ] = σ. Then [ ] n λ > 0, Pr X i µ > λσ < C max {e cλ, (λk/σ) cλσ/k} i=1 for some absolute constants c, C > 0. This form of the Chernoff bound can then be used to show the existence of a large errorcorrecting code with high relative distance. Lemma 18. For any 0 < ε 1/ and integers k, n with 1 k εn/, there exists a q-ary code with q = n/k and block length k of relative distance 1 ε, and with size at least min {e C ε n, e C εk log( εn k ) } for some absolute constant C > 0. 13

14 Proof. We take a random code. That is, pick N = min {e Cεn εn Cεk, e log( k ) } codewords with alphabet size q = n/k and block length k, with replacement. Now, look at two of these randomly chosen codewords. For i = 1,..., k, let X i be an indicator random variable for the event that the ith symbol is equal in the two codewords. Then X = k i=1 X i is the number of positions at which these two codewords agree, and EX = k /n εk/ and Var [ X ] k /n. Thus by the Chernoff bound, P ( X > εk) < C max {e cεn εn cεk, e log( k ) }. Therefore by a union bound, a random multiset of N codewords has relative distance 1 ε with positive probability (in which case it must also clearly be not just a multiset, but a set). Before proving the main theorem of this section, we also need the following theorem of Alon [Alo09]. Theorem 19 (Alon [Alo09]). Let x 1,..., x N R n be such that x i = 1 for all i, and x i, x j ε for all i j, where 1/ n < ε < 1/. Then n = Ω(ε log N/ log(1/ε)). Theorem 0. For any 0 < δ k 1/ and integers k, n with 1 k δ k n/, any k-rip matrix with distortion δ k must have Ω (min{n/ log(1/δ k ), (k/(δ k log(1/δ k ))) log(n/k)}) rows. Proof. Let C 1,..., C N be a code as in Lemma 18 with block length n/(k/) and alphabet size k/ with { ( N min e Cδ k n, e Cδ δk )} n kk log k. Consider a set of vectors y 1,..., y N in R n defined as follows. For j = 0,..., k/ 1, we define (y i ) jn/k+(ci ) j = /k, and all other coordinates of y i are 0. Then we have i y i = 1, and also 0 y i, y j δ k for all i j, and thus δ k y i y j. Since y i is k/-sparse and y i y j is k-sparse for all i, j, we have for any k-rip matrix A with distortion δ k i Ay i = 1 ± δ k, i j Ay i Ay j = (1 ± δ k ) ( ± δ k ) = ± 9δ k. Thus if we define x 1,..., x N by x i = Ay i / Ay i, then the x i satisfy the requirements of Theorem 19 with inner products at most O(δ k ) in magnitude. The lower bound on the number of rows of A then follows. It is also possible to obtain a lower bound on the number of rows of A in Theorem 0 of the form Ω(δ k k/ log(1/δ k)). This is because a theorem of [KW11] shows that any such RIP matrix with k = Θ(log n), when its column signs are flipped randomly, is a JL matrix for any set of n points with high probability. We then know from Theorem 19 that a JL matrix must have m = Ω(δ k log n/ log(1/δ k )) rows, which is Ω(δ k k/ log(1/δ k)). Corollary 1. Suppose 1/ n δ k 1/ and A R m n is a k-rip matrix with distortion δ k. Then m = Ω(log 1 (1/δ k ) min{k log(n/k)/δ k + k/δ k, n}). 14

15 6 Future Directions For several applications the JL lemma is used as a black box to obtain dimensionality-reducing linear maps for other problems. For example, applying the JL lemma with distortion O(δ k ) on a certain net with N = O ( n k) O(1/δk ) k vectors yields a k-rip matrix with distortion δ k [BDDW08]. Note in this case, for constant δ k, the number of rows one obtains is the optimal Θ(log N) = Θ(k log(n/k)). Applying the distributional JL lemma with distortion O(ε) to a certain net of size O(d) yields an OSE with m = O(d/ε ) rows to preserve d-dimensional subspaces (see [CW1, Fact 10], based on [AHK06]). Applying the JL lemma in this black-box way using the sparse JL matrices of [KN1] yields a factor-ε improvement in sparsity over using a random dense JL construction, with for example random Gaussian entries. However, some examples have shown that it is possible to do much better by not using the JL lemma statement as a black box, but rather by analyzing the sparsity required from the constructions in [KN1] from scratch for the problem at hand. For example, the work [NN1] showed that one can have column sparsity O(1/ε) with m = O(d 1+γ /ε ) rows in an OSE for any γ > 0, which is much better than the column sparsity O(d/ε) that is obtained by using the sparse JL theorem as a black box. We thus pose the following open problem in the realm of understanding sparse embedding matrices better. Let D be an OSNAP distribution [NN1] over R m n with column sparsity s. The class of OSNAP distributions includes both of the sparse JL distributions in [KN1], and more generally an OSNAP distribution is characterized by the following three properties where A is a random matrix drawn from D: All entries of A are in {0, 1/ s, 1/ s}. We write A i,j = δ i,j σ i,j / s where δ i,j is an indicator random variable for the event A i,j 0, and the σ i,j are independent uniform ±1 r.v. s. For any j [n], m i=1 δ i,j = s with probability 1. For any S [m] [n], E (i,j) S δ i,j (s/m) S. Given a set of vectors V R n, what is the tradeoff between the number of rows m and the column sparsity s required for a random matrix A drawn from an OSNAP distribution to preserve all l norms of vectors v V up to 1 ± ε simultaneously, with positive probability, as a function of the geometry of V? We are motivated to ask this question by a result of [KM05], which states that for a set of vectors V R n all of unit l norm, a matrix with random subgaussian entries preserves all l norms of vectors in V up to 1 ± ε as long as the number of rows m satisfies ( m Cε E g sup g, x ), (7) x V where g R n has independent Gaussian entries of mean 0 and variance 1. The bound on m in [KM05] is actually stated as Cε (γ (V, )) where γ is the γ functional, but this is equivalent to Eq. (7) up to a constant factor; see [Tal05] for details. Note Eq. (7) easily implies the m = O(d/ε ) bound for OSE s by letting V be the unit sphere in any d-dimensional subspace, and also implies m = O(δ k k log(n/k)) suffices for RIP matrices by letting V be the set of all k-sparse vectors of unit norm. Note that the resolution of this question will not just be in terms of the γ functional. In particular, for constant δ k we see that m, s = Θ((γ (V )) ) is necessary and sufficient when V is the 15

16 set of all unit norm k-sparse vectors. Even increasing m to Θ((γ (V )) +γ ) does not decrease the lower bound on s by much. Meanwhile for V a unit sphere of a d-dimensional subspace, we can simultaneously have m = O((γ (V )) +γ /ε ), and s = O(1/ε) not depending on γ (V ) at all. References [AC09] Nir Ailon and Bernard Chazelle. The Fast Johnson Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput., 39(1):30 3, 009. [Ach03] [AHK06] [AL09] [AL11] Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci., 66(4): , 003. Sanjeev Arora, Elad Hazan, and Satyen Kale. A fast random sampling algorithm for sparsifying matrices. In Proceedings of the 10th International Workshop on Randomization and Computation (RANDOM), pages 7 79, 006. Nir Ailon and Edo Liberty. Fast dimension reduction using Rademacher series on dual BCH codes. Discrete Comput. Geom., 4(4): , 009. Nir Ailon and Edo Liberty. Almost optimal unrestricted fast Johnson-Lindenstrauss transform. In Proceedings of the nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages , 011. [Alo09] Noga Alon. Perturbed identity matrices have high rank: Proof and applications. Combinatorics, Probability & Computing, 18(1-):3 15, 009. [AV06] [BD08] [BDDW08] [BI09] [BIPW10] [BIR08] Rosa I. Arriaga and Santosh Vempala. An algorithmic theory of learning: Robust concepts and random projection. Machine Learning, 63():161 18, 006. Thomas Blumensath and Mike E. Davies. Iterative hard thresholding for compressed sensing. J. Fourier Anal. Appl., 14:69 654, 008. Richard Baraniuk, Mark Davenport, Ronald DeVore, and Michael Wakin. A simple proof of the restricted isometry property for random matrices. Constr. Approx., 8:53 63, 008. Radu Berinde and Piotr Indyk. Sequential sparse matching pursuit. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing, pages 36 43, 009. Khanh Do Ba, Piotr Indyk, Eric Price, and David P. Woodruff. Lower bounds for sparse recovery. In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages , 010. Radu Berinde, Piotr Indyk, and Milan Ružic. Practical near-optimal sparse recovery in the L1 norm. In Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing, pages ,

17 [BOR10] Vladimir Braverman, Rafail Ostrovsky, and Yuval Rabani. Rademacher chaos, random Eulerian graphs and the sparse Johnson-Lindenstrauss transform. CoRR, abs/ , 010. [Can08] Emmanuel J. Candès. The restricted isometry property and its implications for compressed sensing. C. R. Acad. Sci. Paris, 346:589 59, 008. [Cha10] Venkat B. Chandar. Sparse Graph Codes for Compression, Sensing, and Secrecy. PhD thesis, Massachusetts Institute of Technology, 010. [CRT06a] [CRT06b] Emmanuel J. Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory, (5): , 006. Emmanuel J. Candès, Justin Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8), 006. [CT05] Emmanuel J. Candès and Terence Tao. Decoding by linear programming. IEEE Trans. Inf. Theory, 51(1): , 005. [CT06] [CW1] [DG03] [DKS10] Emmanuel J. Candès and Terence Tao. Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory, 5: , 006. Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in input sparsity time. CoRR, abs/ v, 01. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms, (1):60 65, 003. Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A sparse Johnson-Lindenstrauss transform. In Proceedings of the 4nd ACM Symposium on Theory of Computing (STOC), pages , 010. [DMIMW1] Petros Drineas, Malik Magdon-Ismail, Michael Mahoney, and David Woodruff. Fast approximation of matrix coherence and statistical leverage. In Proceedings of the 9th International Conference on Machine Learning (ICML), 01. [Don06] David L. Donoho. Compressed sensing. IEEE Trans. Inf. Theory, 5(4): , 006. [DTDlS1] David L. Donoho, Yaakov Tsaig, Iddo Drori, and Jean luc Starck. Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inf. Theory, 58: , 01. [FM88] Peter Frankl and Hiroshi Maehara. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theory. Ser. B, 44(3):355 36, [Fou11] Simon Foucart. Hard thresholding pursuit: an algorithm for compressive sensing. SIAM J. Numer. Anal., 49(6): ,

18 [GG84] [GK09] Andrej Y. Garnaev and Efim D. Gluskin. On the widths of the Euclidean ball. Soviet Mathematics Doklady, 30:00 03, Rahul Garg and Rohit Khandekar. Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property. In Proceedings of the 6th Annual International Conference on Machine Learning (ICML), pages , 009. [Gor88] Yehoram Gordon. On Milman s inequality and random subspaces which escape through a mesh in R n. Geometric Aspects of Functional Analysis, pages , [IM98] [Ind01] [IR08] [JL84] [Kaš77] [KM05] [KN10] [KN1] [KW11] [LDP07] [Mat08] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th ACM Symposium on Theory of Computing (STOC), pages , Piotr Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the 4nd Annual Symposium on Foundations of Computer Science (FOCS), pages 10 33, 001. Piotr Indyk and Milan Ružic. Near-optimal sparse recovery in the L1 norm. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages , 008. William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 6:189 06, Boris Sergeevich Kašin. The widths of certain finite-dimensional sets and classes of smooth functions. Izv. Akad. Nauk SSSR Ser. Mat., 41(): , 478, Bo az Klartag and Shahar Mendelson. Empirical processes and random projections. J. Funct. Anal., 5(1):9 45, 005. Daniel M. Kane and Jelani Nelson. A derandomized sparse Johnson-Lindenstrauss transform. CoRR, abs/ , 010. Daniel M. Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. In SODA, pages , 01. Felix Krahmer and Rachel Ward. New and improved Johnson-Lindenstrauss embeddings via the Restricted Isometry Property. SIAM J. Math. Anal., 43(3): , 011. Michael Lustig, David Donoho, and John M. Pauly. Sparse MRI: The application of compressed sensing for rapid MR Imaging. Magnetic Resonance in Medicine, 58: , 007. Jirí Matousek. On variants of the Johnson-Lindenstrauss lemma. Random Struct. Algorithms, 33():14 156,

Dimensionality Reduction Notes 3

Dimensionality Reduction Notes 3 Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T

More information

Sparse Johnson-Lindenstrauss Transforms

Sparse Johnson-Lindenstrauss Transforms Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean

More information

Sparser Johnson-Lindenstrauss Transforms

Sparser Johnson-Lindenstrauss Transforms Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical

More information

The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction

The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

More information

16 Embeddings of the Euclidean metric

16 Embeddings of the Euclidean metric 16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.

More information

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5 CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given

More information

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform

More information

New constructions of RIP matrices with fast multiplication and fewer rows

New constructions of RIP matrices with fast multiplication and fewer rows New constructions of RIP matrices with fast multiplication and fewer rows Jelani Nelson Eric Price Mary Wootters July 8, 203 Abstract In this paper, we present novel constructions of matrices with the

More information

Fast Random Projections

Fast Random Projections Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard

More information

Optimality of the Johnson-Lindenstrauss Lemma

Optimality of the Johnson-Lindenstrauss Lemma Optimality of the Johnson-Lindenstrauss Lemma Kasper Green Larsen Jelani Nelson September 7, 2016 Abstract For any integers d, n 2 and 1/(min{n, d}) 0.4999 < ε < 1, we show the existence of a set of n

More information

A Simple Proof of the Restricted Isometry Property for Random Matrices

A Simple Proof of the Restricted Isometry Property for Random Matrices DOI 10.1007/s00365-007-9003-x A Simple Proof of the Restricted Isometry Property for Random Matrices Richard Baraniuk Mark Davenport Ronald DeVore Michael Wakin Received: 17 May 006 / Revised: 18 January

More information

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Randomness-in-Structured Ensembles for Compressed Sensing of Images Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

1 Dimension Reduction in Euclidean Space

1 Dimension Reduction in Euclidean Space CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT Sparse Approximations Goal: approximate a highdimensional vector x by x that is sparse, i.e., has few nonzero

More information

Optimal Bounds for Johnson-Lindenstrauss Transformations

Optimal Bounds for Johnson-Lindenstrauss Transformations Journal of Machine Learning Research 9 08 - Submitted 5/8; Revised 9/8; Published 0/8 Optimal Bounds for Johnson-Lindenstrauss Transformations Michael Burr Shuhong Gao Department of Mathematical Sciences

More information

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction Kasper Green Larsen Jelani Nelson Abstract For any n > 1 and 0 < ε < 1/, we show the existence of an n O(1) -point subset

More information

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

17 Random Projections and Orthogonal Matching Pursuit

17 Random Projections and Orthogonal Matching Pursuit 17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a

More information

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation (approximation theory, learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup:

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

Very Sparse Random Projections

Very Sparse Random Projections Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Using the Johnson-Lindenstrauss lemma in linear and integer programming

Using the Johnson-Lindenstrauss lemma in linear and integer programming Using the Johnson-Lindenstrauss lemma in linear and integer programming Vu Khac Ky 1, Pierre-Louis Poirion, Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:{vu,poirion,liberti}@lix.polytechnique.fr

More information

Sparse Recovery Using Sparse (Random) Matrices

Sparse Recovery Using Sparse (Random) Matrices Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic Linear Compression (learning Fourier coeffs, linear

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

AN INTRODUCTION TO COMPRESSIVE SENSING

AN INTRODUCTION TO COMPRESSIVE SENSING AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE

More information

Optimality of the Johnson-Lindenstrauss lemma

Optimality of the Johnson-Lindenstrauss lemma 58th Annual IEEE Symposium on Foundations of Computer Science Optimality of the Johnson-Lindenstrauss lemma Kasper Green Larsen Computer Science Department Aarhus University Aarhus, Denmark larsen@cs.au.dk

More information

Yale university technical report #1402.

Yale university technical report #1402. The Mailman algorithm: a note on matrix vector multiplication Yale university technical report #1402. Edo Liberty Computer Science Yale University New Haven, CT Steven W. Zucker Computer Science and Appled

More information

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 1-2 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:

More information

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT

Tutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation (approximation theory, learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup:

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk

More information

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA) Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix

More information

The Johnson-Lindenstrauss Lemma

The Johnson-Lindenstrauss Lemma The Johnson-Lindenstrauss Lemma Kevin (Min Seong) Park MAT477 Introduction The Johnson-Lindenstrauss Lemma was first introduced in the paper Extensions of Lipschitz mappings into a Hilbert Space by William

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit arxiv:0707.4203v2 [math.na] 14 Aug 2007 Deanna Needell Department of Mathematics University of California,

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah 00 AIM Workshop on Ranking LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION By Srikanth Jagabathula Devavrat Shah Interest is in recovering distribution over the space of permutations over n elements

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where

More information

Compressive Sensing with Random Matrices

Compressive Sensing with Random Matrices Compressive Sensing with Random Matrices Lucas Connell University of Georgia 9 November 017 Lucas Connell (University of Georgia) Compressive Sensing with Random Matrices 9 November 017 1 / 18 Overview

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Lecture 22: More On Compressed Sensing

Lecture 22: More On Compressed Sensing Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an

More information

JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION

JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION LONG CHEN ABSTRACT. We give a brief survey of Johnson-Lindenstrauss lemma. CONTENTS 1. Introduction 1 2. JL Transform 4 2.1. An Elementary Proof

More information

Dense Fast Random Projections and Lean Walsh Transforms

Dense Fast Random Projections and Lean Walsh Transforms Dense Fast Random Projections and Lean Walsh Transforms Edo Liberty, Nir Ailon, and Amit Singer Abstract. Random projection methods give distributions over k d matrices such that if a matrix Ψ (chosen

More information

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

Accelerated Dense Random Projections

Accelerated Dense Random Projections 1 Advisor: Steven Zucker 1 Yale University, Department of Computer Science. Dimensionality reduction (1 ε) xi x j 2 Ψ(xi ) Ψ(x j ) 2 (1 + ε) xi x j 2 ( n 2) distances are ε preserved Target dimension k

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

Nearest Neighbor Preserving Embeddings

Nearest Neighbor Preserving Embeddings Nearest Neighbor Preserving Embeddings Piotr Indyk MIT Assaf Naor Microsoft Research Abstract In this paper we introduce the notion of nearest neighbor preserving embeddings. These are randomized embeddings

More information

Optimal terminal dimensionality reduction in Euclidean space

Optimal terminal dimensionality reduction in Euclidean space Optimal terminal dimensionality reduction in Euclidean space Shyam Narayanan Jelani Nelson October 22, 2018 Abstract Let ε (0, 1) and X R d be arbitrary with X having size n > 1. The Johnson- Lindenstrauss

More information

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings Jelani Nelson Huy L. Nguy ên November 5, 2012 Abstract An oblivious subspace embedding (OSE) given some parameters ε, d

More information

Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes

Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty Abstract The Fast Johnson-Lindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel

More information

Lecture 16: Compressed Sensing

Lecture 16: Compressed Sensing Lecture 16: Compressed Sensing Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 16 1 / 12 Review of Johnson-Lindenstrauss Unsupervised learning technique key insight:

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Compressed Sensing: Lecture I. Ronald DeVore

Compressed Sensing: Lecture I. Ronald DeVore Compressed Sensing: Lecture I Ronald DeVore Motivation Compressed Sensing is a new paradigm for signal/image/function acquisition Motivation Compressed Sensing is a new paradigm for signal/image/function

More information

Lecture 16 Oct. 26, 2017

Lecture 16 Oct. 26, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Piotr Indyk Lecture 16 Oct. 26, 2017 Scribe: Chi-Ning Chou 1 Overview In the last lecture we constructed sparse RIP 1 matrix via expander and showed that

More information

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Z Algorithmic Superpower Randomization October 15th, Lecture 12 15.859-Z Algorithmic Superpower Randomization October 15th, 014 Lecture 1 Lecturer: Bernhard Haeupler Scribe: Goran Žužić Today s lecture is about finding sparse solutions to linear systems. The problem

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

Euclidean sections of l N 1 with sublinear randomness and error-correction over the reals

Euclidean sections of l N 1 with sublinear randomness and error-correction over the reals Euclidean sections of l N 1 with sublinear randomness and error-correction over the reals Venkatesan Guruswami James R. Lee Avi Wigderson Abstract It is well-known that R N has subspaces of dimension proportional

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

arxiv: v1 [math.na] 26 Nov 2009

arxiv: v1 [math.na] 26 Nov 2009 Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

6 Compressed Sensing and Sparse Recovery

6 Compressed Sensing and Sparse Recovery 6 Compressed Sensing and Sparse Recovery Most of us have noticed how saving an image in JPEG dramatically reduces the space it occupies in our hard drives as oppose to file types that save the pixel value

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

Multipath Matching Pursuit

Multipath Matching Pursuit Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy

More information

Lecture 01 August 31, 2017

Lecture 01 August 31, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed

More information