Optimality of the Johnson-Lindenstrauss Lemma

Size: px
Start display at page:

Download "Optimality of the Johnson-Lindenstrauss Lemma"

Transcription

1 Optimality of the Johnson-Lindenstrauss Lemma Kasper Green Larsen Jelani Nelson September 7, 2016 Abstract For any integers d, n 2 and 1/(min{n, d}) < ε < 1, we show the existence of a set of n vectors X R d such that any embedding f : X R m satisfying must have x, y X, (1 ε) x y 2 2 f(x) f(y) 2 2 (1 + ε) x y 2 2 m = Ω(ε 2 lg n). This lower bound matches the upper bound given by the Johnson-Lindenstrauss lemma [JL84]. Furthermore, our lower bound holds for nearly the full range of ε of interest, since there is always an isometric embedding into dimension min{d, n} (either the identity map, or projection onto span(x)). Previously such a lower bound was only known to hold against linear maps f, and not for such a wide range of parameters ε, n, d [LN16]. The best previously known lower bound for general f was m = Ω(ε 2 lg n/ lg(1/ε)) [Wel74, Alo03], which is suboptimal for any ε = o(1). 1 Introduction In modern algorithm design, often data is high-dimensional, and one seeks to first pre-process the data via some dimensionality reduction scheme that preserves geometry in such a way that is acceptable for particular applications. The lower-dimensional embedded data has the benefit of requiring less storage, less communication bandwith to be transmitted over a network, and less time to be analyzed by later algorithms. Such schemes have been applied to good effect in a diverse range of areas, such as streaming algorithms [Mut05], numerical linear algebra [Woo14], compressed sensing [CRT06, Don06], graph sparsification [SS11], clustering [BZMD15, CEM + 15], nearest neighbor search [HIM12], and many others. A cornerstone dimensionality reduction result is the following Johnson-Lindenstrauss (JL) lemma [JL84]. Theorem 1 (JL lemma). Let X R d be any set of size n, and let ε (0, 1/2) be arbitrary. Then there exists a map f : X R m for some m = O(ε 2 lg n) such that x, y X, (1 ε) x y 2 2 f(x) f(y) 2 2 (1 + ε) x y 2 2. (1) Even though the JL lemma has found applications in a plethora of different fields over the past three decades, its optimality has still not been settled. In the original paper by Johnson and Lindenstrauss [JL84], it was proved that for ε smaller than some universal constant ε 0, there exists n point sets X R n for which any embedding f : X R m providing (1) must have m = Ω(lg n). This was later improved by Alon [Alo03], who showed the existence of an n point set X R n, such that any f providing (1) must have m = Ω(min{n, ε 2 lg n/ lg(1/ε)}). This lower bound can also be obtained from the Welch bound [Wel74], which states ε 2k (1/(n 1))(n/ ( ) m+k 1 k 1) for any positive integer k, by choosing 2k = lg n/ lg(1/ε). The lower bound can also be extended to hold for any n e cε2d for some constant c > 0. This bound falls short of the JL lemma for any ε = o(1). Aarhus University. larsen@cs.au.dk. Supported by Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation, grant DNRF84, a Villum Young Investigator Grant and an AUFF Starting Grant. Harvard University. minilek@seas.harvard.edu. Supported by NSF CAREER award CCF , NSF grant IIS , ONR Young Investigator award N , and a Google Faculty Research Award. 1

2 Our Contribution: In this paper, we finally settle the optimality of the JL lemma. Furthermore, we do so for almost the full range of ε. Theorem 2. For any integers n, d 2 and ε (lg n/ min{n, d}, 1), there exists a set of points X R d of size n, such that any map f : X R m providing the guarantee (1) must have m = Ω(ε 2 lg(ε 2 n)). Here it is worth mentioning that the JL lemma can be used to give an upper bound of m = O(min{n, d, ε 2 lg n}), where the d term is obvious (the identity map) and the n term follows by projecting onto the n-dimensional subspace spanned by X. Thus the requirement ε > (lg n)/ min{n, d} is necessary for such a lower bound to apply, up to the lg n factor. It is worth mentioning that the arguments in previous work [Wel74, Alo03, LN16] all produced hard point sets P which were incoherent: every x P had unit l 2 norm, and x y P one had x, y = O(ε) (to be more precise, the argument in [LN16] produced a random point set which was hard to embed into low dimension with high probability, but was also incoherent with high probability; the arguments in [Wel74, Alo03] necessarily required P to be incoherent). Unfortunately though, it is known that for ε < 2 ω( lg n) an embedding with m = o(ε 2 lg n) satisfying (1) exists, beating the guarantee of the JL lemma. The construction is based on Reed-Solomon codes (see for example [NNW14]). Thus proving Theorem 2 requires a very different construction of a hard point set when compared with previous work. 1.1 Related Results Prior to our work, a result of the authors [LN16] showed an m = Ω(ε 2 lg n) bound in the restricted setting where f must be linear. This left open the possibility that the JL lemma could be improved upon by making use of nonlinear embeddings. Indeed, as mentioned above even the hard instance of [LN16] enjoys the existence of a nonlinear embedding into m = o(ε 2 lg n) dimension for ε < 2 ω( lg n). Furthermore, that result only provided hard instances with n poly(d), and furthermore n had to be sufficiently large (at least Ω(d 1+γ /ε 2 ) for a fixed constant γ > 0). Also related is the so-called distributional JL (DJL) lemma. The original proof of the JL lemma in [JL84] is via random projection, i.e. ones picks a uniformly random rotation U then defines f(x) to be the projection of Ux onto its first m coordinates, scaled by 1/ m in order to have the correct squared Euclidean norm in expectation. Note that this construction of f is both linear, and oblivious to the data set X. Indeed, all known proofs of the JL lemma proceed by instantiating distributions D ε,δ satisfying the guarantee of the below distributional JL (DJL) lemma. Lemma 1 (Distributional JL (DJL) lemma). For any integer d 1 and any 0 < ε, δ < 1/2, there exists a distribution D ε,δ over m d real matrices for some m ε 2 lg(1/δ) such that u R d, P ( Πu 2 u 2 > ε u 2 ) < δ. (2) Π D ε,δ One then proves the JL lemma by proving the DJL lemma with δ < 1/ ( n 2), then performing a union bound over all u {x y : x, y X} to argue that Π simultaneously preserves all norms of such difference vectors simultaneously with positive probability. It is known that the DJL lemma is tight [JW13, KMN11]; namely any distribution D ε,δ over R m n satisfying (2) must have m = Ω(min{d, ε 2 lg(1/δ)}). Note though that, prior to our current work, it may have been possible to improve upon the JL lemma by avoiding the DJL lemma. Our main result implies that, unfortunately, this is not the case: obtaining (1) via the DJL lemma is optimal. 2

3 2 Proof Overview In the following, we give a high level introduction of the main ideas in our proof. The proof goes via a counting argument. More specifically, we construct a large family P = {P 1, P 2,... } of very different sets of n points in R d. We then assume all point sets in P can be embedded into R m while preserving all pairwise distances to within (1 + ε). Letting f 1 (P 1 ), f 2 (P 2 ),..., denote the embedded point sets, we then argue that our choice of P ensures that any two f i (P i ) and f j (P j ) must be very different. If m is too low, this is impossible as there are not enough sufficiently different point sets in R m. In greater detail, the point sets in P are chosen as follows: Let e 1,..., e d denote the standard unit vectors in R d. For now, assume that d = n/ lg(1/ε) and ε (0, 1). For any set S [d] of k = ε 2 /c 2 0 indices, define a vector y S := j S e j/ k. Here c 0 is a sufficiently large constant. A vector y S has the property that y S, e j = 0 if j / S and y S, e j = c 0 ε if j S. The crucial property here is that there is a gap of c 0 ε between the inner products depending on whether or not j S. Now if f is a mapping to R m that satisfies the JL-property (1) for P = {0, e 1,..., e d, y S }, then first off, we can assume f(0) = 0 since pairwise distances are translation invariant. From this it follows that f must preserve norms of the vectors x P to within (1 + ε) since (1 ε) x 2 2 = (1 ε) x f(x) f(0) 2 2 = f(x) 2 2 = f(x) f(0) 2 2 (1 + ε) x = (1 + ε) x 2 2. We then have that f must preserve inner products e j, y S up to an additive of O(ε). This can be seen by the following calculations, where v ± X denotes the interval [v X, v + X]: f(e j ) f(y S ) 2 2 = f(e j ) f(y S ) f(e j ), f(y S ) 2 f(e j ), f(y S ) (1 ± ε) e j (1 ± ε) y S 2 2 (1 ± ε) e j y S f(e j ), f(y S ) 2 e j, y S ± ε( e j y S e j y S 2 2) f(e j ), f(y S ) e j, y S ± 4ε. This means that after applying f, there remains a gap of (c 0 8)ε = Ω(ε) between f(e j ), f(y S ) depending on whether or not j S. With this observation, we are ready to describe the point sets in P. Let Q = n d 1. For every choice of Q sets S 1,..., S Q [d] of k indices each, we add a point set P to P. The point set P is simply {0, e 1,..., e d, y S1,..., y SQ }. This gives us a family P of size ( d k) Q. If we look at JL embeddings for all of these point sets f 1 (P 1 ), f 2 (P 2 ),..., then intuitively these embeddings have to be quite different. This is true since f i (P i ) uniquely determines P i simply by computing all inner products between the f i (e j ) s and f i (y Sl ) s. The problem we now face is that there are infinitely many sets of n points in R m that one can embed to. We thus need to discretize R m in a careful manner and argue that there are not enough n-sized sets of points in this discretization to uniquely embed each P i when m is too low. Encoding Argument. To give a formal proof that there are not enough ways to embed the point sets in P into R m when m is low, we give an encoding argument. More specifically, we assume that it is possible to embed every point set in P into R m while preserving pairwise distances to within (1 + ε). We then present an algorithm that based on this assumption can take any point set P i P and encode it into a bit string of length O(nm). The encoding guarantees that P i can be uniquely recovered from the encoding. The encoding algorithm thus effectively defines an injective mapping g from P to {0, 1} O(nm). Since g is injective, we must have P 2 O(nm). But P = ( d Q k) (ε 2 n/ lg(1/ε)) (ε 2 (n n/ lg(1/ε) 1)/c 2 0 ) and we can conclude m = Ω(ε 2 lg(ε 2 n/ lg(1/ε))). For ε > 1/n , this is m = Ω(ε 2 lg n). The difficult part is to design an encoding algorithm that yields an encoding of size O(nm) bits. A natural first attempt would go as follows: Recall that any JL-embedding f i for a point set P i P must preserve gaps in f i (e j ), f i (y Sl ) s depending on whether or not j S l. This follows simply by preserving distances to within a factor (1 + ε). If we can give an encoding that allows us to recover approximations ˆf i (e j ) of f i (e j ) and ˆf i (y Sl ) of f i (y Sl ) such that ˆf i (e j ) f i (e j ) 2 2 ε and ˆf i (y Sl ) f i (y Sl ) 2 2 ε, then by the triangle inequality, the distance ˆf i (e j ) ˆf i (y Sl ) 2 2 is also a (1 + O(ε)) approximation to e j y Sl 2 2 and the gap 3

4 between inner products would be preserved. To encode sufficiently good approximations ˆf i (e j ) and ˆf i (y Sl ), one could do as follows: Since norms are roughly preserved by f i, we must have f i (e j ) 2 2, f i (y Sl ) ε. Letting B2 m denote the l 2 unit ball in R m, we could choose some fixed covering C 2 of (1 + ε)b2 m with translated copies of εb2 m. Since f i (e j ), f i (y Sl ) (1 + ε)b2 m, we can find translations c 2 (f i (e j )) + εb2 m and c 2 (f i (y Sl )) + εb2 m of εb2 m in C 2, such that these balls contain f i (e j ) and f i (y Sl ) respectively. Letting ˆf i (e j ) = c 2 (f i (e j )) and ˆf i (y Sl ) = c 2 (f i (y Sl )) be the centers of these balls, we can encode an approximation of f i (e j ) and f i (y Sl ) using lg C 2 bits by specifying indices into C 2. Unfortunately, covering (1 + ε)b2 m by εb2 m needs C 2 = 2 Ω(m lg(1/ε)) since the volume ratio between (1 + ε)b2 m and εb2 m is (1/ε) Ω(m). The lg(1/ε) factor loss leaves us with a lower bound on m of no more than m = Ω(ε 2 lg(ε 2 n/ lg(1/ε))/ lg(1/ε)), roughly recovering the lower bound of Alon [Alo03] by a different argument. The key idea to reduce the length of the encoding to O(nm) is as follows: First observe that we chose d = n/ lg(1/ε). Thus we can spend up to O(m lg(1/ε)) bits encoding each f i (e j ) s. Thus we simply encode approximations ˆf i (e j ) by specifying indices into a covering C 2 of (1 + ε)b2 m by εb2 m as outlined above. For the f i (y Sl ) s, we have to be more careful. First, we define the d m matrix A having the ˆf i (e j ) as rows. Note that this matrix can be reconstructed from the part of the encoding specifying the ˆf i (e j )s. Now observe that the j th coordinate of Af i (y Sl ) is within O(ε) of e j, y Sl. The coordinates of Af i (y Sl ) thus determine S l. We therefore seek to encode Af i (y Sl ) efficiently. To encode Af i (y Sl ), first note that Af i (y Sl ) = O(ε). If W denotes the m-dimensional subspace spanned by the columns of A, we also have that Af i (y Sl ) W. Now define the convex body T := B d W, where B d denotes the l unit cube in R d. Then Af i (y Sl ) O(ε) T. Now recall that there is a gap of Ω(ε) between inner products ˆf i (e j ), f i (y Sl ) depending on whether j S l or not. Letting c 1 be a constant such that the gap is more than 2c 1 ε, this implies that if we approximate Af i (y Sl ) by a point ˆf i (y Sl ) such that ( ˆf i (y Sl ) Af i (y Sl )) c 1 ε B, d then the coordinates of ˆfi (y Sl ) still uniquely determine the indices j S l. Exploiting that Af i (y Sl ) O(ε) T, we therefore create a covering C of O(ε) T by translated copies of c 1 ε T and approximate Af i (y Sl ) by a convex body in C containing it. The crucial property of this construction is that the volume ratio between O(ε) T and c 1 ε T is only 2 O(m) and we can have C = 2 O(m). Specifying indices into C thus costs only O(m) bits and we have obtained the desired encoding algorithm. Handling Small d. In the proof sketch above, we assumed d = n/ lg(1/ε). As d went into P, the above argument falls apart when d is much smaller than n and would only yield a lower bound of m = Ω(ε 2 lg(ε 2 d/ lg(1/ε))). Fortunately, we do not have to use completely orthogonal vectors e 1,..., e d in our construction. What we really need is that we have a base set of vectors x 1,..., x and many subsets S of k indices in [ ], such that there is a gap of Ω(ε) between inner products x j, i S x i/ k depending on whether or not j S. To construct such x 1,..., x when d is small, we argue probabilistically: We pick x 1,..., x as uniform random standard gaussians scaled by a factor 1/ d, i.e. each coordinate of each x i is independently N (0, 1/d) distributed. We show that with non-zero probability, at least half of all k-sized sets S of indices in [ ] have the desired property of sufficiently large gaps in the inner products. The proof of this is given in Section 4. Here we also want to remark that it would be more natural to require that all k-sized sets of indices S have the gap property. Unfortunately, this requires us to union bound over ( ) k different subsets and we would obtain a weaker lower bound for small d. 3 Preliminaries on Covering Convex Bodies We here state a standard result on covering numbers. The proof is via a volume comparison argument; see for example [Pis89, Equation (5.7)]. Lemma 2. Let E be an m-dimensional normed space, and let B E denote its unit ball. For any 0 < ε < 1, one can cover B E using at most 2 m lg(1+2/ε) translated copies of εb E. 4

5 Corollary 1. Let T be an origin symmetric convex body in R m. For any 0 < ε < 1, one can cover T using at most 2 m lg(1+2/ε) translated copies of εt. Proof. The Minkowski functional of an origin symmetric convex body T, when restricted to the subspace spanned by vectors in T, is a norm for which T is the unit ball (see e.g. [Tho96, Proposition 1.1.8]). It thus follows from Lemma 2 that T can be covered using at most 2 m lg(1+2/ε) translated copies of εt. In the remainder of the paper, we often use the notation B d p to denote the unit l p ball in R d. 4 Nearly Orthogonal Vectors In the proof overview in Section 2, we argued towards the end that we need to show the existence of a large set of base vectors x 1,..., x and many k-sized sets of indices S [ ], such that there is a gap between x i, j S x j/ k depending on whether or not i S. Below we formally define such collections of base vectors and state two lemmas regarding how large sets we can construct and what properties such sets have. We defer the proofs of the lemmas to Section 6. Definition 1. Let X = {x 1,..., x N } be a set of vectors in R d and F a collection of k-sized subsets of [N]. For any 0 < µ < 1 and integer k 1, we say that (X, F) is k-wise µ-incoherent if for every vector x j X, the following holds: 1. x j µ. 2. For every S F, it holds that x j, i S:i j x i/ k µ. Our first step is to show the existence of a large (X, F) that is k-wise µ-incoherent: Lemma 3. For any 0 < µ < 1, 1 N max{d, e O(µ2d) } and integer 1 k N, there exists (X, F) with X R d and X = N, such that (X, F) is k-wise µ-incoherent and ( ) N F /2. k This lemma is proved in Section 6. The following property of k-wise µ-incoherent pairs (X, F) plays a crucial role in our lower bound proof: Lemma 4. Let (X, F) be k-wise µ-incoherent for some 0 < µ < 1 and k 1. y = i S x i/ k. Then y satisfies: Let S F and define 1. y ( k + 1)µ. 2. For a vector x j X such that j / S, we have y, x j µ. 3. For a vector x j X such that j S, we have (1 µ)/ k µ y, x j (1 + µ)/ k + µ. The proof of this lemma can also be found in Section 6. 5 Lower Bound Proof The goal of this section is to prove Theorem 2: Restatement of Theorem 2. For any integers n, d 2 and ε (lg n/ min{n, d}, 1), there exists a set of points X R d of size n, such that any map f : X R m providing the guarantee (1) must have m = Ω(ε 2 lg(ε 2 n)). 5

6 We assume throughout the proof that ε (lg n/ min{n, d}, 1) as required in the theorem. We now describe a collection of point sets P = {P 1,... } for which at least one point set P i P cannot be embedded into o(ε 2 lg(ε 2 n)) dimensions while preserving all pairwise distances to within a factor (1±ε). Let µ = ε/2, k = (ε 2 /2 20 ) and = n/ lg(1/ε). For this choice of µ, k and, and using ε > lg n/ min{n, d}, we have that k < /2 and e Ω(µ2d) = ω(n). Hence it follows from Lemma 3 that there exists (X, F) that is k-wise µ-incoherent with X = and F ( k) /2. For every sequence of Q = n 1 sets S 1,..., S Q F, we add a point set P to P. For sets S 1,..., S Q we construct P as follows: First we add the zero-vector and X to P. Let x 1,..., x denote the vectors in X. The remaining Q vectors are denoted y 1,..., y Q and are defined as y i = j S i x j / k. The point set P is thus the ordered sequence of points {0, x 1,..., x, y 1,..., y Q }. This concludes the ( ( ) Q. description of the hard point sets. Observe that P k) /2 Also, Lemma 4 implies the following for our choice of k and µ: Corollary 2. Let (X, F) be (ε 2 /2 20 )-wise (ε/2)-incoherent. Let S F and define y = i S x i/ k. Then y satisfies: 1. y For a vector x j X such that j / S, we have y, x j ε/2. 3. For a vector x j X such that j S, we have 2 8 ε y, x j 2 11 ε. Proof. The first item follows since 1 + ( k + 1)µ 1 + 1/ ε/2 < 2. The second item follows by choice of µ and item 2 from Lemma 4. For the third item: (1 µ)/ k µ 2 10 ε 2 9 ε 2 ε/2 2 8 ε and (1 + µ)/ k + µ 2 10 ε ε 2 + ε/ ε. Our lower bound follows via an encoding argument. More specifically, we assume that for every set P R d of n points, there exists an embedding f P : P R m satisfying: x, y P : (1 ε) x y 2 2 f P (x) f P (y) 2 2 (1 + ε) x y 2 2 (3) Under this assumption, we show how to encode and decode a set of vectors P i P using O(nm) bits. Since ( ( ) Q, P k) /2 any encoding that allows unique recovery of the vectors sets in P must necessarily use lg P = Q(lg ( k) 1) Q(k lg( /k) 1) bits on at least one point set Pi P. This will establish a lower bound on m. Encoding Algorithm. First, the encoder and decoder agree on a covering C 2 of 2B m 2 by translated copies of εb m 2. Lemma 2 guarantees that there exists such a covering C 2 with C 2 2 m lg(1+4/ε). Now, given a set of vectors/points P P, where P = {0, x 1,..., x, y 1,..., y Q }, the encoder first applies f P to all points in P. Henceforth we abbreviate f P as f. Since pairwise distances are invariant under translation, we assume wlog. that f(0) = 0. Since f satisfies (3), we must have x i P : f(x i ) f(0) 2 2 (1 + ε) x i x i P : f(x i ) 2 2 (1 + ε) x i 2 2 (1 + ε)(1 + ε/2) 4 x i P : f(x i ) 2 2. Take each x i P in turn and find a ball in C 2 containing f(x i ) and let c 2 (x i ) denote the ball s center. Write down c 2 (x i ) as an index into C 2. This costs m lg(1 + 4/ε) bits when summed over all x i. Next, let A 6

7 denote the m matrix having one row per x i, where the i th row is c 2 (x i ). Let W denote the subspace of R spanned by the columns of A. We have dim(w ) m. Define T as the convex body T := B W. That is, T is the intersection of the subspace W with the -dimensional l unit ball B. Now let C be a minimum cardinality covering of (2 12 ε)t by translated copies of εt, computed by any deterministic procedure that depends only on T. Since T is origin symmetric, by Corollary 1 it follows that C 2 m lg(1+213). To encode the vectors y 1,..., y Q we make use of the following lemma, whose proof we give in Section 5.1: Lemma 5. For every x j and y i in P, we have c 2 (x j ), f(y i ) x j, y i 10ε. From Lemma 5 and Corollary 2, it follows that c 2 (x j ), f(y i ) 10ε ε < 2 12 ε for every x j and y i in P. Since the j th coordinate of Af(y i ) equals c 2 (x j ), f(y i ), it follows that Af(y i ) (2 12 ε)t. Using this fact, we encode each y i by finding some vector c (y i ) such that c (y i ) + εt is a convex shape in the covering C and Af(y i ) c (y i ) + εt. We write down c (y i ) as an index into C. This costs a total of Qm lg( ) < 14Qm bits over all y i. We now describe our decoding algorithm. Decoding Algorithm. To recover P = {0, x 1,..., x, y 1,..., y Q } from the above encoding, we only have to recover y 1,..., y Q as {0, x 1,..., x } is the same for all P P. We first reconstruct the matrix A. We can do this since C 2 was chosen independently of P and thus by the indices encoded into C 2, we recover c 2 (x i ) for i = 1,...,. These are the rows of A. Then given A, we know T. Knowing T, we compute C since it was constructed via a deterministic procedure depending only on T. This finally allows us to recover c (y 1 ),..., c (y Q ). What remains is to recover y 1,..., y Q. Since y i is uniquely determined from the set S i {1,..., } of k indices, we focus on recovering this set of indices for each y i. For i = 1,..., Q recall that Af(y i ) is in c (y i ) + εt. Observe now that: Af(y i ) c (y i ) + εt Af(y i ) c (y i ) εt Af(y i ) c (y i ) ε. But the j th coordinate of Af(y i ) is c 2 (x j ), f(y i ). We combine the above with Lemma 5 to deduce (c (y i )) j x j, y i 11ε for all j. From Corollary 2, it follows (c (y i )) j < 12ε for j / S i and (c (y i )) j > 2 7 ε for j S i. We finally conclude that the set S i, and thus y i, is uniquely determined from c (y i ). Analysis. We finally analyse the size of the encoding produced by the above procedure and derive a lower bound on m. Recall that the encoding procedure produces a total of m lg(1 + 4/ε) + 14Qm m lg(5/ε) + 14Qm < m lg(1/ε) + 17nm bits. We chose = n/ lg(1/ε) and the number of bits is thus no ( ( ) Q more than 18nm. But P k) /2 ( /(2k)) kq = ( /(2k)) k(n 1) ( /(2k)) kn/2. We therefore must have 18nm (kn/2) lg( /(2k)) m = Ω(ε 2 lg(ε 2 n/ lg(1/ε))). Since we assume ε > lg n/ n, this can be simplified to This concludes the proof of Theorem 2. m = Ω(ε 2 lg(ε 2 n)). 7

8 5.1 Proof of Lemma 5 In this section, we prove the lemma: Restatement of Lemma 5. For every x j and y i in P, we have Proof. First note that: c 2 (x j ), f(y i ) x j, y i 10ε. c 2 (x j ), f(y i ) = c 2 (x j ) f(x j ) + f(x j ), f(y i ) = f(x j ), f(y i ) + c 2 (x j ) f(x j ), f(y i ) f(x j ), f(y i ) ± c 2 (x j ) f(x j ) 2 f(y i ) 2. Since C 2 was a covering with εb m 2, we have c 2 (x j ) f(x j ) 2 ε. Furthermore, since f satisfies (3), we have from Corollary 2 and 0 P that f(y i ) 2 2 2(1 + ε) f(y i ) f(y i ) 2 2. We thus have: To bound f(x j ), f(y i ), observe that This implies that That is, we have Using Corollary 2, we have Inserting this in (4), we obtain c 2 (x j ), f(y i ) f(x j ), f(y i ) ± 2ε. (4) f(x j ) f(y i ) 2 2 = f(x j ) f(y i ) f(x j ), f(y i ). 2 f(x j ), f(y i ) x j 2 2(1 ± ε) + y i 2 2(1 ± ε) x i y i 2 2(1 ± ε) 2 x j, y i ± ε( x j y i x j y i 2 2) 2 x j, y i ± ε(4( x j y i 2 2)) f(x j ), f(y i ) x j, y i ± 2ε( x j y i 2 2). f(x j ), f(y i ) x j, y i ± 2ε((1 + ε/2) + 2) x j, y i ± 8ε. c 2 (x j ), f(y i ) x j, y i ± 10ε. 6 Proofs of Lemmas on Nearly Orthogonal Vectors In the following, we prove the lemmas from Section 4. Restatement of Lemma 3. For any 0 < µ < 1, 1 N max{d, e O(µ2d) } and integer 1 k N, there exists (X, F) with X R d and X = N, such that (X, F) is k-wise µ-incoherent and ( ) N F /2. k 8

9 Proof. When d is the maximum in the bound N max{d, e O(µ2d) }, the lemma follows by setting X = {e 1,..., e N } and F the collection of all k-sized subsets of X. For the other case, we prove the lemma by letting X be a set of N independent gaussian vectors in R d each with identity covariance matrix, scaled by a factor 1/ d, i.e. each coordinate of each vector is independently N (0, 1/d) distributed. We then show that as long as µ, N and k satisfy the requirements of the lemma, then with non-zero probability, we can find F of at least ( ) N k /2 k-sized subsets of X which makes (X, F) k-wise µ-incoherent. Let x 1, x 2,..., x N denote the vectors in X and consider any fixed x i. First note that x i 2 2 (1/d)χ 2 d. We thus get from tail bounds on the chi-squared distribution that P( x i > µ) < e Ω(µ2 d). Let E i denote the event x i µ. Consider now any subset S [N] with S k. Let x S = j S x j/ k. For any fixed vector y R d, we have y, x S = d y i (x j ) i / k = i=1 j S d y i (x j ) i / k. For every i, j S (x j) i / k is N (0, S /(dk)) distributed and these are independent across the different i s. Thus y, x S N (0, y 2 2 S /(dk)). Therefore, if y is a vector with y µ, then i=1 j S P( y, x S > µ) < e Ω(µ2 dk/( S y 2 2 )) = e Ω(µ2 d). Now fix an x i and a set S [N] with S = k. Observe that if we condition on the event E i, all vectors x h with h S \ {i} are still independent standard gaussians scaled by 1/ d. We thus have: P( x i, x j / k > µ E i ) < e Ω(µ2d). j S:i j We say that S fails if there is some x i X such that either 1. x i > µ. 2. x i, j S:i j x j/ k > µ. By a union bound, the first event happens with probability at most Ne Ω(µ2d). For the second event, consider a fixed x i. Then P( x i, x j / k > µ) P( E i ) + P( x i, x j / k > µ E i ) j S:i j e Ω(µ2 d) + e Ω(µ2 d). j S:i j Thus we can again union bound over all x i, and conclude that S fails with probability at most Ne Ω(µ2d). As long as N e γµ2d for a sufficiently small constant γ > 0, this probability is less than 1/2. Therefore the expected number of sets S that do not fail is at least ( ) N k /2. This means that there must exist a choice of X for which there are at least ( ) N k /2 sets S that do not fail. This shows the existence of the claimed (X, F) since as long as at least one S does not fail, (X, F) also satisfies x i µ for all x i X. Restatement of Lemma 4. Let X = {x 1,..., x N } and let (X, F) be k-wise µ-incoherent for some 0 < µ < 1 and k 1. Let S F and define y = j S x j/ k. Then y satisfies: 1. y ( k + 1)µ. 9

10 2. For a vector x j X such that j / S, we have y, x j µ. 3. For a vector x j X such that j S, we have (1 µ)/ k µ y, x j (1 + µ)/ k + µ. Proof. For the first item, observe that y 2 2 = x i, x j /k i S j S = x i 2 2/k + x i, x j /k i S i S j S:i j 1 + µ + 1/ k i S x i, x j / k j S:i j 1 + µ + kµ. For the second item, let x j X with j / S. Then y, x j = x j, x i / k i S µ. For the third term, let j S. Then y, x j = = x j, x i / k i S x j 2 2/ k + x j, i S:i j x i / k. This shows that y, x j (1 µ)/ k µ. and also that y, x j (1 + µ)/ k + µ. References [Alo03] Noga Alon. Problems and results in extremal combinatorics I. Discrete Mathematics, 273(1-3):31 53, [BZMD15] Christos Boutsidis, Anastasios Zouzias, Michael W. Mahoney, and Petros Drineas. Randomized dimensionality reduction for k-means clustering. IEEE Transactions on Information Theory, 61(2): , [CEM + 15] Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Mădălina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In Proceedings of the 47th ACM Symposium on Theory of Computing (STOC), Full version at 10

11 [CRT06] Emmanuel Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory, 52(2): , [Don06] David Donoho. Compressed sensing. IEEE Trans. Inf. Theory, 52(4): , [HIM12] [JL84] [JW13] [KMN11] [LN16] Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of Computing, 8(1): , William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26: , T. S. Jayram and David P. Woodruff. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. ACM Transactions on Algorithms, 9(3):26, Daniel M. Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In Proceedings of the 15th International Workshop on Randomization and Computation (RANDOM), pages , Kasper Green Larsen and Jelani Nelson. The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction. In Proceedings of the 43rd International Colloquium on Automata, Languages and Programming (ICALP), [Mut05] S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), [NNW14] [Pis89] [SS11] [Tho96] [Wel74] [Woo14] Jelani Nelson, Huy L. Nguy ên, and David P. Woodruff. On deterministic sketching and streaming for sparse recovery and norm estimation. Linear Algebra and its Applications, Special Issue on Sparse Approximate Solution of Linear Systems, 441: , Gilles Pisier. The volume of convex bodies and Banach space geometry, volume 94 of Cambridge Tracts in Mathematics. Cambridge University Press, Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances. SIAM J. Comput., 40(6): , Anthony C. Thompson. Minkowski Geometry. Encyclopedia of Mathematics and Its Applications. Cambridge University Press, Lloyd R. Welch. Lower bounds on the maximum cross correlation of signals. IEEE Transactions on Information Theory, 20, May David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1-2):1 157,

Optimality of the Johnson-Lindenstrauss lemma

Optimality of the Johnson-Lindenstrauss lemma 58th Annual IEEE Symposium on Foundations of Computer Science Optimality of the Johnson-Lindenstrauss lemma Kasper Green Larsen Computer Science Department Aarhus University Aarhus, Denmark larsen@cs.au.dk

More information

The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction

The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

More information

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction Kasper Green Larsen Jelani Nelson Abstract For any n > 1 and 0 < ε < 1/, we show the existence of an n O(1) -point subset

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5 CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

Optimal terminal dimensionality reduction in Euclidean space

Optimal terminal dimensionality reduction in Euclidean space Optimal terminal dimensionality reduction in Euclidean space Shyam Narayanan Jelani Nelson October 22, 2018 Abstract Let ε (0, 1) and X R d be arbitrary with X having size n > 1. The Johnson- Lindenstrauss

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Sparse Johnson-Lindenstrauss Transforms

Sparse Johnson-Lindenstrauss Transforms Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

Using the Johnson-Lindenstrauss lemma in linear and integer programming

Using the Johnson-Lindenstrauss lemma in linear and integer programming Using the Johnson-Lindenstrauss lemma in linear and integer programming Vu Khac Ky 1, Pierre-Louis Poirion, Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:{vu,poirion,liberti}@lix.polytechnique.fr

More information

16 Embeddings of the Euclidean metric

16 Embeddings of the Euclidean metric 16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.

More information

dimensionality reduction for k-means and low rank approximation

dimensionality reduction for k-means and low rank approximation dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques

More information

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform

More information

Dimensionality Reduction Notes 3

Dimensionality Reduction Notes 3 Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

17 Random Projections and Orthogonal Matching Pursuit

17 Random Projections and Orthogonal Matching Pursuit 17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a

More information

An Algorithmist s Toolkit Nov. 10, Lecture 17

An Algorithmist s Toolkit Nov. 10, Lecture 17 8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from

More information

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation

More information

Sparser Johnson-Lindenstrauss Transforms

Sparser Johnson-Lindenstrauss Transforms Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)

More information

A Tutorial on Matrix Approximation by Row Sampling

A Tutorial on Matrix Approximation by Row Sampling A Tutorial on Matrix Approximation by Row Sampling Rasmus Kyng June 11, 018 Contents 1 Fast Linear Algebra Talk 1.1 Matrix Concentration................................... 1. Algorithms for ɛ-approximation

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard

More information

Sparsification by Effective Resistance Sampling

Sparsification by Effective Resistance Sampling Spectral raph Theory Lecture 17 Sparsification by Effective Resistance Sampling Daniel A. Spielman November 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened

More information

Lecture 19: Elias-Bassalygo Bound

Lecture 19: Elias-Bassalygo Bound Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecturer: Atri Rudra Lecture 19: Elias-Bassalygo Bound October 10, 2007 Scribe: Michael Pfetsch & Atri Rudra In the last lecture,

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Succinct Data Structures for Approximating Convex Functions with Applications

Succinct Data Structures for Approximating Convex Functions with Applications Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca

More information

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA) Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix

More information

Graph Sparsification I : Effective Resistance Sampling

Graph Sparsification I : Effective Resistance Sampling Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate

More information

Higher Cell Probe Lower Bounds for Evaluating Polynomials

Higher Cell Probe Lower Bounds for Evaluating Polynomials Higher Cell Probe Lower Bounds for Evaluating Polynomials Kasper Green Larsen MADALGO, Department of Computer Science Aarhus University Aarhus, Denmark Email: larsen@cs.au.dk Abstract In this paper, we

More information

1 Dimension Reduction in Euclidean Space

1 Dimension Reduction in Euclidean Space CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.

More information

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and

More information

Lecture 6: Expander Codes

Lecture 6: Expander Codes CS369E: Expanders May 2 & 9, 2005 Lecturer: Prahladh Harsha Lecture 6: Expander Codes Scribe: Hovav Shacham In today s lecture, we will discuss the application of expander graphs to error-correcting codes.

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Lecture 16 Oct. 26, 2017

Lecture 16 Oct. 26, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Piotr Indyk Lecture 16 Oct. 26, 2017 Scribe: Chi-Ning Chou 1 Overview In the last lecture we constructed sparse RIP 1 matrix via expander and showed that

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008 The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008 Emmanuel Candés (Caltech), Terence Tao (UCLA) 1 Uncertainty principles A basic principle

More information

Lecture 9: Matrix approximation continued

Lecture 9: Matrix approximation continued 0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during

More information

Lecture 18: March 15

Lecture 18: March 15 CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

Lecture Lecture 25 November 25, 2014

Lecture Lecture 25 November 25, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,

More information

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Randomness-in-Structured Ensembles for Compressed Sensing of Images Randomness-in-Structured Ensembles for Compressed Sensing of Images Abdolreza Abdolhosseini Moghadam Dep. of Electrical and Computer Engineering Michigan State University Email: abdolhos@msu.edu Hayder

More information

Lecture 2 Sept. 8, 2015

Lecture 2 Sept. 8, 2015 CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],

More information

Lecture 12: Reed-Solomon Codes

Lecture 12: Reed-Solomon Codes Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 007) Lecture 1: Reed-Solomon Codes September 8, 007 Lecturer: Atri Rudra Scribe: Michel Kulhandjian Last lecture we saw the proof

More information

4488 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 10, OCTOBER /$ IEEE

4488 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 10, OCTOBER /$ IEEE 4488 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 10, OCTOBER 2008 List Decoding of Biorthogonal Codes the Hadamard Transform With Linear Complexity Ilya Dumer, Fellow, IEEE, Grigory Kabatiansky,

More information

On metric characterizations of some classes of Banach spaces

On metric characterizations of some classes of Banach spaces On metric characterizations of some classes of Banach spaces Mikhail I. Ostrovskii January 12, 2011 Abstract. The first part of the paper is devoted to metric characterizations of Banach spaces with no

More information

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information

Combinatorics in Banach space theory Lecture 12

Combinatorics in Banach space theory Lecture 12 Combinatorics in Banach space theory Lecture The next lemma considerably strengthens the assertion of Lemma.6(b). Lemma.9. For every Banach space X and any n N, either all the numbers n b n (X), c n (X)

More information

Spectral Sparsification in Dynamic Graph Streams

Spectral Sparsification in Dynamic Graph Streams Spectral Sparsification in Dynamic Graph Streams Kook Jin Ahn 1, Sudipto Guha 1, and Andrew McGregor 2 1 University of Pennsylvania {kookjin,sudipto}@seas.upenn.edu 2 University of Massachusetts Amherst

More information

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan Wir müssen wissen, wir werden wissen. David Hilbert We now continue to study a special class of Banach spaces,

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Random Feature Maps for Dot Product Kernels Supplementary Material

Random Feature Maps for Dot Product Kernels Supplementary Material Random Feature Maps for Dot Product Kernels Supplementary Material Purushottam Kar and Harish Karnick Indian Institute of Technology Kanpur, INDIA {purushot,hk}@cse.iitk.ac.in Abstract This document contains

More information

Optimal Bounds for Johnson-Lindenstrauss Transformations

Optimal Bounds for Johnson-Lindenstrauss Transformations Journal of Machine Learning Research 9 08 - Submitted 5/8; Revised 9/8; Published 0/8 Optimal Bounds for Johnson-Lindenstrauss Transformations Michael Burr Shuhong Gao Department of Mathematical Sciences

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

An algebraic perspective on integer sparse recovery

An algebraic perspective on integer sparse recovery An algebraic perspective on integer sparse recovery Lenny Fukshansky Claremont McKenna College (joint work with Deanna Needell and Benny Sudakov) Combinatorics Seminar USC October 31, 2018 From Wikipedia:

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Compressive Sensing with Random Matrices

Compressive Sensing with Random Matrices Compressive Sensing with Random Matrices Lucas Connell University of Georgia 9 November 017 Lucas Connell (University of Georgia) Compressive Sensing with Random Matrices 9 November 017 1 / 18 Overview

More information

The Cell Probe Complexity of Dynamic Range Counting

The Cell Probe Complexity of Dynamic Range Counting The Cell Probe Complexity of Dynamic Range Counting Kasper Green Larsen MADALGO, Department of Computer Science Aarhus University Aarhus, Denmark email: larsen@cs.au.dk August 27, 2012 Abstract In this

More information

Learning convex bodies is hard

Learning convex bodies is hard Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given

More information

Def. A topological space X is disconnected if it admits a non-trivial splitting: (We ll abbreviate disjoint union of two subsets A and B meaning A B =

Def. A topological space X is disconnected if it admits a non-trivial splitting: (We ll abbreviate disjoint union of two subsets A and B meaning A B = CONNECTEDNESS-Notes Def. A topological space X is disconnected if it admits a non-trivial splitting: X = A B, A B =, A, B open in X, and non-empty. (We ll abbreviate disjoint union of two subsets A and

More information

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets

Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Emanuele Viola July 6, 2009 Abstract We prove that to store strings x {0, 1} n so that each prefix sum a.k.a. rank query Sumi := k i x k can

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes

Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes Compressed Sensing Using Reed- Solomon and Q-Ary LDPC Codes Item Type text; Proceedings Authors Jagiello, Kristin M. Publisher International Foundation for Telemetering Journal International Telemetering

More information

18.409: Topics in TCS: Embeddings of Finite Metric Spaces. Lecture 6

18.409: Topics in TCS: Embeddings of Finite Metric Spaces. Lecture 6 Massachusetts Institute of Technology Lecturer: Michel X. Goemans 8.409: Topics in TCS: Embeddings of Finite Metric Spaces September 7, 006 Scribe: Benjamin Rossman Lecture 6 Today we loo at dimension

More information

On the NP-Hardness of Bounded Distance Decoding of Reed-Solomon Codes

On the NP-Hardness of Bounded Distance Decoding of Reed-Solomon Codes On the NP-Hardness of Bounded Distance Decoding of Reed-Solomon Codes Venkata Gandikota Purdue University vgandiko@purdue.edu Badih Ghazi MIT badih@mit.edu Elena Grigorescu Purdue University elena-g@purdue.edu

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

18.5 Crossings and incidences

18.5 Crossings and incidences 18.5 Crossings and incidences 257 The celebrated theorem due to P. Turán (1941) states: if a graph G has n vertices and has no k-clique then it has at most (1 1/(k 1)) n 2 /2 edges (see Theorem 4.8). Its

More information

A Simple Proof of the Restricted Isometry Property for Random Matrices

A Simple Proof of the Restricted Isometry Property for Random Matrices DOI 10.1007/s00365-007-9003-x A Simple Proof of the Restricted Isometry Property for Random Matrices Richard Baraniuk Mark Davenport Ronald DeVore Michael Wakin Received: 17 May 006 / Revised: 18 January

More information

Linear time list recovery via expander codes

Linear time list recovery via expander codes Linear time list recovery via expander codes Brett Hemenway and Mary Wootters June 7 26 Outline Introduction List recovery Expander codes List recovery of expander codes Conclusion Our Results One slide

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit arxiv:0707.4203v2 [math.na] 14 Aug 2007 Deanna Needell Department of Mathematics University of California,

More information

Lecture Lecture 9 October 1, 2015

Lecture Lecture 9 October 1, 2015 CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest

More information

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Z Algorithmic Superpower Randomization October 15th, Lecture 12 15.859-Z Algorithmic Superpower Randomization October 15th, 014 Lecture 1 Lecturer: Bernhard Haeupler Scribe: Goran Žužić Today s lecture is about finding sparse solutions to linear systems. The problem

More information

Algorithms, Geometry and Learning. Reading group Paris Syminelakis

Algorithms, Geometry and Learning. Reading group Paris Syminelakis Algorithms, Geometry and Learning Reading group Paris Syminelakis October 11, 2016 2 Contents 1 Local Dimensionality Reduction 5 1 Introduction.................................... 5 2 Definitions and Results..............................

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Linear-algebraic pseudorandomness: Subspace Designs & Dimension Expanders

Linear-algebraic pseudorandomness: Subspace Designs & Dimension Expanders Linear-algebraic pseudorandomness: Subspace Designs & Dimension Expanders Venkatesan Guruswami Carnegie Mellon University Simons workshop on Proving and Using Pseudorandomness March 8, 2017 Based on a

More information

Asymptotically optimal induced universal graphs

Asymptotically optimal induced universal graphs Asymptotically optimal induced universal graphs Noga Alon Abstract We prove that the minimum number of vertices of a graph that contains every graph on vertices as an induced subgraph is (1+o(1))2 ( 1)/2.

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,

More information

Similarity searching, or how to find your neighbors efficiently

Similarity searching, or how to find your neighbors efficiently Similarity searching, or how to find your neighbors efficiently Robert Krauthgamer Weizmann Institute of Science CS Research Day for Prospective Students May 1, 2009 Background Geometric spaces and techniques

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

Notes taken by Costis Georgiou revised by Hamed Hatami

Notes taken by Costis Georgiou revised by Hamed Hatami CSC414 - Metric Embeddings Lecture 6: Reductions that preserve volumes and distance to affine spaces & Lower bound techniques for distortion when embedding into l Notes taken by Costis Georgiou revised

More information

Near-Optimal Algorithms for Maximum Constraint Satisfaction Problems

Near-Optimal Algorithms for Maximum Constraint Satisfaction Problems Near-Optimal Algorithms for Maximum Constraint Satisfaction Problems Moses Charikar Konstantin Makarychev Yury Makarychev Princeton University Abstract In this paper we present approximation algorithms

More information

Lecture 01 August 31, 2017

Lecture 01 August 31, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed

More information

The concentration of the chromatic number of random graphs

The concentration of the chromatic number of random graphs The concentration of the chromatic number of random graphs Noga Alon Michael Krivelevich Abstract We prove that for every constant δ > 0 the chromatic number of the random graph G(n, p) with p = n 1/2

More information