Sparser Johnson-Lindenstrauss Transforms
|
|
- Todd Mathews
- 5 years ago
- Views:
Transcription
1 Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford)
2 Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)
3 Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression) compressed sensing (recover x from y when x is (near-)sparse) group-testing (as above, but Sx is Boolean multiplication) recover properties of x (entropy, heavy hitters,...) approximate norm preservation (want y x ) motif discovery (slightly different; randomly project discrete x onto subset of its coordinates) [Buhler-Tompa]
4 Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression) compressed sensing (recover x from y when x is (near-)sparse) group-testing (as above, but Sx is Boolean multiplication) recover properties of x (entropy, heavy hitters,...) approximate norm preservation (want y x ) motif discovery (slightly different; randomly project discrete x onto subset of its coordinates) [Buhler-Tompa] In many of these applications, random S is either required or obtains better parameters than deterministic constructions.
5 Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression) compressed sensing (recover x from y when x is (near-)sparse) group-testing (as above, but Sx is Boolean multiplication) recover properties of x (entropy, heavy hitters,...) approximate norm preservation (want y x ) motif discovery (slightly different; randomly project discrete x onto subset of its coordinates) [Buhler-Tompa] In many of these applications, random S is either required or obtains better parameters than deterministic constructions.
6 Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean space can be embedded into O(ε 2 log n)-dimensional Euclidean space so that all pairwise distances are preserved up to a 1 ± ε factor.
7 Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean space can be embedded into O(ε 2 log n)-dimensional Euclidean space so that all pairwise distances are preserved up to a 1 ± ε factor. Uses: Speed up geometric algorithms by first reducing dimension of input [Indyk-Motwani, 1998], [Indyk, 21] Low-memory streaming algorithms for linear algebra problems [Sarlós, 26], [LWMRT, 27], [Clarkson-Woodruff, 29] Essentially equivalent to RIP matrices from compressed sensing [Baraniuk et al., 28], [Krahmer-Ward, 211] (used for recovery of sparse signals)
8 How to prove the JL lemma Distributional JL (DJL) lemma Lemma For any < ε, δ < 1/2 there exists a distribution D ε,δ on R k d for k = O(ε 2 log(1/δ)) so that for any x of unit norm [ Pr Sx > ε ] < δ. S D ε,δ
9 How to prove the JL lemma Distributional JL (DJL) lemma Lemma For any < ε, δ < 1/2 there exists a distribution D ε,δ on R k d for k = O(ε 2 log(1/δ)) so that for any x of unit norm [ Pr Sx > ε ] < δ. S D ε,δ Proof of MJL: Set δ = 1/n 2 in DJL and x as the difference vector of some pair of points. Union bound over the ( n 2) pairs.
10 How to prove the JL lemma Distributional JL (DJL) lemma Lemma For any < ε, δ < 1/2 there exists a distribution D ε,δ on R k d for k = O(ε 2 log(1/δ)) so that for any x of unit norm [ Pr Sx > ε ] < δ. S D ε,δ Proof of MJL: Set δ = 1/n 2 in DJL and x as the difference vector of some pair of points. Union bound over the ( n 2) pairs. Theorem (Alon, 23) For every n, there exists a set of n points requiring target dimension k = Ω((ε 2 / log(1/ε)) log n). Theorem (Jayram-Woodruff, 211; Kane-Meka-N., 211) For DJL, k = Θ(ε 2 log(1/δ)) is optimal.
11 Proving the JL lemma Older proofs [Johnson-Lindenstrauss, 1984], [Frankl-Maehara, 1988]: Random rotation, then projection onto first k coordinates. [Indyk-Motwani, 1998], [Dasgupta-Gupta, 23]: Random matrix with independent Gaussian entries. [Achlioptas, 21]: Independent ±1 entries. [Clarkson-Woodruff, 29]: O(log(1/δ))-wise independent ±1 entries. [Arriaga-Vempala, 1999], [Matousek, 28]: Independent entries having mean, variance 1/k, and subgaussian tails
12 Proving the JL lemma Older proofs [Johnson-Lindenstrauss, 1984], [Frankl-Maehara, 1988]: Random rotation, then projection onto first k coordinates. [Indyk-Motwani, 1998], [Dasgupta-Gupta, 23]: Random matrix with independent Gaussian entries. [Achlioptas, 21]: Independent ±1 entries. [Clarkson-Woodruff, 29]: O(log(1/δ))-wise independent ±1 entries. [Arriaga-Vempala, 1999], [Matousek, 28]: Independent entries having mean, variance 1/k, and subgaussian tails Downside: Performing embedding is dense matrix-vector multiplication, O(k x ) time
13 Fast JL Transforms [Ailon-Chazelle, 26]: x PHDx, O(d log d + k 3 ) time P is a random sparse matrix, H is Hadamard, D has random ±1 on diagonal [Ailon-Liberty, 28]: O(d log k + k 2 ) time, also based on fast Hadamard transform [Ailon-Liberty, 211] and [Krahmer-Ward, 211]: O(d log d) for MJL, but with suboptimal k = O(ε 2 log n log 4 d).
14 Fast JL Transforms [Ailon-Chazelle, 26]: x PHDx, O(d log d + k 3 ) time P is a random sparse matrix, H is Hadamard, D has random ±1 on diagonal [Ailon-Liberty, 28]: O(d log k + k 2 ) time, also based on fast Hadamard transform [Ailon-Liberty, 211] and [Krahmer-Ward, 211]: O(d log d) for MJL, but with suboptimal k = O(ε 2 log n log 4 d). Downside: Slow to embed sparse vectors: running time is Ω(min{k x, d log d}).
15 Where Do Sparse Vectors Show Up? Document as bag of words: x i = number of occurrences of word i. Compare documents using cosine similarity. d = lexicon size; most documents aren t dictionaries Network traffic: x i,j = #bytes sent from i to j d = 2 64 (2 256 in IPv6); most servers don t talk to each other User ratings: x i is user s score for movie i on Netflix d = #movies; most people haven t rated all movies Streaming: x receives a stream of updates of the form: add v to x i. Maintaining Sx requires calculating v Se i....
16 Sparse JL transforms One way to embed sparse vectors faster: use sparse matrices.
17 Sparse JL transforms One way to embed sparse vectors faster: use sparse matrices. s = #non-zero entries per column in embedding matrix (so embedding time is s x ) reference value of s type [JL84], [FM88], [IM98],... k 4ε 2 log(1/δ) dense [Achlioptas1] k/3 sparse Bernoulli [WDALS9] no proof hashing [DKS1] Õ(ε 1 log 3 (1/δ)) hashing [KN1a], [BOR1] Õ(ε 1 log 2 (1/δ)) [KN12] O(ε 1 log(1/δ)) hashing (random codes)
18 Other related work CountSketch of [Charikar-Chen-FarachColton] gives s = O(log(1/δ)) (see [Thorup-Zhang])
19 Other related work CountSketch of [Charikar-Chen-FarachColton] gives s = O(log(1/δ)) (see [Thorup-Zhang]) Can recover (1 ± ε) x 2 from Sx, but not as Sx 2 (not an embedding into l 2 ) Not applicable in certain situations, e.g. in some nearest neighbor data structures, and when learning classifiers over projected vectors via stochastic gradient descent
20 Sparse JL Constructions [DKS, 21] s = Θ(ε 1 log 2 (1/δ))
21 Sparse JL Constructions [DKS, 21] s = Θ(ε 1 log 2 (1/δ)) [this work] s = Θ(ε 1 log(1/δ))
22 Sparse JL Constructions [DKS, 21] s = Θ(ε 1 log 2 (1/δ)) [this work] s = Θ(ε 1 log(1/δ)) [this work] k/s s = Θ(ε 1 log(1/δ))
23 Sparse JL Constructions (in matrix form) = k k/s = k Each black cell is ±1/ s at random
24 Sparse JL Constructions (nicknames) Graph construction k/s Block construction
25 Sparse JL notation (block construction) Let h(j, r), σ(j, r) be random hash location and random sign for copy of x j in rth block.
26 Sparse JL notation (block construction) Let h(j, r), σ(j, r) be random hash location and random sign for copy of x j in rth block. (Sx) i = (1/ s) h(j,r)=i x j σ(j, r)
27 Sparse JL notation (block construction) Let h(j, r), σ(j, r) be random hash location and random sign for copy of x j in rth block. (Sx) i = (1/ s) h(j,r)=i x j σ(j, r) Sx 2 2 = x s s x i x j σ(i, r)σ(j, r) 1 h(i,r)=h(j,r) r=1 i j
28 Sparse JL via Codes = k k/s = k Graph construction: Constant weight binary code of weight s. Block construction: Code over q-ary alphabet, q = k/s.
29 Sparse JL via Codes = k k/s = k Graph construction: Constant weight binary code of weight s. Block construction: Code over q-ary alphabet, q = k/s. Thm: Just need distance s O(s 2 /k) (block construction) (2s O(s 2 /k) for graph construction)
30 Analysis (block construction) k/s = k η i,j,r indicates whether i, j collide in rth block. Sx 2 2 = x Z Z = (1/s) r Z r Z r = i j x ix j σ(i, r)σ(j, r)η i,j,r
31 Analysis (block construction) k/s = k η i,j,r indicates whether i, j collide in rth block. Sx 2 2 = x Z Z = (1/s) r Z r Z r = i j x ix j σ(i, r)σ(j, r)η i,j,r Z is a quadratic form in σ, so apply known moment bounds for quadratic forms
32 Analysis k/s = k Theorem (Hanson-Wright, 1971) σ 1,..., σ n independent ±1, B R n n symmetric. For λ >, [ ] σ Pr T Bσ trace(b) > λ < e C min{λ2 / B 2 F,λ/ B 2} Reminder: B F = i,j B2 i,j B 2 is largest magnitude of eigenvalue of B
33 Analysis Z = 1 s s x i x j σ(i, r)σ(j, r)η i,j,r r=1 i j
34 Analysis Z = 1 s s x i x j σ(i, r)σ(j, r)η i,j,r = σ T Bσ r=1 i j B = 1 s B 1... B B s (B r ) i,j = x i x j η i,j,r
35 Frobenius norm bound B = 1 s B 1... B B s (B r ) i,j = x i x j η i,j,r
36 Frobenius norm bound B = 1 s B 1... B B s (B r ) i,j = x i x j η i,j,r B 2 F = 1 s 2 i j x 2 i x 2 j (#times i, j collide)
37 Frobenius norm bound B = 1 s B 1... B B s (B r ) i,j = x i x j η i,j,r B 2 F = 1 s i j x 2 2 i x j 2 (#times i, j collide) < O(1/k) x 4 2 = O(1/k) (good code!)
38 B r = 1 s Operator norm bound x1 2 x 1 x 2 η 1,2,r... x 1 x d η 1,d,r x 2 x 1 η 2,1,r x x 2 x d η 2,d,r x d x 1 η d,1,r... x d x d 1 η d,d 1,r xd 2 x s x xd 2
39 B r = 1 s Operator norm bound x1 2 x 1 x 2 η 1,2,r... x 1 x d η 1,d,r x 2 x 1 η 2,1,r x x 2 x d η 2,d,r x d x 1 η d,1,r... x d x d 1 η d,d 1,r xd 2 x s x xd 2 B r = 1 s (S r D)
40 B r = 1 s Operator norm bound x1 2 x 1 x 2 η 1,2,r... x 1 x d η 1,d,r x 2 x 1 η 2,1,r x x 2 x d η 2,d,r x d x 1 η d,1,r... x d x d 1 η d,d 1,r xd 2 x s x xd 2 B r = 1 s (S r D) D 2 = x 2 1
41 B r = 1 s Operator norm bound x1 2 x 1 x 2 η 1,2,r... x 1 x d η 1,d,r x 2 x 1 η 2,1,r x x 2 x d η 2,d,r x d x 1 η d,1,r... x d x d 1 η d,d 1,r xd 2 x s x xd 2 B r = 1 s (S r D) D 2 = x 2 1 S r = k/s i=1 u r,iu T r,i, these are the eigenvectors of S r (u r,i is projection of x onto coordinates hashing to i in rth block) S r 2 = max i u r,i 2 2 x 2 2 = 1
42 B r = 1 s Operator norm bound x1 2 x 1 x 2 η 1,2,r... x 1 x d η 1,d,r x 2 x 1 η 2,1,r x x 2 x d η 2,d,r x d x 1 η d,1,r... x d x d 1 η d,d 1,r xd 2 x s x xd 2 B r = 1 s (S r D) D 2 = x 2 1 S r = k/s i=1 u r,iu T r,i, these are the eigenvectors of S r (u r,i is projection of x onto coordinates hashing to i in rth block) S r 2 = max i u r,i 2 2 x 2 2 = 1 B r 2 (1/s) max{ S r 2, D 2 } 1/s
43 Wrapping up analysis B 2 F = O(1/k) B 2 1/s Apply Hanson-Wright:
44 Wrapping up analysis B 2 F = O(1/k) B 2 1/s Apply Hanson-Wright: Pr [ Z > ε] < e C min{ε 2 k, εs} k = Ω(log(1/δ)/ε 2 ), s = Ω(log(1/δ)/ε), QED
45 Code-based Construction: Caveat Need a sufficiently good code.
46 Code-based Construction: Caveat Need a sufficiently good code. Each pair of codewords should agree on O(s 2 /k) coordinates. Can get this with random code by Chernoff + union bound over pairs, but then need: s 2 /k log(d/δ) s k log(d/δ) = Ω(ε 1 log(d/δ) log(1/δ)).
47 Code-based Construction: Caveat Need a sufficiently good code. Each pair of codewords should agree on O(s 2 /k) coordinates. Can get this with random code by Chernoff + union bound over pairs, but then need: s 2 /k log(d/δ) s k log(d/δ) = Ω(ε 1 log(d/δ) log(1/δ)). Slightly better: Can assume d = O(ε 2 /δ) by first embedding into this dimension with s = 1 (Analysis: Chebyshev s inequality) Can get away with s = O(ε 1 log(1/(εδ)) log(1/δ)). Can we avoid the loss incurred by this union bound?
48 Code-based Construction: Caveat Need a sufficiently good code. Each pair of codewords should agree on O(s 2 /k) coordinates. Can get this with random code by Chernoff + union bound over pairs, but then need: s 2 /k log(d/δ) s k log(d/δ) = Ω(ε 1 log(d/δ) log(1/δ)). Slightly better: Can assume d = O(ε 2 /δ) by first embedding into this dimension with s = 1 (Analysis: Chebyshev s inequality) Can get away with s = O(ε 1 log(1/(εδ)) log(1/δ)). Can we avoid the loss incurred by this union bound?
49 Improving the Construction Pick h at random Analysis: Directly bound the l = log(1/δ)th moment of error term Z, then Markov s inequality Pr[ Z > ε] < ε l E[Z l ] (conditioning on the event h gives a code is too demanding)
50 Improving the Construction Pick h at random Analysis: Directly bound the l = log(1/δ)th moment of error term Z, then Markov s inequality Pr[ Z > ε] < ε l E[Z l ] (conditioning on the event h gives a code is too demanding) Z = (1/s) s r=1 i j x ix j σ(i, r)σ(j, r)η i,j,r
51 Improving the Construction Pick h at random Analysis: Directly bound the l = log(1/δ)th moment of error term Z, then Markov s inequality Pr[ Z > ε] < ε l E[Z l ] (conditioning on the event h gives a code is too demanding) Z = (1/s) s r=1 (Z = (1/s) s r=1 Z r ) i j x ix j σ(i, r)σ(j, r)η i,j,r
52 Improving the Construction Pick h at random Analysis: Directly bound the l = log(1/δ)th moment of error term Z, then Markov s inequality Pr[ Z > ε] < ε l E[Z l ] (conditioning on the event h gives a code is too demanding) Z = (1/s) s r=1 (Z = (1/s) s r=1 Z r ) E h,σ [Z l ] = 1 s l i j x ix j σ(i, r)σ(j, r)η i,j,r r 1 <...<r n t 1,...,t n>1 i t i =l ( l t 1,..., t n ) n i=1 E h,σ [Z t i r i ] Bound the tth moment of any Z r, then get the lth moment bound for Z by plugging into the above
53 Bounding E[Z t r ] Z r = i j x ix j σ(i, r)σ(j, r)η i,j,r
54 Bounding E[Z t r ] Z r = i j x ix j σ(i, r)σ(j, r)η i,j,r Monomials appearing in expansion of Zr t are in correspondence with directed multigraphs. (x 1 x 2 ) (x 3 x 4 ) (x 3 x 8 ) (x 4 x 8 ) (x 2 x 1 )
55 Bounding E[Z t r ] Z r = i j x ix j σ(i, r)σ(j, r)η i,j,r Monomials appearing in expansion of Zr t are in correspondence with directed multigraphs. (x 1 x 2 ) (x 3 x 4 ) (x 3 x 8 ) (x 4 x 8 ) (x 2 x 1 ) Monomial contributes to expectation iff all degrees even Analysis: Group monomials appearing in Zr t according to graph isomorphism class then do some combinatorics. Our analysis is tight up to a constant factor. 3
56 Bounding E[Z t r ] m = #connected components, v = #vertices, d u = degree of u E h,σ [Zr t ] = [ t ] ( t ) E x iu x ju G G t i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G u=1 η iu,j u,r u=1
57 Bounding E[Z t r ] m = #connected components, v = #vertices, d u = degree of u E h,σ [Zr t ] = [ t ] ( t ) E x iu x ju G G t i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G = ( s ) v m k G G t u=1 η iu,j u,r i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G u=1 ( t ) x iu x ju u=1
58 Bounding E[Z t r ] m = #connected components, v = #vertices, d u = degree of u E h,σ [Zr t ] = [ t ] ( t ) E x iu x ju G G t i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G = ( s ) v m k G G t G G t ( s k u=1 η iu,j u,r i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G ) v m 1 v! ( ) t d 1 /2,...,d v /2 u=1 ( t ) x iu x ju u=1
59 Bounding E[Z t r ] m = #connected components, v = #vertices, d u = degree of u E h,σ [Zr t ] = [ t ] ( t ) E x iu x ju G G t i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G = ( s ) v m k G G t G G t ( s k 2 O(t) v,m u=1 η iu,j u,r i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G ) v m 1 v! ( ) t d 1 /2,...,d v /2 ( ) v m t t v v ( s k G u=1 ( t ) x iu x ju u=1 ) d u du u
60 Bounding E[Z t r ] m = #connected components, v = #vertices, d u = degree of u E h,σ [Zr t ] = [ t ] ( t ) E x iu x ju G G t i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G = ( s ) v m k G G t G G t ( s k 2 O(t) v,m u=1 η iu,j u,r i 1 j 1,...,i t j t f ((i u,j u) t u=1 )=G ) v m 1 v! ( ) t d 1 /2,...,d v /2 ( ) v m t t v v ( s k G u=1 ( t ) x iu x ju u=1 ) d u du u
61 Bounding E[Z t r ] In the end, can show E[Z t r ] 2 O(t) { s/k (t/ log(k/s)) t t < log(k/s) otherwise Plug this into expression for E[Z l ], QED
62 Open Problems
63 Open Problems OPEN: Devise distribution which can be sampled using few random bits Current record: O(log d + log(1/ε) log(1/δ) + log(1/δ) log log(1/δ)) [Kane-Meka-N., 211] Existential: O(log d + log(1/δ)) OPEN: Prove a tight lower bound on achievable sparsity in a JL distribution. OPEN: Can we have a JL matrix such that we can multiply by any k k submatrix in k polylog(d) time? (ultimate goal) OPEN: Embed any vector in Õ(d) time into optimal k
Sparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationFast Dimension Reduction
Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationSome Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform
Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform
More informationJohnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels
Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Devdatt Dubhashi Department of Computer Science and Engineering, Chalmers University, dubhashi@chalmers.se Functional
More informationFast Random Projections
Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationAccelerated Dense Random Projections
1 Advisor: Steven Zucker 1 Yale University, Department of Computer Science. Dimensionality reduction (1 ε) xi x j 2 Ψ(xi ) Ψ(x j ) 2 (1 + ε) xi x j 2 ( n 2) distances are ε preserved Target dimension k
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationLecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationCS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT
CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT Sparse Approximations Goal: approximate a highdimensional vector x by x that is sparse, i.e., has few nonzero
More informationThe Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction
The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters
More informationCS 229r: Algorithms for Big Data Fall Lecture 17 10/28
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation
More informationHigh Dimensional Geometry, Curse of Dimensionality, Dimension Reduction
Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,
More information16 Embeddings of the Euclidean metric
16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.
More informationOptimal Bounds for Johnson-Lindenstrauss Transformations
Journal of Machine Learning Research 9 08 - Submitted 5/8; Revised 9/8; Published 0/8 Optimal Bounds for Johnson-Lindenstrauss Transformations Michael Burr Shuhong Gao Department of Mathematical Sciences
More informationFaster Johnson-Lindenstrauss style reductions
Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss
More informationSketching as a Tool for Numerical Linear Algebra
Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 1-2 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM
More information18.409: Topics in TCS: Embeddings of Finite Metric Spaces. Lecture 6
Massachusetts Institute of Technology Lecturer: Michel X. Goemans 8.409: Topics in TCS: Embeddings of Finite Metric Spaces September 7, 006 Scribe: Benjamin Rossman Lecture 6 Today we loo at dimension
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationJOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION
JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION LONG CHEN ABSTRACT. We give a brief survey of Johnson-Lindenstrauss lemma. CONTENTS 1. Introduction 1 2. JL Transform 4 2.1. An Elementary Proof
More informationRandomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy
Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy Jalaj Upadhyay Cheriton School of Computer Science University of Waterloo jkupadhy@csuwaterlooca arxiv:14102470v6
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationRandomized Algorithms
Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationThe Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction
The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction Kasper Green Larsen Jelani Nelson Abstract For any n > 1 and 0 < ε < 1/, we show the existence of an n O(1) -point subset
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationLecture 01 August 31, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed
More informationCombining geometry and combinatorics
Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss
More informationFast Dimension Reduction Using Rademacher Series on Dual BCH Codes
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty Abstract The Fast Johnson-Lindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationMANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional
Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation
More informationSparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery
Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More informationOptimality of the Johnson-Lindenstrauss Lemma
Optimality of the Johnson-Lindenstrauss Lemma Kasper Green Larsen Jelani Nelson September 7, 2016 Abstract For any integers d, n 2 and 1/(min{n, d}) 0.4999 < ε < 1, we show the existence of a set of n
More informationDense Fast Random Projections and Lean Walsh Transforms
Dense Fast Random Projections and Lean Walsh Transforms Edo Liberty, Nir Ailon, and Amit Singer Abstract. Random projection methods give distributions over k d matrices such that if a matrix Ψ (chosen
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationEdo Liberty. Accelerated Dense Random Projections
Edo Liberty Accelerated Dense Random Projections CONTENTS 1. List of common notations......................................... 5 2. Introduction to random projections....................................
More informationVery Sparse Random Projections
Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,
More informationNon-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007
Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More information17 Random Projections and Orthogonal Matching Pursuit
17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a
More informationA Unifying Framework for l 0 -Sampling Algorithms
Distributed and Parallel Databases manuscript No. (will be inserted by the editor) A Unifying Framework for l 0 -Sampling Algorithms Graham Cormode Donatella Firmani Received: date / Accepted: date Abstract
More informationSparsity Lower Bounds for Dimensionality Reducing Maps
Sparsity Lower Bounds for Dimensionality Reducing Maps Jelani Nelson Huy L. Nguy ên November 5, 01 Abstract We give near-tight lower bounds for the sparsity required in several dimensionality reducing
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationComplexity Theory of Polynomial-Time Problems
Complexity Theory of Polynomial-Time Problems Lecture 3: The polynomial method Part I: Orthogonal Vectors Sebastian Krinninger Organization of lecture No lecture on 26.05. (State holiday) 2 nd exercise
More informationComputing the Entropy of a Stream
Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationChaining Introduction With Some Computer Science Applications
Chaining Introduction With Some Computer Science Applications The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Nelson,
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationLecture 4 Thursday Sep 11, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationData Streams & Communication Complexity
Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x
More informationDimensionality Reduction Notes 3
Dimensionality Reduction Notes 3 Jelani Nelson minilek@seas.harvard.edu August 13, 2015 1 Gordon s theorem Let T be a finite subset of some normed vector space with norm X. We say that a sequence T 0 T
More informationBig Data. Big data arises in many forms: Common themes:
Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity
More informationTight Bounds for Distributed Functional Monitoring
Tight Bounds for Distributed Functional Monitoring Qin Zhang MADALGO, Aarhus University Joint with David Woodruff, IBM Almaden NII Shonan meeting, Japan Jan. 2012 1-1 The distributed streaming model (a.k.a.
More informationLecture 1: Introduction to Sublinear Algorithms
CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for
More informationSketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden
Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less
More informationLecture 8: Linear Algebra Background
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected
More informationNonlinear Estimators and Tail Bounds for Dimension Reduction in l 1 Using Cauchy Random Projections
Journal of Machine Learning Research 007 Submitted 0/06; Published Nonlinear Estimators and Tail Bounds for Dimension Reduction in l Using Cauchy Random Projections Ping Li Department of Statistical Science
More informationSimilarity searching, or how to find your neighbors efficiently
Similarity searching, or how to find your neighbors efficiently Robert Krauthgamer Weizmann Institute of Science CS Research Day for Prospective Students May 1, 2009 Background Geometric spaces and techniques
More informationAcceleration of Randomized Kaczmarz Method
Acceleration of Randomized Kaczmarz Method Deanna Needell [Joint work with Y. Eldar] Stanford University BIRS Banff, March 2011 Problem Background Setup Setup Let Ax = b be an overdetermined consistent
More informationCSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2
CSCI8980 Algorithmic Techniques for Big Data September, 03 Dr. Barna Saha Lecture Scribe: Matt Nohelty Overview We continue our discussion on data streaming models where streams of elements are coming
More informationYale University Department of Computer Science
Yale University Department of Computer Science Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Nir Ailon 1 Edo Liberty 2 YALEU/DCS/TR-1385 July 2007 1 Institute for Advanced Study, Princeton
More informationFast Moment Estimation in Data Streams in Optimal Space
Fast Moment Estimation in Data Streams in Optimal Space Daniel M. Kane Jelani Nelson Ely Porat David P. Woodruff Abstract We give a space-optimal algorithm with update time O(log 2 (1/ε) log log(1/ε))
More informationarxiv: v4 [cs.ds] 5 Apr 2013
Low Rank Approximation and Regression in Input Sparsity Time Kenneth L. Clarkson IBM Almaden David P. Woodruff IBM Almaden April 8, 2013 arxiv:1207.6365v4 [cs.ds] 5 Apr 2013 Abstract We design a new distribution
More informationA Unified Theorem on SDP Rank Reduction. yyye
SDP Rank Reduction Yinyu Ye, EURO XXII 1 A Unified Theorem on SDP Rank Reduction Yinyu Ye Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering Stanford
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationNew ways of dimension reduction? Cutting data sets into small pieces
New ways of dimension reduction? Cutting data sets into small pieces Roman Vershynin University of Michigan, Department of Mathematics Statistical Machine Learning Ann Arbor, June 5, 2012 Joint work with
More informationFully Understanding the Hashing Trick
Fully Understanding the Hashing Trick Lior Kamma, Aarhus University Joint work with Casper Freksen and Kasper Green Larsen. Recommendation and Classification PG-13 Comic Book Super Hero Sci Fi Adventure
More informationFast Random Projections using Lean Walsh Transforms Yale University Technical report #1390
Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390 Edo Liberty Nir Ailon Amit Singer Abstract We present a k d random projection matrix that is applicable to vectors
More informationGraph Sparsification I : Effective Resistance Sampling
Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationCompressive Sensing with Random Matrices
Compressive Sensing with Random Matrices Lucas Connell University of Georgia 9 November 017 Lucas Connell (University of Georgia) Compressive Sensing with Random Matrices 9 November 017 1 / 18 Overview
More informationSparse Fourier Transform (lecture 1)
1 / 73 Sparse Fourier Transform (lecture 1) Michael Kapralov 1 1 IBM Watson EPFL St. Petersburg CS Club November 2015 2 / 73 Given x C n, compute the Discrete Fourier Transform (DFT) of x: x i = 1 x n
More informationRandomized Numerical Linear Algebra: Review and Progresses
ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationOptimality of the Johnson-Lindenstrauss lemma
58th Annual IEEE Symposium on Foundations of Computer Science Optimality of the Johnson-Lindenstrauss lemma Kasper Green Larsen Computer Science Department Aarhus University Aarhus, Denmark larsen@cs.au.dk
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationFast Dimension Reduction Using Rademacher Series on Dual BCH Codes
DOI 10.1007/s005-008-9110-x Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty Received: 8 November 2007 / Revised: 9 June 2008 / Accepted: September 2008 Springer
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationNew constructions of RIP matrices with fast multiplication and fewer rows
New constructions of RIP matrices with fast multiplication and fewer rows Jelani Nelson Eric Price Mary Wootters July 8, 203 Abstract In this paper, we present novel constructions of matrices with the
More informationCOMS E F15. Lecture 22: Linearity Testing Sparse Fourier Transform
COMS E6998-9 F15 Lecture 22: Linearity Testing Sparse Fourier Transform 1 Administrivia, Plan Thu: no class. Happy Thanksgiving! Tue, Dec 1 st : Sergei Vassilvitskii (Google Research) on MapReduce model
More informationCS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5
CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given
More informationLecture 2 Sept. 8, 2015
CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],
More informationLecture 4 February 2nd, 2017
CS 224: Advanced Algorithms Spring 2017 Prof. Jelani Nelson Lecture 4 February 2nd, 2017 Scribe: Rohil Prasad 1 Overview In the last lecture we covered topics in hashing, including load balancing, k-wise
More informationNearest Neighbor Preserving Embeddings
Nearest Neighbor Preserving Embeddings Piotr Indyk MIT Assaf Naor Microsoft Research Abstract In this paper we introduce the notion of nearest neighbor preserving embeddings. These are randomized embeddings
More informationImproved Bounds on the Dot Product under Random Projection and Random Sign Projection
Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/
More informationDeclaring Independence via the Sketching of Sketches. Until August Hire Me!
Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem
More informationTutorial: Sparse Recovery Using Sparse Matrices. Piotr Indyk MIT
Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation (approximation theory, learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) Setup:
More informationPolynomial Representations of Threshold Functions and Algorithmic Applications. Joint with Josh Alman (Stanford) and Timothy M.
Polynomial Representations of Threshold Functions and Algorithmic Applications Ryan Williams Stanford Joint with Josh Alman (Stanford) and Timothy M. Chan (Waterloo) Outline The Context: Polynomial Representations,
More informationLecture 18 Nov 3rd, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different
More information