Faster Johnson-Lindenstrauss style reductions
|
|
- Simon White
- 5 years ago
- Views:
Transcription
1 Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007
2 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss Transform Sparser projections Trouble with sparse vectors? Summary 3 Bounding the mapping The Walsh-Hadamard transform Error-correcting codes Putting it together 4 References
3 Introduction Dimensionality reduction Distances For high-dimensional vector data, it is of interest to have a notion of distance between two vectors Recall that the l p norm of a vector x is ( x p = xi p) 1/p The l 2 norm corresponds to the standard Euclidean norm of a vector The l norm is the maximal absolute value of any component x = max x i i
4 Introduction Dimensionality reduction Dimensionality reduction Problem Suppose we re given an input vector x R d We want to reduce the dimensionality of x to some k < d, while preserving the l p norm Can think of this as a metric embedding problem - can we embed l d p into l k p? Formally, we have the following problem Suppose we are given an x R d, and some parameters p, ɛ. Can we find a y R k for some k = f (ɛ) so that (1 ɛ) x p y p (1 + ɛ) x p
5 Introduction The Johnson-Lindenstrauss Lemma The Johnson-Lindenstrauss Lemma The Johnson-Lindenstrauss Lemma [5] is the archetypal result for l 2 dimensionality reduction Tells us that for n points, there is an ɛ-embedding of l d 2 lo(log n/ɛ2 ) 2 Theorem Suppose {u i } i=1...n R n d. Then, for ɛ > 0 and k = O(log n/ɛ 2 ), there is a mapping f : R d R k so that ( i, j)(1 ɛ) u i u j 2 f (u i ) f (u j ) 2 (1 + ɛ) u i u j 2
6 Introduction The Johnson-Lindenstrauss Lemma Johnson-Lindenstrauss in practice Proof of Johnson-Lindenstrauss lemma is non-constructive (unfortunately!) In practise, we use the probabilistic method to do a Johnson-Lindenstrauss style reduction Insert randomness at the cost of an exact guarantee Now the guarantee becomes probabilistic
7 Introduction The Johnson-Lindenstrauss Lemma Johnson-Lindenstrauss in practice Theorem Standard version: Suppose {u i } i=1...n R n d. Then, for ɛ > 0 and k = O(β log n/ɛ 2 ), the mapping f (u i ) = 1 k u i R, where R is a d k matrix of i.i.d. Gaussian variables, satisfies with probability at least 1 1, n β ( i, j)(1 ɛ) u i u j 2 f (u i ) f (u j ) 2 (1 + ɛ) u i u j 2
8 Introduction Speeding up computation Achlioptas improvement Achlioptas [1] gave an ever simpler matrix construction: R ij = +1 probability = probability = probability = rds sparse, and simpler to construct than a Gaussian matrix With no loss in accuracy!
9 Introduction Speeding up computation A question 2 3rds sparsity is a good speedup in practise But density is still O(dk) Computing the mapping is still an O(dk) operation asymptotically Let A = {A : unit x R d, with v.h.p., (1 ɛ) Ax 2 (1+ɛ)} Question: For which A A can Ax be computed quicker than O(dk)?
10 Introduction Speeding up computation The answer? We look at two approaches that allow for quicker computation First is the Fast Johnson-Lindenstrauss transform, based on a Fourier transform Next is the Ailon-Liberty Transform, based on a Fourier transform and error correcting codes!
11 The Fast Johnson-Lindenstrauss Transform The Fast Johnson-Lindenstrauss Transform Ailon and Chazelle [2] proposed the Fast Johnson-Lindenstrauss transform Can speedup l 2 reduction from O(dk) to (roughly) O(d log d) How? Make the projection matrix even sparser Need some tricks to solve the problems associated with this Let s reverse engineer the construction...
12 The Fast Johnson-Lindenstrauss Transform Sparser projections Sparser projection matrix Use the projection matrix { ( ) N 0, 1 q p = q P 0 p = 1 q where { q = min Θ ( log 2 n ) }, 1 d Density of the matrix is O ( 1 min { log 3 n, d log n }) ɛ 2 In practise, this is typically significantly sparser than Achlioptas matrix
13 The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors? What do we lose? Can follow standard concentration-proof methods But we end up needing to assume that x is bounded - namely, that information is spread out We fail on vectors like x = (1, 0,..., 0) i.e. sparse data and a sparse projection don t mix well So are we forced to choose between generality or usefulness?
14 The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors? What do we lose? Can follow standard concentration-proof methods But we end up needing to assume that x is bounded - namely, that information is spread out We fail on vectors like x = (1, 0,..., 0) i.e. sparse data and a sparse projection don t mix well So are we forced to choose between generality or usefulness? Not if we try to insert randomness...
15 The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors? A clever idea Can we randomly transform x so that Φ(x) 2 = x 2 Φ(x) is bounded with v.h.p.?
16 The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors? A clever idea Can we randomly transform x so that Φ(x) 2 = x 2 Φ(x) is bounded with v.h.p.? Answer: Yes! Use a Fourier transform Φ = F Distance preserving Has an uncertainty principle - a signal and its Fourier transform cannot both be concentrated Use the FFT to give an O(d log d) random mapping Details on the specifics in next section...
17 The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors? Applying a Fourier transform Fourier transform will guarantee that x = ω(1) x = o(1) But now we will be in trouble if the input is uniformly distributed! To deal with this, do a random sign change: x = Dx where D is a random diagonal ±1 matrix Now we get a guarantee of spread with high probability, so the random Fourier transform gives us back generality
18 The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors? Random sign change The sign change mapping Dx will give us d 1 x 1 ±x 1 d 2 x 2 x =. = ±x 2. d d x d ±x d where the ± are attained with equal probability Clearly norm preserving
19 The Fast Johnson-Lindenstrauss Transform Trouble with sparse vectors? Putting it together So, we compute the mapping f : x PF(Dx) Runtime will be ( { }) d log n O d log d + min ɛ 2, log3 n ɛ 2 Under some loose conditions, runtime is O ( max { d log d, k 3}) [ If k Ω(log d), O( ] d), this is quicker than the O(dk) simple mapping In practise, upper bound is reasonable, lower bound might not be though
20 The Fast Johnson-Lindenstrauss Transform Summary Summary Tried increasing sparsity with disregard for generality Used randomization to get back generality (probabilistically) Key ingredient was a Fourier transform, with a randomization step first
21 Ailon and Liberty [3] improved the runtime from O(d log d) to O(d log k), for k = O ( d 1/2 δ), δ > 0 Idea: Sparsity isn t the only way to speedup computation time Can also speedup runtime when the projection matrix has a special structure So find a matrix with a convenient structure and which will satisfy the JL property
22 Operator norm We need something called the operator norm in our analysis The operator norm of a transformation matrix A is A p q = sup Ax q x p=1 i.e. maximal q norm of the transformation of unit l p -norm points A fact we will need to employ: A p1 p 2 = A T q2 q 1 where 1 p q 1 = 1, 1 p q 2 = 1
23 Reverse engineering Let s say the mapping is a matrix multiplication In particular, say we have a mapping of the form f : x BDx where B is some k d matrix with unit columns, and D is a diagonal matrix whose entries are randomly ±1 Doing a random sign change again Now we just need to see what properties we will need B to satisfy in order for BDx 2 x 2
24 Bounding the mapping Bounding the mapping Easy to see that B 11 d 1 x B 1d d d x d BDx =. B k1 d 1 x B kd d d x d Write as BDx = Mz, where M (i) = x i B (i) z = [ d 1... d d ] T There is a special name for a vector like Mz...
25 Bounding the mapping Rademacher series Definition If M is an arbitrary k d real matrix, and z R d is so that { +1 p = 1/2 z i = 1 p = 1/2 then Mz is called a Rademacher random variable. This is a vector whose entries are arbitrary sums/differences of each of the entires in rows of M. Such a variable is interesting because of a powerful theorem...
26 Bounding the mapping Talagrand s theorem Theorem Suppose M, z are as above. Let Z = Mz p, and let σ = M 2 p µ = median(z) Then, Pr [ Z µ > t] 4e t2 /8σ 2 (see [6]) σ (the deviation ) is the maximal p-norm of all points on the unit circle Theorem says that the norm of a Rademacher variable is sharply concentrated about the median
27 Bounding the mapping Implications for us Our mapping, BDx, has given us a Rademacher random variable We know that we can apply Talagrand s theorem to get a concentration result So, all we need to do is find out what the median and deviation are...
28 Bounding the mapping Deviation Let Y = BDx 2 = Mz 2 Deviation is σ = sup y 2=1 y T M 2 ( d ) 1/2 = sup xi (y 2 B (i)) 2 i=1 ( d ) 1/4 x 4 sup (y T B (i) ) 4 i=1 by Cauchy-Schwartz = x 4 B T 2 4
29 Bounding the mapping What do we need? So, σ x 4 B T 2 4 Fact: 1 µ 32σ Can combine to get Pr [ Y 1 > t] c 0 e c 1t 2 /( x 2 4 BT )
30 Bounding the mapping What do we need? So, σ x 4 B T 2 4 Fact: 1 µ 32σ Can combine to get Pr [ Y 1 > t] c 0 e c 1t 2 /( x 2 4 BT ) Result: We need to control both x 4 and B T 2 4 i.e. we want them both to be small If we manage this, we ve got our concentration bound
31 Bounding the mapping The two ingredients To get the concentration bound, we need to ensure that x 4, B T 2 4 are sufficiently small How to control x 4? How to control B T 2 4?
32 Bounding the mapping The two ingredients To get the concentration bound, we need to ensure that x 4, B T 2 4 are sufficiently small How to control x 4? Use repeated Fourier/Walsh-Hadamard transforms How to control B T 2 4?
33 Bounding the mapping The two ingredients To get the concentration bound, we need to ensure that x 4, B T 2 4 are sufficiently small How to control x 4? Use repeated Fourier/Walsh-Hadamard transforms How to control B T 2 4? Use error correcting codes
34 The Walsh-Hadamard transform Controlling x 4 Problem: Input x is adversarial - so how to make x 4 small?
35 The Walsh-Hadamard transform Controlling x 4 Problem: Input x is adversarial - so how to make x 4 small? Solution: Use an isometric mapping Φ, with a guarantee that Φx 4 is small with very high probability
36 The Walsh-Hadamard transform Controlling x 4 Problem: Input x is adversarial - so how to make x 4 small? Solution: Use an isometric mapping Φ, with a guarantee that Φx 4 is small with very high probability Problem: What is such a Φ?
37 The Walsh-Hadamard transform Controlling x 4 Problem: Input x is adversarial - so how to make x 4 small? Solution: Use an isometric mapping Φ, with a guarantee that Φx 4 is small with very high probability Problem: What is such a Φ? (Final!) Solution: Back to the Fourier transform!
38 The Walsh-Hadamard transform The Discrete Fourier transform Discrete Fourier transform on {a 0, a 1,..., a N 1 } is a k = N 1 n=0 N 1 n=0 a n e 2πikn/N a n ( e 2πik/N) n Can think of it as a polynomial evaluation - if P(x) = a 0 + a 1 x + a 2 x a N 1 x N 1 then we have a k P (e 2πik/N)
39 The Walsh-Hadamard transform The finite-field Fourier transform Notice that ω k = e 2πik/N 1 satisfies (ω k ) N = 1 ω k is a primitive root of 1 Transform is a k P (ω k) for any primitive root ω
40 The Walsh-Hadamard transform The multi-dimensional Fourier transform We can also consider the transform of multi-dimensional data 1-D case: a k υ-d case: If n = (n 1,..., n υ ), a k N 1 n=0 N 1 n 1,...,n υ=0 a n ω kn a n ω k.n
41 The Walsh-Hadamard transform The Walsh-Hadamard transform Consider the case N = 2, ω = 1 [7]: a k1,k 2 1 n 1,n 2 =0 a n1,n 2 ( 1) k 1n 1 +k 2 n 2 This is called the Walsh-Hadamard transform Intuition: Instead of using sinusoidal basis functions, use square-wave functions The square waves are called Walsh-functions Why not the standard discrete FT? We use a technical property about the Walsh-Hadamard transform matrix...
42 The Walsh-Hadamard transform Fourier transform on the binary hyper-cube Suppose we work with F 2 = {0, 1} We can encode the Fourier transform with the Walsh-Hadamard matrix H d, H d (i, j) = 1 ( 1)<i 1,j 1> 2d/2 where < i, j > is the dot-product of i, j as expressed in binary Fact: H d = 1 [ ] Hd/2 H d/2 2 H d/2 H d/2 Corollary: We can compute H d in O(d log d) time
43 The Walsh-Hadamard transform Example of Hadamard matrix When d = 4, we get H 4 = Note entries are always ±1
44 The Walsh-Hadamard transform Fourier again? Let Φ : x H d D 0 x D 0 as before a random diagonal ±1 matrix Already know that it will preserve the l 2 norm But is Φ(x) 4 small?
45 The Walsh-Hadamard transform Fourier again? Let Φ : x H d D 0 x D 0 as before a random diagonal ±1 matrix Already know that it will preserve the l 2 norm But is Φ(x) 4 small? Answer: Yes - by another application of Talagrand s theorem!
46 The Walsh-Hadamard transform Towards Talagrand Need σ, µ for Talagrand s theorem Write Φ(x) = Mz as before, where M (i) = x i H (i) Estimate deviation: σ = M 2 4 = M T 4 2 (from earlier fact) 3 ( ) ( 1/4 x 4 i sup (y T H (i)) ) 4 1/4 y 4/3 =1 = x 4 H 4 3 4
47 The Walsh-Hadamard transform Some magic Theorem We now employ the following theorem [4] Haussdorf-Young theorem. For any p [1, 2], if H is the Hadamard matrix, and 1 p + 1 q = 1, then As a result, for p = 4 3, H p q d.d 1 p σ x 4 d 1/4 Further, we have the following fact (see [3] for proof!)... ( ) Fact: µ = O 1 d 1/4
48 The Walsh-Hadamard transform Getting the desired result With the above σ, µ, an application of Talagrand, along with the assumption k = O(d 1/2 δ ), reveals HD 0 x 4 c 0 d 1/4 + c 1 d δ/2 x 4 If we compose the mapping, HD 1 (HD 0 x) c 0 d 1/4 + c 0 c 1 d 1/4 δ/2 + c 2 1 d δ x 4 If we repeat this r = 1 2δ times, HD r 1 HD r 2... HD 0 x 4 = O (d 1/4)
49 The Walsh-Hadamard transform Our resultant transform To control x 4, use the composed transform Φ (r) : x HD r 1 HD r 2... HD 0 x We manage to preserve x 2, and contract Runtime is O ( ) d log d δ Φ r x 4
50 Error-correcting codes Error-correcting codes The Hadamard matrix also has a connection to error-correcting codes Such codes look to represent one s message in such a way that it can be decoded correctly even if there are some errors during transmission Suppose we want to send out a message to a decoder which allows for at most d errors i.e. we can recover from d or less errors in the transmission Fact: [ By ] choosing our code-words from the matrix H2d, where 1 0, we can correct up to d errors H 2d
51 Error-correcting codes Code matrix An m d matrix A is called a code matrix if H d (i 1, :) d H d (i 2, :) A = m. H d (i m, :) Picking out only m out of d rows of the Hadamard matrix
52 Error-correcting codes Independence in codes A code matrix is called a-wise independent if exactly d 2 a columns agree in a places Independence is very useful for us: Theorem Suppose B is a k d, 4-wise independent code matrix. Then, ( ) B T d 1/4 2 4 = O k
53 Error-correcting codes Proof of theorem Recall that we need to bound Consider: B T 2 4 = y T B 4 4 = de [ (y T B(j)) 4] = d k 2 i 1 i 2 i 3 sup y T B 4 y 2 =1 i 4 E [y i1 y i2 y i3 y i4 b 1 b 2 b 3 b 4 ] Consequently, = d k 2 (3 y y 4 4) 3d k 2 B T 2 4 (3d)1/4 k
54 Error-correcting codes Making our matrix We re set if we get a k d, 4-wise independent code matrix Problem: How do we make such a matrix?
55 Error-correcting codes Making our matrix We re set if we get a k d, 4-wise independent code matrix Problem: How do we make such a matrix? Fact: There exists a 4-wise independent code matrix of size k BCH(k) = Θ(k 2 ) Called the BCH code matrix Which is good, because...
56 Error-correcting codes Making our matrix We re set if we get a k d, 4-wise independent code matrix Problem: How do we make such a matrix? Fact: There exists a 4-wise independent code matrix of size k BCH(k) = Θ(k 2 ) Called the BCH code matrix Which is good, because... Fact: By padding and copy-pasting, we retain independence. In particular, we can construct a k d matrix from a k BCH(k) matrix: B = [ ] B BCH B BCH... B BCH }{{} d BCH(k) copies
57 Error-correcting codes Time to make matrix Time to compute the mapping x Bx? d We have to do BCH(k) mappings B BCHx BCH Each such mapping can be done via a Walsh-Hadamard transform, by construction of BCH codes Takes time O(BCH(k). log BCH(k)) Total runtime is therefore O(d log k)
58 Putting it together Merging results Use the randomized Fourier transform to keep x 4 small O(d log d) time Use the error-correcting code matrix to keep B 2 4 small O(d log k) time Result: We get the concentration bound!
59 Putting it together Runtime Runtime is still going to be O(d log d) Question: Can we speed up the computation of Φ (r)?
60 Putting it together Runtime Runtime is still going to be O(d log d) Question: Can we speed up the computation of Φ (r)? Answer: Yes - use the same block idea as with the error-correcting codes Some rather technical calculation reveals this will still work
61 Putting it together Blocked transform Choose β = BCH(k).k δ = Θ(k 2+δ ) Let H 1 H 2 H =... Hd/β where each H i is of size β β Fact: The above mapping can replace Φ r The mapping HD x can be computed in time O(d log k), so our total runtime is O(d log k)
62 Putting it together A tabular comparison Runtimes of the three approaches (standard JL, Fast JLT, and Ailon-Liberty) (from [3]): k = o(log d) k [ω(log d), o(poly(d))] k [Ω(poly(d)), o((d log d) 1/3 )] k [ω((d log d) 1/3 ), O(d 1/2 δ )] AL AL AL, FJLT AL JL FJLT FJLT FJLT JL JL JL
63 Putting it together Conclusion l 2 dimensionality reduction is based on the Johnson-Lindenstrauss lemma The standard approach takes O(dk) time to perform the reduction By sparsifying, and compensating with a randomized Fourier transform, we can reduce the runtime to roughly O(d log d) via the Fast Johnson-Lindenstrauss transform [2] By using error-correcting codes and a randomized Fourier transform, we can reduce the runtime to roughly O(d log k) via Ailon and Liberty s transform [3] Open questions: Can one extend this to k = O(d 1 δ )? k = Ω(d)?
64 References Achlioptas, D. Database-friendly random projections. In PODS 01: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (New York, NY, USA, 2001), ACM Press, pp Ailon, N., and Chazelle, B. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In STOC 06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing (New York, NY, USA, 2006), ACM Press, pp Ailon, N., and Liberty, E. Fast dimension reduction using Rademacher series on dual BCH codes. Tech. Rep. TR07-070, Electronic Colloquium on Computational Complexity, Bergh, J., and Lofstrom, J. Interpolation Spaces. Springer-Verlag, 1976.
65 References Johnson, W., and Lindenstrauss, J. Extensions of Lipschitz mappings into a Hilbert space. In Conference in Modern Analysis and Probability (Providence, RI, USA, 1984), American Mathematical Society, pp Ledoux, M., and Talagrand, M. Probability in Banach Spaces: Isoperimetry and Processes. Springer, Massey, J. L. Design and analysis of block ciphers. dabcmay2000f3.pdf, May Presentation.
Fast Random Projections
Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random
More informationVery Sparse Random Projections
Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,
More informationSome Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform
Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform
More informationFast Dimension Reduction
Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationFast Dimension Reduction Using Rademacher Series on Dual BCH Codes
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty Abstract The Fast Johnson-Lindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel
More informationAccelerated Dense Random Projections
1 Advisor: Steven Zucker 1 Yale University, Department of Computer Science. Dimensionality reduction (1 ε) xi x j 2 Ψ(xi ) Ψ(x j ) 2 (1 + ε) xi x j 2 ( n 2) distances are ε preserved Target dimension k
More informationYale University Department of Computer Science
Yale University Department of Computer Science Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Nir Ailon 1 Edo Liberty 2 YALEU/DCS/TR-1385 July 2007 1 Institute for Advanced Study, Princeton
More information16 Embeddings of the Euclidean metric
16 Embeddings of the Euclidean metric In today s lecture, we will consider how well we can embed n points in the Euclidean metric (l 2 ) into other l p metrics. More formally, we ask the following question.
More informationFast Dimension Reduction Using Rademacher Series on Dual BCH Codes
DOI 10.1007/s005-008-9110-x Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes Nir Ailon Edo Liberty Received: 8 November 2007 / Revised: 9 June 2008 / Accepted: September 2008 Springer
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationDense Fast Random Projections and Lean Walsh Transforms
Dense Fast Random Projections and Lean Walsh Transforms Edo Liberty, Nir Ailon, and Amit Singer Abstract. Random projection methods give distributions over k d matrices such that if a matrix Ψ (chosen
More informationEdo Liberty. Accelerated Dense Random Projections
Edo Liberty Accelerated Dense Random Projections CONTENTS 1. List of common notations......................................... 5 2. Introduction to random projections....................................
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationNotes taken by Costis Georgiou revised by Hamed Hatami
CSC414 - Metric Embeddings Lecture 6: Reductions that preserve volumes and distance to affine spaces & Lower bound techniques for distortion when embedding into l Notes taken by Costis Georgiou revised
More informationFast Random Projections using Lean Walsh Transforms Yale University Technical report #1390
Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390 Edo Liberty Nir Ailon Amit Singer Abstract We present a k d random projection matrix that is applicable to vectors
More informationYale university technical report #1402.
The Mailman algorithm: a note on matrix vector multiplication Yale university technical report #1402. Edo Liberty Computer Science Yale University New Haven, CT Steven W. Zucker Computer Science and Appled
More informationJOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION
JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION LONG CHEN ABSTRACT. We give a brief survey of Johnson-Lindenstrauss lemma. CONTENTS 1. Introduction 1 2. JL Transform 4 2.1. An Elementary Proof
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationA fast randomized algorithm for overdetermined linear least-squares regression
A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm
More informationHigh Dimensional Geometry, Curse of Dimensionality, Dimension Reduction
Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,
More informationNon-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007
Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where
More informationSparser Johnson-Lindenstrauss Transforms
Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)
More informationarxiv: v1 [cs.ds] 30 May 2010
ALMOST OPTIMAL UNRESTRICTED FAST JOHNSON-LINDENSTRAUSS TRANSFORM arxiv:005.553v [cs.ds] 30 May 200 NIR AILON AND EDO LIBERTY Abstract. The problems of random projections and sparse reconstruction have
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationDecoding Reed-Muller codes over product sets
Rutgers University May 30, 2016 Overview Error-correcting codes 1 Error-correcting codes Motivation 2 Reed-Solomon codes Reed-Muller codes 3 Error-correcting codes Motivation Goal: Send a message Don t
More informationDimensionality reduction of SDPs through sketching
Technische Universität München Workshop on "Probabilistic techniques and Quantum Information Theory", Institut Henri Poincaré Joint work with Andreas Bluhm arxiv:1707.09863 Semidefinite Programs (SDPs)
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationA Non-sparse Tutorial on Sparse FFTs
A Non-sparse Tutorial on Sparse FFTs Mark Iwen Michigan State University April 8, 214 M.A. Iwen (MSU) Fast Sparse FFTs April 8, 214 1 / 34 Problem Setup Recover f : [, 2π] C consisting of k trigonometric
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationCell-Probe Proofs and Nondeterministic Cell-Probe Complexity
Cell-obe oofs and Nondeterministic Cell-obe Complexity Yitong Yin Department of Computer Science, Yale University yitong.yin@yale.edu. Abstract. We study the nondeterministic cell-probe complexity of static
More informationMANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional
Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation
More informationLecture 4: Graph Limits and Graphons
Lecture 4: Graph Limits and Graphons 36-781, Fall 2016 3 November 2016 Abstract Contents 1 Reprise: Convergence of Dense Graph Sequences 1 2 Calculating Homomorphism Densities 3 3 Graphons 4 4 The Cut
More informationNearest Neighbor Preserving Embeddings
Nearest Neighbor Preserving Embeddings Piotr Indyk MIT Assaf Naor Microsoft Research Abstract In this paper we introduce the notion of nearest neighbor preserving embeddings. These are randomized embeddings
More informationProblem Set 6: Solutions Math 201A: Fall a n x n,
Problem Set 6: Solutions Math 201A: Fall 2016 Problem 1. Is (x n ) n=0 a Schauder basis of C([0, 1])? No. If f(x) = a n x n, n=0 where the series converges uniformly on [0, 1], then f has a power series
More informationLocality Sensitive Hashing
Locality Sensitive Hashing February 1, 016 1 LSH in Hamming space The following discussion focuses on the notion of Locality Sensitive Hashing which was first introduced in [5]. We focus in the case of
More informationLecture 19 : Reed-Muller, Concatenation Codes & Decoding problem
IITM-CS6845: Theory Toolkit February 08, 2012 Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem Lecturer: Jayalal Sarma Scribe: Dinesh K Theme: Error correcting codes In the previous lecture,
More informationNotions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy
Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationCS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction
CS168: The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction Tim Roughgarden & Gregory Valiant April 12, 2017 1 The Curse of Dimensionality in the Nearest Neighbor Problem Lectures #1 and
More informationLattice Cryptography
CSE 06A: Lattice Algorithms and Applications Winter 01 Instructor: Daniele Micciancio Lattice Cryptography UCSD CSE Many problems on point lattices are computationally hard. One of the most important hard
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationAcceleration of Randomized Kaczmarz Method
Acceleration of Randomized Kaczmarz Method Deanna Needell [Joint work with Y. Eldar] Stanford University BIRS Banff, March 2011 Problem Background Setup Setup Let Ax = b be an overdetermined consistent
More informationSparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationHilbert Spaces. Contents
Hilbert Spaces Contents 1 Introducing Hilbert Spaces 1 1.1 Basic definitions........................... 1 1.2 Results about norms and inner products.............. 3 1.3 Banach and Hilbert spaces......................
More informationLecture: Expanders, in theory and in practice (2 of 2)
Stat260/CS294: Spectral Graph Methods Lecture 9-02/19/2015 Lecture: Expanders, in theory and in practice (2 of 2) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.
More informationIntroduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011
Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationDivide and Conquer. Maximum/minimum. Median finding. CS125 Lecture 4 Fall 2016
CS125 Lecture 4 Fall 2016 Divide and Conquer We have seen one general paradigm for finding algorithms: the greedy approach. We now consider another general paradigm, known as divide and conquer. We have
More informationLecture Introduction. 2 Formal Definition. CS CTT Current Topics in Theoretical CS Oct 30, 2012
CS 59000 CTT Current Topics in Theoretical CS Oct 30, 0 Lecturer: Elena Grigorescu Lecture 9 Scribe: Vivek Patel Introduction In this lecture we study locally decodable codes. Locally decodable codes are
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationLecture 14: Cryptographic Hash Functions
CSE 599b: Cryptography (Winter 2006) Lecture 14: Cryptographic Hash Functions 17 February 2006 Lecturer: Paul Beame Scribe: Paul Beame 1 Hash Function Properties A hash function family H = {H K } K K is
More informationLipschitz p-convex and q-concave maps
Lipschitz p-convex and q-concave maps J. Alejandro Chávez-Domínguez Instituto de Ciencias Matemáticas, CSIC-UAM-UCM-UC3M, Madrid and Department of Mathematics, University of Texas at Austin Conference
More informationImproved Bounds on the Dot Product under Random Projection and Random Sign Projection
Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/
More informationRandom Feature Maps for Dot Product Kernels Supplementary Material
Random Feature Maps for Dot Product Kernels Supplementary Material Purushottam Kar and Harish Karnick Indian Institute of Technology Kanpur, INDIA {purushot,hk}@cse.iitk.ac.in Abstract This document contains
More informationTalagrand's Inequality
Talagrand's Inequality Aukosh Jagannath October 2, 2013 Abstract Notes on Talagrand's inequality. This material is a mixture from Talagrand's two famous papers and Ledoux. 1 Introduction The goal of this
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationBasic Properties of Metric and Normed Spaces
Basic Properties of Metric and Normed Spaces Computational and Metric Geometry Instructor: Yury Makarychev The second part of this course is about metric geometry. We will study metric spaces, low distortion
More informationLecture 3: Error Correcting Codes
CS 880: Pseudorandomness and Derandomization 1/30/2013 Lecture 3: Error Correcting Codes Instructors: Holger Dell and Dieter van Melkebeek Scribe: Xi Wu In this lecture we review some background on error
More informationCS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT
CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT Sparse Approximations Goal: approximate a highdimensional vector x by x that is sparse, i.e., has few nonzero
More informationLecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 7: Schwartz-Zippel Lemma, Perfect Matching Lecturer: Shayan Oveis Gharan 01/30/2017 Scribe: Philip Cho Disclaimer: These notes have not
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More information4.4 Uniform Convergence of Sequences of Functions and the Derivative
4.4 Uniform Convergence of Sequences of Functions and the Derivative Say we have a sequence f n (x) of functions defined on some interval, [a, b]. Let s say they converge in some sense to a function f
More informationThe Fast Fourier Transform
The Fast Fourier Transform 1 Motivation: digital signal processing The fast Fourier transform (FFT) is the workhorse of digital signal processing To understand how it is used, consider any signal: any
More informationCS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5
CS 229r: Algorithms for Big Data Fall 215 Prof. Jelani Nelson Lecture 19 Nov 5 Scribe: Abdul Wasay 1 Overview In the last lecture, we started discussing the problem of compressed sensing where we are given
More informationHardness of Embedding Metric Spaces of Equal Size
Hardness of Embedding Metric Spaces of Equal Size Subhash Khot and Rishi Saket Georgia Institute of Technology {khot,saket}@cc.gatech.edu Abstract. We study the problem embedding an n-point metric space
More informationA Simple Proof of the Restricted Isometry Property for Random Matrices
DOI 10.1007/s00365-007-9003-x A Simple Proof of the Restricted Isometry Property for Random Matrices Richard Baraniuk Mark Davenport Ronald DeVore Michael Wakin Received: 17 May 006 / Revised: 18 January
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationEstimation of the Click Volume by Large Scale Regression Analysis May 15, / 50
Estimation of the Click Volume by Large Scale Regression Analysis Yuri Lifshits, Dirk Nowotka [International Computer Science Symposium in Russia 07, Ekaterinburg] Presented by: Aditya Menon UCSD May 15,
More informationRecovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations
Recovery Based on Kolmogorov Complexity in Underdetermined Systems of Linear Equations David Donoho Department of Statistics Stanford University Email: donoho@stanfordedu Hossein Kakavand, James Mammen
More informationFréchet embeddings of negative type metrics
Fréchet embeddings of negative type metrics Sanjeev Arora James R Lee Assaf Naor Abstract We show that every n-point metric of negative type (in particular, every n-point subset of L 1 ) admits a Fréchet
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationThe Rademacher Cotype of Operators from l N
The Rademacher Cotype of Operators from l N SJ Montgomery-Smith Department of Mathematics, University of Missouri, Columbia, MO 65 M Talagrand Department of Mathematics, The Ohio State University, 3 W
More information1 Dimension Reduction in Euclidean Space
CSIS0351/8601: Randomized Algorithms Lecture 6: Johnson-Lindenstrauss Lemma: Dimension Reduction Lecturer: Hubert Chan Date: 10 Oct 011 hese lecture notes are supplementary materials for the lectures.
More informationReport on PIR with Low Storage Overhead
Report on PIR with Low Storage Overhead Ehsan Ebrahimi Targhi University of Tartu December 15, 2015 Abstract Private information retrieval (PIR) protocol, introduced in 1995 by Chor, Goldreich, Kushilevitz
More informationNumerical decoding, Johnson-Lindenstrauss transforms, and linear codes
Clemson University TigerPrints All Dissertations Dissertations 12-2014 Numerical decoding, Johnson-Lindenstrauss transforms, and linear codes Yue Mao Clemson University, yuem@gclemsonedu Follow this and
More informationAn Algorithmist s Toolkit Nov. 10, Lecture 17
8.409 An Algorithmist s Toolkit Nov. 0, 009 Lecturer: Jonathan Kelner Lecture 7 Johnson-Lindenstrauss Theorem. Recap We first recap a theorem (isoperimetric inequality) and a lemma (concentration) from
More informationChapter 5 Divide and Conquer
CMPT 705: Design and Analysis of Algorithms Spring 008 Chapter 5 Divide and Conquer Lecturer: Binay Bhattacharya Scribe: Chris Nell 5.1 Introduction Given a problem P with input size n, P (n), we define
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationComplex Analysis for F2
Institutionen för Matematik KTH Stanislav Smirnov stas@math.kth.se Complex Analysis for F2 Projects September 2002 Suggested projects ask you to prove a few important and difficult theorems in complex
More informationFinite Metric Spaces & Their Embeddings: Introduction and Basic Tools
Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools Manor Mendel, CMI, Caltech 1 Finite Metric Spaces Definition of (semi) metric. (M, ρ): M a (finite) set of points. ρ a distance function
More informationA Unified Theorem on SDP Rank Reduction. yyye
SDP Rank Reduction Yinyu Ye, EURO XXII 1 A Unified Theorem on SDP Rank Reduction Yinyu Ye Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering Stanford
More informationLecture 4: Codes based on Concatenation
Lecture 4: Codes based on Concatenation Error-Correcting Codes (Spring 206) Rutgers University Swastik Kopparty Scribe: Aditya Potukuchi and Meng-Tsung Tsai Overview In the last lecture, we studied codes
More informationLearning convex bodies is hard
Learning convex bodies is hard Navin Goyal Microsoft Research India navingo@microsoft.com Luis Rademacher Georgia Tech lrademac@cc.gatech.edu Abstract We show that learning a convex body in R d, given
More information12. Perturbed Matrices
MAT334 : Applied Linear Algebra Mike Newman, winter 208 2. Perturbed Matrices motivation We want to solve a system Ax = b in a context where A and b are not known exactly. There might be experimental errors,
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016
U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest
More informationRandomized Algorithms
Randomized Algorithms 南京大学 尹一通 Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1
More informationCompressed Sensing: Lecture I. Ronald DeVore
Compressed Sensing: Lecture I Ronald DeVore Motivation Compressed Sensing is a new paradigm for signal/image/function acquisition Motivation Compressed Sensing is a new paradigm for signal/image/function
More informationJohnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels
Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Devdatt Dubhashi Department of Computer Science and Engineering, Chalmers University, dubhashi@chalmers.se Functional
More informationarxiv: v1 [math.pr] 22 Dec 2018
arxiv:1812.09618v1 [math.pr] 22 Dec 2018 Operator norm upper bound for sub-gaussian tailed random matrices Eric Benhamou Jamal Atif Rida Laraki December 27, 2018 Abstract This paper investigates an upper
More informationLecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationGRAPH PARTITIONING USING SINGLE COMMODITY FLOWS [KRV 06] 1. Preliminaries
GRAPH PARTITIONING USING SINGLE COMMODITY FLOWS [KRV 06] notes by Petar MAYMOUNKOV Theme The algorithmic problem of finding a sparsest cut is related to the combinatorial problem of building expander graphs
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationRandomized algorithms for the low-rank approximation of matrices
Randomized algorithms for the low-rank approximation of matrices Yale Dept. of Computer Science Technical Report 1388 Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert
More informationReductions for One-Way Functions
Reductions for One-Way Functions by Mark Liu A thesis submitted in partial fulfillment of the requirements for degree of Bachelor of Science (Honors Computer Science) from The University of Michigan 2013
More informationCS Topics in Cryptography January 28, Lecture 5
CS 4501-6501 Topics in Cryptography January 28, 2015 Lecture 5 Lecturer: Mohammad Mahmoody Scribe: Ameer Mohammed 1 Learning with Errors: Motivation An important goal in cryptography is to find problems
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More information