Fast Dimension Reduction

Size: px

Start display at page:

Download "Fast Dimension Reduction"

Maryann Lane
5 years ago
Views:

1 Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University

2 Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion ε with constant probability if: ( ) log(n) k = Ω ε 2

3 Applications This idea is extremely useful in Approximate nearest neighbors searches Linear Embedding / Dimensionality reduction Matrix rank-k approximation l p regression Compressed sensing and the list continues...

4 JL Property definition Pr[distortion] 1/n 2 embedding an n point metric. Definition A distribution D over k d matrices exhibits the JL Property if for x 2 = 1 and 0 < ε < 1/2 Pr [ Ψx 2 1 > ε] c 1e c 2kε 2 Ψ D k = c log(n)/ε 2 Pr[distortion] 1/n 2.

5 Unstructured constructions Other JL distributions for Ψ(i, j) being i.i.d random variables: Frankl and Maehara 1987 Indyk and Motwani 1998 Ψ(i, j) N(0, 1) DasGupta and Gupta 1999 Achlioptas 2003 Ψ(i, j) {0, 1, 1} Matousek 2006 Ψ(i, j) Symmetrically subgaussian distributed. However, these matrices are Slow to apply: O(kd) Heavy to store: O(kd)

6 Fast Johnson Lindenstrauss Transform The first JL distribution for efficiently applicable matrices. Lemma (Ailon, Chazelle (2006)) The distribution Ψ = PHD exhibits the JL property O(d log(d) + k 3 ) to apply. O(d + k 3 log(d)) to store.

7 Our motivation FJLT algorithm running time is O(d log(d) + k 3 ). 1. When the target dimension is very small k = O(log(d)) The running time is Ω(kd). 2. When the target dimension is large k = Ω((d log(d)) 1/3 ) The running time is dominated by k 3.

8 This work, Faster Johnson Lindenstrauss Transform Lemma (Ailon, Liberty (2008)) The distribution Ψ = BDΦ exhibits the JL property for k = O(d 1/2 δ ) and δ > 0 O(d log(k)) to apply O(d) to store.

9 Performance summary Slow Fast k in o(log d) FJLT JL This work k in ω(log d) and o(poly(d)) JL FJLT This work k in Ω(poly(d)) and o((d log(d) 1/3 ) JL This work, FJLT k in ω((d log d) 1/3 ) and O(d 1/2 δ ) JL FJLT This work

10 Measure concentration The randomness is only in D. How does Ms 2 concentrate around 1? ( x 2 = 1)

11 Sufficient conditions Lemma (Talagrand) Denote: µ the median of Ms 2, σ = M 2 2 the spectral norm of M. Pr [ Ms 2 µ > t] 4e t2 /8σ 2 By substitution we get the JL property for BD: If: Pr( BDx 2 1 > ε) c 1 e c 2kε 2 The columns of B are normalized to 1. σ = M 2 2 = O(k 1/2 )

12 Breaking down M 2 2 Using the definition of the spectral norm and Cauchy-Schwartz: M 2 2 B T 2 4 x 4 The l 2 l 4 operator norm is defined as such: B T 2 4 max z 2 =1 BT z 4 We seek a matrix B with a low l 2 l 4 operator norm, B T 2 4. which is fast to apply.

13 Our choice for B Lemma B is obtained from k rows of a Θ(k 2 ) Θ(k 2 ) Hadamard matrix. B can be applied in O(d log(k)) operations. Lemma Let B be a 4-wise independent matrix. (The columns of B are the sample space) B T 2 4 = O(d 1/4 k 1/2 ).

14 Controlling x 4 Since B T 2 4 = O(d 1/4 k 1/2 ) our goal: B T 2 4 x 4 = O(k 1/2 ) is achieved if x 4 = O(d 1/4 ) Problem: x 4 might be as high as 1! Solution: Preprocess x using a random Isometry Φ such that: Φx 4 = O(d 1/4 ) w.h.p Φ can be applied in O(d log(k)) operations Φ requires O(d) bits to store Lemma Such matrix distributions exist for k = O(d 1/2 δ ).

15 Recap and summary Any matrix distribution BDΦ exhibits the JL property if The columns of B are normalized to 1. Φ is an isometry. B T 2 4 Φx 4 O(k 1/2 ) w.h.p. Our result: For k = O(d 1/2 δ ) there exist B and Φ such that BDΦ requires O(d log(k)) operations to apply BDΦ requires O(d) bits to store

16 Future work and open questions Slow Fast O(d) k in o(log d) FJLT JL This work k in ω(log d) and o(poly(d)) JL FJLT This work k in Ω(poly(d)) and o((d log(d) 1/3 ) JL This work, FJLT k in ω((d log d) 1/3 ) and O(d 1/2 δ ) JL FJLT This work k in ω(d 1/2 δ ) and d JL, FJLT

17 Thank you

18 4-wise independence Definition B is 4-wise independent if: There are d/16 columns j for which ( B (j) (j) (j) (j) i 1, B i 2, B i 3, B i 4 ) = (b 1, b 2, b 3, b 4 ) for any choice of 1 i 1 < i 2 < i 3 < i 4 k and (b 1, b 2, b 3, b 4 ) {+1, 1} 4.

19 Bounding B T 2 4 Lemma Given a k d 4-wise independent code matrix B, B T 2 4 = O(d 1/4 k 1/2 ) (1) Proof. For y l k 2, y = 1, y T B 4 4 = de j [d][(y T B(j)) 4 ] = dk 2 k i 1,i 2,i 3,i 4 =1 E b1,b 2,b 3,b 4 [y i1 y i2 y i3 y i4 b 1 b 2 b 3 b 4 ] = dk 2 (3 y y ) 3dk (2)

20 Trimming the Hadamard transform Applying k rows from a Hadamard matrix, H, is like applying PH where P contains only k entrees which are 1 and the rest are zero. PH d z = ( ) ( ) ( ) H P 1 P d/2 H d/2 z1 2 H d/2 H d/2 z 2 = P 1 H d/2 (z 1 + z 2 ) + P 2 H d/2 (z 1 z 2 ) Which gives the relation T (d, k) = T (d/2, k 1 ) + T (d/2, k 2 ) + d. Since T (d, 1) = d, by induction T (d, k) 2d log(k + 1) = O(d log(k)) (3)

Fast Random Projections

Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random