Simple and Deterministic Matrix Sketches

Size: px

Start display at page:

Download "Simple and Deterministic Matrix Sketches"

Ilene Boyd
6 years ago
Views:

1 Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41

2 Data Matrices Often our data is represented by a matrix. Edo Liberty: Simple and Deterministic Matrix Sketches 2 / 41

3 Data Matrices which is often too large to work with on a single machine... Data Columns Rows d n sparse Textual Documents Words > yes Actions Users Types > 10 8 yes Visual Images Pixels, SIFT > 10 9 no Audio Songs, tracks Frequencies > 10 9 no ML Examples Features > 10 5 no Financial Prices Items, Stocks > 10 6 no We think of A R d n as n column vectors in R d and typically n d. Edo Liberty: Simple and Deterministic Matrix Sketches 3 / 41

4 Streaming Matrices Sometimes, we cannot store the entire matrix at all. Edo Liberty: Simple and Deterministic Matrix Sketches 4 / 41

5 Streaming Matrices Example: can we compute AA T from a stream of columns A i? (enough for PCA for example). Edo Liberty: Simple and Deterministic Matrix Sketches 5 / 41

6 Streaming Matrices Example: can we compute AA T from a stream of columns A i? (enough for PCA for example). n AA T = A i A T i Naïve solution i=1 Compute AA T in time O(nd 2 ) and space O(d 2 ). Think about 1Mp images, d = This solution requires operations per update and 1T space. Edo Liberty: Simple and Deterministic Matrix Sketches 5 / 41

7 Matrix Approximation Matrix sketching or approximation Efficiently compute a concisely representable matrix B such that B A or BB T AA T Working with B instead of A is often good enough. Dimension reduction Signal denoising Classification Regression Clustering Approximate matrix multiplication Reconstruction Recommendation... Edo Liberty: Simple and Deterministic Matrix Sketches 6 / 41

8 Matrix Approximation Column subset selection algorithms Paper Space Time Bound FKV04 O(k 4 /ε 6 max(k 4, ε 2 )) O(k 5 /ε 6 max(k 4, ε 2 )) P, εa DV06 #C = O(k/ε + k 2 log k) O(nnz(A)(k/ε + k 2 log k)+ P, εr O(n(k/ε + k 2 log k)) (n + d)(k 2 /ε 2 + k 3 log(k/ε) + k 4 log 2 k)) DKM06 #C = O(1/ε 2 ) O((n + 1/ε 2 )/ε 4 + nnz(a)) P, εl 2 LinearTimeSVD O((n + 1/ε 2 )/ε 4 ) #C = O(k/ε 2 ) O((k/ε 2 ) 2 (n + k/ε 2 ) + nnz(a)) P, εa O((k/ε 2 )(n + k/ε 2 )) DKM06 #C+R = O(1/ε 4 ) O((1/ε 12 + nk/ε 4 + nnz(a)) P, εl 2 ConstantTimeSVD O(1/ε 12 + nk/ε 4 ) #C+R = O(k 2 /ε 4 ) O(k 6 /ε 12 + nk 3 /ε 4 + nnz(a)) P, εa O(k 6 /ε 12 + nk 3 /ε 4 ) DMM08 #C =O(k 2 /ε 2 ) O(nd 2 ) C, εr CUR #R = O(k 4 /ε 6 ) MD09 #C = O(k log k/ε 2 ) O(nd 2 ) P O(k log k/ε 2 ), εr ColumnSelect O(nk log k/ε 2 ) BDM11 #C = 2k/ε(1 + o(1)) O((ndk + dk 3 )ε 2/3 ) P 2k/ε(1+o(1)), εr [Relative Errors for Deterministic Low-Rank Matrix Approximations, Ghashami, Phillips 2013] nnz A B 2 ε A 2 Sparsification and entry sampling Paper Space Time Bound AM07 ρn/ε 2 + n polylog(n) nnz ρn/ε 2 + nnz n polylog(n) A B 2 ε A 2 AHK06 (ñnz n/ε 2 ) 1/2 nnz(ñnz n/ε 2 ) 1/2 A B 2 ε A 2 DZ11 ρn log(n)/ε 2 nnz ρn log(n)/ε 2 A B 2 ε A 2 AKL13 ñ ρ log(n)/ε 2 + (ρ log(n) ñnz /ε 2 ) 1/2 [Near-optimal Distributions for Data Matrix Sampling, Achlioptas, Karnin, Liberty, 2013] Edo Liberty: Simple and Deterministic Matrix Sketches 7 / 41

9 Matrix Approximation Linear subspace embedding sketches Paper Space Time Bound DKM06 #R = O(1/ε 2 ) O((d + 1/ε 2 )/ε 4 + nnz(a)) P, εl 2 LinearTimeSVD O((d + 1/ε 2 )/ε 4 ) #R = O(k/ε 2 ) O((k/ε 2 ) 2 (d + k/ε 2 ) + nnz(a)) P, εa O((k/ε 2 ) 2 (d + k/ε 2 )) Sar06 #R = O(k/ε + k log k) O(nnz(A)(k/ε + k log k) + d(k/ε + P O(k/ε+k log k), εr turnstile O(d(k/ε + k log k)) k log k) 2 )) CW09 #R = O(k/ε) O(nd 2 + (ndk/ε)) P O(k/ε), εr CW09 O((n + d)(k/ε)) O(nd 2 + (ndk/ε)) C, εr CW09 O((k/ε 2 )(n + d/ε 2 )) O(n(k/ε 2 ) 2 + nd(k/ε 2 ) + nd 2 ) C, εr Deterministic sketching algorithms Paper Space Time Bound FSS13 O((k/ε) log n) n((k/ε) log n) O(1) P 2 k/ε, εr Lib13 #R = O(ρ/ε) O(ndρ/ε) P O(ρ/ε), εl 2 O(dρ/ε) GP13 #R = k/ε + k O(dk/ε) O(ndk/ε) P, εr [Relative Errors for Deterministic Low-Rank Matrix Approximations, Ghashami, Phillips 2013] Edo Liberty: Simple and Deterministic Matrix Sketches 8 / 41

10 Frequent Directions Goal: Efficiently maintain a matrix B with only l = 2/ε columns s.t. AA T BB T 2 ε A 2 f Intuition: Extend Frequent-items [Finding repeated elements, Misra, Gries, 1982.] [Frequency estimation of internet packet streams with limited space, Demaine, Lopez-Ortiz, Munro, 2002] [A simple algorithm for finding frequent elements in streams and bags, Karp, Shenker, Papadimitriou, 2003] [Efficient Computation of Frequent and Top-k Elements in Data Streams, Metwally, Agrawal, Abbadi, 2006] (An algorithm so good it was invented 4 times.) Edo Liberty: Simple and Deterministic Matrix Sketches 9 / 41

11 Frequent Items Obtain the frequency f (i) of each item in the stream of items Edo Liberty: Simple and Deterministic Matrix Sketches 10 / 41

12 Frequent Items With d counters it s easy but not good enough (IP addresses, queries...) Edo Liberty: Simple and Deterministic Matrix Sketches 11 / 41

13 Frequent Items (Misra-Gries) Lets keep less than a fixed number of counters l. Edo Liberty: Simple and Deterministic Matrix Sketches 12 / 41

14 Frequent Items If an item has a counter we add 1 to that counter. Edo Liberty: Simple and Deterministic Matrix Sketches 13 / 41

15 Frequent Items Otherwise, we create a new counter for it and set it to 1 Edo Liberty: Simple and Deterministic Matrix Sketches 14 / 41

16 Frequent Items But now we do not have less than l counters. Edo Liberty: Simple and Deterministic Matrix Sketches 15 / 41

17 Frequent Items Let δ be the median counter value at time t Edo Liberty: Simple and Deterministic Matrix Sketches 16 / 41

18 Frequent Items Decrease all counters by δ (or set to zero if less than δ) Edo Liberty: Simple and Deterministic Matrix Sketches 17 / 41

19 Frequent Items And continue... Edo Liberty: Simple and Deterministic Matrix Sketches 18 / 41

20 Frequent Items The approximated counts are f Edo Liberty: Simple and Deterministic Matrix Sketches 19 / 41

21 Frequent Items We increase the count by only 1 for each item appearance. f (i) f (i) Because we decrease each counter by at most δ t at time t f (i) f (i) t δ t Calculating the total approximated frequencies: 0 f (i) i t Setting l = 2/ε yields 1 (l/2) δ t = n (l/2) δ t 2n/l t f (i) f (i) εn t δ t Edo Liberty: Simple and Deterministic Matrix Sketches 20 / 41

22 Frequent Directions We keep a sketch of at most l columns Edo Liberty: Simple and Deterministic Matrix Sketches 21 / 41

23 Frequent Directions We maintain the invariant that some columns are empty (zero valued) Edo Liberty: Simple and Deterministic Matrix Sketches 22 / 41

24 Frequent Directions Input vectors are simply stored in empty columns Edo Liberty: Simple and Deterministic Matrix Sketches 23 / 41

25 Frequent Directions Input vectors are simply stored in empty columns Edo Liberty: Simple and Deterministic Matrix Sketches 24 / 41

26 Frequent Directions When the sketch is full we need to zero out some columns... Edo Liberty: Simple and Deterministic Matrix Sketches 25 / 41

27 Frequent Directions Using the SVD we compute B = USV T and set B new = US Edo Liberty: Simple and Deterministic Matrix Sketches 26 / 41

28 Frequent Directions Note that BB T = B new B T new so we don t lose anything Edo Liberty: Simple and Deterministic Matrix Sketches 27 / 41

29 Frequent Directions The columns of B are now orthogonal and in decreasing magnitude order Edo Liberty: Simple and Deterministic Matrix Sketches 28 / 41

30 Frequent Directions Let δ = B l/2 2 Edo Liberty: Simple and Deterministic Matrix Sketches 29 / 41

31 Frequent Directions Reduce column l 2 2-norms by δ (or nullify if less than δ) Edo Liberty: Simple and Deterministic Matrix Sketches 30 / 41

32 Frequent Directions Start aggregating columns again... Edo Liberty: Simple and Deterministic Matrix Sketches 31 / 41

33 Frequent Directions Input: l, A R d n B all zeros matrix R d l for i [n] do Insert A i into a zero valued column of B if B has no zero valued colums then [U, Σ, V ] SVD(B) δ σl/2 2 ˇΣ max(σ 2 I l δ, 0) B U ˇΣ # At least half the columns of B are zero. Return: B Edo Liberty: Simple and Deterministic Matrix Sketches 32 / 41

34 Bounding the error We first bound AA T BB T sup xa 2 xb 2 = sup x =1 x =1 t=1 = sup = x =1 t=1 n [ x, A t 2 + xb t 1 2 xb t 2 ] n [ xc t 2 xb t 2 ] n C t T C t B t T B t x 2 t=1 n t=1 δ t Which gives: AA T BB T n t=1 δ t Edo Liberty: Simple and Deterministic Matrix Sketches 33 / 41

35 Bounding the error We compute the Frobenius norm of the final sketch. 0 B 2 f = = = Which gives: n [ B t 2 f Bt 1 2 f ] t=1 n [( C t 2 f Bt 1 2 f ) ( C t 2 f Bt 2 f )] t=1 n A t 2 tr(c t T C t B t T B t ) t=1 A 2 f (l/2) n t=1 δ t n δ t 2 A 2 f /l t=1 Edo Liberty: Simple and Deterministic Matrix Sketches 34 / 41

36 Bounding the error We saw that: and that: Setting l = 2/ε yields AA T BB T n t=1 n δ t 2 A 2 f /l t=1 δ t AA T BB T ε A 2 f. The two proofs are (maybe unsurprisingly) very similar... Edo Liberty: Simple and Deterministic Matrix Sketches 35 / 41

37 Experiments AA T BB T as a function of the sketch size l Sketch accuracy 20,000 18,000 16,000 14,000 12,000 10,000 8,000 6,000 Naive Sampling Hashing Random ProjecBons Frequent DirecBons Bound Frequent DirecBons Brute Force 4,000 2, Number of rows in sketch Synthetic input matrix with linearly decaying singular values Edo Liberty: Simple and Deterministic Matrix Sketches 36 / 41

38 Experiments Running time in second as a function of n (x-axis) and d (y-axis) Running 'me in seconds ,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 Number of rows in matrix The running time scales linearly in n, d and l as expected. Edo Liberty: Simple and Deterministic Matrix Sketches 37 / ,

39 More results Theorem For any matrix A R n d, FrequentDirections guaranties that: AA T BB T 2 A A k 2 F /(l k). This holds also for all k < l including k = 0. For example, for l = k + 1/ε we use O(dk + d/ε) space and have. AA T BB T 2 ε A A k 2 F Theorem This is space optimal. Any streaming algorithm with this guarantee must use Ω(dk + d/ε) space. Edo Liberty: Simple and Deterministic Matrix Sketches 38 / 41

40 More results Theorem Let B R l d be the sketch produced by FrequentDirections. For any k < l it holds that A π k B (A) 2 F (1 + k l k ) A A k 2 F. For example, for l = k/ε we use O(dk/ε) space and have. A π k B (A) 2 F (1 + ε) A A k 2 F Theorem This is space optimal. Any streaming algorithm with this guarantee must use Ω(dk/ε) space. Edo Liberty: Simple and Deterministic Matrix Sketches 39 / 41

41 More results Theorem There exists a variant of FrequentDirections with the same guaranties and space complexity whose running time is Õ(l 2 n + l nnz(a)) Here, Õ( ) suppresses logarithmic factors and numerical convergence dependencies. The power method requires Õ(1) iterations to converge. Conjecture This is not optimal. Edo Liberty: Simple and Deterministic Matrix Sketches 40 / 41

42 Thanks Edo Liberty: Simple and Deterministic Matrix Sketches 41 / 41

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to