Boutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014

Size: px

Start display at page:

Download "Boutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014"

Albert Todd
5 years ago
Views:

1 A Online PowerPoint Principal Presentation Component Analysis Boutsidis, Garber, Karnin, Liberty PRESENTED BY Firstname Lastname August 5, 013 PRESENTED BY Zohar Karnin November 3, 014

2 Data Matrix Often, data is represented as a huge matrix Sometimes, we can t store the entire matrix Yahoo labs

3 Principal Component Analysis Often, we require a low rank approximation of matrix A Recommender systems, images, LSA, The approximation is used to save space and often, clean up noise A = Yahoo labs

4 Column by Column Stream Data arrives column by column column=item and we re seeing the items one at a time 4 Yahoo labs

5 The Formal Stream Setup Observe x 1 R d, output y 1 R k 5 Yahoo labs

6 The Formal Stream Setup Observe x 1 R d, output y 1 R k 6 Yahoo labs

7 The Formal Stream Setup Observe x 1 R d, output y 1 R k Observe x t R d, output y t R k 7 Yahoo labs

8 The Formal Stream Setup X Cost = s.t Min t kx t ytk = embedding from R k to R d ky i -y j k=k yi- yjk Y 8 Yahoo labs

9 The Cost Function Y Output Input X 9 Yahoo labs

10 The Cost Function Y Y X - Embedding of Y into the same space of X 10 Yahoo labs

11 The Cost Function Y Y X - = R=X- Y Error matrix 11 Yahoo labs

12 The Cost Function Y Y X - Frob Error = R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix 1 Yahoo labs

13 The Cost Function Y Y X - 13 Yahoo labs = Frob Error R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix Spectral Error = krk = max kvk=1 kv > X v > ( Y)k

14 Secondary Costs: Computational Resources Run time: #operations required per observed column Memory 14 Yahoo labs

15 Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different 15 Yahoo labs

16 Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] 16 Yahoo labs

17 Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] Reconstruction matrix (not an embedding) [CW 09] min t kx t ytk s.t is an arbitrary linear transformation from R k to R d 17 Yahoo labs

18 Results X = d n matrix whose columns are observed 18 Yahoo labs

19 Results X = d n matrix whose columns are observed k << d 19 Yahoo labs

20 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) 0 Yahoo labs

21 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F 1 Yahoo labs

22 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Yahoo labs

23 Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Theorem : Given k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² 3 ) 3 Yahoo labs

24 The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F 4 Yahoo labs

25 The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 5 Yahoo labs

26 The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error Alternative cost: kx Yk Interpretation: bounds max unit vector v, kv > X v > Yk noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 6 Yahoo labs

27 Results Theorem 3 [under construction] : Given kxk, kx-x k k, k, ²: Operator Norm Error = OPT operator + ²kXk Target dimension = O(k/²) 7 Yahoo labs

28 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 8 Yahoo labs

29 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 9 Yahoo labs

30 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 30 Yahoo labs

31 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 31 Yahoo labs

32 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 3 Yahoo labs

33 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 33 Yahoo labs

34 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 34 Yahoo labs

35 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 35 Yahoo labs

36 Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 36 Yahoo labs

37 Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U 37 Yahoo labs

38 Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F 38 Yahoo labs

39 Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F ) number of vectors added to U ` 39 Yahoo labs

40 Analysis: Cost Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Operator norm cost = krk = max{r 1,r } Cost = krk F = r 1 +r r r 1 40 Yahoo labs

41 Analysis: Cost r = Tolerable error radius = kxk F / `1/ Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Statements: krk r = kxk F / ` krk F loss from X k + loss from X-X k kxk F (k/`) 1/ + kx-x k k F 41 Yahoo labs

42 Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] 4 Yahoo labs

43 Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d 43 Yahoo labs

44 Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d Instead: Maintain Z: d ` matrix such that ZZ > ¼ RR > kzz > - RR > k< krk F /` [Lib 1] Update time, memory requirements = d` 44 Yahoo labs

45 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ 45 Yahoo labs

46 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ 46 Yahoo labs

47 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 47 Yahoo labs

48 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 48 Yahoo labs

49 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 49 Yahoo labs

50 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 50 Yahoo labs

51 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch 51 Yahoo labs

52 Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch Idea : if direction u becomes weak (ku > X t k kx t k F / `1/ ) remove it Thm: works as before, target dimension = ` / ² 5 Yahoo labs

53 Conclusions and Open Questions We obtain error = OPT + ²kXk F with target dimension O(k/² 3 ). Can we reduce the dependence on ²? Improve to OPT(1+²)? Lower bound? (currently same for arbitrary reconstruction matrix) Obtain approximation of OPT + ²kX-X k k 53 Yahoo labs

54 Thank you! 54 Yahoo labs

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence) David Glickenstein December 7, 2015 1 Inner product spaces In this chapter, we will only consider the elds R and C. De nition 1 Let V be a vector