A Online PowerPoint Principal Presentation Component Analysis Boutsidis, Garber, Karnin, Liberty PRESENTED BY Firstname Lastname August 5, 013 PRESENTED BY Zohar Karnin November 3, 014
Data Matrix Often, data is represented as a huge matrix Sometimes, we can t store the entire matrix Yahoo labs
Principal Component Analysis Often, we require a low rank approximation of matrix A Recommender systems, images, LSA, The approximation is used to save space and often, clean up noise A = + + + 3 Yahoo labs
Column by Column Stream Data arrives column by column column=item and we re seeing the items one at a time 4 Yahoo labs
The Formal Stream Setup Observe x 1 R d, output y 1 R k 5 Yahoo labs
The Formal Stream Setup Observe x 1 R d, output y 1 R k 6 Yahoo labs
The Formal Stream Setup Observe x 1 R d, output y 1 R k Observe x t R d, output y t R k 7 Yahoo labs
The Formal Stream Setup X Cost = s.t Min t kx t ytk = embedding from R k to R d ky i -y j k=k yi- yjk Y 8 Yahoo labs
The Cost Function Y Output Input X 9 Yahoo labs
The Cost Function Y Y X - Embedding of Y into the same space of X 10 Yahoo labs
The Cost Function Y Y X - = R=X- Y Error matrix 11 Yahoo labs
The Cost Function Y Y X - Frob Error = R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix 1 Yahoo labs
The Cost Function Y Y X - 13 Yahoo labs = Frob Error R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix Spectral Error = krk = max kvk=1 kv > X v > ( Y)k
Secondary Costs: Computational Resources Run time: #operations required per observed column Memory 14 Yahoo labs
Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different 15 Yahoo labs
Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] 16 Yahoo labs
Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] Reconstruction matrix (not an embedding) [CW 09] min t kx t ytk s.t is an arbitrary linear transformation from R k to R d 17 Yahoo labs
Results X = d n matrix whose columns are observed 18 Yahoo labs
Results X = d n matrix whose columns are observed k << d 19 Yahoo labs
Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) 0 Yahoo labs
Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F 1 Yahoo labs
Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Yahoo labs
Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Theorem : Given k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² 3 ) 3 Yahoo labs
The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F 4 Yahoo labs
The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 5 Yahoo labs
The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error Alternative cost: kx Yk Interpretation: bounds max unit vector v, kv > X v > Yk noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 6 Yahoo labs
Results Theorem 3 [under construction] : Given kxk, kx-x k k, k, ²: Operator Norm Error = OPT operator + ²kXk Target dimension = O(k/²) 7 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 8 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 9 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 30 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 31 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 3 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 33 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 34 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 35 Yahoo labs
Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 36 Yahoo labs
Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U 37 Yahoo labs
Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F 38 Yahoo labs
Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F ) number of vectors added to U ` 39 Yahoo labs
Analysis: Cost Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Operator norm cost = krk = max{r 1,r } Cost = krk F = r 1 +r r r 1 40 Yahoo labs
Analysis: Cost r = Tolerable error radius = kxk F / `1/ Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Statements: krk r = kxk F / ` krk F loss from X k + loss from X-X k kxk F (k/`) 1/ + kx-x k k F 41 Yahoo labs
Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] 4 Yahoo labs
Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d 43 Yahoo labs
Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d Instead: Maintain Z: d ` matrix such that ZZ > ¼ RR > kzz > - RR > k< krk F /` [Lib 1] Update time, memory requirements = d` 44 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ 45 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ 46 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 47 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 48 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 49 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 50 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch 51 Yahoo labs
Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch Idea : if direction u becomes weak (ku > X t k kx t k F / `1/ ) remove it Thm: works as before, target dimension = ` / ² 5 Yahoo labs
Conclusions and Open Questions We obtain error = OPT + ²kXk F with target dimension O(k/² 3 ). Can we reduce the dependence on ²? Improve to OPT(1+²)? Lower bound? (currently same for arbitrary reconstruction matrix) Obtain approximation of OPT + ²kX-X k k 53 Yahoo labs
Thank you! 54 Yahoo labs