Boutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014

Similar documents
Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

and u and v are orthogonal if and only if u v = 0. u v = x1x2 + y1y2 + z1z2. 1. In R 3 the dot product is defined by

LOWELL WEEKLY JOURNAL

4 Linear Algebra Review

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

10 Distributed Matrix Sketching

Edo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles

Methods for sparse analysis of high-dimensional data, II

One-Way ANOVA Model. group 1 y 11, y 12 y 13 y 1n1. group 2 y 21, y 22 y 2n2. group g y g1, y g2, y gng. g is # of groups,

Simple and Deterministic Matrix Sketches

ECON 4117/5111 Mathematical Economics

Methods for sparse analysis of high-dimensional data, II

Wavelet decomposition of data streams. by Dragana Veljkovic

Randomized Algorithms

Lowell Dam Gone Out. Streets Turned I n t o Rivers. No Cause For Alarm Now However As This Happened 35 Years A&o

3.1. Derivations. Let A be a commutative k-algebra. Let M be a left A-module. A derivation of A in M is a linear map D : A M such that

Regularized Least Squares

Functional Analysis Exercise Class

Tighter Low-rank Approximation via Sampling the Leveraged Element

6.4 BASIS AND DIMENSION (Review) DEF 1 Vectors v 1, v 2,, v k in a vector space V are said to form a basis for V if. (a) v 1,, v k span V and

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Regularized Least Squares

Quantum Recommendation Systems

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja

Data Mining Techniques

Linear vector spaces and subspaces.

Recall: Dot product on R 2 : u v = (u 1, u 2 ) (v 1, v 2 ) = u 1 v 1 + u 2 v 2, u u = u u 2 2 = u 2. Geometric Meaning:

A randomized algorithm for approximating the SVD of a matrix

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

ESTIMATION OF ERROR. r = b Abx a quantity called the residual for bx. Then

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Variable Latent Semantic Indexing

Learning gradients: prescriptive models

Metric Spaces. DEF. If (X; d) is a metric space and E is a nonempty subset, then (E; d) is also a metric space, called a subspace of X:

Least squares and Eigenvalues

ABSTRACT. Topics on LASSO and Approximate Message Passing. Ali Mousavi

Advanced Machine Learning

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Processing Big Data Matrix Sketching

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Linear regression methods

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Definition 3 A Hamel basis (often just called a basis) of a vector space X is a linearly independent set of vectors in X that spans X.

Randomized Algorithms

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

The Three Possibilities Summary No solution Infinitely many solutions Unique solution Existence of Infinitely Many Solutions More unknowns than

r/lt.i Ml s." ifcr ' W ATI II. The fnncrnl.icniccs of Mr*. John We mil uppn our tcpiiblicnn rcprc Died.

.-I;-;;, '.-irc'afr?*. P ublic Notices. TiffiATRE, H. aiety

W i n t e r r e m e m b e r t h e W O O L L E N S. W rite to the M anageress RIDGE LAUNDRY, ST. H E LE N S. A uction Sale.

H A M M IG K S L IM IT E D, ' i. - I f

The PANORAMA. Numerical Ranges in. Modern Times. Man-Duen Choi 蔡文端. For the 12 th WONRA.

Graphs, Vectors, and Matrices Daniel A. Spielman Yale University. AMS Josiah Willard Gibbs Lecture January 6, 2016

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Online Principal Components Analysis

11 Heavy Hitters Streaming Majority

Online Dictionary Learning with Group Structure Inducing Norms

Problem 1. CS205 Homework #2 Solutions. Solution

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Online Kernel PCA with Entropic Matrix Updates

Principal Component Analysis

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

New Directions in Computer Science

Chapter 7 Iterative Techniques in Matrix Algebra

Review and problem list for Applied Math I

Scalable Subspace Clustering

Community detection in stochastic block models via spectral methods

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

dimensionality reduction for k-means and low rank approximation

Let's contemplate a continuous-time limit of the Bernoulli process:

Online Social Networks and Media. Link Analysis and Web Search

On queueing in coded networks queue size follows degrees of freedom

Statistical Machine Learning

Data Mining Techniques

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan

Breaking Computational Barriers: Multi-GPU High-Order RBF Kernel Problems with Millions of Points

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

4 Frequent Directions

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b)

BANACH SPACES WITH THE 2-SUMMING PROPERTY. A. Arias, T. Figiel, W. B. Johnson and G. Schechtman

Linear Algebra and Eigenproblems

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

Manifold Coarse Graining for Online Semi-supervised Learning

The Nyström Extension and Spectral Methods in Learning

ECS130 Scientific Computing. Lecture 1: Introduction. Monday, January 7, 10:00 10:50 am

Course Summary Math 211

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

Solving Corrupted Quadratic Equations, Provably

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Randomized algorithms for the approximation of matrices

CS249: ADVANCED DATA MINING

Statistical Machine Learning, Part I. Regression 2

Optimal Stochastic Strongly Convex Optimization with a Logarithmic Number of Projections

Transcription:

A Online PowerPoint Principal Presentation Component Analysis Boutsidis, Garber, Karnin, Liberty PRESENTED BY Firstname Lastname August 5, 013 PRESENTED BY Zohar Karnin November 3, 014

Data Matrix Often, data is represented as a huge matrix Sometimes, we can t store the entire matrix Yahoo labs

Principal Component Analysis Often, we require a low rank approximation of matrix A Recommender systems, images, LSA, The approximation is used to save space and often, clean up noise A = + + + 3 Yahoo labs

Column by Column Stream Data arrives column by column column=item and we re seeing the items one at a time 4 Yahoo labs

The Formal Stream Setup Observe x 1 R d, output y 1 R k 5 Yahoo labs

The Formal Stream Setup Observe x 1 R d, output y 1 R k 6 Yahoo labs

The Formal Stream Setup Observe x 1 R d, output y 1 R k Observe x t R d, output y t R k 7 Yahoo labs

The Formal Stream Setup X Cost = s.t Min t kx t ytk = embedding from R k to R d ky i -y j k=k yi- yjk Y 8 Yahoo labs

The Cost Function Y Output Input X 9 Yahoo labs

The Cost Function Y Y X - Embedding of Y into the same space of X 10 Yahoo labs

The Cost Function Y Y X - = R=X- Y Error matrix 11 Yahoo labs

The Cost Function Y Y X - Frob Error = R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix 1 Yahoo labs

The Cost Function Y Y X - 13 Yahoo labs = Frob Error R=X- Y = krk F = ij (X ij - Y ij ) = MSE Error matrix Spectral Error = krk = max kvk=1 kv > X v > ( Y)k

Secondary Costs: Computational Resources Run time: #operations required per observed column Memory 14 Yahoo labs

Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different 15 Yahoo labs

Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] 16 Yahoo labs

Previous Works Regret Minimization Setting [WK 07], [NKW 13] At time t, before observing x t, predict U t, a projection matrix onto a k dim subspace. The loss is kx t -U t x t k Each U t can be completely different Stochastic setting [ACS 13], [MCJ 13], [BDF 13] x t are drawn i.i.d from some distribution. Objective: find U as quickly as possible minimizing E[ kx t -Ux t k ] Reconstruction matrix (not an embedding) [CW 09] min t kx t ytk s.t is an arbitrary linear transformation from R k to R d 17 Yahoo labs

Results X = d n matrix whose columns are observed 18 Yahoo labs

Results X = d n matrix whose columns are observed k << d 19 Yahoo labs

Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) 0 Yahoo labs

Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F 1 Yahoo labs

Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Yahoo labs

Results X = d n matrix whose columns are observed k << d X k = Best rank k approximation of X (top k directions) OPT = kx-x k k F Theorem 1: Given kxk F, k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² ) Theorem : Given k, ²: Error = OPT + ²kXk F Memory, Target dimension, Processing time per column = O(k/² 3 ) 3 Yahoo labs

The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F 4 Yahoo labs

The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 5 Yahoo labs

The Operator Norm Cost Function Y = output matrix [y 1,,y n ] Cost = kx Yk F Interpretation: Mean square error Alternative cost: kx Yk Interpretation: bounds max unit vector v, kv > X v > Yk noise signal kx X k k F kx k k F kx X k k F ÀkX k k F but kx X k k kx k k 6 Yahoo labs

Results Theorem 3 [under construction] : Given kxk, kx-x k k, k, ²: Operator Norm Error = OPT operator + ²kXk Target dimension = O(k/²) 7 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 8 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ 9 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 30 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 31 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 3 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 33 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid 34 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 35 Yahoo labs

Algorithm Maintain U:R d R` Directions are only added, never removed (for now) r = Tolerable error radius = kxk F / `1/ Error ellipsoid Add vector u 1 to U 36 Yahoo labs

Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U 37 Yahoo labs

Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F 38 Yahoo labs

Analysis: Target Dimension r = Tolerable error radius = kxk F / `1/ Target dimension = number of vectors added to U Obs: adding a vector to U means requires kxk F / ` weight from kxk F ) number of vectors added to U ` 39 Yahoo labs

Analysis: Cost Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Operator norm cost = krk = max{r 1,r } Cost = krk F = r 1 +r r r 1 40 Yahoo labs

Analysis: Cost r = Tolerable error radius = kxk F / `1/ Error ellipsoid Y = output matrix R = error matrix = X-U n> Y Statements: krk r = kxk F / ` krk F loss from X k + loss from X-X k kxk F (k/`) 1/ + kx-x k k F 41 Yahoo labs

Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] 4 Yahoo labs

Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d 43 Yahoo labs

Implementation: Memory and Run-time Complexity r t = x t U t x t R = [r 1, r,, r t ] Straightforward version requires maintaining RR > Update time, memory requirements = d Instead: Maintain Z: d ` matrix such that ZZ > ¼ RR > kzz > - RR > k< krk F /` [Lib 1] Update time, memory requirements = d` 44 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ 45 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ 46 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 47 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 48 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 49 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) 50 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch 51 Yahoo labs

Implementation: Unknown Horizon Error radius parameter = kxk F / `1/ Def: X t = [x 1,,x t ] Idea: use growing radius parameter kx t k F / `1/ Thm: works as before, but target dimension = ` log(n) Divide time into epochs, in each epoch, N kx t k F N At most ` directions are added in each epoch Idea : if direction u becomes weak (ku > X t k kx t k F / `1/ ) remove it Thm: works as before, target dimension = ` / ² 5 Yahoo labs

Conclusions and Open Questions We obtain error = OPT + ²kXk F with target dimension O(k/² 3 ). Can we reduce the dependence on ²? Improve to OPT(1+²)? Lower bound? (currently same for arbitrary reconstruction matrix) Obtain approximation of OPT + ²kX-X k k 53 Yahoo labs

Thank you! 54 Yahoo labs