Simple and Deterministic Matrix Sketches

Size: px
Start display at page:

Download "Simple and Deterministic Matrix Sketches"

Transcription

1 Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41

2 Data Matrices Often our data is represented by a matrix. Edo Liberty: Simple and Deterministic Matrix Sketches 2 / 41

3 Data Matrices which is often too large to work with on a single machine... Data Columns Rows d n sparse Textual Documents Words > yes Actions Users Types > 10 8 yes Visual Images Pixels, SIFT > 10 9 no Audio Songs, tracks Frequencies > 10 9 no ML Examples Features > 10 5 no Financial Prices Items, Stocks > 10 6 no We think of A R d n as n column vectors in R d and typically n d. Edo Liberty: Simple and Deterministic Matrix Sketches 3 / 41

4 Streaming Matrices Sometimes, we cannot store the entire matrix at all. Edo Liberty: Simple and Deterministic Matrix Sketches 4 / 41

5 Streaming Matrices Example: can we compute AA T from a stream of columns A i? (enough for PCA for example). Edo Liberty: Simple and Deterministic Matrix Sketches 5 / 41

6 Streaming Matrices Example: can we compute AA T from a stream of columns A i? (enough for PCA for example). n AA T = A i A T i Naïve solution i=1 Compute AA T in time O(nd 2 ) and space O(d 2 ). Think about 1Mp images, d = This solution requires operations per update and 1T space. Edo Liberty: Simple and Deterministic Matrix Sketches 5 / 41

7 Matrix Approximation Matrix sketching or approximation Efficiently compute a concisely representable matrix B such that B A or BB T AA T Working with B instead of A is often good enough. Dimension reduction Signal denoising Classification Regression Clustering Approximate matrix multiplication Reconstruction Recommendation... Edo Liberty: Simple and Deterministic Matrix Sketches 6 / 41

8 Matrix Approximation Column subset selection algorithms Paper Space Time Bound FKV04 O(k 4 /ε 6 max(k 4, ε 2 )) O(k 5 /ε 6 max(k 4, ε 2 )) P, εa DV06 #C = O(k/ε + k 2 log k) O(nnz(A)(k/ε + k 2 log k)+ P, εr O(n(k/ε + k 2 log k)) (n + d)(k 2 /ε 2 + k 3 log(k/ε) + k 4 log 2 k)) DKM06 #C = O(1/ε 2 ) O((n + 1/ε 2 )/ε 4 + nnz(a)) P, εl 2 LinearTimeSVD O((n + 1/ε 2 )/ε 4 ) #C = O(k/ε 2 ) O((k/ε 2 ) 2 (n + k/ε 2 ) + nnz(a)) P, εa O((k/ε 2 )(n + k/ε 2 )) DKM06 #C+R = O(1/ε 4 ) O((1/ε 12 + nk/ε 4 + nnz(a)) P, εl 2 ConstantTimeSVD O(1/ε 12 + nk/ε 4 ) #C+R = O(k 2 /ε 4 ) O(k 6 /ε 12 + nk 3 /ε 4 + nnz(a)) P, εa O(k 6 /ε 12 + nk 3 /ε 4 ) DMM08 #C =O(k 2 /ε 2 ) O(nd 2 ) C, εr CUR #R = O(k 4 /ε 6 ) MD09 #C = O(k log k/ε 2 ) O(nd 2 ) P O(k log k/ε 2 ), εr ColumnSelect O(nk log k/ε 2 ) BDM11 #C = 2k/ε(1 + o(1)) O((ndk + dk 3 )ε 2/3 ) P 2k/ε(1+o(1)), εr [Relative Errors for Deterministic Low-Rank Matrix Approximations, Ghashami, Phillips 2013] nnz A B 2 ε A 2 Sparsification and entry sampling Paper Space Time Bound AM07 ρn/ε 2 + n polylog(n) nnz ρn/ε 2 + nnz n polylog(n) A B 2 ε A 2 AHK06 (ñnz n/ε 2 ) 1/2 nnz(ñnz n/ε 2 ) 1/2 A B 2 ε A 2 DZ11 ρn log(n)/ε 2 nnz ρn log(n)/ε 2 A B 2 ε A 2 AKL13 ñ ρ log(n)/ε 2 + (ρ log(n) ñnz /ε 2 ) 1/2 [Near-optimal Distributions for Data Matrix Sampling, Achlioptas, Karnin, Liberty, 2013] Edo Liberty: Simple and Deterministic Matrix Sketches 7 / 41

9 Matrix Approximation Linear subspace embedding sketches Paper Space Time Bound DKM06 #R = O(1/ε 2 ) O((d + 1/ε 2 )/ε 4 + nnz(a)) P, εl 2 LinearTimeSVD O((d + 1/ε 2 )/ε 4 ) #R = O(k/ε 2 ) O((k/ε 2 ) 2 (d + k/ε 2 ) + nnz(a)) P, εa O((k/ε 2 ) 2 (d + k/ε 2 )) Sar06 #R = O(k/ε + k log k) O(nnz(A)(k/ε + k log k) + d(k/ε + P O(k/ε+k log k), εr turnstile O(d(k/ε + k log k)) k log k) 2 )) CW09 #R = O(k/ε) O(nd 2 + (ndk/ε)) P O(k/ε), εr CW09 O((n + d)(k/ε)) O(nd 2 + (ndk/ε)) C, εr CW09 O((k/ε 2 )(n + d/ε 2 )) O(n(k/ε 2 ) 2 + nd(k/ε 2 ) + nd 2 ) C, εr Deterministic sketching algorithms Paper Space Time Bound FSS13 O((k/ε) log n) n((k/ε) log n) O(1) P 2 k/ε, εr Lib13 #R = O(ρ/ε) O(ndρ/ε) P O(ρ/ε), εl 2 O(dρ/ε) GP13 #R = k/ε + k O(dk/ε) O(ndk/ε) P, εr [Relative Errors for Deterministic Low-Rank Matrix Approximations, Ghashami, Phillips 2013] Edo Liberty: Simple and Deterministic Matrix Sketches 8 / 41

10 Frequent Directions Goal: Efficiently maintain a matrix B with only l = 2/ε columns s.t. AA T BB T 2 ε A 2 f Intuition: Extend Frequent-items [Finding repeated elements, Misra, Gries, 1982.] [Frequency estimation of internet packet streams with limited space, Demaine, Lopez-Ortiz, Munro, 2002] [A simple algorithm for finding frequent elements in streams and bags, Karp, Shenker, Papadimitriou, 2003] [Efficient Computation of Frequent and Top-k Elements in Data Streams, Metwally, Agrawal, Abbadi, 2006] (An algorithm so good it was invented 4 times.) Edo Liberty: Simple and Deterministic Matrix Sketches 9 / 41

11 Frequent Items Obtain the frequency f (i) of each item in the stream of items Edo Liberty: Simple and Deterministic Matrix Sketches 10 / 41

12 Frequent Items With d counters it s easy but not good enough (IP addresses, queries...) Edo Liberty: Simple and Deterministic Matrix Sketches 11 / 41

13 Frequent Items (Misra-Gries) Lets keep less than a fixed number of counters l. Edo Liberty: Simple and Deterministic Matrix Sketches 12 / 41

14 Frequent Items If an item has a counter we add 1 to that counter. Edo Liberty: Simple and Deterministic Matrix Sketches 13 / 41

15 Frequent Items Otherwise, we create a new counter for it and set it to 1 Edo Liberty: Simple and Deterministic Matrix Sketches 14 / 41

16 Frequent Items But now we do not have less than l counters. Edo Liberty: Simple and Deterministic Matrix Sketches 15 / 41

17 Frequent Items Let δ be the median counter value at time t Edo Liberty: Simple and Deterministic Matrix Sketches 16 / 41

18 Frequent Items Decrease all counters by δ (or set to zero if less than δ) Edo Liberty: Simple and Deterministic Matrix Sketches 17 / 41

19 Frequent Items And continue... Edo Liberty: Simple and Deterministic Matrix Sketches 18 / 41

20 Frequent Items The approximated counts are f Edo Liberty: Simple and Deterministic Matrix Sketches 19 / 41

21 Frequent Items We increase the count by only 1 for each item appearance. f (i) f (i) Because we decrease each counter by at most δ t at time t f (i) f (i) t δ t Calculating the total approximated frequencies: 0 f (i) i t Setting l = 2/ε yields 1 (l/2) δ t = n (l/2) δ t 2n/l t f (i) f (i) εn t δ t Edo Liberty: Simple and Deterministic Matrix Sketches 20 / 41

22 Frequent Directions We keep a sketch of at most l columns Edo Liberty: Simple and Deterministic Matrix Sketches 21 / 41

23 Frequent Directions We maintain the invariant that some columns are empty (zero valued) Edo Liberty: Simple and Deterministic Matrix Sketches 22 / 41

24 Frequent Directions Input vectors are simply stored in empty columns Edo Liberty: Simple and Deterministic Matrix Sketches 23 / 41

25 Frequent Directions Input vectors are simply stored in empty columns Edo Liberty: Simple and Deterministic Matrix Sketches 24 / 41

26 Frequent Directions When the sketch is full we need to zero out some columns... Edo Liberty: Simple and Deterministic Matrix Sketches 25 / 41

27 Frequent Directions Using the SVD we compute B = USV T and set B new = US Edo Liberty: Simple and Deterministic Matrix Sketches 26 / 41

28 Frequent Directions Note that BB T = B new B T new so we don t lose anything Edo Liberty: Simple and Deterministic Matrix Sketches 27 / 41

29 Frequent Directions The columns of B are now orthogonal and in decreasing magnitude order Edo Liberty: Simple and Deterministic Matrix Sketches 28 / 41

30 Frequent Directions Let δ = B l/2 2 Edo Liberty: Simple and Deterministic Matrix Sketches 29 / 41

31 Frequent Directions Reduce column l 2 2-norms by δ (or nullify if less than δ) Edo Liberty: Simple and Deterministic Matrix Sketches 30 / 41

32 Frequent Directions Start aggregating columns again... Edo Liberty: Simple and Deterministic Matrix Sketches 31 / 41

33 Frequent Directions Input: l, A R d n B all zeros matrix R d l for i [n] do Insert A i into a zero valued column of B if B has no zero valued colums then [U, Σ, V ] SVD(B) δ σl/2 2 ˇΣ max(σ 2 I l δ, 0) B U ˇΣ # At least half the columns of B are zero. Return: B Edo Liberty: Simple and Deterministic Matrix Sketches 32 / 41

34 Bounding the error We first bound AA T BB T sup xa 2 xb 2 = sup x =1 x =1 t=1 = sup = x =1 t=1 n [ x, A t 2 + xb t 1 2 xb t 2 ] n [ xc t 2 xb t 2 ] n C t T C t B t T B t x 2 t=1 n t=1 δ t Which gives: AA T BB T n t=1 δ t Edo Liberty: Simple and Deterministic Matrix Sketches 33 / 41

35 Bounding the error We compute the Frobenius norm of the final sketch. 0 B 2 f = = = Which gives: n [ B t 2 f Bt 1 2 f ] t=1 n [( C t 2 f Bt 1 2 f ) ( C t 2 f Bt 2 f )] t=1 n A t 2 tr(c t T C t B t T B t ) t=1 A 2 f (l/2) n t=1 δ t n δ t 2 A 2 f /l t=1 Edo Liberty: Simple and Deterministic Matrix Sketches 34 / 41

36 Bounding the error We saw that: and that: Setting l = 2/ε yields AA T BB T n t=1 n δ t 2 A 2 f /l t=1 δ t AA T BB T ε A 2 f. The two proofs are (maybe unsurprisingly) very similar... Edo Liberty: Simple and Deterministic Matrix Sketches 35 / 41

37 Experiments AA T BB T as a function of the sketch size l Sketch accuracy 20,000 18,000 16,000 14,000 12,000 10,000 8,000 6,000 Naive Sampling Hashing Random ProjecBons Frequent DirecBons Bound Frequent DirecBons Brute Force 4,000 2, Number of rows in sketch Synthetic input matrix with linearly decaying singular values Edo Liberty: Simple and Deterministic Matrix Sketches 36 / 41

38 Experiments Running time in second as a function of n (x-axis) and d (y-axis) Running 'me in seconds ,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 Number of rows in matrix The running time scales linearly in n, d and l as expected. Edo Liberty: Simple and Deterministic Matrix Sketches 37 / ,

39 More results Theorem For any matrix A R n d, FrequentDirections guaranties that: AA T BB T 2 A A k 2 F /(l k). This holds also for all k < l including k = 0. For example, for l = k + 1/ε we use O(dk + d/ε) space and have. AA T BB T 2 ε A A k 2 F Theorem This is space optimal. Any streaming algorithm with this guarantee must use Ω(dk + d/ε) space. Edo Liberty: Simple and Deterministic Matrix Sketches 38 / 41

40 More results Theorem Let B R l d be the sketch produced by FrequentDirections. For any k < l it holds that A π k B (A) 2 F (1 + k l k ) A A k 2 F. For example, for l = k/ε we use O(dk/ε) space and have. A π k B (A) 2 F (1 + ε) A A k 2 F Theorem This is space optimal. Any streaming algorithm with this guarantee must use Ω(dk/ε) space. Edo Liberty: Simple and Deterministic Matrix Sketches 39 / 41

41 More results Theorem There exists a variant of FrequentDirections with the same guaranties and space complexity whose running time is Õ(l 2 n + l nnz(a)) Here, Õ( ) suppresses logarithmic factors and numerical convergence dependencies. The power method requires Õ(1) iterations to converge. Conjecture This is not optimal. Edo Liberty: Simple and Deterministic Matrix Sketches 40 / 41

42 Thanks Edo Liberty: Simple and Deterministic Matrix Sketches 41 / 41

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

4 Frequent Directions

4 Frequent Directions 4 Frequent Directions Edo Liberty[3] discovered a strong connection between matrix sketching and frequent items problems. In FREQUENTITEMS problem, we are given a stream S = hs 1,s 2,...,s n i of n items

More information

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in

More information

10 Distributed Matrix Sketching

10 Distributed Matrix Sketching 10 Distributed Matrix Sketching In order to define a distributed matrix sketching problem thoroughly, one has to specify the distributed model, data model and the partition model of data. The distributed

More information

Lecture 9: Matrix approximation continued

Lecture 9: Matrix approximation continued 0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during

More information

arxiv: v6 [cs.ds] 11 Jul 2012

arxiv: v6 [cs.ds] 11 Jul 2012 Simple and Deterministic Matrix Sketching Edo Liberty arxiv:1206.0594v6 [cs.ds] 11 Jul 2012 Abstract We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching

More information

Frequent Directions: Simple and Deterministic Matrix Sketching

Frequent Directions: Simple and Deterministic Matrix Sketching : Simple and Deterministic Matrix Sketching Mina Ghashami University of Utah ghashami@cs.utah.edu Jeff M. Phillips University of Utah jeffp@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com David

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Improved Practical Matrix Sketching with Guarantees

Improved Practical Matrix Sketching with Guarantees Improved Practical Matrix Sketching with Guarantees Amey Desai Mina Ghashami Jeff M. Phillips January 26, 215 Abstract Matrices have become essential data representations for many large-scale problems

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

dimensionality reduction for k-means and low rank approximation

dimensionality reduction for k-means and low rank approximation dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques

More information

11 Heavy Hitters Streaming Majority

11 Heavy Hitters Streaming Majority 11 Heavy Hitters A core mining problem is to find items that occur more than one would expect. These may be called outliers, anomalies, or other terms. Statistical models can be layered on top of or underneath

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Efficiently Implementing Sparsity in Learning

Efficiently Implementing Sparsity in Learning Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion

More information

Relative-Error CUR Matrix Decompositions

Relative-Error CUR Matrix Decompositions RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

Edo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles

Edo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles Edo Liberty Principal Scientist Amazon Web Services Streaming Quantiles Streaming Quantiles Manku, Rajagopalan, Lindsay. Random sampling techniques for space efficient online computation of order statistics

More information

Continuous Matrix Approximation on Distributed Data

Continuous Matrix Approximation on Distributed Data Continuous Matrix Approximation on Distributed Data Mina Ghashami School of Computing University of Utah ghashami@cs.uah.edu Jeff M. Phillips School of Computing University of Utah jeffp@cs.uah.edu Feifei

More information

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) 12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.

More information

Improved Practical Matrix Sketching with Guarantees

Improved Practical Matrix Sketching with Guarantees Improved Practical Matrix Sketching with Guarantees Mina Ghashami, Amey Desai, and Jeff M. Phillips University of Utah Abstract. Matrices have become essential data representations for many large-scale

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

Optimal Principal Component Analysis in Distributed and Streaming Models

Optimal Principal Component Analysis in Distributed and Streaming Models Optimal Principal Component Analysis in Distributed and Streaming Models Christos Boutsidis Goldman Sachs New York, USA christos.boutsidis@gmail.com David P. Woodruff IBM Research California, USA dpwoodru@us.ibm.com

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

randomized block krylov methods for stronger and faster approximate svd

randomized block krylov methods for stronger and faster approximate svd randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS singular value decomposition n d left

More information

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan

Estimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan Estimating Dominance Norms of Multiple Data Streams Graham Cormode graham@dimacs.rutgers.edu Joint work with S. Muthukrishnan Data Stream Phenomenon Data is being produced faster than our ability to process

More information

Effective computation of biased quantiles over data streams

Effective computation of biased quantiles over data streams Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com

More information

arxiv: v2 [cs.ds] 17 Feb 2016

arxiv: v2 [cs.ds] 17 Feb 2016 Efficient Algorithm for Sparse Matrices Mina Ghashami University of Utah ghashami@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com Jeff M. Phillips University of Utah jeffp@cs.utah.edu arxiv:1602.00412v2

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Fast Random Projections

Fast Random Projections Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis

Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Michael W. Mahoney Yale University Dept. of Mathematics http://cs-www.cs.yale.edu/homes/mmahoney Joint work with: P. Drineas

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

Space-optimal Heavy Hitters with Strong Error Bounds

Space-optimal Heavy Hitters with Strong Error Bounds Space-optimal Heavy Hitters with Strong Error Bounds Graham Cormode graham@research.att.com Radu Berinde(MIT) Piotr Indyk(MIT) Martin Strauss (U. Michigan) The Frequent Items Problem TheFrequent Items

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya Indian Institute of Science, Bangalore arnabb@csa.iisc.ernet.in Palash Dey Indian Institute of

More information

Chapter XII: Data Pre and Post Processing

Chapter XII: Data Pre and Post Processing Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data

More information

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a

More information

Distributed Summaries

Distributed Summaries Distributed Summaries Graham Cormode graham@research.att.com Pankaj Agarwal(Duke) Zengfeng Huang (HKUST) Jeff Philips (Utah) Zheiwei Wei (HKUST) Ke Yi (HKUST) Summaries Summaries allow approximate computations:

More information

Maintaining Significant Stream Statistics over Sliding Windows

Maintaining Significant Stream Statistics over Sliding Windows Maintaining Significant Stream Statistics over Sliding Windows L.K. Lee H.F. Ting Abstract In this paper, we introduce the Significant One Counting problem. Let ε and θ be respectively some user-specified

More information

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /

More information

Approximate Counts and Quantiles over Sliding Windows

Approximate Counts and Quantiles over Sliding Windows Approximate Counts and Quantiles over Sliding Windows Arvind Arasu Stanford University arvinda@cs.stanford.edu Gurmeet Singh Manku Stanford University manku@cs.stanford.edu Abstract We consider the problem

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows

A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows ABSTRACT L.K. Lee Department of Computer Science The University of Hong Kong Pokfulam, Hong Kong lklee@cs.hku.hk

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract)

Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract) Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract) Toon Calders, Nele Dexters and Bart Goethals University of Antwerp, Belgium firstname.lastname@ua.ac.be Abstract. In this paper,

More information

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Devdatt Dubhashi Department of Computer Science and Engineering, Chalmers University, dubhashi@chalmers.se Functional

More information

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard

More information

From Non-Negative Matrix Factorization to Deep Learning

From Non-Negative Matrix Factorization to Deep Learning The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #21: Dimensionality Reduction Seoul National University 1 In This Lecture Understand the motivation and applications of dimensionality reduction Learn the definition

More information

Quantum Recommendation Systems

Quantum Recommendation Systems Quantum Recommendation Systems Iordanis Kerenidis 1 Anupam Prakash 2 1 CNRS, Université Paris Diderot, Paris, France, EU. 2 Nanyang Technological University, Singapore. April 4, 2017 The HHL algorithm

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

RandNLA: Randomized Numerical Linear Algebra

RandNLA: Randomized Numerical Linear Algebra RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling

More information

Boutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014

Boutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014 A Online PowerPoint Principal Presentation Component Analysis Boutsidis, Garber, Karnin, Liberty PRESENTED BY Firstname Lastname August 5, 013 PRESENTED BY Zohar Karnin November 3, 014 Data Matrix Often,

More information

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra. QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview

More information

Single Pass PCA of Matrix Products

Single Pass PCA of Matrix Products Single Pass PCA of Matrix Products Shanshan Wu The University of Texas at Austin shanshan@utexas.edu Sujay Sanghavi The University of Texas at Austin sanghavi@mail.utexas.edu Srinadh Bhojanapalli Toyota

More information

Normalized power iterations for the computation of SVD

Normalized power iterations for the computation of SVD Normalized power iterations for the computation of SVD Per-Gunnar Martinsson Department of Applied Mathematics University of Colorado Boulder, Co. Per-gunnar.Martinsson@Colorado.edu Arthur Szlam Courant

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Edo Liberty. Accelerated Dense Random Projections

Edo Liberty. Accelerated Dense Random Projections Edo Liberty Accelerated Dense Random Projections CONTENTS 1. List of common notations......................................... 5 2. Introduction to random projections....................................

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Processing Big Data Matrix Sketching

Processing Big Data Matrix Sketching Processing Big Data Matrix Sketching Dimensionality reduction Linear Principal Component Analysis: SVD-based Compressed sensing Matrix sketching Non-linear Kernel PCA Isometric mapping Matrix sketching

More information

Lecture 01 August 31, 2017

Lecture 01 August 31, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

Data stream algorithms via expander graphs

Data stream algorithms via expander graphs Data stream algorithms via expander graphs Sumit Ganguly sganguly@cse.iitk.ac.in IIT Kanpur, India Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less

More information

Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard,

Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard, Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, 2011 3884 Morten Houmøller Nygaard, 2011 4582 Master s Thesis, Computer Science June 2016 Main Supervisor: Gerth Stølting Brodal

More information

A Mergeable Summaries

A Mergeable Summaries A Mergeable Summaries Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi We study the mergeability of data summaries. Informally speaking, mergeability requires

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch

More information

Lower Bound Techniques for Multiparty Communication Complexity

Lower Bound Techniques for Multiparty Communication Complexity Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand

More information

CPSC 340 Assignment 4 (due November 17 ATE)

CPSC 340 Assignment 4 (due November 17 ATE) CPSC 340 Assignment 4 due November 7 ATE) Multi-Class Logistic The function example multiclass loads a multi-class classification datasetwith y i {,, 3, 4, 5} and fits a one-vs-all classification model

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Lecture 2. Frequency problems

Lecture 2. Frequency problems 1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments

More information

7 Algorithms for Massive Data Problems

7 Algorithms for Massive Data Problems 7 Algorithms for Massive Data Problems Massive Data, Sampling This chapter deals with massive data problems where the input data (a graph, a matrix or some other object) is too large to be stored in random

More information

3 Best-Fit Subspaces and Singular Value Decomposition

3 Best-Fit Subspaces and Singular Value Decomposition 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) Think of the rows of an n d matrix A as n data points in a d-dimensional space and consider the problem of finding the best k-dimensional subspace

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following

More information

CS 5321: Advanced Algorithms Amortized Analysis of Data Structures. Motivations. Motivation cont d

CS 5321: Advanced Algorithms Amortized Analysis of Data Structures. Motivations. Motivation cont d CS 5321: Advanced Algorithms Amortized Analysis of Data Structures Ali Ebnenasir Department of Computer Science Michigan Technological University Motivations Why amortized analysis and when? Suppose you

More information