Simple and Deterministic Matrix Sketches
|
|
- Ilene Boyd
- 6 years ago
- Views:
Transcription
1 Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41
2 Data Matrices Often our data is represented by a matrix. Edo Liberty: Simple and Deterministic Matrix Sketches 2 / 41
3 Data Matrices which is often too large to work with on a single machine... Data Columns Rows d n sparse Textual Documents Words > yes Actions Users Types > 10 8 yes Visual Images Pixels, SIFT > 10 9 no Audio Songs, tracks Frequencies > 10 9 no ML Examples Features > 10 5 no Financial Prices Items, Stocks > 10 6 no We think of A R d n as n column vectors in R d and typically n d. Edo Liberty: Simple and Deterministic Matrix Sketches 3 / 41
4 Streaming Matrices Sometimes, we cannot store the entire matrix at all. Edo Liberty: Simple and Deterministic Matrix Sketches 4 / 41
5 Streaming Matrices Example: can we compute AA T from a stream of columns A i? (enough for PCA for example). Edo Liberty: Simple and Deterministic Matrix Sketches 5 / 41
6 Streaming Matrices Example: can we compute AA T from a stream of columns A i? (enough for PCA for example). n AA T = A i A T i Naïve solution i=1 Compute AA T in time O(nd 2 ) and space O(d 2 ). Think about 1Mp images, d = This solution requires operations per update and 1T space. Edo Liberty: Simple and Deterministic Matrix Sketches 5 / 41
7 Matrix Approximation Matrix sketching or approximation Efficiently compute a concisely representable matrix B such that B A or BB T AA T Working with B instead of A is often good enough. Dimension reduction Signal denoising Classification Regression Clustering Approximate matrix multiplication Reconstruction Recommendation... Edo Liberty: Simple and Deterministic Matrix Sketches 6 / 41
8 Matrix Approximation Column subset selection algorithms Paper Space Time Bound FKV04 O(k 4 /ε 6 max(k 4, ε 2 )) O(k 5 /ε 6 max(k 4, ε 2 )) P, εa DV06 #C = O(k/ε + k 2 log k) O(nnz(A)(k/ε + k 2 log k)+ P, εr O(n(k/ε + k 2 log k)) (n + d)(k 2 /ε 2 + k 3 log(k/ε) + k 4 log 2 k)) DKM06 #C = O(1/ε 2 ) O((n + 1/ε 2 )/ε 4 + nnz(a)) P, εl 2 LinearTimeSVD O((n + 1/ε 2 )/ε 4 ) #C = O(k/ε 2 ) O((k/ε 2 ) 2 (n + k/ε 2 ) + nnz(a)) P, εa O((k/ε 2 )(n + k/ε 2 )) DKM06 #C+R = O(1/ε 4 ) O((1/ε 12 + nk/ε 4 + nnz(a)) P, εl 2 ConstantTimeSVD O(1/ε 12 + nk/ε 4 ) #C+R = O(k 2 /ε 4 ) O(k 6 /ε 12 + nk 3 /ε 4 + nnz(a)) P, εa O(k 6 /ε 12 + nk 3 /ε 4 ) DMM08 #C =O(k 2 /ε 2 ) O(nd 2 ) C, εr CUR #R = O(k 4 /ε 6 ) MD09 #C = O(k log k/ε 2 ) O(nd 2 ) P O(k log k/ε 2 ), εr ColumnSelect O(nk log k/ε 2 ) BDM11 #C = 2k/ε(1 + o(1)) O((ndk + dk 3 )ε 2/3 ) P 2k/ε(1+o(1)), εr [Relative Errors for Deterministic Low-Rank Matrix Approximations, Ghashami, Phillips 2013] nnz A B 2 ε A 2 Sparsification and entry sampling Paper Space Time Bound AM07 ρn/ε 2 + n polylog(n) nnz ρn/ε 2 + nnz n polylog(n) A B 2 ε A 2 AHK06 (ñnz n/ε 2 ) 1/2 nnz(ñnz n/ε 2 ) 1/2 A B 2 ε A 2 DZ11 ρn log(n)/ε 2 nnz ρn log(n)/ε 2 A B 2 ε A 2 AKL13 ñ ρ log(n)/ε 2 + (ρ log(n) ñnz /ε 2 ) 1/2 [Near-optimal Distributions for Data Matrix Sampling, Achlioptas, Karnin, Liberty, 2013] Edo Liberty: Simple and Deterministic Matrix Sketches 7 / 41
9 Matrix Approximation Linear subspace embedding sketches Paper Space Time Bound DKM06 #R = O(1/ε 2 ) O((d + 1/ε 2 )/ε 4 + nnz(a)) P, εl 2 LinearTimeSVD O((d + 1/ε 2 )/ε 4 ) #R = O(k/ε 2 ) O((k/ε 2 ) 2 (d + k/ε 2 ) + nnz(a)) P, εa O((k/ε 2 ) 2 (d + k/ε 2 )) Sar06 #R = O(k/ε + k log k) O(nnz(A)(k/ε + k log k) + d(k/ε + P O(k/ε+k log k), εr turnstile O(d(k/ε + k log k)) k log k) 2 )) CW09 #R = O(k/ε) O(nd 2 + (ndk/ε)) P O(k/ε), εr CW09 O((n + d)(k/ε)) O(nd 2 + (ndk/ε)) C, εr CW09 O((k/ε 2 )(n + d/ε 2 )) O(n(k/ε 2 ) 2 + nd(k/ε 2 ) + nd 2 ) C, εr Deterministic sketching algorithms Paper Space Time Bound FSS13 O((k/ε) log n) n((k/ε) log n) O(1) P 2 k/ε, εr Lib13 #R = O(ρ/ε) O(ndρ/ε) P O(ρ/ε), εl 2 O(dρ/ε) GP13 #R = k/ε + k O(dk/ε) O(ndk/ε) P, εr [Relative Errors for Deterministic Low-Rank Matrix Approximations, Ghashami, Phillips 2013] Edo Liberty: Simple and Deterministic Matrix Sketches 8 / 41
10 Frequent Directions Goal: Efficiently maintain a matrix B with only l = 2/ε columns s.t. AA T BB T 2 ε A 2 f Intuition: Extend Frequent-items [Finding repeated elements, Misra, Gries, 1982.] [Frequency estimation of internet packet streams with limited space, Demaine, Lopez-Ortiz, Munro, 2002] [A simple algorithm for finding frequent elements in streams and bags, Karp, Shenker, Papadimitriou, 2003] [Efficient Computation of Frequent and Top-k Elements in Data Streams, Metwally, Agrawal, Abbadi, 2006] (An algorithm so good it was invented 4 times.) Edo Liberty: Simple and Deterministic Matrix Sketches 9 / 41
11 Frequent Items Obtain the frequency f (i) of each item in the stream of items Edo Liberty: Simple and Deterministic Matrix Sketches 10 / 41
12 Frequent Items With d counters it s easy but not good enough (IP addresses, queries...) Edo Liberty: Simple and Deterministic Matrix Sketches 11 / 41
13 Frequent Items (Misra-Gries) Lets keep less than a fixed number of counters l. Edo Liberty: Simple and Deterministic Matrix Sketches 12 / 41
14 Frequent Items If an item has a counter we add 1 to that counter. Edo Liberty: Simple and Deterministic Matrix Sketches 13 / 41
15 Frequent Items Otherwise, we create a new counter for it and set it to 1 Edo Liberty: Simple and Deterministic Matrix Sketches 14 / 41
16 Frequent Items But now we do not have less than l counters. Edo Liberty: Simple and Deterministic Matrix Sketches 15 / 41
17 Frequent Items Let δ be the median counter value at time t Edo Liberty: Simple and Deterministic Matrix Sketches 16 / 41
18 Frequent Items Decrease all counters by δ (or set to zero if less than δ) Edo Liberty: Simple and Deterministic Matrix Sketches 17 / 41
19 Frequent Items And continue... Edo Liberty: Simple and Deterministic Matrix Sketches 18 / 41
20 Frequent Items The approximated counts are f Edo Liberty: Simple and Deterministic Matrix Sketches 19 / 41
21 Frequent Items We increase the count by only 1 for each item appearance. f (i) f (i) Because we decrease each counter by at most δ t at time t f (i) f (i) t δ t Calculating the total approximated frequencies: 0 f (i) i t Setting l = 2/ε yields 1 (l/2) δ t = n (l/2) δ t 2n/l t f (i) f (i) εn t δ t Edo Liberty: Simple and Deterministic Matrix Sketches 20 / 41
22 Frequent Directions We keep a sketch of at most l columns Edo Liberty: Simple and Deterministic Matrix Sketches 21 / 41
23 Frequent Directions We maintain the invariant that some columns are empty (zero valued) Edo Liberty: Simple and Deterministic Matrix Sketches 22 / 41
24 Frequent Directions Input vectors are simply stored in empty columns Edo Liberty: Simple and Deterministic Matrix Sketches 23 / 41
25 Frequent Directions Input vectors are simply stored in empty columns Edo Liberty: Simple and Deterministic Matrix Sketches 24 / 41
26 Frequent Directions When the sketch is full we need to zero out some columns... Edo Liberty: Simple and Deterministic Matrix Sketches 25 / 41
27 Frequent Directions Using the SVD we compute B = USV T and set B new = US Edo Liberty: Simple and Deterministic Matrix Sketches 26 / 41
28 Frequent Directions Note that BB T = B new B T new so we don t lose anything Edo Liberty: Simple and Deterministic Matrix Sketches 27 / 41
29 Frequent Directions The columns of B are now orthogonal and in decreasing magnitude order Edo Liberty: Simple and Deterministic Matrix Sketches 28 / 41
30 Frequent Directions Let δ = B l/2 2 Edo Liberty: Simple and Deterministic Matrix Sketches 29 / 41
31 Frequent Directions Reduce column l 2 2-norms by δ (or nullify if less than δ) Edo Liberty: Simple and Deterministic Matrix Sketches 30 / 41
32 Frequent Directions Start aggregating columns again... Edo Liberty: Simple and Deterministic Matrix Sketches 31 / 41
33 Frequent Directions Input: l, A R d n B all zeros matrix R d l for i [n] do Insert A i into a zero valued column of B if B has no zero valued colums then [U, Σ, V ] SVD(B) δ σl/2 2 ˇΣ max(σ 2 I l δ, 0) B U ˇΣ # At least half the columns of B are zero. Return: B Edo Liberty: Simple and Deterministic Matrix Sketches 32 / 41
34 Bounding the error We first bound AA T BB T sup xa 2 xb 2 = sup x =1 x =1 t=1 = sup = x =1 t=1 n [ x, A t 2 + xb t 1 2 xb t 2 ] n [ xc t 2 xb t 2 ] n C t T C t B t T B t x 2 t=1 n t=1 δ t Which gives: AA T BB T n t=1 δ t Edo Liberty: Simple and Deterministic Matrix Sketches 33 / 41
35 Bounding the error We compute the Frobenius norm of the final sketch. 0 B 2 f = = = Which gives: n [ B t 2 f Bt 1 2 f ] t=1 n [( C t 2 f Bt 1 2 f ) ( C t 2 f Bt 2 f )] t=1 n A t 2 tr(c t T C t B t T B t ) t=1 A 2 f (l/2) n t=1 δ t n δ t 2 A 2 f /l t=1 Edo Liberty: Simple and Deterministic Matrix Sketches 34 / 41
36 Bounding the error We saw that: and that: Setting l = 2/ε yields AA T BB T n t=1 n δ t 2 A 2 f /l t=1 δ t AA T BB T ε A 2 f. The two proofs are (maybe unsurprisingly) very similar... Edo Liberty: Simple and Deterministic Matrix Sketches 35 / 41
37 Experiments AA T BB T as a function of the sketch size l Sketch accuracy 20,000 18,000 16,000 14,000 12,000 10,000 8,000 6,000 Naive Sampling Hashing Random ProjecBons Frequent DirecBons Bound Frequent DirecBons Brute Force 4,000 2, Number of rows in sketch Synthetic input matrix with linearly decaying singular values Edo Liberty: Simple and Deterministic Matrix Sketches 36 / 41
38 Experiments Running time in second as a function of n (x-axis) and d (y-axis) Running 'me in seconds ,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 Number of rows in matrix The running time scales linearly in n, d and l as expected. Edo Liberty: Simple and Deterministic Matrix Sketches 37 / ,
39 More results Theorem For any matrix A R n d, FrequentDirections guaranties that: AA T BB T 2 A A k 2 F /(l k). This holds also for all k < l including k = 0. For example, for l = k + 1/ε we use O(dk + d/ε) space and have. AA T BB T 2 ε A A k 2 F Theorem This is space optimal. Any streaming algorithm with this guarantee must use Ω(dk + d/ε) space. Edo Liberty: Simple and Deterministic Matrix Sketches 38 / 41
40 More results Theorem Let B R l d be the sketch produced by FrequentDirections. For any k < l it holds that A π k B (A) 2 F (1 + k l k ) A A k 2 F. For example, for l = k/ε we use O(dk/ε) space and have. A π k B (A) 2 F (1 + ε) A A k 2 F Theorem This is space optimal. Any streaming algorithm with this guarantee must use Ω(dk/ε) space. Edo Liberty: Simple and Deterministic Matrix Sketches 39 / 41
41 More results Theorem There exists a variant of FrequentDirections with the same guaranties and space complexity whose running time is Õ(l 2 n + l nnz(a)) Here, Õ( ) suppresses logarithmic factors and numerical convergence dependencies. The power method requires Õ(1) iterations to converge. Conjecture This is not optimal. Edo Liberty: Simple and Deterministic Matrix Sketches 40 / 41
42 Thanks Edo Liberty: Simple and Deterministic Matrix Sketches 41 / 41
to be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More information4 Frequent Directions
4 Frequent Directions Edo Liberty[3] discovered a strong connection between matrix sketching and frequent items problems. In FREQUENTITEMS problem, we are given a stream S = hs 1,s 2,...,s n i of n items
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in
More information10 Distributed Matrix Sketching
10 Distributed Matrix Sketching In order to define a distributed matrix sketching problem thoroughly, one has to specify the distributed model, data model and the partition model of data. The distributed
More informationLecture 9: Matrix approximation continued
0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during
More informationarxiv: v6 [cs.ds] 11 Jul 2012
Simple and Deterministic Matrix Sketching Edo Liberty arxiv:1206.0594v6 [cs.ds] 11 Jul 2012 Abstract We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching
More informationFrequent Directions: Simple and Deterministic Matrix Sketching
: Simple and Deterministic Matrix Sketching Mina Ghashami University of Utah ghashami@cs.utah.edu Jeff M. Phillips University of Utah jeffp@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com David
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationImproved Practical Matrix Sketching with Guarantees
Improved Practical Matrix Sketching with Guarantees Amey Desai Mina Ghashami Jeff M. Phillips January 26, 215 Abstract Matrices have become essential data representations for many large-scale problems
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationdimensionality reduction for k-means and low rank approximation
dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques
More information11 Heavy Hitters Streaming Majority
11 Heavy Hitters A core mining problem is to find items that occur more than one would expect. These may be called outliers, anomalies, or other terms. Statistical models can be layered on top of or underneath
More information15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018
15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science
More informationEfficiently Implementing Sparsity in Learning
Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have
More informationFast Dimension Reduction
Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion
More informationRelative-Error CUR Matrix Decompositions
RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or
More informationLecture 18 Nov 3rd, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different
More informationRandomized algorithms for the approximation of matrices
Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics
More informationEdo Liberty Principal Scientist Amazon Web Services. Streaming Quantiles
Edo Liberty Principal Scientist Amazon Web Services Streaming Quantiles Streaming Quantiles Manku, Rajagopalan, Lindsay. Random sampling techniques for space efficient online computation of order statistics
More informationContinuous Matrix Approximation on Distributed Data
Continuous Matrix Approximation on Distributed Data Mina Ghashami School of Computing University of Utah ghashami@cs.uah.edu Jeff M. Phillips School of Computing University of Utah jeffp@cs.uah.edu Feifei
More information12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)
12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.
More informationImproved Practical Matrix Sketching with Guarantees
Improved Practical Matrix Sketching with Guarantees Mina Ghashami, Amey Desai, and Jeff M. Phillips University of Utah Abstract. Matrices have become essential data representations for many large-scale
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching
More informationOptimal Principal Component Analysis in Distributed and Streaming Models
Optimal Principal Component Analysis in Distributed and Streaming Models Christos Boutsidis Goldman Sachs New York, USA christos.boutsidis@gmail.com David P. Woodruff IBM Research California, USA dpwoodru@us.ibm.com
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationrandomized block krylov methods for stronger and faster approximate svd
randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS singular value decomposition n d left
More informationEstimating Dominance Norms of Multiple Data Streams Graham Cormode Joint work with S. Muthukrishnan
Estimating Dominance Norms of Multiple Data Streams Graham Cormode graham@dimacs.rutgers.edu Joint work with S. Muthukrishnan Data Stream Phenomenon Data is being produced faster than our ability to process
More informationEffective computation of biased quantiles over data streams
Effective computation of biased quantiles over data streams Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com
More informationarxiv: v2 [cs.ds] 17 Feb 2016
Efficient Algorithm for Sparse Matrices Mina Ghashami University of Utah ghashami@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com Jeff M. Phillips University of Utah jeffp@cs.utah.edu arxiv:1602.00412v2
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationFast Random Projections
Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationFast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis
Fast Monte Carlo Algorithms for Matrix Operations & Massive Data Set Analysis Michael W. Mahoney Yale University Dept. of Mathematics http://cs-www.cs.yale.edu/homes/mmahoney Joint work with: P. Drineas
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More informationRandomized Numerical Linear Algebra: Review and Progresses
ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November
More informationSketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden
Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationSpace-optimal Heavy Hitters with Strong Error Bounds
Space-optimal Heavy Hitters with Strong Error Bounds Graham Cormode graham@research.att.com Radu Berinde(MIT) Piotr Indyk(MIT) Martin Strauss (U. Michigan) The Frequent Items Problem TheFrequent Items
More informationTighter Low-rank Approximation via Sampling the Leveraged Element
Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya Indian Institute of Science, Bangalore arnabb@csa.iisc.ernet.in Palash Dey Indian Institute of
More informationChapter XII: Data Pre and Post Processing
Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationDistributed Summaries
Distributed Summaries Graham Cormode graham@research.att.com Pankaj Agarwal(Duke) Zengfeng Huang (HKUST) Jeff Philips (Utah) Zheiwei Wei (HKUST) Ke Yi (HKUST) Summaries Summaries allow approximate computations:
More informationMaintaining Significant Stream Statistics over Sliding Windows
Maintaining Significant Stream Statistics over Sliding Windows L.K. Lee H.F. Ting Abstract In this paper, we introduce the Significant One Counting problem. Let ε and θ be respectively some user-specified
More informationCS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014
CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /
More informationApproximate Counts and Quantiles over Sliding Windows
Approximate Counts and Quantiles over Sliding Windows Arvind Arasu Stanford University arvinda@cs.stanford.edu Gurmeet Singh Manku Stanford University manku@cs.stanford.edu Abstract We consider the problem
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationA Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows
A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows ABSTRACT L.K. Lee Department of Computer Science The University of Hong Kong Pokfulam, Hong Kong lklee@cs.hku.hk
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationMining Frequent Items in a Stream Using Flexible Windows (Extended Abstract)
Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract) Toon Calders, Nele Dexters and Bart Goethals University of Antwerp, Belgium firstname.lastname@ua.ac.be Abstract. In this paper,
More informationJohnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels
Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Devdatt Dubhashi Department of Computer Science and Engineering, Chalmers University, dubhashi@chalmers.se Functional
More informationMANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional
Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationFrom Non-Negative Matrix Factorization to Deep Learning
The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #21: Dimensionality Reduction Seoul National University 1 In This Lecture Understand the motivation and applications of dimensionality reduction Learn the definition
More informationQuantum Recommendation Systems
Quantum Recommendation Systems Iordanis Kerenidis 1 Anupam Prakash 2 1 CNRS, Université Paris Diderot, Paris, France, EU. 2 Nanyang Technological University, Singapore. April 4, 2017 The HHL algorithm
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationLink Analysis Ranking
Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationBoutsidis, Garber, Karnin, Liberty. PRESENTED BY Firstname Lastname August 25, 2013 PRESENTED BY Zohar Karnin November 23, 2014
A Online PowerPoint Principal Presentation Component Analysis Boutsidis, Garber, Karnin, Liberty PRESENTED BY Firstname Lastname August 5, 013 PRESENTED BY Zohar Karnin November 3, 014 Data Matrix Often,
More informationQALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.
QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview
More informationSingle Pass PCA of Matrix Products
Single Pass PCA of Matrix Products Shanshan Wu The University of Texas at Austin shanshan@utexas.edu Sujay Sanghavi The University of Texas at Austin sanghavi@mail.utexas.edu Srinadh Bhojanapalli Toyota
More informationNormalized power iterations for the computation of SVD
Normalized power iterations for the computation of SVD Per-Gunnar Martinsson Department of Applied Mathematics University of Colorado Boulder, Co. Per-gunnar.Martinsson@Colorado.edu Arthur Szlam Courant
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationEdo Liberty. Accelerated Dense Random Projections
Edo Liberty Accelerated Dense Random Projections CONTENTS 1. List of common notations......................................... 5 2. Introduction to random projections....................................
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationProcessing Big Data Matrix Sketching
Processing Big Data Matrix Sketching Dimensionality reduction Linear Principal Component Analysis: SVD-based Compressed sensing Matrix sketching Non-linear Kernel PCA Isometric mapping Matrix sketching
More informationLecture 01 August 31, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed
More informationFirst Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate
58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu
More informationData stream algorithms via expander graphs
Data stream algorithms via expander graphs Sumit Ganguly sganguly@cse.iitk.ac.in IIT Kanpur, India Abstract. We present a simple way of designing deterministic algorithms for problems in the data stream
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationCS 229r: Algorithms for Big Data Fall Lecture 17 10/28
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationSketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden
Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less
More informationEstimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard,
Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, 2011 3884 Morten Houmøller Nygaard, 2011 4582 Master s Thesis, Computer Science June 2016 Main Supervisor: Gerth Stølting Brodal
More informationA Mergeable Summaries
A Mergeable Summaries Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi We study the mergeability of data summaries. Informally speaking, mergeability requires
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More informationCS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization
CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationTitle: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,
Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch
More informationLower Bound Techniques for Multiparty Communication Complexity
Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand
More informationCPSC 340 Assignment 4 (due November 17 ATE)
CPSC 340 Assignment 4 due November 7 ATE) Multi-Class Logistic The function example multiclass loads a multi-class classification datasetwith y i {,, 3, 4, 5} and fits a one-vs-all classification model
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationPrincipal components analysis COMS 4771
Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More information7 Algorithms for Massive Data Problems
7 Algorithms for Massive Data Problems Massive Data, Sampling This chapter deals with massive data problems where the input data (a graph, a matrix or some other object) is too large to be stored in random
More information3 Best-Fit Subspaces and Singular Value Decomposition
3 Best-Fit Subspaces and Singular Value Decomposition (SVD) Think of the rows of an n d matrix A as n data points in a d-dimensional space and consider the problem of finding the best k-dimensional subspace
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationExercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians
Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following
More informationCS 5321: Advanced Algorithms Amortized Analysis of Data Structures. Motivations. Motivation cont d
CS 5321: Advanced Algorithms Amortized Analysis of Data Structures Ali Ebnenasir Department of Computer Science Michigan Technological University Motivations Why amortized analysis and when? Suppose you
More information