Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 1 / 25
Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 2 / 25
Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 3 / 25
Introduction You have Big data! Computationally expensive to deal with Excessive storage requirement Hard to communicate... Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 4 / 25
Introduction You have Big data! Computationally expensive to deal with Excessive storage requirement Hard to communicate... Summarize your data Sampling A representative subset of the data Sketching An aggregate summary of the whole data Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 5 / 25
Model Input: matrix A R n d vector b R n. Output: function F(A, b,...) e.g. least square regression Different goals: Faster algorithms Streaming Distributed Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 6 / 25
Linear Sketching Input: matrix A R n d Let r n and S R r n be a random matrix Let S A be the sketch Compute F(S A) instead of F(A) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 7 / 25
Linear Sketching (cont.) Pros: Compute on a r d matrix instead of n d Smaller representation and faster computation Linearity: S (A + B) = S A + S B We can compose linear sketches! Cons: F(S A) is an approximation of F(A) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 8 / 25
Least Square Regression (l 2 -regression) Input: matrix A R n d (full column rank) vector b R n Output x R d : Closed form solution: x = arg min x Ax b 2 x = (A T A) 1 A T b Θ(nd 2 )-time algorithm using naive matrix multiplication Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 9 / 25
Approximate l 2 -regression Input: matrix A R n d (full column rank) vector b R n parameter 0 < ε < 1 Output ˆx R d : Aˆx b 2 (1 + ε) arg min x Ax b 2 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 10 / 25
Approximate l 2 -regression (cont.) A sketching algorithm: Sample a random matrix S R r n Compute S A and S b Output ˆx = arg minx (SA)x (Sb) 2 Which randomized family of matrices S and what value of r? Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 11 / 25
Approximate l 2 -regression (cont.) An introductory construction: Let r = Θ(d/ε 2 ) Let S R r n be a matrix of i.i.d normal random variables with mean zero and variance 1/r Proof Sketch. On the board Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 12 / 25
Approximate l 2 -regression (cont.) Problems: Computing S A takes Θ(nrd) time Constructing S requires Θ(nr) space Different constructions for S: Fast Johnson-Lindenstrauss transforms: O(nd log d) + poly(d/ε) time [Sarlos, FOCS 06] Optimal O(nnz(A)) + poly(d/ε) time algorithm [Clarkson, Woodruff, STOC 13] Random sign matrices with Θ(d)-wise independent entries: O(d 2 /ε log (nd))-space streaming algorithm [Clarkson, Woodruff, STOC 09] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 13 / 25
Subspace Embedding Definition (l 2 -subspace embedding) A (1 ± ε) l 2 -subspace embedding for a matrix A R n d is a matrix S for which for all x R n SAx 2 2 = (1 ± ε) Ax 2 2 Actually subspace embedding for column space of A Oblivious l 2 -subspace embedding The distribution from which S is chosen is oblivious to A One very common tool for (oblivious) l 2 -subspace embedding is Johnson-Lindenstrauss transform (JLT) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 14 / 25
Johnson-Lindenstrauss transform Definition (JLT(ε, δ, f )) A random matrix S R r d forms a JLT(ε, δ, f ), if with probability at least 1 δ, for any f -element subset V R n, it holds that: v, v V Sv, Sv v, v ε v 2 v 2 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 15 / 25
Johnson-Lindenstrauss transform Definition (JLT(ε, δ, f )) A random matrix S R r d forms a JLT(ε, δ, f ), if with probability at least 1 δ, for any f -element subset V R n, it holds that: v, v V Sv, Sv v, v ε v 2 v 2 Usual statement (i.e. original Johnson-Lindenstrauss Lemma) Lemma (JLL) Given N points q 1,..., q N R n, there exists a matrix S R t n (linear map) for t = Θ(log N /ε 2 ) such that with high probability, simultaneously for all pairs q i and q j, S(q i q j ) 2 = (1 ± ε) (q i q j ) 2 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 16 / 25
Johnson-Lindenstrauss transform (cont.) A simple construction of JLT(ε, δ, f ): Theorem Let 0 < ε, δ < 1 and S = 1 r R R r n where the entries R i,j are independent standard normal random variables. Assuming r = Ω(ε 2 log (f /δ)) then S is a JLT(ε, δ, f ). Other constructions: Random sign matrices [Achlioptas, 03],[Clarkson, Woodruff, STOC 09] Random sparse matrices [Dasgupta, Kumar, Sarlos, STOC 10],[Kane, Nelson, J. ACM 14] Fast Johnson-Lindenstrauss transforms [Ailon, Chazelle, STOC 06] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 17 / 25
JLT results in l 2 -subspace embedding Claim S = JLT(ε, δ, f ) is an oblivious l 2 -subspace embedding for A R n d Challenge: JLT(ε, δ, f ) provides a guarantee for a single finite set in R n l2 -subspace embedding requires the guarantee for an infinite set, i.e. the column space of A Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 25
JLT results in l 2 -subspace embedding (cont.) Let S be the unit sphere in column space of A S = {y R n y = Ax for some x R d and y 2 = 1} We seek a finite subset N S so that if w, w N Sw, Sw = w, w ± ε then y S Sy = (1 ± ε) y Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 19 / 25
JLT results in l 2 -subspace embedding (cont.) Lemma ( 1 2-net for S) Suffices to choose any N such that Proof. 1 Decompose y: where y S w N s.t. y w 2 1/2 y (i) 2 1 2 i y = y (0) + y (1) + y (2) +... and y i y (i) N 2 Sy 2 2 = S(y (0) + y (1) + y (2) +...) = 1 ± O(ε) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 20 / 25
1 2-net of S Lemma There exists a 1 -net N of S for which N 5d 2 Proof. 1 Find a set N of maximal number of points in R d so that no two points are within 1/2 distance from each other 2 Let U be the orthonormal matrix of column space of A 3 N = {y R n y = Ux for some x N and y 2 = 1} Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 21 / 25
Subspace Embedding via JLT Theorem Let 0 < ε, δ < 1 and S = JLT(ε, δ, 5 d ). For any fixed matrix A R n d, with probability 1 δ, S is a (1 ± ε) l 2 -subspace embedding for A, i.e. x R d, SAx 2 = (1 ± ε) Ax 2 Results in O(nnz(A) ε 1 log d) time algorithm using column-sparsity transform of Kane and Nelson [Kane, Nelson, J. ACM 14] O(nd log n) time algorithm using Fast Johnson-Lindenstrauss transform of Ailon and Chazelle [Ailon, Chazelle, STOC 06] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 22 / 25
Other Subspace Embedding Algorithms Not JLT-based subspace embedding O(nnz(A)) + poly(d/ε) time algorithm [Clarkson, Woodruff, STOC 13] None oblivious subspace embeddings Based on Leverage Score Sampling [Drineas, Mahoney, Muthukrishnan, SODA 06] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 23 / 25
l 2 -regression via Oblivious Subspace Embedding Theorem Let S R r n be any oblivious subspace embedding matrix and ˆx = arg min x SAx Sb 2 ; then, Proof. SAˆx Sb 2 (1 + ε) arg min x Ax b 2 1 Let matrix U R n (d+1) be the orthonormal basis of columns of A together with vector b 2 Suppose S is a l 2 -subspace embedding for U Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 24 / 25
Questions? Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 25 / 25