Sketching as a Tool for Numerical Linear Algebra

Similar documents
Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra

Approximate Spectral Clustering via Randomized Sketching

Randomized Numerical Linear Algebra: Review and Progresses

arxiv: v3 [cs.ds] 21 Mar 2013

The Fast Cauchy Transform and Faster Robust Linear Regression

Fast Dimension Reduction

Sketched Ridge Regression:

Sparse Johnson-Lindenstrauss Transforms

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Multidimensional data analysis in biomedicine and epidemiology

Randomized Algorithms

Empirical Performance of Approximate Algorithms for Low Rank Approximation

EECS 275 Matrix Computation

Sparser Johnson-Lindenstrauss Transforms

RandNLA: Randomization in Numerical Linear Algebra

Subspace Embedding and Linear Regression with Orlicz Norm

RandNLA: Randomized Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

arxiv: v4 [cs.ds] 5 Apr 2013

to be more efficient on enormous scale, in a stream, or in distributed settings.

Methods for sparse analysis of high-dimensional data, II

Supremum of simple stochastic processes

Fast Dimension Reduction

Least squares problems Linear Algebra with Computer Science Application

Randomized algorithms for the approximation of matrices

Lecture 9: Low Rank Approximation

A fast randomized algorithm for approximating an SVD of a matrix

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

Randomized Algorithms in Linear Algebra and Applications in Data Analysis

Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390

A fast randomized algorithm for overdetermined linear least-squares regression

Lecture 18 Nov 3rd, 2015

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Accelerated Dense Random Projections

Sketching Structured Matrices for Faster Nonlinear Regression

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional

Dimensionality Reduction Notes 3

Introduction to Compressed Sensing

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

Dense Fast Random Projections and Lean Walsh Transforms

Subspace Embeddings for the Polynomial Kernel

Low-Rank PSD Approximation in Input-Sparsity Time

Faster Johnson-Lindenstrauss style reductions

dimensionality reduction for k-means and low rank approximation

Sparsity Lower Bounds for Dimensionality Reducing Maps

Lecture 15: Random Projections

Optimality of the Johnson-Lindenstrauss Lemma

Very Sparse Random Projections

THE FAST CAUCHY TRANSFORM AND FASTER ROBUST LINEAR REGRESSION

Information-Theoretic Methods in Data Science

Approximate Principal Components Analysis of Large Data Sets

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

Low Rank Matrix Approximation

Optimal Bounds for Johnson-Lindenstrauss Transformations

Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap

The Johnson-Lindenstrauss Lemma Is Optimal for Linear Dimensionality Reduction

25.2 Last Time: Matrix Multiplication in Streaming Model

Recovering any low-rank matrix, provably

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 3 Sept. 4, 2014

Norms of Random Matrices & Low-Rank via Sampling

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction

Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares

Random Methods for Linear Algebra

Randomness Efficient Fast-Johnson-Lindenstrauss Transform with Applications in Differential Privacy

JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION

Fast Approximation of Matrix Coherence and Statistical Leverage

Optimal compression of approximate Euclidean distances

Chapter XII: Data Pre and Post Processing

Dimensionality reduction of SDPs through sketching

Methods for sparse analysis of high-dimensional data, II

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

arxiv: v2 [stat.ml] 29 Nov 2018

Yale university technical report #1402.

A Tutorial on Matrix Approximation by Row Sampling

randomized block krylov methods for stronger and faster approximate svd

Tighter Low-rank Approximation via Sampling the Leveraged Element

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Technical Report. Random projections for Bayesian regression. Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014

An Iterative, Sketching-based Framework for Ridge Regression

Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes

Compressed Least Squares Regression revisited

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Simple and Deterministic Matrix Sketches

Fast Approximate Matrix Multiplication by Solving Linear Systems

Gradient Projection Iterative Sketch for Large-Scale Constrained Least-Squares

Fast Matrix Computations via Randomized Sampling. Gunnar Martinsson, The University of Colorado at Boulder

Relative-Error CUR Matrix Decompositions

Sketching as a Tool for Numerical Linear Algebra

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

R A N D O M I Z E D L I N E A R A L G E B R A F O R L A R G E - S C A L E D ATA A P P L I C AT I O N S

Fast Relative-Error Approximation Algorithm for Ridge Regression

Compressed Sensing and Robust Recovery of Low Rank Matrices

Optimality of the Johnson-Lindenstrauss lemma

Lecture 16: Compressed Sensing

Transcription:

Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 1 / 25

Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 2 / 25

Goal New survey by David Woodruff: Sketching as a Tool for Numerical Linear Algebra Topics: Subspace Embeddings Least Squares Regression Least Absolute Deviation Regression Low Rank Approximation Graph Sparsification Sketching Lower Bounds Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 3 / 25

Introduction You have Big data! Computationally expensive to deal with Excessive storage requirement Hard to communicate... Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 4 / 25

Introduction You have Big data! Computationally expensive to deal with Excessive storage requirement Hard to communicate... Summarize your data Sampling A representative subset of the data Sketching An aggregate summary of the whole data Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 5 / 25

Model Input: matrix A R n d vector b R n. Output: function F(A, b,...) e.g. least square regression Different goals: Faster algorithms Streaming Distributed Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 6 / 25

Linear Sketching Input: matrix A R n d Let r n and S R r n be a random matrix Let S A be the sketch Compute F(S A) instead of F(A) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 7 / 25

Linear Sketching (cont.) Pros: Compute on a r d matrix instead of n d Smaller representation and faster computation Linearity: S (A + B) = S A + S B We can compose linear sketches! Cons: F(S A) is an approximation of F(A) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 8 / 25

Least Square Regression (l 2 -regression) Input: matrix A R n d (full column rank) vector b R n Output x R d : Closed form solution: x = arg min x Ax b 2 x = (A T A) 1 A T b Θ(nd 2 )-time algorithm using naive matrix multiplication Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 9 / 25

Approximate l 2 -regression Input: matrix A R n d (full column rank) vector b R n parameter 0 < ε < 1 Output ˆx R d : Aˆx b 2 (1 + ε) arg min x Ax b 2 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 10 / 25

Approximate l 2 -regression (cont.) A sketching algorithm: Sample a random matrix S R r n Compute S A and S b Output ˆx = arg minx (SA)x (Sb) 2 Which randomized family of matrices S and what value of r? Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 11 / 25

Approximate l 2 -regression (cont.) An introductory construction: Let r = Θ(d/ε 2 ) Let S R r n be a matrix of i.i.d normal random variables with mean zero and variance 1/r Proof Sketch. On the board Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 12 / 25

Approximate l 2 -regression (cont.) Problems: Computing S A takes Θ(nrd) time Constructing S requires Θ(nr) space Different constructions for S: Fast Johnson-Lindenstrauss transforms: O(nd log d) + poly(d/ε) time [Sarlos, FOCS 06] Optimal O(nnz(A)) + poly(d/ε) time algorithm [Clarkson, Woodruff, STOC 13] Random sign matrices with Θ(d)-wise independent entries: O(d 2 /ε log (nd))-space streaming algorithm [Clarkson, Woodruff, STOC 09] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 13 / 25

Subspace Embedding Definition (l 2 -subspace embedding) A (1 ± ε) l 2 -subspace embedding for a matrix A R n d is a matrix S for which for all x R n SAx 2 2 = (1 ± ε) Ax 2 2 Actually subspace embedding for column space of A Oblivious l 2 -subspace embedding The distribution from which S is chosen is oblivious to A One very common tool for (oblivious) l 2 -subspace embedding is Johnson-Lindenstrauss transform (JLT) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 14 / 25

Johnson-Lindenstrauss transform Definition (JLT(ε, δ, f )) A random matrix S R r d forms a JLT(ε, δ, f ), if with probability at least 1 δ, for any f -element subset V R n, it holds that: v, v V Sv, Sv v, v ε v 2 v 2 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 15 / 25

Johnson-Lindenstrauss transform Definition (JLT(ε, δ, f )) A random matrix S R r d forms a JLT(ε, δ, f ), if with probability at least 1 δ, for any f -element subset V R n, it holds that: v, v V Sv, Sv v, v ε v 2 v 2 Usual statement (i.e. original Johnson-Lindenstrauss Lemma) Lemma (JLL) Given N points q 1,..., q N R n, there exists a matrix S R t n (linear map) for t = Θ(log N /ε 2 ) such that with high probability, simultaneously for all pairs q i and q j, S(q i q j ) 2 = (1 ± ε) (q i q j ) 2 Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 16 / 25

Johnson-Lindenstrauss transform (cont.) A simple construction of JLT(ε, δ, f ): Theorem Let 0 < ε, δ < 1 and S = 1 r R R r n where the entries R i,j are independent standard normal random variables. Assuming r = Ω(ε 2 log (f /δ)) then S is a JLT(ε, δ, f ). Other constructions: Random sign matrices [Achlioptas, 03],[Clarkson, Woodruff, STOC 09] Random sparse matrices [Dasgupta, Kumar, Sarlos, STOC 10],[Kane, Nelson, J. ACM 14] Fast Johnson-Lindenstrauss transforms [Ailon, Chazelle, STOC 06] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 17 / 25

JLT results in l 2 -subspace embedding Claim S = JLT(ε, δ, f ) is an oblivious l 2 -subspace embedding for A R n d Challenge: JLT(ε, δ, f ) provides a guarantee for a single finite set in R n l2 -subspace embedding requires the guarantee for an infinite set, i.e. the column space of A Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 18 / 25

JLT results in l 2 -subspace embedding (cont.) Let S be the unit sphere in column space of A S = {y R n y = Ax for some x R d and y 2 = 1} We seek a finite subset N S so that if w, w N Sw, Sw = w, w ± ε then y S Sy = (1 ± ε) y Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 19 / 25

JLT results in l 2 -subspace embedding (cont.) Lemma ( 1 2-net for S) Suffices to choose any N such that Proof. 1 Decompose y: where y S w N s.t. y w 2 1/2 y (i) 2 1 2 i y = y (0) + y (1) + y (2) +... and y i y (i) N 2 Sy 2 2 = S(y (0) + y (1) + y (2) +...) = 1 ± O(ε) Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 20 / 25

1 2-net of S Lemma There exists a 1 -net N of S for which N 5d 2 Proof. 1 Find a set N of maximal number of points in R d so that no two points are within 1/2 distance from each other 2 Let U be the orthonormal matrix of column space of A 3 N = {y R n y = Ux for some x N and y 2 = 1} Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 21 / 25

Subspace Embedding via JLT Theorem Let 0 < ε, δ < 1 and S = JLT(ε, δ, 5 d ). For any fixed matrix A R n d, with probability 1 δ, S is a (1 ± ε) l 2 -subspace embedding for A, i.e. x R d, SAx 2 = (1 ± ε) Ax 2 Results in O(nnz(A) ε 1 log d) time algorithm using column-sparsity transform of Kane and Nelson [Kane, Nelson, J. ACM 14] O(nd log n) time algorithm using Fast Johnson-Lindenstrauss transform of Ailon and Chazelle [Ailon, Chazelle, STOC 06] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 22 / 25

Other Subspace Embedding Algorithms Not JLT-based subspace embedding O(nnz(A)) + poly(d/ε) time algorithm [Clarkson, Woodruff, STOC 13] None oblivious subspace embeddings Based on Leverage Score Sampling [Drineas, Mahoney, Muthukrishnan, SODA 06] Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 23 / 25

l 2 -regression via Oblivious Subspace Embedding Theorem Let S R r n be any oblivious subspace embedding matrix and ˆx = arg min x SAx Sb 2 ; then, Proof. SAˆx Sb 2 (1 + ε) arg min x Ax b 2 1 Let matrix U R n (d+1) be the orthonormal basis of columns of A together with vector b 2 Suppose S is a l 2 -subspace embedding for U Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 24 / 25

Questions? Sepehr Assadi (Penn) Sketching for Numerical Linear Algebra Big Data Reading Group 25 / 25