Random Methods for Linear Algebra

Similar documents
Accelerated Dense Random Projections

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional

Fast Dimension Reduction

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression

Fast Random Projections

Randomized sparsification in NP-hard norms

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Faster Johnson-Lindenstrauss style reductions

JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION

arxiv: v1 [math.na] 20 Nov 2009

Low Rank Matrix Approximation

Approximate Spectral Clustering via Randomized Sketching

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Randomized Numerical Linear Algebra: Review and Progresses

A randomized algorithm for approximating the SVD of a matrix

Supremum of simple stochastic processes

A fast randomized algorithm for approximating an SVD of a matrix

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

Technical Report No September 2009

Enabling very large-scale matrix computations via randomization

Compressed Sensing and Robust Recovery of Low Rank Matrices

A fast randomized algorithm for overdetermined linear least-squares regression

Using Friendly Tail Bounds for Sums of Random Matrices

Random regular digraphs: singularity and spectrum

Optimal compression of approximate Euclidean distances

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Random matrices: Distribution of the least singular value (via Property Testing)

Approximating sparse binary matrices in the cut-norm

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Dense Fast Random Projections and Lean Walsh Transforms

Sparser Johnson-Lindenstrauss Transforms

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Fast Dimension Reduction

STAT 200C: High-dimensional Statistics

A sequence of triangle-free pseudorandom graphs

Reconstruction from Anisotropic Random Measurements

Randomized Algorithms

Randomized algorithms for matrix computations and analysis of high dimensional data

SAMPLING FROM LARGE MATRICES: AN APPROACH THROUGH GEOMETRIC FUNCTIONAL ANALYSIS

On Low Rank Matrix Approximations with Applications to Synthesis Problem in Compressed Sensing

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices

Lecture 3. Random Fourier measurements

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

OPERATIONS on large matrices are a cornerstone of

Matrix decompositions

A fast randomized algorithm for the approximation of matrices preliminary report

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Topics in Randomized Numerical Linear Algebra

Sparse Johnson-Lindenstrauss Transforms

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

Fast Matrix Computations via Randomized Sampling. Gunnar Martinsson, The University of Colorado at Boulder

Singular value decomposition (SVD) of large random matrices. India, 2010

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007

Random Matrices: Invertibility, Structure, and Applications

Very Sparse Random Projections

Using SVD to Recommend Movies

Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes

Graph Sparsification I : Effective Resistance Sampling

Theoretical and empirical aspects of SPSD sketches

An Approximation Algorithm for Approximation Rank

Math 407: Linear Optimization

STAT 200C: High-dimensional Statistics

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Edo Liberty. Accelerated Dense Random Projections

Randomized algorithms for the approximation of matrices

Fast Approximate Matrix Multiplication by Solving Linear Systems

Randomization: Making Very Large-Scale Linear Algebraic Computations Possible

Randomized algorithms for the low-rank approximation of matrices

Random Projections. Lopez Paz & Duvenaud. November 7, 2013

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Norms of Random Matrices & Low-Rank via Sampling

Recovering any low-rank matrix, provably

arxiv: v1 [math.pr] 22 May 2008

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

arxiv: v1 [cs.ds] 30 May 2010

CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT

Recovering overcomplete sparse representations from structured sensing

User-Friendly Tools for Random Matrices

Introduction to Numerical Linear Algebra II

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Concentration and Anti-concentration. Van H. Vu. Department of Mathematics Yale University

EECS 275 Matrix Computation

Randomized Algorithms in Linear Algebra and Applications in Data Analysis

Lecture I: Asymptotics for large GUE random matrices

Sparse Legendre expansions via l 1 minimization

Sketching as a Tool for Numerical Linear Algebra

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

A Randomized Algorithm for the Approximation of Matrices

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

The Johnson-Lindenstrauss Lemma

8.1 Concentration inequality for Gaussian random matrix (cont d)

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Sketching as a Tool for Numerical Linear Algebra

Matrix estimation by Universal Singular Value Thresholding

Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Dissertation Defense

Transcription:

Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009

Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform 2 3 4

The Johnson-Lindenstrauss lemma says that a collection of n points in any Euclidean space can be mapped into a Euclidean space of dimension k = O(log n) with little distortion: Theorem Let ɛ (0, 1 2 ), and let P = {p 1,..., p n } be a set of n points in R n. Let k be an integer with k Cɛ 2 log n, where C is a sufficiently large absolute constant. Then there exists a mapping f : R n R k such that (1 ɛ) p i p j f (p i ) f (p j ) (1 + ɛ) p i p j for all i, j = 1, 2,..., n.

The JL-lemma allows us to compress (an approximation of) the representation of a set of points from O(n 2 ) space to O(n log n) Better, the JL transforms can be done by using multiplication with a random matrix. Projection of all the points takes O(n 2 log n) time. Even better, you can use particularly nice random matrices, and obtain a Fast JL Transform. Projection of all the points takes O(n log n + polylog(1/ɛ, log n)).

Ailon and Chazelle introduced the FJLT earlier work showed you need k = O(log n), so to decrease computation, considered using sparse projection matrices problem is that sparse projection matrices distort sparse vectors key idea: use FFT to increase the support of sparse vectors. use randomized FFT to avoid sparsification of dense vectors form of the random projection doesn t depend on the data with probability 2/3, one application of the FJLT succeeds

Structure of the FJLT Φ = PHD P is a k n matrix such that P ij N(0, q 1 ) w.p. q and P ij = 0 w.p. 1 q, where q depends on n, ɛ. H is an n n normalized Hadamard matrix (real analog of the Fourier matrix) D is a n n diagonal matrix with independent diagonal entries uniformly drawn from 1, 1.

The magic lies in HD. Fix p R n with unit length, then examine (HDp) 1 = n H 1j d jj p j j=1 Apply Hoeffding s inequality and the fact H ij = n 1/2 (gives an upper bound on the probability for a sum of bounded random variables to deviate from its mean): P( (HDp) 1 nnt) 2 exp ( n( nt) 2 ) 2 so with very high probability (HDp) 1 = O(n 1/2 ).

H and D are isometries, and the (HDp) i are i.i.d, so it follows that they are all on the order of n 1/2. we have densification of sparse vectors without sparsification of dense vectors. P is tailored appropriately to get a JLT. Randomized FFTs have become very popular (SRFTs)

The Low Rank Approximation Problem Given large A R m n, efficiently approximate the solution to min A B ξ, ξ {F, 2}. rank(b) k Cost of dense SVD (entire SVD): O(min{mn 2, m 2 n}) Cost of sparse SVD (top k singular vectors): O(kmn + k 2 m)

Ways to use randomization: Reduce the size of the matrix, e.g. by biased sampling of its rows or columns, take the SVD of this matrix, show this is close to the SVD of the original. The Monte-Carlo approach. Approximate the matrix with a random sparse matrix, show that the SVD of this is close to the SVD of the original. Find a rank-k basis Q which minimizes A QQ A and take B = QQ A.

Frieze, Kannan, Vempala s work on Monte Carlo method is seminal key observation is that a good low rank approximation to A (in Frobenius norm) lies in the span of a small subset of its rows. algorithm samples rows of A to form X, biased according to their share of A F. After scaling, SVD of X T X approximates that of A T A. Rudelson and Vershynin showed that if numerical rank of A is r, then by sampling O(r log r) rows of A, can construct a matrix P k so A AP k 2 σ k+1 (A) + ɛ A 2.

Schemes Achlioptas and McSherry considered replacing A with a sparse X using the scheme { a jk /p jk, w. p. p jk X jk = 0, otherwise where p jk = p(a ij /b) 2 (more or less). b is the largest absolute value in A. Key idea: sparsification similar to adding gaussian noise, which has weak spectral features.

Our idea: develop bounds for the error of sparsification schemes which satisfy EX = A and the entries of X are independent. Want a bound on the expected spectral norm of Z = A X ( 1/2 E Z 2 C max EZjk) 2 + max j k k j + jk 1/4 E(X jk a jk ) 4 EZ 2 jk 1/2 This bound is optimal.

Found similar bounds for other useful norms ) E Z 1 2E ( Z col + Z T col E Z 2 2E Z F + 2 min E ZD 1 2 D These bounds are also asymptotically optimal.

Deviations from the Expected Error Let g be a measurable function of n random variables and F = g(x 1,..., X n ). Let F i denote the random variable obtained by replacing the ith argument of g with an independent copy: F i = g(x 1,..., X i,..., X n). Theorem Assume that there exists a positive constant C such that, a.s. n i=1 (F F i) 2 1 F >F C. Then for all t > 0, i ) P(F > EF + t) exp ( t2 4C

Let F = Z ξ and F i be obtained by replacing Z ij = a ij X ij with Z ij = a ij X ij. It follows that if the entries of X are bounded by D/2, then P( A X 2 > (1 + δ)e A X 2 ) e δ2 E A X 2 «2 4D 2 P( A X 1 > (1 + δ)e A X 1 ) e A similar bound holds for the (, 2) error. δ2 E A X 2 1 4D 2 nm «

The combination of our bounds for the expected error and the error tail bounds work as well as Achlioptas and McSherry s scheme. Additionally Analyzing other schemes is straightforward. Our tail bounds are sharper, and apply over a wider range of matrix sizes. comparison to another sparsification scheme (Arora, Hazan, and Kale) is equally favorable.

Find rank-k matrix Q minimizing A QQ A. Idea: if A were exactly rank-k, take Y = AΩ where random matrix Ω has k columns. Then Y = QR gives Q with high probability. since A not exactly rank-k, do an oversampling: let Ω have l columns need a bound on A QQ A so we know how to choose l to minimize this quantity

A nice bound by Tropp, et.al. Let A = U ( Σ1 Σ 2 ) ( ) V 1 where Σ 1 contains the top k singular values and Σ 2 contains the bottom n k. Further, let Ω 1 = V 1 Ω and Ω 2 = V 2 Ω so V 2 ( ) Σ1 Ω Y = AΩ = U 1. Σ 2 Ω 2

Theorem Assuming that Ω 1 has full row rank, the approximation error satisfies (I P Y )A 2 ξ Σ 2 ξ + Σ 2 Ω 2 Ω 1 2 ξ, where ξ {F, 2}.

Known results If Ω is a n (k + p) Gaussian, E (I P Y )A F ( 1 + k ) 2 σj 2 p 1 j>k 1/2. If Ω = n/l RHD is a SRFT H and D are as before, and R is a random l n matrix that restricts to l uniformly randomly chosen coordinates: (I P Y )A 1 + 18n/lσ k+1 except with probability O(k 1/26 ).

Another useful form for Ω Experimentally, taking Ω to be a submatrix of a product of random Givens rotations seems to give good results: G 1 G 2... G s P = : ( Ω samp Ω err ) where P is a random permutation matrix and G i is a Givens rotation about a uniformly randomly chosen angle in a uniformly randomly chosen plane.

I was able to show Ω 2 = 1 1 V 1 Ω errω errv 1 2 so the problem reduces to showing that V 1 Ω errω errv 1 2 is bounded away from 1 with reasonable probability. HARD: connection to convergence of Kac Walk on sphere.