Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Similar documents
Subset Selection. Ilse Ipsen. North Carolina State University, USA

SUBSET SELECTION ALGORITHMS: RANDOMIZED VS. DETERMINISTIC

Randomized algorithms for the approximation of matrices

A randomized algorithm for approximating the SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

14.2 QR Factorization with Column Pivoting

Randomized Algorithms for Matrix Computations

Lecture 5: Randomized methods for low-rank approximation

Randomized Numerical Linear Algebra: Review and Progresses

A fast randomized algorithm for overdetermined linear least-squares regression

Column Selection via Adaptive Sampling

Continuous analogues of matrix factorizations NASC seminar, 9th May 2014

Approximate Spectral Clustering via Randomized Sketching

Randomized algorithms for matrix computations and analysis of high dimensional data

Scientific Computing

ETNA Kent State University

arxiv: v1 [math.na] 29 Dec 2014

randomized block krylov methods for stronger and faster approximate svd

ETNA Kent State University

Sparse Features for PCA-Like Linear Regression

Lecture 9: Matrix approximation continued

Parallel Numerical Algorithms

arxiv: v1 [math.na] 23 Nov 2018

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Normalized power iterations for the computation of SVD

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices

Introduction to Numerical Linear Algebra II

Rank revealing factorizations, and low rank approximations

Enabling very large-scale matrix computations via randomization

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation

A Fast And Optimal Deterministic Algorithm For NP-Hard Antenna Selection Problem

Low Rank Matrix Approximation

Using Friendly Tail Bounds for Sums of Random Matrices

Fast Approximate Matrix Multiplication by Solving Linear Systems

Randomization: Making Very Large-Scale Linear Algebraic Computations Possible

Rank revealing factorizations, and low rank approximations

Random Methods for Linear Algebra

Rank Revealing QR factorization. F. Guyomarc h, D. Mezher and B. Philippe

Randomized methods for approximating matrices

Spectrum-Revealing Matrix Factorizations Theory and Algorithms

DEN: Linear algebra numerical view (GEM: Gauss elimination method for reducing a full rank matrix to upper-triangular

RandNLA: Randomization in Numerical Linear Algebra

Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting

Randomized algorithms in numerical linear algebra

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Gram-Schmidt Orthogonalization: 100 Years and More

Compressed Sensing and Robust Recovery of Low Rank Matrices

Why the QR Factorization can be more Accurate than the SVD

Provably Correct Algorithms for Matrix Column Subset Selection with Selectively Sampled Data

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

LU factorization with Panel Rank Revealing Pivoting and its Communication Avoiding version

Subspace sampling and relative-error matrix approximation

Randomly Sampling from Orthonormal Matrices: Coherence and Leverage Scores

Rank-Deficient Nonlinear Least Squares Problems and Subset Selection

Randomized algorithms for the low-rank approximation of matrices

Randomized Algorithms in Linear Algebra and Applications in Data Analysis

Matrix decompositions

Parallel Singular Value Decomposition. Jiaxing Tan

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

dimensionality reduction for k-means and low rank approximation

Technical Report No September 2009

c 2015 Society for Industrial and Applied Mathematics

Algorithm 853: an Efficient Algorithm for Solving Rank-Deficient Least Squares Problems

A determinantal point process for column subset selection

CUR Matrix Factorizations

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Numerical Methods in Matrix Computations

Lecture 9: Numerical Linear Algebra Primer (February 11st)

arxiv: v2 [cs.ds] 17 Feb 2016

Near-Optimal Coresets for Least-Squares Regression

Computing Rank-Revealing QR Factorizations of Dense Matrices

Dominant feature extraction

arxiv: v1 [math.na] 1 Sep 2018

to be more efficient on enormous scale, in a stream, or in distributed settings.

A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades

Pivoting. Reading: GV96 Section 3.4, Stew98 Chapter 3: 1.3

Orthonormal Transformations and Least Squares

RANDOMIZED METHODS FOR RANK-DEFICIENT LINEAR SYSTEMS

Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs

arxiv: v6 [cs.ds] 11 Jul 2012

Sparse Approximation of Signals with Highly Coherent Dictionaries

Randomized Algorithms for Very Large-Scale Linear Algebra

Low rank approximation of a sparse matrix based on LU factorization with column and row tournament pivoting

Column Subset Selection with Missing Data via Active Sampling

Orthogonalization and least squares methods

RandNLA: Randomization in Numerical Linear Algebra: Theory and Practice

Randomized Algorithms for Very Large-Scale Linear Algebra

Tighter Low-rank Approximation via Sampling the Leveraged Element

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem

Stein s Method for Matrix Concentration

Numerical Linear Algebra

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

arxiv: v1 [math.na] 7 Feb 2017

Basic Concepts in Linear Algebra

Recovering any low-rank matrix, provably

Dense LU factorization and its error analysis

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Transcription:

Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner

Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns A 1 Columns of A 1 are very linearly independent Redundant columns A 2 Columns of A 2 are well represented by A 1

Subset Selection Requirements Important columns A 1 Smallest singular value σ k (A 1 ) should be large Redundant columns A 2 min A 1 A 2 should be small (two norm)

Subset Selection Requirements Important columns A 1 Smallest singular value σ k (A 1 ) should be large σ k (A)/γ σ k (A 1 ) σ k (A) for some γ Redundant columns A 2 min A 1 A 2 should be small (two norm) for some γ σ k+1 (A) min A 1 A 2 γ σ k+1 (A)

Outline Deterministic algorithms: strong RRQR, SVD Randomized 2-phase algorithm Perturbation analysis of randomized algorithm Numerical experiments: randomized vs. deterministic New deterministic algorithm

Deterministic Subset Selection Businger & Golub (1965) QR with column pivoting Faddev, Kublanovskaya & Faddeeva (1968) Golub, Klema & Stewart (1976) Gragg & Stewart (1976) Stewart (1984) Foster (1986) T. Chan (1987) Hong & Pan (1992) Chandrasekaran & Ipsen (1994) Gu & Eisenstat (1996) Strong RRQR

First Deterministic Algorithm Rank revealing QR decomposition ( ) R11 R AP = Q 12 where 0 R 22 Q T Q = I Important columns ( ) R11 Q = A 0 1 and σ i (A 1 ) = σ i (R 11 ) 1 i k Redundant columns ( ) R12 Q = A 2 min A 1 A 2 = R 22 R 22

Strong RRQR (Gu & Eisenstat 1996) Input: Output: m n matrix A, m n, integer k ( ) R11 R AP = Q 12 R 0 R 11 is k k 22 R 11 is well conditioned σ i (A) 1 + k(n k) σ i (R 11 ) σ i (A) 1 i k R 22 is small σ k+j (A) σ j (R 22 ) 1 + k(n k) σ k+j (A) Offdiagonal block not too large ( ) R 1 11 R 12 1 ij

Strong RRQR Algorithm 1 Compute some QR decomposition with column pivoting ( ) R11 R AP initial = Q 12 0 R 22 2 Repeat «R11 Exchange a column of with a column of 0 Update permutations P, retriangularize until det(r 11 ) stops increasing 3 Output: AP final = ( A 1 }{{} k A }{{} 2 ) n k R12 R 22 «

Second Deterministic Algorithm Singular value decomposition AP = ( A 1 }{{} k A 2 ) = U }{{} n k ( Σ1 Σ 2 ) ( ) V11 V 12 V 21 V 22 Important columns A 1 σ i (A) V 1 11 σ i(a 1 ) σ i (A) for all 1 i k Redundant columns A 2 σ k+1 (A) min A 1 A 2 V 1 11 σ k+1(a) [Hong & Pan 1992]

Almost Strong RRQR Algorithm In the spirit of Golub, Klema and Stewart (1976) 1 Compute SVD ( Σ1 A = U Σ 2 ) ( ) V1 V 2 2 Apply strong RRQR to V 1 : V 1 P = (V }{{} 11 k V }{{} 12 ) n k 1 1 + k(n k) σ i (V 11 ) 1 1 i k 3 Output: AP = ( A }{{} 1 k A }{{} 2 ) n k

Almost Strong RRQR Produces permutation P so that AP = ( A 1 }{{} k A 2 ) }{{} n k where Important columns A 1 σ i (A) 1 + k(n k) σ i (A 1 ) σ i (A) 1 i k Redundant columns A 2 σ k+1 (A) min A 1 A 2 1 + k(n k) σ k+1 (A)

Deterministic Subset Selection Algorithms: strong RRQR, SVD Permuting columns of A corresponds to permuting right singular vector matrix V Perturbation bounds in terms of V σ k+1 (A) min A 1 A 2 V 1 11 σ k+1(a) Operation count for m n matrix, m n min A 1 A 2 in O ( mn 2 + n 3 log f n ) flops 1 + f 2 k(n k) σ k+1 (A)

Randomized Algorithms Frieze, Kannan & Vempala 1998, 2004 Drineas, Kannan & Mahoney 2006 Deshpande, Rademacher, Vempala & Wang 2006 Rudelson & Vershynin 2007 Liberty, Woolfe, Martinsson, Rokhlin & Tygert 2007 Drineas, Mahoney & Muthukrishnan 2006, 2008 Boutsidis, Mahoney & Drineas 2008, 2009 Civril & Magdon-Ismail 2009 Survey paper: Halko, Martinsson & Tropp 2009

2-Phase Randomized Algorithm Boutsidis, Mahoney & Drineas 2009 1 Randomized Phase: Sample small number ( k log k) of columns 2 Deterministic Phase: Apply rank revealing QR to sampled columns With 70% probability: Two norm min A 1 A 2 2 O Frobenius norm min A 1 A 2 F O ( k 3/4 log 1/2 k (n k) 1/4) Σ 2 2 ( ) k log 1/2 k Σ 2 F

Deterministic vs. Randomized Algorithms Want: permutation P so that AP = ( A }{{} 1 k Compute SVD ( ) ( ) Σ1 V1 A = U Σ 2 V 2 A 2 ) }{{} n k Obtain P from k dominant right singular vectors V 1

Deterministic vs. Randomized Algorithms Want: permutation P so that AP = ( A }{{} 1 k Compute SVD ( ) ( ) Σ1 V1 A = U Σ 2 V 2 A 2 ) }{{} n k Obtain P from k dominant right singular vectors V 1 Deterministic: Apply RRQR to all columns of matrix V 1 Randomized: Apply RRQR to subset of columns of scaled matrix V 1 D

2-Phase Randomized Algorithm 1 Compute SVD ( Σ1 A = U ) ( ) V1 Σ 2 V 2 2 Randomized phase: Scale: V 1 V 1 D Sample c columns: 3 Deterministic phase: (V 1 D) P s = (V 1s D s }{{} ĉ Apply RRQR to V 1s D s : (V 1s D s ) P d = (V 11 D 1 }{{} k 4 Output: AP s P d = ( A }{{} 1 k A }{{} 2 ) n k ) )

SVD : Perturbation Bounds ( Σ1 AP = U Σ 2 ) ( ) V11 V 12 V 21 V 22 V 11 is k k For deterministic algorithms or min A 1 A 2 Σ 2 /σ k (V 11 ) min A 1 A 2 Σ 2 + Σ 2 V 21 /σ k (V 11 )

SVD : Perturbation Bounds ( Σ1 AP = U Σ 2 ) ( ) V11 V 12 V 21 V 22 V 11 is k k For deterministic algorithms or min A 1 A 2 Σ 2 /σ k (V 11 ) min A 1 A 2 Σ 2 + Σ 2 V 21 /σ k (V 11 ) For randomized 2-phase algorithm min A 1 A 2 Σ 2 + Σ 2 V 21 D 1 /σ k (V 11 D 1 ) for any nonsingular matrix D 1

Perturbation Bounds for Randomized Algorithm D 1 is scaling matrix min A 1 A 2 Σ 2 + Σ 2 V 21 D 1 /σ k (V 11 D 1 ) V 11 D 1 comes from RRQR: (V 1s D s ) }{{} ĉ P d = (V 11 D 1 }{{} k ) min A 1 A 2 Σ 2 + 1 + k(ĉ k) Σ 2 V 21 D 1 /σ k (V 1s D s )

Perturbation Bounds for Randomized Algorithm D 1 is scaling matrix min A 1 A 2 Σ 2 + Σ 2 V 21 D 1 /σ k (V 11 D 1 ) V 11 D 1 comes from RRQR: (V 1s D s ) }{{} ĉ P d = (V 11 D 1 }{{} k ) min A 1 A 2 Σ 2 + 1 + k(ĉ k) Σ 2 V 21 D 1 /σ k (V 1s D s ) We need with high probability: Σ 2 V 21 D 1 Σ 2 σ k (V 1s D s ) 0

Probabilistic Bounds: Frobenius Norm Column i of X sampled with probability p i Scaling matrix D ii = 1/ p i with probability p i

Probabilistic Bounds: Frobenius Norm Column i of X sampled with probability p i Scaling matrix D ii = 1/ p i with probability p i Frobenius norm X D 2 F = trace(x D2 X T ) Linearity E [ X D 2 ] F = trace(x E [D 2] X T ) }{{} I Scaling E [ D 2 ii] = pi 1 p i + (1 p i ) 0 = 1 Expected value E [ X D 2 F] = X 2 F

Probabilistic Bounds: Frobenius Norm Column i of X sampled with probability p i Scaling matrix D ii = 1/ p i with probability p i Frobenius norm X D 2 F = trace(x D2 X T ) Linearity E [ X D 2 ] F = trace(x E [D 2] X T ) }{{} I Scaling E [ D 2 ii] = pi 1 p i + (1 p i ) 0 = 1 Expected value E [ X D 2 F] = X 2 F Markov s inequality [ ] Prob X D 2 F α X 2 F 1 1 α

Randomized Subset Selection: Frobenius Norm Perturbation bound min A 1 A 2 F Σ 2 F + 1 + k(ĉ k) Σ 2 V 21 D 1 F /σ k (V 1s D s ) With probability 1 1 α Σ 2 V 21 D 1 F α Σ 2 V 2 F = α Σ 2 F Holds for any probability distribution

Randomized Subset Selection: Frobenius Norm Perturbation bound min A 1 A 2 F Σ 2 F + 1 + k(ĉ k) Σ 2 V 21 D 1 F /σ k (V 1s D s ) With probability 1 1 α Σ 2 V 21 D 1 F α Σ 2 V 2 F = α Σ 2 F Holds for any probability distribution Still to show: σ k (V 1s D s ) 0 with high probability

Sampling: Frobenius Norm When is σ k (V 1s D s ) 0 with high probability? Expected number of sampled columns: c Column i of V 1 sampled with probability p i = min{1, c q i } where q i = (V 1 ) i 2 2 /k Try to sample columns with large norm Scaling matrix D = ( 1/ p 1... 1/ p n ) If c = Θ (k log k) then with probability.9 σ k (V 1s D s ) 1/2 [Boutsidis, Mahoney & Drineas 2009]

Sampling: Frobenius Norm How many columns should V 1s actually have? Expected number of sampled columns from V 1 : c Actual number of columns in V 1s : ĉ E[ĉ] c

Sampling: Frobenius Norm How many columns should V 1s actually have? Expected number of sampled columns from V 1 : c Actual number of columns in V 1s : ĉ E[ĉ] c If c q i 1 for all i, and ĉ k then ĉ σ k (V 1s D s ) c If ĉ < c/4 then σ k (V 1s D s ) < 1/2 Make sure that enough columns are actually sampled

Randomized Subset Selection: Two Norm min A 1 A 2 2 Σ 2 2 + 1 + k(ĉ k) Σ 2 V 21 D 1 2 /σ k (V 1s D s ) Probability distribution p i = min{1, c q i } q i = 1 2 (V 1 ) i 2 2 k ( (Σ2 V 2 ) i 2 + 1 2 Σ 2 V 2 F ) 2 With probability.9 Σ 2 V 21 D 1 2 γ ( ) (n k + 1) log c 1/4 Σ 2 2 c [Boutsidis, Mahoney & Drineas 2009]

Numerical Experiments Compare strong RRQR and randomized 2-phase algorithm Subset selection for 2 norm Matrix orders n 500, 2000 0 k 240 Matrices: Kahan, random, scaled random, triangular numerical rank k Randomized algorithm: run 40 times Iterative determination of c c = 2k while σ k (V 1s D s ) < 1/2 do c = 2 c

Accuracy Residuals min A 1 A 2 2 for n = 500 and k = 20 RRQR Randomized Matrix max min mean Kahan 7 10 0 2 10 0 5 10 1 5 10 1 num. rank k 7 10 0 2 10 1 8 10 0 1 10 1 triangular 3 10 0 1 10 0 1 10 0 1 10 0 random 3 10 1 3 10 1 3 10 1 3 10 1 scaled rand 4 10 1 5 10 1 4 10 1 5 10 1 No significant difference in accuracy between deterministic and randomized algorithms

Number of Sampled Columns Values of c for n = 500 and k = 20 Matrix c values tried most frequent mean ĉ Kahan 40, 80, 160, 320, 640 2k = 40 56 num. rank k 40, 80, 160 4k = 80 83 triangular 40, 80, 160 4k = 80 97 random 40, 80, 160 4k = 80 93 scaled rand 40, 80, 160 4k = 80 97 c = 4k seems to be a good value

Different Probability Distributions Two norm q i = 1 2 (V 1 ) i 2 2 k + 1 2 ( (Σ2 V 2 ) i 2 Σ 2 V 2 F ) 2 expensive q i = 1 2 (V 1 ) i 2 2 k + 1 2 numerically unstable (can be negative) Frobenius norm q i = (V 1) i 2 2 k numerically stable and cheap (A) i 2 2 (AVT 1 V 1) i 2 2 A 2 F AVT 1 V 1 2 F

Different Probability Distributions Residuals min A 1 A 2 2 for n = 500 and k = 20 Matrix P max min mean ĉ num. rank k 2 2 10 1 8 10 0 1 10 1 83 F 1 10 1 7 10 0 1 10 1 88 triangular 2 1 10 0 1 10 0 1 10 0 97 F 1 10 0 1 10 0 8 10 0 91 random 2 3 10 1 3 10 1 3 10 1 93 F 3 10 1 3 10 1 3 10 1 87 scaled rand 2 5 10 1 4 10 1 5 10 1 97 F 5 10 1 4 10 1 5 10 1 93 Kahan 2 2 10 0 5 10 1 5 10 1 56 F 7 10 1 5 10 1 5 10 1 42 No difference in accuracy between 2 norm and Frobenius norm probability distributions

Ideas from Randomized Algorithm ( Σ1 AP = U ) ( ) V1 Σ 2 V 2 V 1 = (V 11 }{{} k V 12 ) }{{} n k Order columns of V 1 in order of decreasing norms Apply strong RRQR to columns of largest norm Intuition V 11 2 F = k V 12 2 F This means: V 11 F large implies V 12 F small σ k (V 11 ) 2 = 1 V 12 2 2 This means: V 12 2 small implies σ k (V 11 ) large

New 2-Phase Deterministic Algorithm 1 Compute SVD ( Σ1 A = U ) ( ) V1 Σ 2 V 2 2 Ordering phase: Order according to decreasing column norms Select leading 4k columns: 3 Rank revealing phase: V 1 V 1 P O A P O = ( A S }{{} 4k Apply strong RRQR to A S : A S P R = ( A 1 }{{} k ) )

Results for New Deterministic Algorithm n = 2000 and k = 40 Residuals min A 1 A 2 2 Time ratio TR = time(new algorithm)/time(rrqr) Matrix Residuals σ k (A 1 ) TR RRQR new RRQR new Kahan 4 10 0 4 10 0 8 10 2 8 10 2 0.11 random 9 10 1 9 10 1 1 10 1 1 10 1 0.03 s. rand 1 10 2 1 10 2 2 10 1 2 10 1 0.07 triang 4 10 0 3 10 1 3 10 1 3 10 1 0.02 New algorithm appears to be as accurate as RRQR and possibly faster

Subset selection Summary Given: real or complex matrix A, integer k Want: AP = ( A }{{} 1 A 2 ) with k σ k (A 1 ) σ k (A) min A 1 A 2 σ k+1 (A) Deterministic algorithms: strong RRQR, SVD Randomized 2-phase algorithm Randomized algorithm: no more accurate than strong RRQR for matrices of order 2000 Numerical issues with randomized algorithm New deterministic algorithm: As accurate as strong RRQR and perhaps faster