Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Size: px

Start display at page:

Download "Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale"

Emery Woods
6 years ago
Views:

1 Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner

2 Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns A 1 Columns of A 1 are very linearly independent Redundant columns A 2 Columns of A 2 are well represented by A 1

3 Subset Selection Requirements Important columns A 1 Smallest singular value σ k (A 1 ) should be large Redundant columns A 2 min A 1 A 2 should be small (two norm)

4 Subset Selection Requirements Important columns A 1 Smallest singular value σ k (A 1 ) should be large σ k (A)/γ σ k (A 1 ) σ k (A) for some γ Redundant columns A 2 min A 1 A 2 should be small (two norm) for some γ σ k+1 (A) min A 1 A 2 γ σ k+1 (A)

5 Outline Deterministic algorithms: strong RRQR, SVD Randomized 2-phase algorithm Perturbation analysis of randomized algorithm Numerical experiments: randomized vs. deterministic New deterministic algorithm

6 Deterministic Subset Selection Businger & Golub (1965) QR with column pivoting Faddev, Kublanovskaya & Faddeeva (1968) Golub, Klema & Stewart (1976) Gragg & Stewart (1976) Stewart (1984) Foster (1986) T. Chan (1987) Hong & Pan (1992) Chandrasekaran & Ipsen (1994) Gu & Eisenstat (1996) Strong RRQR

7 First Deterministic Algorithm Rank revealing QR decomposition ( ) R11 R AP = Q 12 where 0 R 22 Q T Q = I Important columns ( ) R11 Q = A 0 1 and σ i (A 1 ) = σ i (R 11 ) 1 i k Redundant columns ( ) R12 Q = A 2 min A 1 A 2 = R 22 R 22

8 Strong RRQR (Gu & Eisenstat 1996) Input: Output: m n matrix A, m n, integer k ( ) R11 R AP = Q 12 R 0 R 11 is k k 22 R 11 is well conditioned σ i (A) 1 + k(n k) σ i (R 11 ) σ i (A) 1 i k R 22 is small σ k+j (A) σ j (R 22 ) 1 + k(n k) σ k+j (A) Offdiagonal block not too large ( ) R 1 11 R 12 1 ij

9 Strong RRQR Algorithm 1 Compute some QR decomposition with column pivoting ( ) R11 R AP initial = Q 12 0 R 22 2 Repeat «R11 Exchange a column of with a column of 0 Update permutations P, retriangularize until det(r 11 ) stops increasing 3 Output: AP final = ( A 1 }{{} k A }{{} 2 ) n k R12 R 22 «

10 Second Deterministic Algorithm Singular value decomposition AP = ( A 1 }{{} k A 2 ) = U }{{} n k ( Σ1 Σ 2 ) ( ) V11 V 12 V 21 V 22 Important columns A 1 σ i (A) V 1 11 σ i(a 1 ) σ i (A) for all 1 i k Redundant columns A 2 σ k+1 (A) min A 1 A 2 V 1 11 σ k+1(a) [Hong & Pan 1992]

11 Almost Strong RRQR Algorithm In the spirit of Golub, Klema and Stewart (1976) 1 Compute SVD ( Σ1 A = U Σ 2 ) ( ) V1 V 2 2 Apply strong RRQR to V 1 : V 1 P = (V }{{} 11 k V }{{} 12 ) n k k(n k) σ i (V 11 ) 1 1 i k 3 Output: AP = ( A }{{} 1 k A }{{} 2 ) n k

12 Almost Strong RRQR Produces permutation P so that AP = ( A 1 }{{} k A 2 ) }{{} n k where Important columns A 1 σ i (A) 1 + k(n k) σ i (A 1 ) σ i (A) 1 i k Redundant columns A 2 σ k+1 (A) min A 1 A k(n k) σ k+1 (A)

13 Deterministic Subset Selection Algorithms: strong RRQR, SVD Permuting columns of A corresponds to permuting right singular vector matrix V Perturbation bounds in terms of V σ k+1 (A) min A 1 A 2 V 1 11 σ k+1(a) Operation count for m n matrix, m n min A 1 A 2 in O ( mn 2 + n 3 log f n ) flops 1 + f 2 k(n k) σ k+1 (A)

14 Randomized Algorithms Frieze, Kannan & Vempala 1998, 2004 Drineas, Kannan & Mahoney 2006 Deshpande, Rademacher, Vempala & Wang 2006 Rudelson & Vershynin 2007 Liberty, Woolfe, Martinsson, Rokhlin & Tygert 2007 Drineas, Mahoney & Muthukrishnan 2006, 2008 Boutsidis, Mahoney & Drineas 2008, 2009 Civril & Magdon-Ismail 2009 Survey paper: Halko, Martinsson & Tropp 2009

15 2-Phase Randomized Algorithm Boutsidis, Mahoney & Drineas Randomized Phase: Sample small number ( k log k) of columns 2 Deterministic Phase: Apply rank revealing QR to sampled columns With 70% probability: Two norm min A 1 A 2 2 O Frobenius norm min A 1 A 2 F O ( k 3/4 log 1/2 k (n k) 1/4) Σ 2 2 ( ) k log 1/2 k Σ 2 F

16 Deterministic vs. Randomized Algorithms Want: permutation P so that AP = ( A }{{} 1 k Compute SVD ( ) ( ) Σ1 V1 A = U Σ 2 V 2 A 2 ) }{{} n k Obtain P from k dominant right singular vectors V 1

17 Deterministic vs. Randomized Algorithms Want: permutation P so that AP = ( A }{{} 1 k Compute SVD ( ) ( ) Σ1 V1 A = U Σ 2 V 2 A 2 ) }{{} n k Obtain P from k dominant right singular vectors V 1 Deterministic: Apply RRQR to all columns of matrix V 1 Randomized: Apply RRQR to subset of columns of scaled matrix V 1 D

18 2-Phase Randomized Algorithm 1 Compute SVD ( Σ1 A = U ) ( ) V1 Σ 2 V 2 2 Randomized phase: Scale: V 1 V 1 D Sample c columns: 3 Deterministic phase: (V 1 D) P s = (V 1s D s }{{} ĉ Apply RRQR to V 1s D s : (V 1s D s ) P d = (V 11 D 1 }{{} k 4 Output: AP s P d = ( A }{{} 1 k A }{{} 2 ) n k ) )

19 SVD : Perturbation Bounds ( Σ1 AP = U Σ 2 ) ( ) V11 V 12 V 21 V 22 V 11 is k k For deterministic algorithms or min A 1 A 2 Σ 2 /σ k (V 11 ) min A 1 A 2 Σ 2 + Σ 2 V 21 /σ k (V 11 )

20 SVD : Perturbation Bounds ( Σ1 AP = U Σ 2 ) ( ) V11 V 12 V 21 V 22 V 11 is k k For deterministic algorithms or min A 1 A 2 Σ 2 /σ k (V 11 ) min A 1 A 2 Σ 2 + Σ 2 V 21 /σ k (V 11 ) For randomized 2-phase algorithm min A 1 A 2 Σ 2 + Σ 2 V 21 D 1 /σ k (V 11 D 1 ) for any nonsingular matrix D 1

21 Perturbation Bounds for Randomized Algorithm D 1 is scaling matrix min A 1 A 2 Σ 2 + Σ 2 V 21 D 1 /σ k (V 11 D 1 ) V 11 D 1 comes from RRQR: (V 1s D s ) }{{} ĉ P d = (V 11 D 1 }{{} k ) min A 1 A 2 Σ k(ĉ k) Σ 2 V 21 D 1 /σ k (V 1s D s )

22 Perturbation Bounds for Randomized Algorithm D 1 is scaling matrix min A 1 A 2 Σ 2 + Σ 2 V 21 D 1 /σ k (V 11 D 1 ) V 11 D 1 comes from RRQR: (V 1s D s ) }{{} ĉ P d = (V 11 D 1 }{{} k ) min A 1 A 2 Σ k(ĉ k) Σ 2 V 21 D 1 /σ k (V 1s D s ) We need with high probability: Σ 2 V 21 D 1 Σ 2 σ k (V 1s D s ) 0

23 Probabilistic Bounds: Frobenius Norm Column i of X sampled with probability p i Scaling matrix D ii = 1/ p i with probability p i

24 Probabilistic Bounds: Frobenius Norm Column i of X sampled with probability p i Scaling matrix D ii = 1/ p i with probability p i Frobenius norm X D 2 F = trace(x D2 X T ) Linearity E [ X D 2 ] F = trace(x E [D 2] X T ) }{{} I Scaling E [ D 2 ii] = pi 1 p i + (1 p i ) 0 = 1 Expected value E [ X D 2 F] = X 2 F

25 Probabilistic Bounds: Frobenius Norm Column i of X sampled with probability p i Scaling matrix D ii = 1/ p i with probability p i Frobenius norm X D 2 F = trace(x D2 X T ) Linearity E [ X D 2 ] F = trace(x E [D 2] X T ) }{{} I Scaling E [ D 2 ii] = pi 1 p i + (1 p i ) 0 = 1 Expected value E [ X D 2 F] = X 2 F Markov s inequality [ ] Prob X D 2 F α X 2 F 1 1 α

26 Randomized Subset Selection: Frobenius Norm Perturbation bound min A 1 A 2 F Σ 2 F k(ĉ k) Σ 2 V 21 D 1 F /σ k (V 1s D s ) With probability 1 1 α Σ 2 V 21 D 1 F α Σ 2 V 2 F = α Σ 2 F Holds for any probability distribution

27 Randomized Subset Selection: Frobenius Norm Perturbation bound min A 1 A 2 F Σ 2 F k(ĉ k) Σ 2 V 21 D 1 F /σ k (V 1s D s ) With probability 1 1 α Σ 2 V 21 D 1 F α Σ 2 V 2 F = α Σ 2 F Holds for any probability distribution Still to show: σ k (V 1s D s ) 0 with high probability

28 Sampling: Frobenius Norm When is σ k (V 1s D s ) 0 with high probability? Expected number of sampled columns: c Column i of V 1 sampled with probability p i = min{1, c q i } where q i = (V 1 ) i 2 2 /k Try to sample columns with large norm Scaling matrix D = ( 1/ p / p n ) If c = Θ (k log k) then with probability.9 σ k (V 1s D s ) 1/2 [Boutsidis, Mahoney & Drineas 2009]

29 Sampling: Frobenius Norm How many columns should V 1s actually have? Expected number of sampled columns from V 1 : c Actual number of columns in V 1s : ĉ E[ĉ] c

30 Sampling: Frobenius Norm How many columns should V 1s actually have? Expected number of sampled columns from V 1 : c Actual number of columns in V 1s : ĉ E[ĉ] c If c q i 1 for all i, and ĉ k then ĉ σ k (V 1s D s ) c If ĉ < c/4 then σ k (V 1s D s ) < 1/2 Make sure that enough columns are actually sampled

31 Randomized Subset Selection: Two Norm min A 1 A 2 2 Σ k(ĉ k) Σ 2 V 21 D 1 2 /σ k (V 1s D s ) Probability distribution p i = min{1, c q i } q i = 1 2 (V 1 ) i 2 2 k ( (Σ2 V 2 ) i Σ 2 V 2 F ) 2 With probability.9 Σ 2 V 21 D 1 2 γ ( ) (n k + 1) log c 1/4 Σ 2 2 c [Boutsidis, Mahoney & Drineas 2009]

32 Numerical Experiments Compare strong RRQR and randomized 2-phase algorithm Subset selection for 2 norm Matrix orders n 500, k 240 Matrices: Kahan, random, scaled random, triangular numerical rank k Randomized algorithm: run 40 times Iterative determination of c c = 2k while σ k (V 1s D s ) < 1/2 do c = 2 c

33 Accuracy Residuals min A 1 A 2 2 for n = 500 and k = 20 RRQR Randomized Matrix max min mean Kahan num. rank k triangular random scaled rand No significant difference in accuracy between deterministic and randomized algorithms

34 Number of Sampled Columns Values of c for n = 500 and k = 20 Matrix c values tried most frequent mean ĉ Kahan 40, 80, 160, 320, 640 2k = num. rank k 40, 80, 160 4k = triangular 40, 80, 160 4k = random 40, 80, 160 4k = scaled rand 40, 80, 160 4k = c = 4k seems to be a good value

35 Different Probability Distributions Two norm q i = 1 2 (V 1 ) i 2 2 k ( (Σ2 V 2 ) i 2 Σ 2 V 2 F ) 2 expensive q i = 1 2 (V 1 ) i 2 2 k numerically unstable (can be negative) Frobenius norm q i = (V 1) i 2 2 k numerically stable and cheap (A) i 2 2 (AVT 1 V 1) i 2 2 A 2 F AVT 1 V 1 2 F

36 Different Probability Distributions Residuals min A 1 A 2 2 for n = 500 and k = 20 Matrix P max min mean ĉ num. rank k F triangular F random F scaled rand F Kahan F No difference in accuracy between 2 norm and Frobenius norm probability distributions

37 Ideas from Randomized Algorithm ( Σ1 AP = U ) ( ) V1 Σ 2 V 2 V 1 = (V 11 }{{} k V 12 ) }{{} n k Order columns of V 1 in order of decreasing norms Apply strong RRQR to columns of largest norm Intuition V 11 2 F = k V 12 2 F This means: V 11 F large implies V 12 F small σ k (V 11 ) 2 = 1 V This means: V 12 2 small implies σ k (V 11 ) large

38 New 2-Phase Deterministic Algorithm 1 Compute SVD ( Σ1 A = U ) ( ) V1 Σ 2 V 2 2 Ordering phase: Order according to decreasing column norms Select leading 4k columns: 3 Rank revealing phase: V 1 V 1 P O A P O = ( A S }{{} 4k Apply strong RRQR to A S : A S P R = ( A 1 }{{} k ) )

39 Results for New Deterministic Algorithm n = 2000 and k = 40 Residuals min A 1 A 2 2 Time ratio TR = time(new algorithm)/time(rrqr) Matrix Residuals σ k (A 1 ) TR RRQR new RRQR new Kahan random s. rand triang New algorithm appears to be as accurate as RRQR and possibly faster

40 Subset selection Summary Given: real or complex matrix A, integer k Want: AP = ( A }{{} 1 A 2 ) with k σ k (A 1 ) σ k (A) min A 1 A 2 σ k+1 (A) Deterministic algorithms: strong RRQR, SVD Randomized 2-phase algorithm Randomized algorithm: no more accurate than strong RRQR for matrices of order 2000 Numerical issues with randomized algorithm New deterministic algorithm: As accurate as strong RRQR and perhaps faster

Subset Selection. Ilse Ipsen. North Carolina State University, USA

Subset Selection. Ilse Ipsen. North Carolina State University, USA Subset Selection Ilse Ipsen North Carolina State University, USA Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns