Rank revealing factorizations, and low rank approximations

Size: px

Start display at page:

Download "Rank revealing factorizations, and low rank approximations"

Scott Cross
5 years ago
Views:

1 Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC February 2017

2 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization with column and row tournament pivoting Experimental results, LU CRTP Randomized algorithms for low rank approximation 2 of 73

3 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization with column and row tournament pivoting Experimental results, LU CRTP Randomized algorithms for low rank approximation 3 of 73

4 Low rank matrix approximation Problem: given m n matrix A, compute rank-k approximation ZW T, where Z is m k and W T is k n. Problem with diverse applications from scientific computing: fast solvers for integral equations, H-matrices to data analytics: principal component analysis, image processing,... Flops Ax ZW T x 2mn 2(m + n)k 4 of 73

5 Singular value decomposition Given A R m n, m n its singular value decomposition is A = UΣV T = ( ) Σ 1 0 U 1 U 2 U 3 0 Σ 2 (V ) T 1 V where U is m m orthogonal matrix, the left singular vectors of A, U 1 is m k, U 2 is m n k, U 3 is m m n Σ is m n, its diagonal is formed by σ 1 (A)... σ n (A) 0 Σ 1 is k k, Σ 2 is n k n k V is n n orthogonal matrix, the right singular vectors of A, V 1 is n k, V 2 is n n k 5 of 73

6 Norms A F = m i=1 j=1 n a ij 2 = A 2 = σ max (A) = σ 1 (A) σ 2 1 (A) +... σ2 n(a) Some properties: A 2 A F min(m, n) A 2 Orthogonal Invariance: If Q R m m and Z R n n are orthogonal, then QAZ F = A F QAZ 2 = A 2 6 of 73

7 Low rank matrix approximation Best rank-k approximation A k = U k Σ k V k is rank-k truncated SVD of A [Eckart and Young, 1936] Original image of size min A Ãk 2 = A A k 2 = σ k+1 (A) (1) rank(ãk ) k n min A Ã k F = A A k F = σj 2 (A) (2) rank(ã k ) k j=k+1 Rank-38 approximation, SVD Rank-75 approximation, SVD Image source: https: //upload.wikimedia.org/wikipedia/commons/a/a1/alan_turing_aged_16.jpg 7 of 73

8 Large data sets Matrix A might not exist entirely at a given time, rows or columns are added progressively. Streaming algorithm: can solve an arbitrarily large problem with one pass over the data (a row or a column at a time). Weakly streaming algorithm: can solve a problem with O(1) passes over the data. Matrix A might exist only implicitly, and it is never formed explicitly. 8 of 73

9 Low rank matrix approximation: trade-offs 9 of 73

10 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization with column and row tournament pivoting Experimental results, LU CRTP Randomized algorithms for low rank approximation 10 of 73

11 Rank revealing QR factorization Given A of size m n, consider the decomposition [ R11 R AP c = QR = Q 12 R 22 ], (3) where R 11 is k k, P c and k are chosen such that R 22 2 is small and R 11 is well-conditioned. By the interlacing property of singular values [Golub, Van Loan, 4th edition, page 487], σ i (R 11 ) σ i (A) and σ j (R 22 ) σ k+j (A) for 1 i k and 1 j n k. σ k+1 (A) σ max (R 22 ) = R of 73

12 Rank revealing QR factorization Given A of size m n, consider the decomposition [ R11 R AP c = QR = Q 12 R 22 If R 22 2 is small, ]. (4) Q(:, 1 : k) forms an approximate orthogonal basis for the range of A, P c [ R 1 11 R 12 I a(:, j) = min(j,k) r ij Q(:, i) i=1 ran(a) span{q(:, 1),... Q(:, k)} span{q(:, 1),... Q(:, k)} ] is an approximate right null space of A. 12 of 73

13 Rank revealing QR factorization The factorization from equation (4) is rank revealing if 1 σ i(a) σ i (R 11 ), σ j(r 22 ) σ k+j (A) q 1(n, k), for 1 i k and 1 j min(m, n) k, where σ max (A) = σ 1 (A)... σ min (A) = σ n (A) It is strong rank revealing [Gu and Eisenstat, 1996] if in addition R 1 11 R 12 max q 2 (n, k) Gu and Eisenstat show that given k and f, there exists a P c such that q 1 (n, k) = 1 + f 2 k(n k) and q 2 (n, k) = f. Factorization computed in 4mnk (QRCP) plus O(mnk) flops. 13 of 73

14 QR with column pivoting [Businger and Golub, 1965] Idea: At first iteration, trailing columns decomposed into parallel part to first column (or e 1 ) and orthogonal part (in rows 2 : m). The column of maximum norm is the column with largest component orthogonal to the first column. Implementation: Find at each step of the QR factorization the column of maximum norm. Permute it into leading position. If rank(a) = k, at step k + 1 the maximum norm is 0. No need to compute the column norms at each step, but just update them since Q T v = w = [ w 1 w(2 : n) ], w(2 : n) 2 2 = v 2 2 w of 73

15 QR with column pivoting [Businger and Golub, 1965] Sketch of the algorithm column norm vector: colnrm(j) = A(:, j) 2, j = 1 : n. for j = 1 : n do Find column p of largest norm if colnrm[p] > ɛ then 1. Pivot: swap columns j and p in A and modify colnrm. 2. Compute Householder matrix H j s.t. H j A(j : m, j) = ± A(j : m, j) 2 e Update A(j : m, j + 1 : n) = H j A(j : m, j + 1 : n). 4. Norm downdate colnrm(j + 1 : n) 2 = A(j, j + 1 : n) 2. else Break end if end for If algorithm stops after k steps σ max (R 22 ) n k max R 22(:, j) 2 n kɛ 1 j n k 15 of 73

16 Strong RRQR [Gu and Eisenstat, 1996] Since k n k det(r 11 ) = σ i (R 11 ) = det(a T A)/ σ i (R 22 ) i=1 i=1 a stron RRQR is related to a large det(r 11 ). The following algorithm interchanges columns that increase det(r 11 ), given f and k. Compute a strong RRQR factorization, given k: Compute AΠ = QR by using QRCP while there exist i and j such that det( R 11 )/det(r 11 ) > f, where R 11 = R(1 : k, 1 : k), Π i,j+k permutes columns i and j + k, RΠ i,j+k = Q R, R 11 = R(1 : k, 1 : k) do Find i and j Compute RΠ i,j+k = Q R and Π = ΠΠ i,j+k end while 16 of 73

17 Strong RRQR (contd) It can be shown that det( R 11 ) det(r 11 ) = (R 1 11 R 12) 2 i,j + ω2 i (R 11 ) χ 2 j (R 22) (5) for any 1 i k and 1 j n k (the 2-norm of the j-th column of A is χ j (A), and the 2-norm of the j-th row of A 1 is ω j (A) ). Compute a strong RRQR factorization, given k: Compute AΠ = QR by using QRCP (R 1 while max 1 i k,1 j n k 11 R 2 12) i,j + ω2 i (R 11 ) χ 2 j (R 22) > f do (R 1 Find i and j such that 11 R 2 12) + i,j ω2 i (R 11 ) χ 2 j (R 22) > f Compute RΠ i,j+k = Q R and Π = ΠΠ i,j+k end while 17 of 73

18 Strong RRQR (contd) det(r 11 ) strictly increases with every permutation, no permutation repeats, hence there is a finite number of permutations to be performed. 18 of 73

19 Strong RRQR (contd) Theorem [Gu and Eisenstat, 1996] If the QR factorization with column pivoting as in equation (4) satisfies inequality (R 1 11 R 12) 2 i,j + ω2 i (R 11 ) χ 2 j (R 22) < f for any 1 i k and 1 j n k, then 1 σ i(a) σ i (R 11 ), σ j(r 22 ) σ k+j (A) 1 + f 2 k(n k), for any 1 i k and 1 j min(m, n) k. 19 of 73

20 Sketch of the proof ([Gu and Eisenstat, 1996]) Assume A is full column rank. Let α = σ max (R 22 )/σ min (R 11 ), and let [ [ R11 Ik R R = R22/α] 1 11 R ] 12 = R αi 1 W 1. n k We have σ i (R) σ i ( R 1 ) W 1 2, 1 i n. Since σ min (R 11 ) = σ max (R 22 /α), then σ i ( R 1 ) = σ i (R 11 ), for 1 i k. W R 1 11 R α 2 = 1 + R 1 11 R R R R 1 11 R 12 2 F + R 22 2 F R F = 1 + k n k i=1 j=1 ( (R 1 11 R 12) 2 i,j + ω 2 i (R 11 ) χ 2 j (R 22 ) ) 1 + f 2 k(n k) We obtain, σ i (A) σ i (R 11 ) 1 + f 2 k(n k) 20 of 73

21 Tournament pivoting [Demmel et al., 2015] One step of CA RRQR, tournament pivoting used to select k columns Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Permute A ji in leading positions, compute QR with no pivoting ( ) R11 AP c1 = Q 1 21 of 73

22 Tournament pivoting [Demmel et al., 2015] One step of CA RRQR, tournament pivoting used to select k columns Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Permute A ji in leading positions, compute QR with no pivoting ( ) R11 AP c1 = Q 1 21 of 73

23 Tournament pivoting [Demmel et al., 2015] One step of CA RRQR, tournament pivoting used to select k columns Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Permute A ji in leading positions, compute QR with no pivoting ( ) R11 AP c1 = Q 1 21 of 73

24 Tournament pivoting [Demmel et al., 2015] One step of CA RRQR, tournament pivoting used to select k columns Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Permute A ji in leading positions, compute QR with no pivoting ( ) R11 AP c1 = Q 1 21 of 73

25 Tournament pivoting [Demmel et al., 2015] One step of CA RRQR, tournament pivoting used to select k columns Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Permute A ji in leading positions, compute QR with no pivoting ( ) R11 AP c1 = Q 1 21 of 73

26 Tournament pivoting [Demmel et al., 2015] One step of CA RRQR, tournament pivoting used to select k columns Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Permute A ji in leading positions, compute QR with no pivoting ( ) R11 AP c1 = Q 1 21 of 73

27 Select k columns from a tall and skinny matrix Given W of size m 2k, m >> k, k columns are selected as: W = QR 02 using TSQR R 02 P c = Q 2 R 2 using QRCP Return WP c (:, 1 : k) 22 of 73

28 Reduction trees Any shape of reduction tree can be used during CA RRQR, depending on the underlying architecture. Binary tree: A 00 A 10 A 20 A 30 f (A 00 ) f (A 10 ) f (A 20 ) f (A 30 ) f (A 01 ) f (A 11 ) f (A 02 ) Flat tree: A 00 A 10 A 20 A 30 f (A 00 ) f (A 01 ) f (A 02 ) f (A 03 ) Notation: at each node of the reduction tree, f (A ij ) returns the first b columns obtained after performing (strong) RRQR of A ij. 23 of 73

29 Rank revealing properties of CA-RRQR It is shown in [Demmel et al., 2015] that the column permutation computed by CA-RRQR satisfies χ 2 ( j R 1 11 R 12) + (χj (R 22 ) /σ min (R 11 )) 2 FTP, 2 for j = 1,..., n k. (6) where F TP depends on k, f, n, the shape of reduction tree used during tournament pivoting, and the number of iterations of CARRQR. 24 of 73

30 CA-RRQR - bounds for one tournament Selecting k columns by using tournament pivoting reveals the rank of A with the following bounds: 1 σ i(a) σ i (R 11 ), σ j(r 22 ) σ k+j (A) 1 + FTP 2 (n k), R 1 11 R 12 max F TP Binary tree of depth log 2 (n/k), F TP 1 2k (n/k) log 2( 2fk). (7) The upper bound is a decreasing function of k when k > Flat tree of depth n/k, n/( 2f ). F TP 1 2k ( 2fk ) n/k. (8) 25 of 73

31 Cost of CA-RRQR Cost of CA-RRQR vs QR with column pivoting n n matrix on P P processor grid, block size k Flops : 4n 3 /P + O(n 2 klogp/ P) vs (4/3)n 3 /P Bandwidth : O(n 2 log P/ P) vs same Latency : O(n log P/k) vs O(n log P) Communication optimal, modulo polylogarithmic factors, by choosing k = 1 n 2log 2 P P 26 of 73

32 Numerical results Stability close to QRCP for many tested matrices. Absolute value of diagonals of R, L referred to as R-values, L-values. Methods compared RRQR: QR with column pivoting CA-RRQR-B with tournament pivoting based on binary tree CA-RRQR-F with tournament pivoting based on flat tree SVD 27 of 73

33 Numerical results - devil s stairs Devil s stairs (Stewart), a matrix with multiple gaps in the singular values. Matlab code: Length = 20; s = zeros(n,1); Nst = floor(n/length); for i = 1 : Nst do s(1+length*(i-1):length*i) = -0.6*(i-1); end for s(length Nst : end) = 0.6 (Nst 1); s = 10. s; A = orth(rand(n)) * diag(s) * orth(randn(n)); QLP decomposition (Stewart) AP c1 = Q 1 R 1 using ca rrqr R T 1 = Q 2 R x QRCP CARRQR B 9 CARRQR F SVD 8 R values & singular values Trustworthiness check 2 1 i Column No. 28 of 73

34 Numerical results - devil s stairs Devil s stairs (Stewart), a matrix with multiple gaps in the singular values. Matlab code: Length = 20; s = zeros(n,1); Nst = floor(n/length); for i = 1 : Nst do s(1+length*(i-1):length*i) = -0.6*(i-1); end for s(length Nst : end) = 0.6 (Nst 1); s = 10. s; A = orth(rand(n)) * diag(s) * orth(randn(n)); QLP decomposition (Stewart) AP c1 = Q 1 R 1 using ca rrqr R T 1 = Q 2 R x QRCP CARRQR B 9 CARRQR F SVD QRCP + QLP CARRQR B + QLP CARRQR F + QLP SVD 7 R values & singular values Trustworthiness check L values & singular values i Column No Column No. i 28 of 73

35 Numerical results (contd) R values, singular values & trustworthiness check QRCP CARRQR B CARRQR F SVD R values, singular values & trustworthiness check QRCP CARRQR B CARRQR F SVD Column No. i Column No. i Left: exponent - exponential Distribution, σ 1 = 1, σ i = α i 1 (i = 2,..., n), α = 10 1/11 [Bischof, 1991] Right: shaw - 1D image restoration model [Hansen, 2007] ɛ min{ (AΠ 0 )(:, i) 2, (AΠ 1 )(:, i) 2, (AΠ 2 )(:, i) 2 } (9) ɛ max{ (AΠ 0 )(:, i) 2, (AΠ 1 )(:, i) 2, (AΠ 2 )(:, i) 2 } (10) where Π j (j = 0, 1, 2) are the permutation matrices obtained by QRCP, CARRQR-B, and CARRQR-F, and ɛ is the machine precision. 29 of 73

36 Numerical results - a set of 18 matrices Ratios R(i, i) /σ i (R), for QRCP (top plot), CARRQR-B (second plot), and CARRQR-F (third plot). The number along x-axis represents the index of test matrices. 30 of 73

37 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization with column and row tournament pivoting Experimental results, LU CRTP Randomized algorithms for low rank approximation 31 of 73

38 LU versus QR - filled graph G + (A) Consider A is SPD and A = LL T Given G(A) = (V, E), G + (A) = (V, E + ) is defined as: there is an edge (i, j) G + (A) iff there is a path from i to j in G(A) going through lower numbered vertices. G(L + L T ) = G + (A), ignoring cancellations. Definition holds also for directed graphs (LU factorization). A = x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x L + L T = x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 32 of

39 LU versus QR Filled column intersection graph G + (A) Graph of the Cholesky factor of A T A G(R) G + (A) A T A can have many more nonzeros than A 33 of 73

40 LU versus QR Numerical stability Let ˆL and Û be the computed factors of the block LU factorization. Then ( ) ˆLÛ = A + E, E max c(n)ɛ A max + ˆL max Û max. (11) For partial pivoting, L max 1, U max 2 n A max In practice, U max n A max 34 of 73

41 Low rank approximation based on LU factorization Given desired rank k, the factorization has the form ) ( ) ) (Ā11 Ā 12 I (Ā11 Ā 12 P r AP c = = Ā 21 Ā 22 Ā 21 Ā 1 11 I S(Ā11), (12) where A R m n, Ā 11 R k,k, S(Ā11) = Ā22 Ā21Ā 1 11 Ā12. The rank-k approximation matrix Ã k is Ã k = ( I Ā 21 Ā 1 11 ) (Ā11 Ā 12 ) = (Ā11 Ā 21 ) Ā 1 ) 11 (Ā11 Ā 12. (13) Ā 1 11 is never formed, its factorization is used when Ãk is applied to a vector. In randomized algorithms, U = C + AR +, where C +, R + are Moore-Penrose generalized inverses. 35 of 73

42 Design space Non-exhaustive list for selecting k columns and rows: 1. Select k linearly independent columns of A (call result B), by using 1.1 (strong) QRCP/tournament pivoting using QR, 1.2 LU / tournament pivoting based on LU, with some form of pivoting (column, complete, rook), 1.3 randomization: premultiply X = ZA where random matrix Z is short and fat, then pick k rows from X T, by some method from 2) below, 1.4 tournament pivoting based on randomized algorithms to select columns at each step. 2. Select k linearly independent rows of B, by using 2.1 (strong) QRCP / tournament pivoting based on QR on B T, or on Q T, the rows of the thin Q factor of B, 2.2 LU / tournament pivoting based on LU, with pivoting (row, complete, rook) on B, 2.3 tournament pivoting based on randomized algorithms to select rows. 36 of 73

43 Select k cols using tournament pivoting Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Return columns in A ji 37 of 73

44 Select k cols using tournament pivoting Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Return columns in A ji 37 of 73

45 Select k cols using tournament pivoting Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Return columns in A ji 37 of 73

46 Select k cols using tournament pivoting Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Return columns in A ji 37 of 73

47 Select k cols using tournament pivoting Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Return columns in A ji 37 of 73

48 Select k cols using tournament pivoting Partition A = (A 1, A 2, A 3, A 4 ). Select k cols from each column block, by using QR with column pivoting At each level i of the tree At each node j do in parallel Let A v,i 1, A w,i 1 be the cols selected by the children of node j Select k cols from (A v,i 1, A w,i 1 ), by using QR with column pivoting Return columns in A ji 37 of 73

49 LU CRTP factorization - one block step One step of truncated block LU based on column/row tournament pivoting on matrix A of size m n: 1. Select k columns by using tournament pivoting, permute them in front, bounds for s.v. governed by q 1 (n, k) ( ) ( ) ( ) R11 R AP c = Q 12 Q11 Q = 12 R11 R 12 R 22 Q 21 Q 22 R Select k rows from (Q 11 ; Q 21 ) T of size m k by using tournament pivoting, ( ) Q11 Q12 P r Q = Q 21 Q22 such that Q 21 Q 1 11 max F TP and bounds for s.v. governed by q 2 (m, k). 38 of 73

50 Orthogonal matrices Given orthogonal matrix Q R m m and its partitioning ( ) Q11 Q 12 Q =, (14) Q 21 Q 22 the selection of k cols by tournament pivoting from (Q 11 ; Q 21 ) T leads to the factorization ( ) ( ) ( ) Q11 Q12 I Q11 Q12 P r Q = = Q 21 Q 22 Q Q I S( Q (15) 11) where S( Q 11 ) = Q 22 Q 21 Q 1 11 Q 12 = Q T of 73

51 Orthogonal matrices (contd) The factorization satisfies: P r Q = ( Q11 Q12 Q 21 Q22 ) = ( ) ( ) I Q11 Q12 Q Q I S( Q 11) (16) ρ j ( Q Q ) F TP, (17) 1 q 2 (m, k) σ i ( Q 11 ) 1, (18) σ min ( Q 11 ) = σ min ( Q 22 ) (19) for all 1 i k, 1 j m k, where ρ j (A) is the 2-norm of the j-th row of A, q 2 (m, k) = 1 + FTP 2 (m k). 40 of 73

52 Sketch of the proof P r AP c = = ) ( (Ā11 Ā 12 I = Ā 21 Ā 22 Ā 21 Ā 1 ( I Q Q I 11 I ) ( Q11 Q12 S( Q 11 ) ) ) (Ā11 Ā 12 S(Ā11) ) ( R11 R 12 R 22 ) (20) where Q Q = Ā21Ā 1 11, S(Ā11) = S( Q 11 )R 22 = Q T 22 R of 73

53 Sketch of the proof (contd) We obtain Ā 11 = Q 11 R 11, (21) S(Ā11) = S( Q T 11 )R 22 = Q 22 R 22. (22) σ i (A) σ i (Ā 11 ) σ min ( Q 11 )σ i (R 11 ) We also have that 1 q 1 (n, k)q 2 (m, k) σ i(a), σ k+j (A) σ j (S(Ā11)) = σ j (S( Q 11 )R 22 ) S( Q 11 ) 2 σ j (R 22 ) q 1 (n, k)q 2 (m, k)σ k+j (A), where q 1 (n, k) = 1 + FTP 2 (n k), q 2(m, k) = 1 + FTP 2 (m k). 42 of 73

54 LU CRTP factorization - bounds if rank = k Given A of size m n, one step of LU CRTP computes the decomposition ) ( ) ) (Ā11 Ā 12 I (Ā11 Ā 12 Ā = P r AP c = = Ā 21 Ā 22 Q 21 Q 1 (23) 11 I S(Ā 11) where Ā11 is of size k k and S(Ā11) = Ā22 Ā21Ā 1 11 Ā12 = Ā22 Q Q Ā12. (24) It satisfies the following properties: ρ l (Ā21Ā 1 11 ) = ρ l( Q 21 Q 1 11 ) F TP, (25) S(Ā11) max min((1 + F TP k) A max, F TP 1 + F 2 TP (m k)σ k(a)) 1 σ i(a) σ i (Ā11), σ j(s(ā11)) q(m, n, k), (26) σ k+j (A) for any 1 l m k, 1 i k, and 1 j min(m, n) k, q(m, n, k) = (1 + FTP 2 (n k)) (1 + F TP 2 (m k)). 43 of 73

55 LU CRTP factorization - bounds if rank = K = Tk Consider T block steps of LU CRTP factorization I U 11 U U 1T U 1,T +1 L 21 I U U 2T U 2,T +1 P r AP c = (27 L T 1 L T 2... I U TT U T,T +1 L T +1,1 L T +1,2... L T +1,T I U T +1,T +1 where U tt is k k for 1 t T, and U T +1,T +1 is (m Tk) (n Tk). Then: ρ l (L i+1,j ) F TP, ( ) U K max min (1 + F TP k) K/k A max, q 2(m, k)q(m, n, k) K/k 1 σ K (A), for any 1 l k. q 2(m, k) = 1 + FTP 2 (m k), and q(m, n, k) = (1 + FTP 2 (n k)) (1 + F TP 2 (m k)). 44 of 73

56 LU CRTP factorization - bounds if rank = K = Tk Consider T = K/k block steps of our LU CRTP factorization I U 11 U U 1T U 1,T +1 L 21 I U U 2T U 2,T +1 P r AP c = (28 L T 1 L T 2... I U TT U T,T +1 L T +1,1 L T +1,2... L T +1,T I U T +1,T +1 where U tt is k k for 1 t T, and U T +1,T +1 is (m Tk) (n Tk). Then: 1 v=0 q(m vk, n vk, k) σ (t 1)k+i(A) q(m (t 1)k, n (t 1)k, k), σ i (U tt) t 2 1 σ j(u T +1,T +1 ) σ K+j (A) K/k 1 for any 1 i k, 1 t T, and 1 j min(m, n) K. Here q 2(m, k) = 1 + FTP 2 (m k), and q(m, n, k) = (1 + FTP 2 (n k)) (1 + F TP 2 (m k)). 45 of 73 v=0 q(m vk, n vk, k),

57 Arithmetic complexity - arbitrary sparse matrices Let d i be the number of nonzeros in column i of A, nnz(a) = n i=1 d i. A is permuted such that d 1... d n. A = [A 00,..., A n/k,0 ] is partitioned into n/k blocks of columns. At first step of TP: Pick k cols from A 1 = [A 00, A 10 ] nnz(a 1 ) 2k 2k i=1 d i, flops QR (A 1 ) 8k 2 2k i=1 d i. At the second step of TP: Pick k cols from A 2 nnz(a 2 ) 2k 3k i=k+1 d i flops QR (A 2 ) 8k 2 3k i=k+1 d i Bounds attained when: A = of 73

58 Arithmetic complexity - arbitrary sparse matrices (2) nnz max (TP FT ) 4d n k 2 ( 2k nnz total (TP FT ) 2k d i + 4k i=1 3k ) n d i d i i=k+1 i=n 2k+1 n d i = 4nnz(A)k, i=1 flops(tp FT ) 16nnz(A)k 2, 47 of 73

59 Tournament pivoting for sparse matrices Arithmetic complexity A has arbitrary sparsity structure G(A T A) is an n 1/2 - separable graph flops(tp FT ) 16nnz(A)k 2 flops(tp FT ) O(nnz(A)k 3/2 ) flops(tp BT ) 8 nnz(a) k 2 log n flops(tp BT ) O( nnz(a) k 3/2 log n P k P k ) Randomized algorithm by Clarkson and Woodruff, STOC 13 Given n n matrix A, it computes LDW T, where D is k k such that with failure probability 1/10 A LDW T F (1 + ɛ) A A k F, A k is best rank-k approximation. The cost of this algorithm is flops O(nnz(A)) + nk 2 ɛ 4 log O(1) (nk 2 ɛ 4 ) Tournament pivoting is faster if ɛ 1 (nnz(a)/n) 1/4 or if ɛ = 0.1 and nnz(a)/n of 73

60 Tournament pivoting for sparse matrices Arithmetic complexity A has arbitrary sparsity structure G(A T A) is an n 1/2 - separable graph flops(tp FT ) 16nnz(A)k 2 flops(tp FT ) O(nnz(A)k 3/2 ) flops(tp BT ) 8 nnz(a) k 2 log n flops(tp BT ) O( nnz(a) k 3/2 log n P k P k ) Randomized algorithm by Clarkson and Woodruff, STOC 13 Given n n matrix A, it computes LDW T, where D is k k such that with failure probability 1/10 A LDW T F (1 + ɛ) A A k F, A k is best rank-k approximation. The cost of this algorithm is flops O(nnz(A)) + nk 2 ɛ 4 log O(1) (nk 2 ɛ 4 ) Tournament pivoting is faster if ɛ 1 (nnz(a)/n) 1/4 or if ɛ = 0.1 and nnz(a)/n of 73

61 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization with column and row tournament pivoting Experimental results, LU CRTP Randomized algorithms for low rank approximation 49 of 73

62 Numerical results Singular value 10 0 Evolution of singular values for exponential QRCP LU-CRQRCP LU-CRTP SVD Singular value 10 0 Evolution of singular values for foxgood QRCP LU-CRQRCP LU-CRTP SVD Index of singular values Index of singular values Left: exponent - exponential Distribution, σ 1 = 1, σ i = α i 1 (i = 2,..., n), α = 10 1/11 [Bischof, 1991] Right: foxgood - Severely ill-posed test problem of the 1st kind Fredholm integral equation used by Fox and Goodwin 50 of 73

63 Numerical results Here k = 16 and the factorization is truncated at K = 128 (bars) or K = 240 (red lines). LU CTP: Column tournament pivoting + partial pivoting All singular values smaller than machine precision, ɛ, are replaced by ɛ. The number along x-axis represents the index of test matrices. 51 of 73

Results for image of size 919 707 Original image Rank-38 approx, SVD Singular value

64 Results for image of size Original image Rank-38 approx, SVD Singular value distribution Rank-38 approx, LUPP Rank-38 approx, LU CRTP Rank-75 approx, LU CRTP 52 of 73

Results for image of size 691 505 Original

65 Results for image of size Original image Rank-105 approx, SVD Singular value distribution Rank-105 approx, LUPP Rank-105 approx, LU CRTP Rank-209 approx, LU CRTP 53 of 73

66 Comparing nnz in the factors L, U versus Q, R Name/size Nnz Rank K Nnz QRCP/ Nnz LU CRTP/ A(:, 1 : K) Nnz LU CRTP Nnz LUPP gemat wang Rfdevice Parab fem Mac econ of 73

67 Performance results Selection of 256 columns by tournament pivoting Edison, Cray XC30 (NERSC): 2x12-core Intel Ivy Bridge (2.4 GHz) Tournament pivoting uses SPQR (T. Davis) + dgeqp3 (Lapack), time in secs Matrices: dimension at leaves on 32 procs Parab fem: Mac econ: Time Time leaves Number of MPI processes 2k cols 32procs SPQR + dgeqp3 Parab fem Mac econ of 73

68 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization with column and row tournament pivoting Experimental results, LU CRTP Randomized algorithms for low rank approximation 56 of 73

69 Randomized algorithms - main idea Construct a low dimensional subspace that captures the action of A. Restrict A to the subspace and compute a standard QR or SVD factorization. Obtained as follows: 1. Compute an approximate basis for the range of A (m n) find Q (m k) with orthonormal columns and approximate A by the projection of its columns onto the space spanned by Q: A QQ T A 2. Use Q to compute a standard factorization of A Source: Halko et al, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decomposition, SIREV of 73

70 Why a random projection works Johnson-Lindenstrauss Lemma For any 0 < ɛ < 1, and any set of vectors x 1,..,.x n in R m, let k 4(ɛ 2 /2 ɛ 3 /3) 1 ln(n). Let F be a random kxm orthogonal matrix multiplied by m/k. Then with probability at 1/n, for all 1 <= i, j <= n (1 ɛ) x i x j 2 <= F (x i x j ) 2 <= (1 + ɛ) x i x j 2 Any m-vector can be embedded in k = O(log(n)/ɛ 2 ) dimensions while incurring a distortion of at most 1 ± ɛ between all pairs of m-vectors. JL relies on F being uniformly distributed random orthonormal matrix. Such an F can be obtained by computing the QR factorization of an m k matrix of i.i.d. N(0, 1) random variables. Source: Theorem 2.1 and proof in S. Dasgupta, A. Gupta, 2003, An Elementary Proof of a Theorem of Johnson and Lindenstrauss 58 of 73

71 Typical randomized truncated SVD Algorithm Input: m n matrix A, desired rank k, l = p + k exponent q. 1. Sample an n l test matrix G with independent mean-zero, unit-variance Gaussian entries. 2. Compute Y = (AA T ) q AG /* Y is expected to span the column space of A */ 3. Construct Q R m l with columns forming an orthonormal basis for the range of Y. 4. Compute B = Q T A 5. Compute the SVD of B = ÛΣV T Return the approximation Ãk = QÛ Σ V T 59 of 73

72 Randomized truncated SVD (q = 0) The best approximation is when Q equals the first k + p left singular vectors of A. Given A = UΣV T, QQ T A = U(1 : m, 1 : k + p)σ(1 : k + p, 1 : k + p)(v (1 : n, 1 : k + p)) A QQ T A 2 = σ k+p+1 Theorem 1.1 from Halko et al. If G is chosen to be i.i.d. N(0,1), k, p 2, q = 1, then the expectation with respect to the random matrix G is ( E( A QQ T A 2 ) ) k + p min(m, n) σ k+1 (A) p 1 and the probability that the error satisfies ( A QQ T A k + p min(m, ) n) σ k+1 (A) is at least 1 6/p p. For p = 6, the probability becomes of 73

73 Randomized truncated SVD Theorem 10.6, Halko et al. Average spectral norm. Under the same hypotheses as Theorem 1.1 from Halko et al., ( ) E( A QQ T k A 2 ) 1 + σ k+1 (A) + e k + p p 1 p n j=k+1 σj 2 (A) 1/2 Fast decay of singular values: ( ) 1/2 If j>k σ2 j (A) σk+1 then the approximation should be accurate. Slow decay of singular values: ( ) 1/2 If j>k σ2 j (A) n kσk+1 and n large, then the approximation might not be accurate. Source: G. Martinsson s talk 61 of 73

74 Power iteration q 1 The matrix (AA T ) q A has a faster decay in its singular values: has the same left singular vectors as A its singular values are: σ j ((AA T ) q A) = (σ j (A)) 2q+1 62 of 73

75 Cost of randomized truncated SVD Randomized SVD requires 2q + 1 passes over the matrix. The last 3 steps of the algorithms cost: (2) Compute Y = (AA T ) q AG: 2(2q + 1) nnz(a) (k + p) (3) Compute QR of Y : 2m(k + p) 2 (4) Compute B = Q T A: 2nnz(A) (k + p) (5) Compute SVD of B: O(n(k + p) 2 ) If nnz(a)/m k + p and q = 1, then (2) and (4) dominate (3). To be faster than deterministic approaches, the cost of (2) and (4) need to be reduced. 63 of 73

76 Fast Johnson-Lindenstrauss transform Find sparse or structured G such that computing AG is cheap, e.g. a subsampled random Fourier trasnform (SRFT), n G = D F S, where k + p D is n n diagonal with entries uniformly distributed on unit circle in C F is n n discrete Fourier transform, F jk = 1 n e 2πi(j 1)(k 1)/n S is n (k + p) random subset of the columns of the identity (draws k + p columns at random from DF ). Computational cost (2) Compute AG in O(mn log(n)) or O(mn log(k + p)) via a subsampled FFT (4) Compute B = Q T A still expensive! can be reduced by row sampling References: Ailon and Chazelle (2006), Liberty, Rokhlin, Tygert and Woolfe (2006). 64 of 73

77 Summary of computation cost Dense matrix A of size m n QR with column pivoting: 4mnk Randomized SVD with a Gaussian matrix: O(mnk) Randomized SVD with an SRFT: O(mn log(k)) 65 of 73

Results from image processing (from Halko et al) A matrix A of size 9025 9025 arising from a diffusion geometry approach. A is a graph Lapacian on the manifold of 3 3 patches.

78 Results from image processing (from Halko et al) A matrix A of size arising from a diffusion geometry approach. A is a graph Lapacian on the manifold of 3 3 patches pixel grayscale image, intensity of each pixel is an integer Vector x (i) R 9 gives the intensities of the pixels in a 3 3 neighborhood of pixel i. W reflects similarities between patches, σ = 50 reflects the level of sensitivity, w ij = exp{ x (i) x (j) 2 /σ 2 }, Sparsify W, compute dominant eigenvectors of A = D 1/2 WD 1/2. 66 of 73

79 Experimental results (from Halko et al) Approximation error : A QQ T A 2 Estimated eigenvalues for k = of 73

80 Clarksson and Woodruff, STOC 2013 Based on randomized sparse embedding Let S, of size poly(k/ɛ) n be formed such that each column has one non-zero, ±1, randomly chosen S = Given A of size n n and rank k, for certain poly(k/ɛ), with probability at least 9/10, the column space of A is preserved, that is for all x R n, SAx 2 = (1 ± ɛ) Ax 2 SA can be computed in nnz(a) time Source: Woodruff s talk, STOC of 73

81 Clarksson and Woodruff, STOC 2013 Main idea Let A be an n n matrix S be an v n sparse embedding matrix, v = Θ(ɛ 4 k 2 log 6 (k/ɛ)) R an t n sparse embedding matrix, t = O(kɛ 1 log(k/ɛ)) A = AR T (SAR T ) 1 SA Extract low rank approximation from A More details in Theorem 47 from STOC 2013 Theorem 47 relies on S and R being the product of a sparse embedding and a SRHT matrix 69 of 73

82 Clarkson and Woodruff, STOC 2013 Given n n matrix A, it computes LDW T, where D is k k such that with failure probability 1/10 A LDW T F (1 + ɛ) A A k F, A k is best rank-k approximation. flops O(nnz(A)) + (nk 2 ɛ 4 + k 3 ɛ 5 )log O(1) (nk 2 ɛ 4 + k 3 ɛ 5 ) 70 of 73

83 More details on CA deterministic algorithms [Demmel et al., 2015] Communication avoiding rank revealing QR factorization with column pivoting Demmel, Grigori, Gu, Xiang, SIAM J. Matrix Analysis and Applications, Low rank approximation of a sparse matrix based on LU factorization with column and row tournament pivoting, with S. Cayrols and J. Demmel, Inria TR of 73

84 References (1) Bischof, C. H. (1991). A parallel QR factorization algorithm with controlled local pivoting. SIAM J. Sci. Stat. Comput., 12: Businger, P. A. and Golub, G. H. (1965). Linear least squares solutions by Householder transformations. Numer. Math., 7: Demmel, J., Grigori, L., Gu, M., and Xiang, H. (2015). Communication-avoiding rank-revealing qr decomposition. SIAM Journal on Matrix Analysis and its Applications, 36(1): Eckart, C. and Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1: Eisenstat, S. C. and Ipsen, I. C. F. (1995). Relative perturbation techniques for singular value problems. SIAM J. Numer. Anal., 32(6): Gu, M. and Eisenstat, S. C. (1996). Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Sci. Comput., 17(4): Hansen, P. C. (2007). Regularization tools: A matlab package for analysis and solution of discrete ill-posed problems. Numerical Algorithms, (46): of 73

85 Results used in the proofs Interlacing property of singular values [Golub, Van Loan, 4th edition, page 487] Let A = [a 1... a n ] be a column partitioning of an m n matrix with m n. If A r = [a 1... a r ], then for r = 1 : n 1 σ 1 (A r+1 ) σ 1 (A r ) σ 2 (A r+1 )... σ r (A r+1 ) σ r (A r ) σ r+1 (A r+1 ). Given n n matrix B and n k matrix C, then ([Eisenstat and Ipsen, 1995], p. 1977) σ min (B)σ j (C) σ j (BC) σ max (B)σ j (C), j = 1,..., k. 73 of 73

Rank revealing factorizations, and low rank approximations

Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization