BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

BlockMatrixComputations and the Singular Value Decomposition ATaleofTwoIdeas Charles F. Van Loan Department of Computer Science Cornell University Supported in part by the NSF contract CCR-9901988.

Block Matrices A block matrix is a matrix with matrix entries, e.g., A = = A 11 A 12 A 21 A 22 A ij IR 3 2 Operations are pretty much business as usual, e.g. A 11 A 12 A 21 A 22 T A 11 A 12 A 21 A 22 = A T 11 A T 21 A T 12 AT 22 A 11 A 12 A 21 A 22 = A T 11A 11 + A T 21A 21 etc etc etc

E.g., Strassen Multiplication C 11 C 12 C 21 C 22 = A 11 A 12 A 21 A 22 B 11 B 12 B 21 B 22 P 1 =(A 11 + A 22 )(B 11 + B 22 ) P 2 =(A 21 + A 22 )B 11 P 3 = A 11 (B 12 B 22 ) P 4 = A 22 (B 21 B 11 ) P 5 =(A 11 + A 12 )B 22 P 6 =(A 21 A 11 )(B 11 + B 12 ) P 7 =(A 12 A 22 )(B 21 + B 22 ) C 11 = P 1 + P 4 P 5 + P 7 C 12 = P 3 + P 5 C 21 = P 2 + P 4 C 22 = P 1 + P 3 P 2 + P 6

Singular Value Decomposition (SVD) If A IR m n then there exist orthogonal U IR m m and V IR n n so U T AV = Σ = diag(σ 1,...,σ n ) = where σ 1 σ 2 σ n are the singular values. σ 1 0 0 σ 2 0 0

Singular Value Decomposition (SVD) If A IR m n then there exist orthogonal U IR m m and V IR n n so U T AV = Σ = diag(σ 1,...,σ n ) = where σ 1 σ 2 σ n are the singular values. σ 1 0 0 σ 2 0 0 Fact 1. The columns of V = v 1 v n and U = u 1 u n are the right and left singular vectors and they are related: Av j = σ j u j j =1:n A T u j = σ j v j

Singular Value Decomposition (SVD) If A IR m n then there exist orthogonal U IR m m and V IR n n so U T AV = Σ = diag(σ 1,...,σ n ) = where σ 1 σ 2 σ n are the singular values. σ 1 0 0 σ 2 0 0 Fact 2. The SVD of A is related to the eigen-decompositions of A T A and AA T : V T (A T A)V =diag(σ 2 1,...,σ 2 n) U T (AA T )U =diag(σ1,...,σ 2 n, 2 0,...,0) m n

Singular Value Decomposition (SVD) If A IR m n then there exist orthogonal U IR m m and V IR n n so U T AV = Σ = diag(σ 1,...,σ n ) = where σ 1 σ 2 σ n are the singular values. σ 1 0 0 σ 2 0 0 Fact 3. The smallest singular value is the distance from A to the set of rank deficient matrices: σ min = min rank(b) <n A B F

Singular Value Decomposition (SVD) If A IR m n then there exist orthogonal U IR m m and V IR n n so U T AV = Σ = diag(σ 1,...,σ n ) = where σ 1 σ 2 σ n are the singular values. σ 1 0 0 σ 2 0 0 Fact 4. The matrix σ 1 u 1 v T 1 is the closest rank-1 matrix to A, i.e., it solves the problem: σ min = min rank(b) =1 A B F

The High-Level Message.. It is important to be able to think at the block level because of problem structure. It is important to be able to develop block matrix algorithms There is a progression... Simple Linear Algebra Block Linear Algebra Multilinear Algebra

Reasoning at the Block Level

Uncontrollability The system ẋ = Ax + Bu A IR n n,b IR n p,n>p is uncontrollable if G = t 0 ea(t τ) BB T e Aτ dτ is singular.

Uncontrollability The system ẋ = Ax + Bu A IR n n,b IR n p,n>p is uncontrollable if G = t 0 ea(t τ) BB T e Aτ dτ is singular. Ã = A BB T 0 A T eãt = F 11 F 12 G = F 11F T 12 0 F 22

Nearness to Uncontrollability The system ẋ = Ax + Bu A IR n n,b IR n p,n>p is nearly uncontrollable if the minimum singular value of is small. G = t 0 ea(t τ) BB T e Aτ dτ Ã = A BB T 0 A eãt = F 11 F 12 G = F 11F T 12 0 F 22

Developing Algorithms at the Block Level

Block Matrix Factorizations: A Challenge By a block algorithm we mean an algorithm that is rich in matrix-matrix multiplication. Is there a block algorithm for Gaussian elimination? I.e., is there a way to rearrange the O(n 3 ) operations so that the implementation spends at most O(n 2 )timenot doing matrix-matrix multiplication?

Why? Re-use ideology: when you touch data, you want to use it a lot. Not all linear algebra operations are equal in this regard. Level Example Data Work 1 α = y T z O(n) O(n) y = y + αz O(n) O(n) 2 y = y + As O(n 2 ) O(n 2 ) A = A + yz T O(n 2 ) O(n 2 ) 3 A = A + BC O(n 2 ) O(n 3 ) Here, α is a scalar, y, z vectors, and A, B, C matrices

Scalar LU a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 = 1 0 0 21 1 0 31 32 1 u 11 u 12 u 13 0 u 22 u 23 0 0 u 33 a 11 = u 11 a 12 = u 12 a 13 = u 13 a 21 = 21 u 11 u 11 = a 11 u 12 = a 12 u 13 = a 13 21 = a 21 /u 11 a 31 = 31 u 11 31 = a 31 /u 11 a 22 = 21 u 12 + u 22 u 22 = a 22 21 u 12 a 23 = 21 u 13 + u 23 u 23 = a 23 21 u 13 a 32 = 31 u 12 + 32 u 22 32 =(a 32 31 u 12 )/u 22 a 33 = 31 u 13 + 32 u 23 + u 33 u 33 = a 33 31 u 13 32 u 23

Recursive Description If α IR, v, w IR n 1,andB IR (n 1) (n 1) then A = α v w T B = 1 0 v/α L α w T 0 Ũ is the LU factorization of A if LŨ = Ã is the LU factorization of Ã = B vw T /α. Rich in level-2 operations.

Block LU: Recursive Description L 11 0 L 21 L 22 U 11 U 12 0 U 22 = A 11 A 12 A 21 A 22 p n p p n p A 11 = L 11 U 11 Get L 11, U 11 via LU of A 11 A 21 = L 21 U 11 Solve triangular systems for L 21 A 12 = L 11 U 12 Solve triangular systems for U 12 A 22 = L 21 U 12 + L 22 U 22 Form Ã = A 22 L 21 U 12 Get L 22, U 22 via LU of Ã

Block LU: Recursive Description L 11 0 L 21 L 22 U 11 U 12 0 U 22 = A 11 A 12 A 21 A 22 p n p p n p A 11 = L 11 U 11 Get L 11, U 11 via LU of A 11 O(p 3 ) A 21 = L 21 U 11 Solve triangular systems for L 21 O(np 2 ) A 12 = L 11 U 12 Solve triangular systems for U 12 O(np 2 ) A 22 = L 21 U 12 + L 22 U 22 Form Ã = A 22 L 21 U 12 O(n 2 p) Recur Get L 22, U 22 via LU of Ã Rich in level-3 operations!!!

Consider: Ã = A 22 L 21 U 12 p =3: p must be big enough so that the advantage of data re-use is realized.

Block Algorithms for (Some) Matrix Factorizations LU with pivoting: PA = LU P permutation, L lower triangular, U upper triangular Cholesky factorization for symmetric positive definite A: A = GG T G lower triangular The QR factorization for rectangular matrices: A = QR Q IR m m orthogonal R IR m n upper triangular

Block Matrix Factorization Algorithms via Aggregation

Developing a Block QR Factorization The standard algorithm computes Q as a product of Householder reflections, After 2 steps... Q T A = H n H 1 A = R H 2 H 1 A = 0 0 0 0 0 0 0 0 0. The H matrices look like this: H = I 2vv T v a unit 2-norm column vector

Developing a Block QR Factorization The standard algorithm computes Q as a product of Householder reflections, After 2 steps... Q T A = H n H 1 A = R H 2 H 1 A = 0 0 0 2 0 0 2 0 0 2 0 0 2. The H matrices look like this: H = I 2vv T v a unit 2-norm column vector

Developing a Block QR Factorization The standard algorithm computes Q as a product of Householder reflections, After 3 steps... Q T A = H n H 1 A = R H 3 H 2 H 1 A = 0 0 0 0 0 0 0 0 0 0 0 0. The H matrices look like this: H = I 2vv T v a unit 2-norm column vector

Aggregation A = A 1 A 2 p n p p<<n Generate the first p Householders based on A 1 : H p H 1 A 1 = R 11 (upper triangular) Aggregate H 1,...,H p : H p H 1 = I 2WY T W, Y IR m p Apply to rest of matrix: (H p H 1 )A = (H p H 1 )A 1 (I 2WY T )A 2

The WY Representation Aggregation: (I 2WY T )(I 2vv T ) = I 2W + Y T + where W + = W Y + = Y (I 2WY T )v v Application A (I 2WY T )A = A (2W )(Y T A)

The Curse of Similarity Transforms

A Block Householder Tridiagonalization? A symmetric. Compute orthogonal Q such that Q T AQ = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0. The standard algorithm computes Q as a product of Householder reflections, Q = H 1 H n 2

A Block Householder Tridiagonalization? H 1 introduces zeros in first column H T 1 A = 0 0 0 0. Scrambles rows 2 through 6.

A Block Householder Tridiagonalization? Must also post-multiply... H T 1 AH 1 = 0 0 0 0 0 0 0 0. H 1 scrambles columns 2 through 6.

A Block Householder Tridiagonalization? H 2 introduces zeros into second column H T 1 AH 1 = 0 0 0 0 0 2 0 2 0 2 0 2. Note that because of H 1 s impact, H 2 depends on all of A s entries. The H i can be aggregated, but A must be completely updated along the way destroys the advantage.

Jacobi Methods for Symmetric Eigenproblem These methods do not initially reduce A to condensed form. They repeatedly reduce the sum-of-squares of the off-diagonal elements. A = 0 0 1 0 0 0 0 c 0 s 0 0 1 0 0 s 0 c T A 1 0 0 0 0 c 0 s 0 0 1 0 0 s 0 c The (2,4) subproblem: d 1 0 0 d 2 = c s s c T a 22 a 24 a 42 a 44 c s s c c 2 + s 2 = 1 and the off-diagonal sum of squares is reduced by 2a 2 24.

Jacobi Methods for Symmetric Eigenproblem Cycle through subproblems: (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) (1,2): A = 0 0 c s 0 0 s c 0 0 0 0 1 0 0 0 0 1 T A c s 0 0 s c 0 0 0 0 1 0 0 0 0 1 (1,3): A = 0 0 c 0 s 0 0 1 0 0 s 0 c 0 0 0 0 1 T A c 0 s 0 0 1 0 0 s 0 c 0 0 0 0 1 etc.

Parallel Jacobi Parallel Ordering: {(1, 2), (3, 4)}, {(1, 4), (2, 3)}, { (1, 3), (2, 4)} (1,2): A = 0 0 c s 0 0 s c 0 0 0 0 1 0 0 0 0 1 T A c s 0 0 s c 0 0 0 0 1 0 0 0 0 1 (3,4): A = 0 0 1 0 0 0 0 1 0 0 0 0 c s 0 0 s c T A 1 0 0 0 0 1 0 0 0 0 c s 0 0 s c

Block Jacobi A = 0 0 I 0 0 0 0 Q 11 0 Q 12 0 0 I 0 0 Q 21 0 Q 22 T A I 0 0 0 0 Q 11 0 Q 12 0 0 I 0 0 Q 21 0 Q 22 The (2,4) subproblem: D 1 0 0 D 2 = Q 11 Q 12 Q 21 Q 22 T A 22 A 24 A 42 A 44 Q 11 Q 12 Q 21 Q 22 Convergence analysis requires an understanding of 2-by-2 block matrices that are orthogonal...

The CS Decomposition The blocks of an orthogonal matrix have related SVDs: Q 11 Q 12 Q 21 Q 22 = U 1 0 0 U 2 C S S C V 1 0 0 V 2 T C =diag(cos(θ 1 ),...,cos(θ m )) S =diag(sin(θ 1 ),...,sin(θ m )) U1 T Q 11V 1 = C U1 T Q 12 V 2 = S U2 T Q 21 V 1 = S U2 T Q 22 V 2 = C

The Curse of Sparsity

HowtoExtractLevel-3Performance? Suppose we want to solve a large sparse Ax = b problem. Iterative methods for this problem can often be dramatically accelerated if you can find a matrix M with two properties: It is easy/efficient to solve linear systems of the form Mz = r. M approximates the essence of A. M is called a preconditioner and the original iteration is modified to effectively solve the equivalent linear system (M 1 A)x =(M 1 b).

Idea: Kronecker Product Preconditioners b 11 b 12 b 21 b 22 c 11 c 12 c 13 c 21 c 22 c 23 c 31 c 32 c 33 = b 11 c 11 b 11 c 12 b 11 c 13 b 12 c 11 b 12 c 12 b 12 c 13 b 11 c 21 b 11 c 22 b 11 c 23 b 12 c 21 b 12 c 22 b 12 c 23 b 11 c 31 b 11 c 32 b 11 c 33 b 12 c 31 b 12 c 32 b 12 c 33 b 21 c 11 b 21 c 12 b 21 c 13 b 22 c 11 b 22 c 12 b 22 c 13 b 21 c 21 b 21 c 22 b 21 c 23 b 22 c 21 b 22 c 22 b 22 c 23 b 21 c 31 b 21 c 32 b 21 c 33 b 22 c 31 b 22 c 32 b 22 c 33 Replicated Block Structure

Some Properties and a Big Fact Properties: (B C) T = B T C T (B C) 1 = B 1 C 1 (B C)(D F ) = BD CF B (C D) = (B C) D Big Fact: If B and C are m-by-m, then the m 2 -by-m 2 linear system (B C)z = r can be solved in O(m 3 ) time rather than O(m 6 )time.

Capturing Essence with a Kronecker Product Given A = A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 A ij IR m m choose B and C to minimize A B C F = A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 b 11 C b 12 C b 13 C b 21 C b 22 C b 23 C b 31 C b 32 C b 33 C F TheexactsolutioncanbeobtainedviatheSVD..

Solution of the min A B C Problem Makes the blocks of A into vectors and arrange block-column major order: Ã = col(a 11 ) col(a 21 ) col(a 31 ) col(a 12 ) col(a 33 ) Compute the largest singular value σ max and the corresponding singular vectors u max and v max. B opt = σ max reshape(v max, 3, 3)). C opt = σ max reshape(u max,m,m)).

Conclusions It is important to be able to think at the block level because of problem structure. It is important to be able to develop block matrix algorithms There is a progression... Simple Linear Algebra Block Linear Algebra Multilinear Algebra (Thursday)