Lecture 4. CP and KSVD Representations. Charles F. Van Loan

Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD Representations Charles F. Van Loan Cornell University CIME-EMS Summer School June 22-26, 2015 Cetraro, Italy Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 1 / 36

What is this Lecture About? Two More Tensor SVDs The CP Representation has diagonal aspect like the SVD but there is no orthogonality. The Kronecker Product SVD can be used to write a given matrix as an optimal sum of Kronecker products. If the matrix is obtained via a tensor unfolding, then we obtain yet another SVD-like representation. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 2 / 36

The CP Representation Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 3 / 36

The CP Representation Definition The CP representation for an n 1 n 2 n 3 tensor A has the form A = r λ k F (:, k) G(:, k) H(:, k) k=1 where λ s are real scalars and F IR n 1 r, G IR n 2 r, and H IR n 3 r Equivalent A(i 1, i 2, i 3 ) = vec(a) = r λ j F (i 1, j) G(i 2, j) H(i 3, j)) j=1 r λ j H(:, j) G(:, j) F (:, j) j=1 Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 4 / 36

Tucker Vs. CP The Tucker Representation A = r 1 j 1 =1 r 2 j 2 =1 r 3 j 3 =1 S(j 1, j 2, j 3 ) U 1 (:, j 1 ) U 2 (:, j 2 ) U 3 (:, j 3 ) The CP Representation r A = λ j F (:, j) G(:, j) H(:, j) j=1 In Tucker the U s have orthonormal columns. In CP, the matrices F, G, and H do not have orthonormal columns. In CP the core tensor is diagonal while in Tucker it is not. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 5 / 36

A Note on Terminology The CP Decomposition It also goes by the name of the CANDECOMP/PARAFAC Decomposition. CANDECOMP = Canonical Decomposition PARAFAC = Parallel Factors Decomposition Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 6 / 36

A Little More About Tensor Rank Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 7 / 36

The CP Representation and Rank Definition If r A = λ j F (:, j) G(:, j) H(:, j) j=1 is the shortset possible CP representation of A, then rank(a) = r Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 8 / 36

Tensor Rank Anomaly 1 The largest rank attainable for an n 1 -by-...-n d tensor is called the maximum rank. It is not a simple formula that depends on the dimensions n 1,..., n d. Indeed, its precise value is only known for small examples. Maximum rank does not equal min{n 1,..., n d } unless d 2. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 9 / 36

Tensor Rank Anomaly 2 If the set of rank-k tensors in IR n 1 n d measure, then k is a typical rank. has positive Lebesgue Size Typical Ranks 2 2 2 2,3 3 3 3 4 3 3 4 4,5 3 3 5 5,6 For n 1 -by-n 2 matrices, typical rank and maximal rank are both equal to the smaller of n 1 and n 2. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 10 / 36

Tensor Rank Anomaly 3 The rank of a particular tensor over the real field may be different than its rank over the complex field. Anomaly 4 A tensor with a given rank may be approximated with arbitrary precision by a tensor of lower rank. Such a tensor is said to be degenerate. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 11 / 36

The Nearest CP Problem Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 12 / 36

The CP Approximation Problem Definition Given: A IR n 1 n 2 n 3 and r Determine: λ IR r and F IR n 1 r, G IR n 2 r, and H IR n 3 r (with unit 2-norm columns) so that if X = r λ j F (:, j) G(:, j) H(:, j) j=1 then A X 2 F is minimized. A multilinear optimization problem. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 13 / 36

The CP Approximation Problem Equivalent Formulations r A λ j F (:, j) G(:, j) H(:, j) A (1) A (2) A (3) j=1 F = r λ j F (:, j) ( H(:, j) G(:, j) ) T j=1 = r λ j G(:, j) ( H(:, j) F (:, j) ) T j=1 = r λ j H(:, j) ( G(:, j) F (:, j) ) T j=1 F F F Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 14 / 36

Introducing the Khatri-Rao Product Definition If B = [ b 1 b r ] IR n 1 r C = [ c 1 c r ] IR n 2 r then the Khatri-Rao product of B and C is given by B C = [ b 1 c 1 b r c r ]. Column-wise KPs. Note that B C IR n1n2 r. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 15 / 36

The CP Approximation Problem Equivalent Formulations r A λ j F (:, j) G(:, j) H(:, j) j=1 = A (1) F diag(λ j ) (H G) T F = A (2) G diag(λ j ) (H F ) T F = A (3) H diag(λ j ) (G F ) T F F Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 16 / 36

The CP Approximation Problem The Alternating LS Solution Framework... A X F = A (1) F diag(λ j ) (H G) T F = A (2) G diag(λ j ) (H F ) T F = A (3) H diag(λ j ) (G F ) T F 1. Fix G and H and improve λ and F. 2. Fix F and H and improve λ and G. 3. Fix F and G and improve λ and H. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 17 / 36

The CP Approximation Problem The Alternating LS Solution Framework Repeat: 1. Let F minimize A (1) F (H G) T F and for j = 1:r set λ j = F (:, j) 2 and F (:, j) = F (:, j)/λ j. 2. Let G minimize A (2) G (H F ) T F and for j = 1:r set λ j = G(:, j) 2 and G(:, j) = G(:, j)/λ j. 3. Let H minimize A (3) H (G F ) T F and for j = 1:r set λ j = H(:, j) 2 and H(:, j) = H(:, j)/λ j. These are linear least squares problems. The columns of F, G, and H are normalized. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 18 / 36

The CP Approximation Problem Solving the LS Problems The solution to min F A (1) F (H G) T F = min F A T (1) (H G) F T F can be obtained by solving the normal equation system (H G) T (H G) F T = (H G) T A T (1) Can be solved efficiently by exploiting two properties of the Khatri-Rao product. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 19 / 36

The Khatri-Rao Product Fast Property 1. If B IR n1 r and C IR n2 r, then (B C) T (B C) = (B T B). (C T C) where. denotes pointwise multiplication. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 20 / 36

The Khatri-Rao Product Fast Property 2. If B = [ b 1 b r ] IR n 1 r C = [ c 1 c r ] IR n 2 r z IR n 1n 2, and y = (B C) T z, then c1 T Zb 1 y =. Z = reshape(z, n 2, n 1 ) cr T Zb r Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 21 / 36

Overall: The Khatri-Rao LS Problem Structure Given B IR n 1 r, C IR n 2 r, and b IR n 1n 2, minimize B C)x z 2 Data Sparse: An n 1 n 2 -by-r LS problem defined by O((n 1 + n 2 )r) data. Solution Procedure 1. Form M = (B T B). (C T C). O((n 1 + n 2 )r 2 ). 2. Cholesky: M = LL T. O(r 3 ). 3. Form y = (B C) T using Property 2. O(n 1 n 2 r). 4. Solve Mx = y. O(r 2 ). O(n 1 n 2 r) vs O((n 1 n 2 r 2 ) Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 22 / 36

The Kronecker Product SVD Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 23 / 36

The Nearest Kronecker Product Problem Find B and C so that A B C F = min a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 a 51 a 52 a 53 a 54 a 61 a 62 a 63 a 64 a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 b 11 b 12 b 21 b 22 b 31 b 32 = b 11 b 21 b 31 b 12 b 22 b 32 [ c11 c 12 c 21 c 22 ] F [ c11 c 21 c 12 c 22 ] F Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 24 / 36

The Nearest Kronecker Product Problem Find B and C so that A B C F = min It is a nearest rank-1 problem, φ A (B, C) = a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 b 11 b 21 b 31 b 12 b 22 b 32 [ ] c11 c 21 c 12 c 22 F = Ã vec(b)vec(c)t F with SVD solution: Ã = UΣV T vec(b) = σ 1 U(:, 1) vec(c) = σ 1 V (:, 1) Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 25 / 36

The Nearest Kronecker Product Problem The Tilde Matrix a 11 a 12 a 13 a 14 A = a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 = a 51 a 52 a 53 a 54 a 61 a 62 a 63 a 64 implies Ã = a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 = A 11 A 12 A 21 A 22 A 31 A 32 vec(a 11 ) T vec(a 21 ) T vec(a 31 ) T vec(a 12 ) T vec(a 22 ) T vec(a 32 ) T. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 26 / 36

The Kronecker Product SVD (KPSVD) Theorem If A = A 11 A 1,c2..... A r2,1 A r2,c 2 A i2,j 2 IR r 1 c 1 then there exist U 1,..., U rkp IR r 2 c 2, V 1,..., V rkp IR r 1 c 1, and scalars σ 1 σ rkp > 0 such that A = r KP σ k U k V k. k=1 The sets {vec(u k )} and {vec(v k )} are orthonormal and r KP is the Kronecker rank of A with respect to the chosen blocking. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 27 / 36

The Kronecker Product SVD (KPSVD) Constructive Proof Compute the SVD of Ã: Ã = UΣV T = r KP σ k u k vk T k=1 and define the U k and V k by for k = 1:r KP. vec(u k ) = u k vec(v k ) = v k U k = reshape(u k, r 2, c 2 ), V k = reshape(v k, r 1, c 1 ) Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 28 / 36

The Kronecker Product SVD (KPSVD) Nearest rank-r If r r KP, then A r = r σ k U k V k k=1 is the nearest matrix to A (in the Frobenius norm) that has Kronecker rank r. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 29 / 36

Structured Kronecker Product Approximation min B,C A B C F Problems If A is symmetric and positive definite, then so are B and C. If A is a block Toeplitz with Toeplitz blocks, then B and C are Toeplitz. If A is a block band matrix with banded blocks, the B and C are banded. Can use Lanczos SVD if A is large and sparse. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 30 / 36

A Tensor Approximation Idea Motivation Unfold A IR n n n n into an n 2 -by-n 2 matrix A. Express A as a sum of Kronecker products: A = r σ k B k C k k=1 B k, C k IR n n Back to tensor: A = i.e., A(i 1, i 2, j 1, j 2 ) = r σ k C k B k k=1 r σ k C k (i 1, i 2 )B k (j 1, j 2 ) k=1 Sums of tensor products of matrices instead of vectors. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 31 / 36

The Nearest Kronecker Product Problem Harder φ A (B, C, D) = r 1 c 1 r 2 c 2 r 3 i 1=1 j 1=1 i 2=1 j 2=1 i 3=1 j 3=1 A B C D F = c 2 A(i 1, j 1, i 2, j 2, i 3, j 3 ) B(i 3, j 3 )C(i 2, j 2 )D(i 1, j 1 ) Trying to approximate an order-6 tensor with a triplet of order-2 tensors. Would have to apply componentwise optimization. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 32 / 36

Concluding Remarks Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 33 / 36

Optional Fun Problems Problem E4. Suppose " B11 C 11 B 12 C 12 A = B 21 C 21 B 22 C 22 # and that the B ij and C ij are each m-by-m. (a) Assuming that structure is fully exploited, how many flops are required to compute y = Ax where x IR 2m2? (b) How many flops are required to explicitly form A? (c) How many flops are required to compute y = Ax assuming that A has been explicitly formed? Problem A4. Suppose A is n 2 -by-n 2. How would you compute X IR n n so that A X X F is minimized? Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 34 / 36