From Matrix to Tensor. Charles F. Van Loan

From Matrix to Tensor Charles F. Van Loan Department of Computer Science January 28, 2016 From Matrix to Tensor From Tensor To Matrix 1 / 68

What is a Tensor? Instead of just A(i, j) it s A(i, j, k) or A(i 1, i 2,..., i d ) From Matrix to Tensor From Tensor To Matrix 2 / 68

Where Might They Come From? Discretization A(i, j, k, l) might house the value of f (w, x, y, z) at (w, x, y, z) = (w i, x j, y k, z l ). High-Dimension Evaluations Given a basis {φ i (r)} n i=1 A(p, q, r, s) = Multiway Analysis R 3 φ p (r 1 )φ q (r 1 )φ r (r 2 )φ s (r 2 ) R 3 r 1 r 2 dr 1 dr 2. A(i, j, k, l) is a value that captures an interaction between four variables/factors. From Matrix to Tensor From Tensor To Matrix 3 / 68

You May Have Seen them Before... Here is a 3x3 block matrix with 2x2 blocks: A = a 11 a 12 a 13 a 14 a 15 a 16 a 21 a 22 a 23 a 24 a 25 a 26 a 31 a 32 a 33 a 34 a 35 a 36 a 41 a 42 a 43 a 44 a 45 a 46 a 51 a 52 a 53 a 54 a 55 a 56 a 61 a 62 a 63 a 64 a 65 a 66 This is a reshaping of a 2 2 3 3 tensor: Matrix entry a 45 is the (2,1) entry of the (2,3) block. Matrix entry a 45 is A(2, 3, 2, 1). From Matrix to Tensor From Tensor To Matrix 4 / 68

A Tensor Has Parts A matrix has columns and rows. A tensor has fibers. A fiber of a tensor A is a vector obtained by fixing all but one A s indices. Given A = A(1:3, 1:5, 1:4, 1:7), here is a mode-2 fiber: A(2, 1:5, 4, 6) = This is the (2,4,6) mode-2 fiber. A(2, 1, 4, 6) A(2, 2, 4, 6) A(2, 3, 4, 6) A(2, 4, 4, 6) A(2, 5, 4, 6) From Matrix to Tensor From Tensor To Matrix 5 / 68

Fibers Can Be Assembled Into a Matrix The mode-1, mode-2, and mode-3 unfoldings of A IR 4 3 2 : A (1) = a 111 a 121 a 131 a 112 a 122 a 132 a 211 a 221 a 231 a 212 a 222 a 232 a 311 a 321 a 331 a 312 a 322 a 332 a 411 a 421 a 431 a 412 a 422 a 432 (1,1) (2,1) (3,1) (1,2) (2,2) (3,2) A (2) = a 111 a 211 a 311 a 411 a 112 a 212 a 312 a 412 a 121 a 221 a 321 a 421 a 122 a 222 a 322 a 422 a 131 a 231 a 331 a 431 a 132 a 232 a 332 a 432 A (3) = (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) [ a111 a 211 a 311 a 411 a 121 a 221 a 321 a 421 a 131 a 231 a 331 a 431 a 112 a 212 a 312 a 412 a 122 a 222 a 322 a 422 a 132 a 232 a 332 a 432 (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) (1,3) (2,3) (3,3) (4,3) ] From Matrix to Tensor From Tensor To Matrix 6 / 68

There are Many Ways to Unfold a Given Tensor Here is one way to unfold A(1:2, 1:3, 1:2, 1:2, 1:3): B = 2 6 4 (1,1) (2,1) (1,2) (2,2) (1,3) (2,3) a 11111 a 11121 a 11112 a 11122 a 11113 a 11123 a 21111 a 21121 a 21112 a 21122 a 21113 a 21123 a 12111 a 12121 a 12112 a 12122 a 12113 a 12123 a 22111 a 22121 a 22112 a 22122 a 22113 a 22123 a 13111 a 13121 a 13112 a 13122 a 13113 a 13123 a 23111 a 23121 a 23112 a 23122 a 23113 a 23123 a 11211 a 11221 a 11212 a 11222 a 11213 a 11223 a 21211 a 21221 a 21212 a 21222 a 21213 a 21223 a 12211 a 12221 a 12212 a 12222 a 12213 a 12223 a 22211 a 22221 a 22212 a 22222 a 22213 a 22223 a 13211 a 13221 a 13212 a 13222 a 13213 a 13223 a 23211 a 23221 a 23212 a 23222 a 23213 a 23223 3 7 5 (1,1,1) (2,1,1) (1,2,1) (2,2,1) (1,3,1) (2,3,1) (1,1,2) (2,1,2) (1,2,2) (2,2,2) (1,3,2) (2,3,2) With the Matlab Tensor Toolbox: B = tenmat(a,[1 2 3],[4 5]) From Matrix to Tensor From Tensor To Matrix 7 / 68

There are Many Ways to Unfold a Given Tensor tenmat(a,[1 2 3],[4 5]) tenmat(a,[1 2 4],[3 5]) tenmat(a,[1 2 5],[4 5]) tenmat(a,[1 3 4],[2 5]) tenmat(a,[1 3 5],[2 5]) tenmat(a,[1 4 5],[2 3]) tenmat(a,[2 3 4],[1 5]) tenmat(a,[2 3 5],[1 4]) tenmat(a,[2 4 5],[1 3]) tenmat(a,[3 4 5],[1 2]) tenmat(a,[4 5],[1 2 3]) tenmat(a,[3,5],[1 2 4]) tenmat(a,[4 5],[1 2 5]) tenmat(a,[2 5],[1 3 4]) tenmat(a,[2 5],[1 3 5]) tenmat(a,[2 3],[1 4 5]) tenmat(a,[1 5],[2 3 4]) tenmat(a,[1 4],[2 3 5]) tenmat(a,[1 3],[2 4 5]) tenmat(a,[1 2],[3 4 5]) tenmat(a,[1],[2 3 4 5]) tenmat(a,[2],[1 3 4 5]) tenmat(a,[3],[1 2 4 5]) tenmat(a,[4],[1 2 3 5]) tenmat(a,[5],[1 2 3 4]) tenmat(a,[2 3 4 5],[1]) tenmat(a,[1 3 4 5],[2]) tenmat(a,[1 2 4 5],[3]) tenmat(a,[1 2 3 5],[4]) tenmat(a,[1 2 3 4],[5]) Choice makes life complicated... From Matrix to Tensor From Tensor To Matrix 8 / 68

Paradigm for Much of Tensor Computations To say something about a tensor A: 1. Thoughtfully unfold tensor A into a matrix A. 2. Use classical matrix computations to discover something interesting/useful about matrix A. 3. Map your insights back to tensor A. Computing (parts of) decompositions is how we do this in classical matrix computations. From Matrix to Tensor From Tensor To Matrix 9 / 68

Matrix Factorizations and Decompositions A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T A = ULV It s T PAQ T = LUa A = Language UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR From Matrix to Tensor From Tensor To Matrix 10 / 68

The Singular Value Decomposition Perhaps the most versatile and important of all the different matrix decompositions is the SVD: [ a11 a 12 a 21 a 22 ] = [ = σ 1 [ c 1 s 1 s 1 c 1 c 1 s 1 ] [ ] [ σ1 0 0 σ 2 c 2 s 2 ] [ c 2 s 2 s 2 c 2 ] T + σ 2 [ s1 c 1 ] T ] [ s2 c 2 ] T = σ 1 [ c 1 s 1 ] [c2 s 2 ] + σ 2 [ s1 c 1 ] [s2 c 2 ] where c 2 1 + s2 1 = 1 and c2 2 + s2 2 = 1. This is a very special sum of rank-1 matrices. From Matrix to Tensor From Tensor To Matrix 12 / 68

Rank-1 Matrices: You have Seen Them Before 1 2 3 4 5 6 7 8 9 2 4 6 8 10 12 14 16 18 3 6 9 12 15 18 21 24 27 4 8 12 16 20 24 28 32 36 T = 5 10 15 20 25 30 35 40 45 6 12 18 24 30 36 42 48 54 7 14 21 28 35 42 49 56 63 8 16 24 32 40 48 56 64 72 9 18 27 36 45 54 63 72 81 From Matrix to Tensor From Tensor To Matrix 13 / 68

Rank-1 Matrices: They Are Data Sparse 1 2 3 4 5 6 7 8 9 2 4 6 8 10 12 14 16 18 3 6 9 12 15 18 21 24 27 4 8 12 16 20 24 28 32 36 T = 5 10 15 20 25 30 35 40 45 = vv T v = 6 12 18 24 30 36 42 48 54 7 14 21 28 35 42 49 56 63 8 16 24 32 40 48 56 64 72 9 18 27 36 45 54 63 72 81 1 2 3 4 5 6 7 8 9 From Matrix to Tensor From Tensor To Matrix 14 / 68

The Matrix SVD Expresses the matrix as a special sum of rank-1 matrices. If A IR n n then A = n σ k u k vk T k=1 Here σ 1 σ 2 σ r > σ r+1 = = σ n = 0 and U = [u 1 u 2 u n ] V = [v 1 v 2 v n ] have columns that are mutually orthogonal. From Matrix to Tensor From Tensor To Matrix 15 / 68

The Matrix SVD: Nearness Problems Expresses the matrix as a special sum of rank-1 matrices. If A IR n n then n A = σ k u k vk T k=1 Here σ 1 σ 2 σ r > σ r+1 = = σ n = 0 and U = [u 1 u 2 u n ] V = [v 1 v 2 v n ] have columns that are mutually orthogonal. That s how far A is from being rank deficient. From Matrix to Tensor From Tensor To Matrix 16 / 68

The Matrix SVD: Data Sparse Approximation Expresses the matrix as a special sum of rank-1 matrices. If A IR n n then A r k=1 σ k u k v T k = A r Here σ 1 σ 2 σ r > σ r+1 = = σ n = 0 and U = [u 1 u 2 u n ] V = [v 1 v 2 v n ] have columns that are mutually orthogonal. That s the closest matrix to A that has rank r. If r << n, then that is a data sparse approximation of A because O(n r) << O(n 2 ). From Matrix to Tensor From Tensor To Matrix 17 / 68

There is a New Definition of Big In Matrix Computations, to say that A IR n 1 n 2 that both n 1 and n 2 are big. E.g., is big is to say n 1 = 500000 n 2 = 100000 In Tensor Computations, to say that A IR n 1 n d is big is to say that n 1 n 2 n d is big and this need not require big n k. E.g. n 1 = n 2 = = n 1000 = 2. From Matrix to Tensor From Tensor To Matrix 18 / 68

Why Data Sparse Tensor Approximation is Important 1. If you want to see this Matrix-Based Scientific Computation Tensor-Based Scientific Computation you will need tensor algorithms that scale with d. 2. This requires a framework for low-rank tensor approximation. 3. This requires some kind of tensor-level SVD. From Matrix to Tensor From Tensor To Matrix 19 / 68

What is a Rank-1 Tensor? Think Matrix First This: [ ] r11 r 12 R = = fg T = r 21 r 22 [ f1 f 2 ] [ f1 g 1 f 1 g 2 [g 1 g 2 ] = f 2 g 1 f 2 g 2 ] Is the same as this: Is the same as this: vec(r) = vec(r) = r 11 r 21 r 12 r 22 r 11 r 21 r 12 r 22 = = [ g1 g 2 g 1 f 1 g 1 f 2 g 2 f 1 g 2 f 2 ] [ f1 f 2 ] From Matrix to Tensor From Tensor To Matrix 20 / 68

The Kronecker Product of Vectors x y = x 1 x 2 x 3 [ y1 y 2 ] = x 1 y 1 x 1 y 2 x 2 y 1 x 2 y 2 x 3 y 1 x 3 y 2 = x 1 y x 2 y x 3 y From Matrix to Tensor From Tensor To Matrix 21 / 68

So What is a Rank-1 Tensor? R IR 2 2 2 is rank-1 if there exist f, g, h IR 2 such that vec(r) = r 111 r 211 r 121 r 221 r 112 r 212 r 122 r 222 = h 1 g 1 f 1 h 1 g 1 f 2 h 1 g 2 f 1 h 1 g 2 f 2 h 2 g 1 f 1 h 2 g 1 f 2 h 2 g 2 f 1 h 2 g 2 f 2 = [ h1 h 2 ] [ g1 g 2 ] [ f1 f 2 ] r ijk = h k g j f i From Matrix to Tensor From Tensor To Matrix 22 / 68

What Might a Tensor SVD Look Like? vec(r) = r 111 r 211 r 121 r 221 r 112 r 212 r 122 r 222 A special sum of rank-1 tensors. = h (1) g (1) f (1) +h (2) g (2) f (2) +h (3) g (3) f (3) From Matrix to Tensor From Tensor To Matrix 23 / 68

What Does the Matrix SVD Look Like? This: [ a11 a 12 a 21 a 22 ] = [ u11 u 12 u 21 u 22 ] [ σ1 0 0 σ 2 ] [ v11 v 12 v 21 v 22 ] T = σ 1 [ u11 u 21 ] [ v11 v 21 ] T + σ 2 [ u12 u 22 ] [ v12 v 22 ] T Is the same as this: a 11 a 21 a 12 a 22 = σ 1 = σ 1 [ v11 v 21 v 11 u 11 v 11 u 21 v 21 u 11 v 21 u 21 ] + σ 2 [ u11 u 21 ] v 12 u 12 v 12 u 22 v 22 u 12 v 22 u 22 + σ 2 [ v12 v 22 ] [ u12 u 22 ] From Matrix to Tensor From Tensor To Matrix 24 / 68

What Might a Tensor SVD Look Like? vec(r) = r 111 r 211 r 121 r 221 r 112 r 212 r 122 r 222 = h (1) g (1) f (1) + h (2) g (2) f (2) + h (3) g (3) f (3). A special sum of rank-1 tensors. Getting that special sum often requires multilinear optimiziation. We better understand that before we proceed. From Matrix to Tensor From Tensor To Matrix 25 / 68

A Nearest Rank-1 Tensor Problem Find σ 0 and [ ] [ ] [ c1 cos(θ1 ) c2 = sin(θ 1 ) s 1 s 2 ] = [ cos(θ2 ) sin(θ 2 ) ] [ c3 s 3 ] = [ cos(θ3 ) sin(θ 3 ) ] so that φ(σ, θ 1, θ 2, θ 3 ) = a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 [ c3 σ s 3 ] [ c2 s 2 ] [ c1 s 1 ] 2 is minimized. From Matrix to Tensor From Tensor To Matrix 26 / 68

A Nearest Rank-1 Tensor Problem Find σ 0 and [ ] [ ] [ c1 cos(θ1 ) c2 = sin(θ 1 ) s 1 s 2 ] = [ cos(θ2 ) sin(θ 2 ) ] [ c3 s 3 ] = [ cos(θ3 ) sin(θ 3 ) ] so that is minimized. φ(σ, θ 1, θ 2, θ 3 ) = a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 σ c 3 c 2 c 1 c 3 c 2 s 1 c 3 s 2 c 1 c 3 s 2 s 1 s 3 c 2 c 1 s 3 c 2 s 1 s 3 s 2 c 1 s 3 s 2 s 1 2 From Matrix to Tensor From Tensor To Matrix 27 / 68

Alternating Least Squares Freeze c 2, s 2, c 3 and s 3 and minimize 2 3 2 a 111 a 211 a 121 φ = a 221 a 112 σ 6 a 212 7 6 4 a 122 5 4 a 222 with respect to c 3c 2c 1 c 3c 2s 1 c 3s 2c 1 c 3s 2s 1 s 3c 2c 1 s 3c 2s 1 s 3s 2c 1 s 3s 2s 1 3 7 5 2 2 3 a 111 a 211 a 121 = a 221 a 112 6 a 212 7 4 a 122 5 a 222 x 1 = σc 1 y 1 = σs 1 2 6 4 c 3c 2 0 0 c 3c 2 c 3s 2 0 0 c 3s 2 s 3c 2 0 0 s 3c 2 s 3s 2 0 0 s 3s 2 3 7 5» x1 y 1 2 This is an ordinary linear least squares problem. We then get improved σ, c 1, and s 1 via σ = [ ] [ ] x 2 1 + y 2 c1 x1 1 = /σ s 1 y 1 From Matrix to Tensor From Tensor To Matrix 28 / 68

Alternating Least Squares Freeze c 1, s 1, c 3 and s 3 and minimize 2 3 2 3 2 a 111 c 3c 2c 1 a 211 c 3c 2s 1 a 121 c 3s 2c 1 φ = a 221 a 112 σ c 3s 2s 1 s 3c 2c 1 = 6 a 212 7 6 s 3c 2s 1 7 6 4 a 122 5 4 s 3s 2c 1 5 4 a 222 s 3s 2s 1 with respect to 2 a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 3 7 5 x 2 = σc 2 y 2 = σs 2 2 6 4 c 3c 1 0 c 3s 1 0 0 c 3c 1 0 c 3s 1 s 3c 1 0 s 3s 1 0 0 s 3c 1 0 s 3s 1 3 7 5» x2 y 2 2 This is an ordinary linear least squares problem. We then get improved σ, c 2, and s 2 via σ = [ ] [ ] x 2 2 + y 2 c2 x2 2 = /σ s 2 y 2 From Matrix to Tensor From Tensor To Matrix 29 / 68

Alternating Least Squares Freeze c 1, s 1, c 2 and s 2 and minimize 2 3 2 a 111 a 211 a 121 φ = a 221 a 112 σ 6 a 212 7 6 4 a 122 5 4 a 222 with respect to c 3c 2c 1 c 3c 2s 1 c 3s 2c 1 c 3s 2s 1 s 3c 2c 1 s 3c 2s 1 s 3s 2c 1 s 3s 2s 1 3 7 5 2 2 3 a 111 a 211 a 121 = a 221 a 112 6 a 212 7 4 a 122 5 a 222 x 3 = σc 3 y 3 = σs 3 2 6 4 c 2c 1 0 c 2s 1 0 s 2c 1 0 s 2s 1 0 0 c 2s 1 0 c 2s 1 0 s 2c 1 0 s 2s 1 3 7 5» x3 y 3 2 This is an ordinary linear least squares problem. We then get improved σ, c 3, and s 3 via σ = [ ] [ ] x 2 3 + y 2 c3 x3 3 = /σ s 3 y 3 From Matrix to Tensor From Tensor To Matrix 30 / 68

Componentwise Optimization A Common Framework for Tensor-Related Optimization: Choose a subset of the unknowns such that if they are (temporarily) fixed, then we are presented with some standard matrix problem in the remaining unknowns. By choosing different subsets, cycle through all the unknowns. Repeat until converged. The standard matrix problem that we end up solving is usually some kind of linear least squares problem. From Matrix to Tensor From Tensor To Matrix 31 / 68

We Are Now Ready For This! U T V = That is, we are ready to look at SVD ideas at the tensor level. From Matrix to Tensor From Tensor To Matrix 32 / 68

The Higher-Order SVD Motivation: In the matrix case, if A IR n 1 n 2 and A = U 1 SU T 2, then vec(a) = n n S(j 1, j 2 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1 =1 j 2 =1 We are able to choose orthogonal U 1 and U 2 so that S = U T 1 AU 2 is diagonal. From Matrix to Tensor From Tensor To Matrix 33 / 68

The Higher-Order SVD Definition: Given A IR n 1 n 2 n 3, compute the SVDs of the modal unfoldings A (1) = U 1 Σ 1 V T 1 A (2) = U 2 Σ 2 V T 2 A (3) = U 3 Σ 3 V T 3 and then compute S IR n 1 n 2 n 3 so that vec(a) = n 1 j 1 =1 n 2 j 2 =1 n 3 j 3 =1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) From Matrix to Tensor From Tensor To Matrix 34 / 68

Recall... The mode-1, mode-2, and mode-3 unfoldings of A IR 4 3 2 : A (1) = a 111 a 121 a 131 a 112 a 122 a 132 a 211 a 221 a 231 a 212 a 222 a 232 a 311 a 321 a 331 a 312 a 322 a 332 a 411 a 421 a 431 a 412 a 422 a 432 (1,1) (2,1) (3,1) (1,2) (2,2) (3,2) A (2) = a 111 a 211 a 311 a 411 a 112 a 212 a 312 a 412 a 121 a 221 a 321 a 421 a 122 a 222 a 322 a 422 a 131 a 231 a 331 a 431 a 132 a 232 a 332 a 432 A (3) = (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) [ a111 a 211 a 311 a 411 a 121 a 221 a 321 a 421 a 131 a 231 a 331 a 431 a 112 a 212 a 312 a 412 a 122 a 222 a 322 a 422 a 132 a 232 a 332 a 432 (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) (1,3) (2,3) (3,3) (4,3) ] From Matrix to Tensor From Tensor To Matrix 35 / 68

The Truncated Higher-Order SVD The HO-SVD: vec(a) = n 1 j 1 =1 n 2 j 2 =1 n 3 j 3 =1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) The core tensor S is not diagonal, but its entries get smaller as you move away from the (1,1,1) entry. The Truncated HO-SVD: vec(a) = r 1 j 1 =1 r 2 j 2 =1 r 3 j 3 =1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) From Matrix to Tensor From Tensor To Matrix 36 / 68

The Tucker Nearness Problem Assume that A IR n1 n2 n3. Given integers r 1, r 2 and r 3 compute U 1 : n 1 r 1, orthonormal columns U 2 : n 2 r 2, orthonormal columns U 3 : n 3 r 3, orthonormal columns and tensor S IR r1 r2 r3 so that r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) is minimized. j 1=1 j 2=1 j 3=1 2 From Matrix to Tensor From Tensor To Matrix 37 / 68

Componentwise Optimization 1. Fix U 2 and U 3 and minimize with respect to S and U 1 : r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1=1 j 2=1 j 3=1 2. Fix U 1 and U 3 and minimize with respect to S and U 2 : r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1=1 j 2=1 j 3=1 2 2 3. Fix U 1 and U 2 and minimize with respect to S and U 3 : r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1=1 j 2=1 j 3=1 2 From Matrix to Tensor From Tensor To Matrix 38 / 68

The CP-Decomposition It also goes by the name of the CANDECOMP/PARAFAC Decomposition. CANDECOMP = Canonical Decomposition PARAFAC = Parallel Factors Decomposition From Matrix to Tensor From Tensor To Matrix 39 / 68

A Different Kind of Rank-1 Summation The Tucker representation vec(a) = r 1 j 1=1 r 2 j 2=1 r 3 j 3=1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) uses orthogonal U 1, U 2, and U 3. The CP representation r vec(a) = λ j U 3 (:, j) U 2 (:, j) U 1 (:, j) j=1 uses nonorthogonal U 1, U 2, and U 3. The smallest possible r is called the rank of A. From Matrix to Tensor From Tensor To Matrix 40 / 68

Tensor Rank is Trickier than Matrix Rank If a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 rank = 2 with prob 79% = randn(8,1), then rank = 3 with prob 21% This is Different from the Matrix Case If A = randn(n,n), then rank(a) = n with probability 1. From Matrix to Tensor From Tensor To Matrix 41 / 68

Componentwise Optimization Fix r rank(a) and minimize: r vec(a) λ j U 3 (:, j) U 2 (:, j) U 1 (:, j) j=1 2 Improve U 1 and the λ j by fixing U 2 and U 3 and minimizing r vec(a) λ j U 3 (:, j) U 2 (:, j) U 1 (:, j) j=1 2 Etc. The component optimizations are highly structured least squares problems. From Matrix to Tensor From Tensor To Matrix 42 / 68

The Tensor Train Decomposition Idea: Approximate a high-order tensor with a collection of order-3 tensors. Each order-3 tensor is connected to its left and right neighbor through a simple summation. An example of a tensor network. From Matrix to Tensor From Tensor To Matrix 43 / 68

Tensor Train: An Example Given the carriages... G 1 : n 1 r 1 G 2 : r 1 n 2 r 2 G 3 : r 2 n 3 r 3 G 4 : r 3 n 4 r 4 G 5 : r 4 n 5 We define the train A(1:n 1, 1:n 2, 1:n 3, 1:n 4, 1:n 5 )... r 1 r 2 r 3 r 4 k 1=1 k 2=1 k 3=1 k 4=1 A(i 1, i 2, i 3, i 4, i 5 ) = G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) From Matrix to Tensor From Tensor To Matrix 44 / 68

Tensor Train: An Example Given the carriages... G 1 : G 2 : G 3 : G 4 : n 1 r r n 2 r r n 3 r r n 4 r G 5 : r n 5 A(i 1, i 2, i 3, i 4, i 5 ) r r r r G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) k 1=1 k 2=1 k 3=1 k 4=1 Data Sparse: O(nr 2 ) instead of O(n 5 ). From Matrix to Tensor From Tensor To Matrix 49 / 68

The Kronecker Product SVD A way to obtain a data sparse representation of an order-4 tensor. It is based on the Kronecker product of matrices, e.g., u 11 u 12 u 11 V u 12 V A = u 21 u 22 V = u 21 V u 22 V u 31 u 32 u 31 V u 32 V and the fact that an order-4 tensor is a reshaped block matrix, e.g., A(i 1, i 2, i 3, i 4 ) = U(i 1, i 2 )V (i 3, i 4 ) From Matrix to Tensor From Tensor To Matrix 50 / 68

Kronecker Products are Data Sparse If B and C are n-by-n, then B C is n 2 -by-n 2. = Thus, we need O(n 2 ) numbers to describe an O(n 4 ) object. From Matrix to Tensor From Tensor To Matrix 51 / 68

The Nearest Kronecker Product Problem Find B and C so that A B C F = min: a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 b a 31 a 32 a 33 a 34 11 b 12 [ ] c11 c a 41 a 42 a 43 a 44 b 21 b 22 12 c 21 c 22 a 51 a 52 a 53 a 54 b 31 b 32 a 61 a 62 a 63 a 64 a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 = b 11 b 21 b 31 b 12 b 22 b 32 F [ c11 c 21 c 12 c 22 ] F From Matrix to Tensor From Tensor To Matrix 52 / 68

The Kronecker Product SVD If A 11 A 1n A =..... A n1 A nn A ij IR n n then there exist U 1,..., U r IR n n, V 1,..., V r IR n n, and scalars σ 1 σ r > 0 such that A = r σ k U k V k. k=1 From Matrix to Tensor From Tensor To Matrix 53 / 68

A Tensor Approximation Idea Unfold A IR n n n n into an n 2 -by-n 2 matrix A. Express A as a sum of Kronecker products: A = r σ k B k C k k=1 B k, C k IR n n Back to tensor: A(i 1, i 2, j 1, j 2 ) = r σ k C k (i 1, i 2 )B k (j 1, j 2 ) k=1 Sums of tensor products of matrices instead of vectors. O(n 2 r) From Matrix to Tensor From Tensor To Matrix 54 / 68

The Higher-Order Generalized Singular Value Decomposition We are given a collection of m-by-n data matrices {A 1,..., A N } each of which has full column rank. Do an SVD thing on each of them simultaneously: A 1 = U 1 Σ 1 V T. A N = U N Σ N V T that exposes common features. From Matrix to Tensor From Tensor To Matrix 55 / 68

The 2-Matrix GSVD If A 1 = A 2 = then there exist orthogonal U 1, orthogonal U 2 and nonsingular X so that c 1 0 0 s 1 0 0 U1 T 0 c 2 0 A 1 X = Σ 1 = 0 0 c 3 U T 0 s 2 0 2 A 2 X = Σ 2 = 0 0 s 3 0 0 0 0 0 0 0 0 0 0 0 0 From Matrix to Tensor From Tensor To Matrix 56 / 68

The Higher-Order GSVD Framework 1. Compute V 1 S N V = diag(λ i ) where S N = 1 N(N 1) N N i=1 j=i+1 ( (A T i A i )(A T j A j ) 1 + (A T j A j )(A T i A i ) 1). 2. For k = 1:N compute A k V T = U k Σ k where the U k have unit 2-norm columns and the Σ k are diagonal. The eigenvalues of S are never smaller than 1. From Matrix to Tensor From Tensor To Matrix 57 / 68

The Common HO-GSVD Subspace: Definition The eigenvectors associated with the unit eigenvalues of S N common HO-GSVD subspace: define the HO-GSVD(A 1,..., A N ) = { v : S N v = v } We are able to stably compute this without ever forming S explicitly. A sequence of 2-matrix GSVDs. From Matrix to Tensor From Tensor To Matrix 58 / 68

The Common HO-GSVD Subspace: Relevance In general, we have these rank-1 expansions A k = U k Σ k V T = where V = [v 1,..., v n ]. n i=1 σ (k) i u (k) i v T i k = 1:N But if (say) the HO-GSVD(A 1,..., A N ) = span{v 1, v 2 }, then A k and {u (k) = σ 1 u (k) 1 v T 1 + σ 2 u (k) 2 v T 2 + 1, u(k) 2 n i=3 σ (k) i u (k) i v T i k = 1:N } is an orthonormal basis for span{u(k) 3,..., u(k) n }. Moreover, u (k) 1 and u (k) 2 are left singular vectors for A k. This expansion identifies features that are common across the datasets A 1,..., A N. From Matrix to Tensor From Tensor To Matrix 59 / 68

The Pivoted Cholesky Decomposition PAP T = 1 0 0 0 0 0 0 0 d 0 0 0 0 0 0 0 1 x x x x x x x x 1 0 0 0 0 0 0 0 d 0 0 0 0 0 0 0 1 x x x x x x x x 1 0 0 0 0 0 0 0 d 0 0 0 0 0 0 0 1 x x x x x x x x 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 x x x 0 1 0 0 0 0 0 0 x 0 0 0 0 1 0 0 0 x x x 0 0 1 0 0 0 0 0 x 0 0 0 0 0 1 0 0 x x x 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 x x x 0 0 0 0 1 0 0 0 x 0 0 0 0 0 0 0 1 We will use this on a problem where the tensor has multiple symmetries and unfolds to a highly structured positive semidefinite matrix with multiple symmetries. From Matrix to Tensor From Tensor To Matrix 60 / 68

The Two-Electron Integral Tensor (TEI) Given a basis {φ i (r)} n i=1 of atomic orbital functions, we consider the following order-4 tensor: φ p (r 1 )φ q (r 1 )φ r (r 2 )φ s (r 2 ) A(p, q, r, s) = dr 1 dr 2. R 3 R 3 r 1 r 2 The TEI tensor plays an important role in electronic structure theory and ab initio quantum chemistry. The TEI tensor has these symmetries: A(q, p, r, s) A(p, q, r, s) = A(p, q, s, r) A(r, s, p, q) (i) (ii) (iii) We say that A is ((12)(34))-symmetric. From Matrix to Tensor From Tensor To Matrix 61 / 68

The [1, 2] [3, 4] Unfolding of a ((12)(34)) Symmetric A If A = A [1,2] [3,4], then A is symmetric and (among other things) is perfect shuffle symmetric. A = 11 12 13 12 14 15 13 15 16 12 17 18 17 19 20 18 20 21 13 18 22 18 23 24 22 24 25 12 17 18 17 19 20 18 20 21 14 19 23 19 26 27 23 27 28 15 20 24 20 27 29 24 29 30 13 18 22 18 23 24 22 24 25 15 20 24 20 27 29 24 29 30 16 21 25 21 28 30 25 30 31 Each column reshapes into a 3x3 symmetric matrix, e.g., A(:, ) reshapes to 11 12 13 12 14 15 13 15 16 What is perfect shuffle symmetry? From Matrix to Tensor From Tensor To Matrix 62 / 68

Perfect Shuffle Symmetry An n 2 -by-n 2 matrix A has perfect shuffle symmetry if where A = Π n,n AΠ n,n Π n,n = I n 2(:, v), v = [ 1:n:n 2 2:n:n 2 n:n:n 2 ]. e.g., Π 3,3 = 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 From Matrix to Tensor From Tensor To Matrix 63 / 68

Structured Low-Rank Approximation We have an n 2 -by-n 2 matrix A that is symmetric and perfect shuffle symmetric and it basically has rank n. Using PAP T = LDL T we are able to write A = n d k u k uk T k=1 where each rank-1 is symmetric and perfect shuffle symmetric. This structured data-sparse representation reduces work by an order of magnitude in the application we are considering. From Matrix to Tensor From Tensor To Matrix 64 / 68

Notation: The Challenge Scientific computing is increasingly tensor-based. It is hard to spread the word about tensor computations because summations, transpositions, and symmetries are typically described through multiple indices. And different camps have very different notations, e.g. t i 1i 2 i 3 i 4 i 5 = a i 1 j1 b i 2 j1 j 2 c i 2 j2 j 3 d i 2 j 3 j 4 e i 2 j4 From Matrix to Tensor From Tensor To Matrix 65 / 68

Brevity is the Soul of Wit Multiple Summations n j=1 n 1 j 1 =1 n d j d =1 Transposition If T = [2 1 4 3] then B = A T means B(i 1, i 2, i 3, i 4 ) = A(i 2, i 1, i 4, i 3 ) Contractions For all 1 i m and 1 j n: A(i, j) = p B(i, k)c(k, j) k=1 From Matrix to Tensor From Tensor To Matrix 66 / 68

From Jacobi s 1846 Eigenvalue Paper A system of linear equations: (a, a)α + (a, b)β + (a, c)γ + + (a, p) ω = α x (b, a)α + (b, b)β + (b, c)γ + + (b, p) ω = β x (p, a)α + (p, b)β + (p, c)γ + + (p, p) ω = ω x Somewhere between 1846 and the present we picked up conventional matrix-vector notation: Ax = b How did the transition from scalar notation to matrix-vector notation happen? From Matrix to Tensor From Tensor To Matrix 67 / 68

The Next Big Thing... Scalar-Level Thinking 1960 s Matrix-Level Thinking The factorization paradigm: LU, LDL T, QR, UΣV T, etc. 1980 s Block Matrix-Level Thinking 2000 s Tensor-Level Thinking Cache utilization, parallel computing, LAPACK, etc. High-dimensional modeling, cheap storage, good notation etc. From Matrix to Tensor From Tensor To Matrix 68 / 68