From Matrix to Tensor. Charles F. Van Loan

Similar documents
1. Connecting to Matrix Computations. Charles F. Van Loan

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

Lecture 4. CP and KSVD Representations. Charles F. Van Loan

Lecture 2. Tensor Iterations, Symmetries, and Rank. Charles F. Van Loan

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

Tensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Lecture 2. Tensor Unfoldings. Charles F. Van Loan

Matrix decompositions

The Singular Value Decomposition

Applied Numerical Linear Algebra. Lecture 8

Algebra C Numerical Linear Algebra Sample Exam Problems

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Fundamentals of Multilinear Subspace Learning

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Math 671: Tensor Train decomposition methods

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Linear Algebra Review. Vectors

Homework 2 Foundations of Computational Math 2 Spring 2019

The Singular Value Decomposition and Least Squares Problems

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Numerical Linear Algebra

Linear Algebra in Actuarial Science: Slides to the lecture

TBP MATH33A Review Sheet. November 24, 2018

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Computational Methods. Eigenvalues and Singular Values

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

14 Singular Value Decomposition

Lecture 2: Linear Algebra Review

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Math 671: Tensor Train decomposition methods II

Applied Linear Algebra in Geoscience Using MATLAB

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Conceptual Questions for Review

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

Large Scale Data Analysis Using Deep Learning

Parallel Singular Value Decomposition. Jiaxing Tan

Singular Value Decomposition

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry

Lecture 3: Review of Linear Algebra

Matrix Multiplication Chapter IV Special Linear Systems

Lecture 3: Review of Linear Algebra

Problem set 5: SVD, Orthogonal projections, etc.

Review of Some Concepts from Linear Algebra: Part 2

B553 Lecture 5: Matrix Algebra Review

Linear Algebra. Session 12

Linear Algebra and Matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Lecture: Face Recognition and Feature Reduction

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

A Review of Linear Algebra

Cheat Sheet for MATH461

Notes on Eigenvalues, Singular Values and QR

ECS130 Scientific Computing Handout E February 13, 2017

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Symmetric Matrices and Eigendecomposition

Tensor Decompositions and Applications

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

. = V c = V [x]v (5.1) c 1. c k

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Lecture: Face Recognition and Feature Reduction

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

EXAM. Exam 1. Math 5316, Fall December 2, 2012

Lecture 6, Sci. Comp. for DPhil Students

Linear Algebra- Final Exam Review

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

We will discuss matrix diagonalization algorithms in Numerical Recipes in the context of the eigenvalue problem in quantum mechanics, m A n = λ m

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Assignment 2 (Sol.) Introduction to Machine Learning Prof. B. Ravindran

EE731 Lecture Notes: Matrix Computations for Signal Processing

Chapter 1. Matrix Algebra

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Lecture 10: Eigenvectors and eigenvalues (Numerical Recipes, Chapter 11)

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

3D Computer Vision - WT 2004

CSL361 Problem set 4: Basic linear algebra

Solution of Linear Equations

Problem # Max points possible Actual score Total 120

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)

MATH 581D FINAL EXAM Autumn December 12, 2016

EECS 275 Matrix Computation

Positive Definite Matrix

Properties of Matrices and Operations on Matrices

CS 143 Linear Algebra Review

Orthonormal Transformations and Least Squares

Linear Algebra, part 3 QR and SVD

5.3 The Power Method Approximation of the Eigenvalue of Largest Module

Econ Slides from Lecture 7

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Singular Value Decomposition

Transcription:

From Matrix to Tensor Charles F. Van Loan Department of Computer Science January 28, 2016 From Matrix to Tensor From Tensor To Matrix 1 / 68

What is a Tensor? Instead of just A(i, j) it s A(i, j, k) or A(i 1, i 2,..., i d ) From Matrix to Tensor From Tensor To Matrix 2 / 68

Where Might They Come From? Discretization A(i, j, k, l) might house the value of f (w, x, y, z) at (w, x, y, z) = (w i, x j, y k, z l ). High-Dimension Evaluations Given a basis {φ i (r)} n i=1 A(p, q, r, s) = Multiway Analysis R 3 φ p (r 1 )φ q (r 1 )φ r (r 2 )φ s (r 2 ) R 3 r 1 r 2 dr 1 dr 2. A(i, j, k, l) is a value that captures an interaction between four variables/factors. From Matrix to Tensor From Tensor To Matrix 3 / 68

You May Have Seen them Before... Here is a 3x3 block matrix with 2x2 blocks: A = a 11 a 12 a 13 a 14 a 15 a 16 a 21 a 22 a 23 a 24 a 25 a 26 a 31 a 32 a 33 a 34 a 35 a 36 a 41 a 42 a 43 a 44 a 45 a 46 a 51 a 52 a 53 a 54 a 55 a 56 a 61 a 62 a 63 a 64 a 65 a 66 This is a reshaping of a 2 2 3 3 tensor: Matrix entry a 45 is the (2,1) entry of the (2,3) block. Matrix entry a 45 is A(2, 3, 2, 1). From Matrix to Tensor From Tensor To Matrix 4 / 68

A Tensor Has Parts A matrix has columns and rows. A tensor has fibers. A fiber of a tensor A is a vector obtained by fixing all but one A s indices. Given A = A(1:3, 1:5, 1:4, 1:7), here is a mode-2 fiber: A(2, 1:5, 4, 6) = This is the (2,4,6) mode-2 fiber. A(2, 1, 4, 6) A(2, 2, 4, 6) A(2, 3, 4, 6) A(2, 4, 4, 6) A(2, 5, 4, 6) From Matrix to Tensor From Tensor To Matrix 5 / 68

Fibers Can Be Assembled Into a Matrix The mode-1, mode-2, and mode-3 unfoldings of A IR 4 3 2 : A (1) = a 111 a 121 a 131 a 112 a 122 a 132 a 211 a 221 a 231 a 212 a 222 a 232 a 311 a 321 a 331 a 312 a 322 a 332 a 411 a 421 a 431 a 412 a 422 a 432 (1,1) (2,1) (3,1) (1,2) (2,2) (3,2) A (2) = a 111 a 211 a 311 a 411 a 112 a 212 a 312 a 412 a 121 a 221 a 321 a 421 a 122 a 222 a 322 a 422 a 131 a 231 a 331 a 431 a 132 a 232 a 332 a 432 A (3) = (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) [ a111 a 211 a 311 a 411 a 121 a 221 a 321 a 421 a 131 a 231 a 331 a 431 a 112 a 212 a 312 a 412 a 122 a 222 a 322 a 422 a 132 a 232 a 332 a 432 (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) (1,3) (2,3) (3,3) (4,3) ] From Matrix to Tensor From Tensor To Matrix 6 / 68

There are Many Ways to Unfold a Given Tensor Here is one way to unfold A(1:2, 1:3, 1:2, 1:2, 1:3): B = 2 6 4 (1,1) (2,1) (1,2) (2,2) (1,3) (2,3) a 11111 a 11121 a 11112 a 11122 a 11113 a 11123 a 21111 a 21121 a 21112 a 21122 a 21113 a 21123 a 12111 a 12121 a 12112 a 12122 a 12113 a 12123 a 22111 a 22121 a 22112 a 22122 a 22113 a 22123 a 13111 a 13121 a 13112 a 13122 a 13113 a 13123 a 23111 a 23121 a 23112 a 23122 a 23113 a 23123 a 11211 a 11221 a 11212 a 11222 a 11213 a 11223 a 21211 a 21221 a 21212 a 21222 a 21213 a 21223 a 12211 a 12221 a 12212 a 12222 a 12213 a 12223 a 22211 a 22221 a 22212 a 22222 a 22213 a 22223 a 13211 a 13221 a 13212 a 13222 a 13213 a 13223 a 23211 a 23221 a 23212 a 23222 a 23213 a 23223 3 7 5 (1,1,1) (2,1,1) (1,2,1) (2,2,1) (1,3,1) (2,3,1) (1,1,2) (2,1,2) (1,2,2) (2,2,2) (1,3,2) (2,3,2) With the Matlab Tensor Toolbox: B = tenmat(a,[1 2 3],[4 5]) From Matrix to Tensor From Tensor To Matrix 7 / 68

There are Many Ways to Unfold a Given Tensor tenmat(a,[1 2 3],[4 5]) tenmat(a,[1 2 4],[3 5]) tenmat(a,[1 2 5],[4 5]) tenmat(a,[1 3 4],[2 5]) tenmat(a,[1 3 5],[2 5]) tenmat(a,[1 4 5],[2 3]) tenmat(a,[2 3 4],[1 5]) tenmat(a,[2 3 5],[1 4]) tenmat(a,[2 4 5],[1 3]) tenmat(a,[3 4 5],[1 2]) tenmat(a,[4 5],[1 2 3]) tenmat(a,[3,5],[1 2 4]) tenmat(a,[4 5],[1 2 5]) tenmat(a,[2 5],[1 3 4]) tenmat(a,[2 5],[1 3 5]) tenmat(a,[2 3],[1 4 5]) tenmat(a,[1 5],[2 3 4]) tenmat(a,[1 4],[2 3 5]) tenmat(a,[1 3],[2 4 5]) tenmat(a,[1 2],[3 4 5]) tenmat(a,[1],[2 3 4 5]) tenmat(a,[2],[1 3 4 5]) tenmat(a,[3],[1 2 4 5]) tenmat(a,[4],[1 2 3 5]) tenmat(a,[5],[1 2 3 4]) tenmat(a,[2 3 4 5],[1]) tenmat(a,[1 3 4 5],[2]) tenmat(a,[1 2 4 5],[3]) tenmat(a,[1 2 3 5],[4]) tenmat(a,[1 2 3 4],[5]) Choice makes life complicated... From Matrix to Tensor From Tensor To Matrix 8 / 68

Paradigm for Much of Tensor Computations To say something about a tensor A: 1. Thoughtfully unfold tensor A into a matrix A. 2. Use classical matrix computations to discover something interesting/useful about matrix A. 3. Map your insights back to tensor A. Computing (parts of) decompositions is how we do this in classical matrix computations. From Matrix to Tensor From Tensor To Matrix 9 / 68

Matrix Factorizations and Decompositions A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T A = ULV It s T PAQ T = LUa A = Language UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR From Matrix to Tensor From Tensor To Matrix 10 / 68

Matrix Factorizations and Decompositions A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T A = ULV It s T PAQ T = LUa A = Language UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR A = GG T PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR PAP T = LDL T Q T AQ = D X 1 AX = J U T AU = T AP = QR A = ULV T PAQ T = LU A = UΣV T PA = LU A = QR From Matrix to Tensor From Tensor To Matrix 11 / 68

The Singular Value Decomposition Perhaps the most versatile and important of all the different matrix decompositions is the SVD: [ a11 a 12 a 21 a 22 ] = [ = σ 1 [ c 1 s 1 s 1 c 1 c 1 s 1 ] [ ] [ σ1 0 0 σ 2 c 2 s 2 ] [ c 2 s 2 s 2 c 2 ] T + σ 2 [ s1 c 1 ] T ] [ s2 c 2 ] T = σ 1 [ c 1 s 1 ] [c2 s 2 ] + σ 2 [ s1 c 1 ] [s2 c 2 ] where c 2 1 + s2 1 = 1 and c2 2 + s2 2 = 1. This is a very special sum of rank-1 matrices. From Matrix to Tensor From Tensor To Matrix 12 / 68

Rank-1 Matrices: You have Seen Them Before 1 2 3 4 5 6 7 8 9 2 4 6 8 10 12 14 16 18 3 6 9 12 15 18 21 24 27 4 8 12 16 20 24 28 32 36 T = 5 10 15 20 25 30 35 40 45 6 12 18 24 30 36 42 48 54 7 14 21 28 35 42 49 56 63 8 16 24 32 40 48 56 64 72 9 18 27 36 45 54 63 72 81 From Matrix to Tensor From Tensor To Matrix 13 / 68

Rank-1 Matrices: They Are Data Sparse 1 2 3 4 5 6 7 8 9 2 4 6 8 10 12 14 16 18 3 6 9 12 15 18 21 24 27 4 8 12 16 20 24 28 32 36 T = 5 10 15 20 25 30 35 40 45 = vv T v = 6 12 18 24 30 36 42 48 54 7 14 21 28 35 42 49 56 63 8 16 24 32 40 48 56 64 72 9 18 27 36 45 54 63 72 81 1 2 3 4 5 6 7 8 9 From Matrix to Tensor From Tensor To Matrix 14 / 68

The Matrix SVD Expresses the matrix as a special sum of rank-1 matrices. If A IR n n then A = n σ k u k vk T k=1 Here σ 1 σ 2 σ r > σ r+1 = = σ n = 0 and U = [u 1 u 2 u n ] V = [v 1 v 2 v n ] have columns that are mutually orthogonal. From Matrix to Tensor From Tensor To Matrix 15 / 68

The Matrix SVD: Nearness Problems Expresses the matrix as a special sum of rank-1 matrices. If A IR n n then n A = σ k u k vk T k=1 Here σ 1 σ 2 σ r > σ r+1 = = σ n = 0 and U = [u 1 u 2 u n ] V = [v 1 v 2 v n ] have columns that are mutually orthogonal. That s how far A is from being rank deficient. From Matrix to Tensor From Tensor To Matrix 16 / 68

The Matrix SVD: Data Sparse Approximation Expresses the matrix as a special sum of rank-1 matrices. If A IR n n then A r k=1 σ k u k v T k = A r Here σ 1 σ 2 σ r > σ r+1 = = σ n = 0 and U = [u 1 u 2 u n ] V = [v 1 v 2 v n ] have columns that are mutually orthogonal. That s the closest matrix to A that has rank r. If r << n, then that is a data sparse approximation of A because O(n r) << O(n 2 ). From Matrix to Tensor From Tensor To Matrix 17 / 68

There is a New Definition of Big In Matrix Computations, to say that A IR n 1 n 2 that both n 1 and n 2 are big. E.g., is big is to say n 1 = 500000 n 2 = 100000 In Tensor Computations, to say that A IR n 1 n d is big is to say that n 1 n 2 n d is big and this need not require big n k. E.g. n 1 = n 2 = = n 1000 = 2. From Matrix to Tensor From Tensor To Matrix 18 / 68

Why Data Sparse Tensor Approximation is Important 1. If you want to see this Matrix-Based Scientific Computation Tensor-Based Scientific Computation you will need tensor algorithms that scale with d. 2. This requires a framework for low-rank tensor approximation. 3. This requires some kind of tensor-level SVD. From Matrix to Tensor From Tensor To Matrix 19 / 68

What is a Rank-1 Tensor? Think Matrix First This: [ ] r11 r 12 R = = fg T = r 21 r 22 [ f1 f 2 ] [ f1 g 1 f 1 g 2 [g 1 g 2 ] = f 2 g 1 f 2 g 2 ] Is the same as this: Is the same as this: vec(r) = vec(r) = r 11 r 21 r 12 r 22 r 11 r 21 r 12 r 22 = = [ g1 g 2 g 1 f 1 g 1 f 2 g 2 f 1 g 2 f 2 ] [ f1 f 2 ] From Matrix to Tensor From Tensor To Matrix 20 / 68

The Kronecker Product of Vectors x y = x 1 x 2 x 3 [ y1 y 2 ] = x 1 y 1 x 1 y 2 x 2 y 1 x 2 y 2 x 3 y 1 x 3 y 2 = x 1 y x 2 y x 3 y From Matrix to Tensor From Tensor To Matrix 21 / 68

So What is a Rank-1 Tensor? R IR 2 2 2 is rank-1 if there exist f, g, h IR 2 such that vec(r) = r 111 r 211 r 121 r 221 r 112 r 212 r 122 r 222 = h 1 g 1 f 1 h 1 g 1 f 2 h 1 g 2 f 1 h 1 g 2 f 2 h 2 g 1 f 1 h 2 g 1 f 2 h 2 g 2 f 1 h 2 g 2 f 2 = [ h1 h 2 ] [ g1 g 2 ] [ f1 f 2 ] r ijk = h k g j f i From Matrix to Tensor From Tensor To Matrix 22 / 68

What Might a Tensor SVD Look Like? vec(r) = r 111 r 211 r 121 r 221 r 112 r 212 r 122 r 222 A special sum of rank-1 tensors. = h (1) g (1) f (1) +h (2) g (2) f (2) +h (3) g (3) f (3) From Matrix to Tensor From Tensor To Matrix 23 / 68

What Does the Matrix SVD Look Like? This: [ a11 a 12 a 21 a 22 ] = [ u11 u 12 u 21 u 22 ] [ σ1 0 0 σ 2 ] [ v11 v 12 v 21 v 22 ] T = σ 1 [ u11 u 21 ] [ v11 v 21 ] T + σ 2 [ u12 u 22 ] [ v12 v 22 ] T Is the same as this: a 11 a 21 a 12 a 22 = σ 1 = σ 1 [ v11 v 21 v 11 u 11 v 11 u 21 v 21 u 11 v 21 u 21 ] + σ 2 [ u11 u 21 ] v 12 u 12 v 12 u 22 v 22 u 12 v 22 u 22 + σ 2 [ v12 v 22 ] [ u12 u 22 ] From Matrix to Tensor From Tensor To Matrix 24 / 68

What Might a Tensor SVD Look Like? vec(r) = r 111 r 211 r 121 r 221 r 112 r 212 r 122 r 222 = h (1) g (1) f (1) + h (2) g (2) f (2) + h (3) g (3) f (3). A special sum of rank-1 tensors. Getting that special sum often requires multilinear optimiziation. We better understand that before we proceed. From Matrix to Tensor From Tensor To Matrix 25 / 68

A Nearest Rank-1 Tensor Problem Find σ 0 and [ ] [ ] [ c1 cos(θ1 ) c2 = sin(θ 1 ) s 1 s 2 ] = [ cos(θ2 ) sin(θ 2 ) ] [ c3 s 3 ] = [ cos(θ3 ) sin(θ 3 ) ] so that φ(σ, θ 1, θ 2, θ 3 ) = a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 [ c3 σ s 3 ] [ c2 s 2 ] [ c1 s 1 ] 2 is minimized. From Matrix to Tensor From Tensor To Matrix 26 / 68

A Nearest Rank-1 Tensor Problem Find σ 0 and [ ] [ ] [ c1 cos(θ1 ) c2 = sin(θ 1 ) s 1 s 2 ] = [ cos(θ2 ) sin(θ 2 ) ] [ c3 s 3 ] = [ cos(θ3 ) sin(θ 3 ) ] so that is minimized. φ(σ, θ 1, θ 2, θ 3 ) = a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 σ c 3 c 2 c 1 c 3 c 2 s 1 c 3 s 2 c 1 c 3 s 2 s 1 s 3 c 2 c 1 s 3 c 2 s 1 s 3 s 2 c 1 s 3 s 2 s 1 2 From Matrix to Tensor From Tensor To Matrix 27 / 68

Alternating Least Squares Freeze c 2, s 2, c 3 and s 3 and minimize 2 3 2 a 111 a 211 a 121 φ = a 221 a 112 σ 6 a 212 7 6 4 a 122 5 4 a 222 with respect to c 3c 2c 1 c 3c 2s 1 c 3s 2c 1 c 3s 2s 1 s 3c 2c 1 s 3c 2s 1 s 3s 2c 1 s 3s 2s 1 3 7 5 2 2 3 a 111 a 211 a 121 = a 221 a 112 6 a 212 7 4 a 122 5 a 222 x 1 = σc 1 y 1 = σs 1 2 6 4 c 3c 2 0 0 c 3c 2 c 3s 2 0 0 c 3s 2 s 3c 2 0 0 s 3c 2 s 3s 2 0 0 s 3s 2 3 7 5» x1 y 1 2 This is an ordinary linear least squares problem. We then get improved σ, c 1, and s 1 via σ = [ ] [ ] x 2 1 + y 2 c1 x1 1 = /σ s 1 y 1 From Matrix to Tensor From Tensor To Matrix 28 / 68

Alternating Least Squares Freeze c 1, s 1, c 3 and s 3 and minimize 2 3 2 3 2 a 111 c 3c 2c 1 a 211 c 3c 2s 1 a 121 c 3s 2c 1 φ = a 221 a 112 σ c 3s 2s 1 s 3c 2c 1 = 6 a 212 7 6 s 3c 2s 1 7 6 4 a 122 5 4 s 3s 2c 1 5 4 a 222 s 3s 2s 1 with respect to 2 a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 3 7 5 x 2 = σc 2 y 2 = σs 2 2 6 4 c 3c 1 0 c 3s 1 0 0 c 3c 1 0 c 3s 1 s 3c 1 0 s 3s 1 0 0 s 3c 1 0 s 3s 1 3 7 5» x2 y 2 2 This is an ordinary linear least squares problem. We then get improved σ, c 2, and s 2 via σ = [ ] [ ] x 2 2 + y 2 c2 x2 2 = /σ s 2 y 2 From Matrix to Tensor From Tensor To Matrix 29 / 68

Alternating Least Squares Freeze c 1, s 1, c 2 and s 2 and minimize 2 3 2 a 111 a 211 a 121 φ = a 221 a 112 σ 6 a 212 7 6 4 a 122 5 4 a 222 with respect to c 3c 2c 1 c 3c 2s 1 c 3s 2c 1 c 3s 2s 1 s 3c 2c 1 s 3c 2s 1 s 3s 2c 1 s 3s 2s 1 3 7 5 2 2 3 a 111 a 211 a 121 = a 221 a 112 6 a 212 7 4 a 122 5 a 222 x 3 = σc 3 y 3 = σs 3 2 6 4 c 2c 1 0 c 2s 1 0 s 2c 1 0 s 2s 1 0 0 c 2s 1 0 c 2s 1 0 s 2c 1 0 s 2s 1 3 7 5» x3 y 3 2 This is an ordinary linear least squares problem. We then get improved σ, c 3, and s 3 via σ = [ ] [ ] x 2 3 + y 2 c3 x3 3 = /σ s 3 y 3 From Matrix to Tensor From Tensor To Matrix 30 / 68

Componentwise Optimization A Common Framework for Tensor-Related Optimization: Choose a subset of the unknowns such that if they are (temporarily) fixed, then we are presented with some standard matrix problem in the remaining unknowns. By choosing different subsets, cycle through all the unknowns. Repeat until converged. The standard matrix problem that we end up solving is usually some kind of linear least squares problem. From Matrix to Tensor From Tensor To Matrix 31 / 68

We Are Now Ready For This! U T V = That is, we are ready to look at SVD ideas at the tensor level. From Matrix to Tensor From Tensor To Matrix 32 / 68

The Higher-Order SVD Motivation: In the matrix case, if A IR n 1 n 2 and A = U 1 SU T 2, then vec(a) = n n S(j 1, j 2 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1 =1 j 2 =1 We are able to choose orthogonal U 1 and U 2 so that S = U T 1 AU 2 is diagonal. From Matrix to Tensor From Tensor To Matrix 33 / 68

The Higher-Order SVD Definition: Given A IR n 1 n 2 n 3, compute the SVDs of the modal unfoldings A (1) = U 1 Σ 1 V T 1 A (2) = U 2 Σ 2 V T 2 A (3) = U 3 Σ 3 V T 3 and then compute S IR n 1 n 2 n 3 so that vec(a) = n 1 j 1 =1 n 2 j 2 =1 n 3 j 3 =1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) From Matrix to Tensor From Tensor To Matrix 34 / 68

Recall... The mode-1, mode-2, and mode-3 unfoldings of A IR 4 3 2 : A (1) = a 111 a 121 a 131 a 112 a 122 a 132 a 211 a 221 a 231 a 212 a 222 a 232 a 311 a 321 a 331 a 312 a 322 a 332 a 411 a 421 a 431 a 412 a 422 a 432 (1,1) (2,1) (3,1) (1,2) (2,2) (3,2) A (2) = a 111 a 211 a 311 a 411 a 112 a 212 a 312 a 412 a 121 a 221 a 321 a 421 a 122 a 222 a 322 a 422 a 131 a 231 a 331 a 431 a 132 a 232 a 332 a 432 A (3) = (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) [ a111 a 211 a 311 a 411 a 121 a 221 a 321 a 421 a 131 a 231 a 331 a 431 a 112 a 212 a 312 a 412 a 122 a 222 a 322 a 422 a 132 a 232 a 332 a 432 (1,1) (2,1) (3,1) (4,1) (1,2) (2,2) (3,2) (4,2) (1,3) (2,3) (3,3) (4,3) ] From Matrix to Tensor From Tensor To Matrix 35 / 68

The Truncated Higher-Order SVD The HO-SVD: vec(a) = n 1 j 1 =1 n 2 j 2 =1 n 3 j 3 =1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) The core tensor S is not diagonal, but its entries get smaller as you move away from the (1,1,1) entry. The Truncated HO-SVD: vec(a) = r 1 j 1 =1 r 2 j 2 =1 r 3 j 3 =1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) From Matrix to Tensor From Tensor To Matrix 36 / 68

The Tucker Nearness Problem Assume that A IR n1 n2 n3. Given integers r 1, r 2 and r 3 compute U 1 : n 1 r 1, orthonormal columns U 2 : n 2 r 2, orthonormal columns U 3 : n 3 r 3, orthonormal columns and tensor S IR r1 r2 r3 so that r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) is minimized. j 1=1 j 2=1 j 3=1 2 From Matrix to Tensor From Tensor To Matrix 37 / 68

Componentwise Optimization 1. Fix U 2 and U 3 and minimize with respect to S and U 1 : r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1=1 j 2=1 j 3=1 2. Fix U 1 and U 3 and minimize with respect to S and U 2 : r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1=1 j 2=1 j 3=1 2 2 3. Fix U 1 and U 2 and minimize with respect to S and U 3 : r 1 vec(a) r 2 r 3 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) j 1=1 j 2=1 j 3=1 2 From Matrix to Tensor From Tensor To Matrix 38 / 68

The CP-Decomposition It also goes by the name of the CANDECOMP/PARAFAC Decomposition. CANDECOMP = Canonical Decomposition PARAFAC = Parallel Factors Decomposition From Matrix to Tensor From Tensor To Matrix 39 / 68

A Different Kind of Rank-1 Summation The Tucker representation vec(a) = r 1 j 1=1 r 2 j 2=1 r 3 j 3=1 S(j 1, j 2, j 3 ) U 3 (:, j 3 ) U 2 (:, j 2 ) U 1 (:, j 1 ) uses orthogonal U 1, U 2, and U 3. The CP representation r vec(a) = λ j U 3 (:, j) U 2 (:, j) U 1 (:, j) j=1 uses nonorthogonal U 1, U 2, and U 3. The smallest possible r is called the rank of A. From Matrix to Tensor From Tensor To Matrix 40 / 68

Tensor Rank is Trickier than Matrix Rank If a 111 a 211 a 121 a 221 a 112 a 212 a 122 a 222 rank = 2 with prob 79% = randn(8,1), then rank = 3 with prob 21% This is Different from the Matrix Case If A = randn(n,n), then rank(a) = n with probability 1. From Matrix to Tensor From Tensor To Matrix 41 / 68

Componentwise Optimization Fix r rank(a) and minimize: r vec(a) λ j U 3 (:, j) U 2 (:, j) U 1 (:, j) j=1 2 Improve U 1 and the λ j by fixing U 2 and U 3 and minimizing r vec(a) λ j U 3 (:, j) U 2 (:, j) U 1 (:, j) j=1 2 Etc. The component optimizations are highly structured least squares problems. From Matrix to Tensor From Tensor To Matrix 42 / 68

The Tensor Train Decomposition Idea: Approximate a high-order tensor with a collection of order-3 tensors. Each order-3 tensor is connected to its left and right neighbor through a simple summation. An example of a tensor network. From Matrix to Tensor From Tensor To Matrix 43 / 68

Tensor Train: An Example Given the carriages... G 1 : n 1 r 1 G 2 : r 1 n 2 r 2 G 3 : r 2 n 3 r 3 G 4 : r 3 n 4 r 4 G 5 : r 4 n 5 We define the train A(1:n 1, 1:n 2, 1:n 3, 1:n 4, 1:n 5 )... r 1 r 2 r 3 r 4 k 1=1 k 2=1 k 3=1 k 4=1 A(i 1, i 2, i 3, i 4, i 5 ) = G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) From Matrix to Tensor From Tensor To Matrix 44 / 68

Tensor Train: An Example Given the carriages... G 1 : n 1 r 1 G 2 : r 1 n 2 r 2 G 3 : r 2 n 3 r 3 G 4 : r 3 n 4 r 4 G 5 : r 4 n 5 We define the train A(1:n 1, 1:n 2, 1:n 3, 1:n 4, 1:n 5 )... r 1 r 2 r 3 r 4 k 1=1 k 2=1 k 3=1 k 4=1 A(i 1, i 2, i 3, i 4, i 5 ) = G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) From Matrix to Tensor From Tensor To Matrix 45 / 68

Tensor Train: An Example Given the carriages... G 1 : n 1 r 1 G 2 : r 1 n 2 r 2 G 3 : r 2 n 3 r 3 G 4 : r 3 n 4 r 4 G 5 : r 4 n 5 We define the train A(1:n 1, 1:n 2, 1:n 3, 1:n 4, 1:n 5 )... r 1 r 2 r 3 r 4 k 1=1 k 2=1 k 3=1 k 4=1 A(i 1, i 2, i 3, i 4, i 5 ) = G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) From Matrix to Tensor From Tensor To Matrix 46 / 68

Tensor Train: An Example Given the carriages... G 1 : n 1 r 1 G 2 : r 1 n 2 r 2 G 3 : r 2 n 3 r 3 G 4 : r 3 n 4 r 4 G 5 : r 4 n 5 We define the train A(1:n 1, 1:n 2, 1:n 3, 1:n 4, 1:n 5 )... r 1 r 2 r 3 r 4 k 1=1 k 2=1 k 3=1 k 4=1 A(i 1, i 2, i 3, i 4, i 5 ) = G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) From Matrix to Tensor From Tensor To Matrix 47 / 68

Tensor Train: An Example Given the carriages... G 1 : n 1 r 1 G 2 : r 1 n 2 r 2 G 3 : r 2 n 3 r 3 G 4 : r 3 n 4 r 4 G 5 : r 4 n 5 We define the train A(1:n 1, 1:n 2, 1:n 3, 1:n 4, 1:n 5 )... r 1 r 2 r 3 r 4 k 1=1 k 2=1 k 3=1 k 4=1 A(i 1, i 2, i 3, i 4, i 5 ) = G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) From Matrix to Tensor From Tensor To Matrix 48 / 68

Tensor Train: An Example Given the carriages... G 1 : G 2 : G 3 : G 4 : n 1 r r n 2 r r n 3 r r n 4 r G 5 : r n 5 A(i 1, i 2, i 3, i 4, i 5 ) r r r r G 1 (i 1, k 1 ) G 2 (k 1, i 2, k 2 ) G 3 (k 2, i 3, k 3 ) G 4 (k 3, i 4, k 4 ) G 5 (k 4, i 5 ) k 1=1 k 2=1 k 3=1 k 4=1 Data Sparse: O(nr 2 ) instead of O(n 5 ). From Matrix to Tensor From Tensor To Matrix 49 / 68

The Kronecker Product SVD A way to obtain a data sparse representation of an order-4 tensor. It is based on the Kronecker product of matrices, e.g., u 11 u 12 u 11 V u 12 V A = u 21 u 22 V = u 21 V u 22 V u 31 u 32 u 31 V u 32 V and the fact that an order-4 tensor is a reshaped block matrix, e.g., A(i 1, i 2, i 3, i 4 ) = U(i 1, i 2 )V (i 3, i 4 ) From Matrix to Tensor From Tensor To Matrix 50 / 68

Kronecker Products are Data Sparse If B and C are n-by-n, then B C is n 2 -by-n 2. = Thus, we need O(n 2 ) numbers to describe an O(n 4 ) object. From Matrix to Tensor From Tensor To Matrix 51 / 68

The Nearest Kronecker Product Problem Find B and C so that A B C F = min: a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 b a 31 a 32 a 33 a 34 11 b 12 [ ] c11 c a 41 a 42 a 43 a 44 b 21 b 22 12 c 21 c 22 a 51 a 52 a 53 a 54 b 31 b 32 a 61 a 62 a 63 a 64 a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 = b 11 b 21 b 31 b 12 b 22 b 32 F [ c11 c 21 c 12 c 22 ] F From Matrix to Tensor From Tensor To Matrix 52 / 68

The Kronecker Product SVD If A 11 A 1n A =..... A n1 A nn A ij IR n n then there exist U 1,..., U r IR n n, V 1,..., V r IR n n, and scalars σ 1 σ r > 0 such that A = r σ k U k V k. k=1 From Matrix to Tensor From Tensor To Matrix 53 / 68

A Tensor Approximation Idea Unfold A IR n n n n into an n 2 -by-n 2 matrix A. Express A as a sum of Kronecker products: A = r σ k B k C k k=1 B k, C k IR n n Back to tensor: A(i 1, i 2, j 1, j 2 ) = r σ k C k (i 1, i 2 )B k (j 1, j 2 ) k=1 Sums of tensor products of matrices instead of vectors. O(n 2 r) From Matrix to Tensor From Tensor To Matrix 54 / 68

The Higher-Order Generalized Singular Value Decomposition We are given a collection of m-by-n data matrices {A 1,..., A N } each of which has full column rank. Do an SVD thing on each of them simultaneously: A 1 = U 1 Σ 1 V T. A N = U N Σ N V T that exposes common features. From Matrix to Tensor From Tensor To Matrix 55 / 68

The 2-Matrix GSVD If A 1 = A 2 = then there exist orthogonal U 1, orthogonal U 2 and nonsingular X so that c 1 0 0 s 1 0 0 U1 T 0 c 2 0 A 1 X = Σ 1 = 0 0 c 3 U T 0 s 2 0 2 A 2 X = Σ 2 = 0 0 s 3 0 0 0 0 0 0 0 0 0 0 0 0 From Matrix to Tensor From Tensor To Matrix 56 / 68

The Higher-Order GSVD Framework 1. Compute V 1 S N V = diag(λ i ) where S N = 1 N(N 1) N N i=1 j=i+1 ( (A T i A i )(A T j A j ) 1 + (A T j A j )(A T i A i ) 1). 2. For k = 1:N compute A k V T = U k Σ k where the U k have unit 2-norm columns and the Σ k are diagonal. The eigenvalues of S are never smaller than 1. From Matrix to Tensor From Tensor To Matrix 57 / 68

The Common HO-GSVD Subspace: Definition The eigenvectors associated with the unit eigenvalues of S N common HO-GSVD subspace: define the HO-GSVD(A 1,..., A N ) = { v : S N v = v } We are able to stably compute this without ever forming S explicitly. A sequence of 2-matrix GSVDs. From Matrix to Tensor From Tensor To Matrix 58 / 68

The Common HO-GSVD Subspace: Relevance In general, we have these rank-1 expansions A k = U k Σ k V T = where V = [v 1,..., v n ]. n i=1 σ (k) i u (k) i v T i k = 1:N But if (say) the HO-GSVD(A 1,..., A N ) = span{v 1, v 2 }, then A k and {u (k) = σ 1 u (k) 1 v T 1 + σ 2 u (k) 2 v T 2 + 1, u(k) 2 n i=3 σ (k) i u (k) i v T i k = 1:N } is an orthonormal basis for span{u(k) 3,..., u(k) n }. Moreover, u (k) 1 and u (k) 2 are left singular vectors for A k. This expansion identifies features that are common across the datasets A 1,..., A N. From Matrix to Tensor From Tensor To Matrix 59 / 68

The Pivoted Cholesky Decomposition PAP T = 1 0 0 0 0 0 0 0 d 0 0 0 0 0 0 0 1 x x x x x x x x 1 0 0 0 0 0 0 0 d 0 0 0 0 0 0 0 1 x x x x x x x x 1 0 0 0 0 0 0 0 d 0 0 0 0 0 0 0 1 x x x x x x x x 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 x x x 0 1 0 0 0 0 0 0 x 0 0 0 0 1 0 0 0 x x x 0 0 1 0 0 0 0 0 x 0 0 0 0 0 1 0 0 x x x 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 x x x 0 0 0 0 1 0 0 0 x 0 0 0 0 0 0 0 1 We will use this on a problem where the tensor has multiple symmetries and unfolds to a highly structured positive semidefinite matrix with multiple symmetries. From Matrix to Tensor From Tensor To Matrix 60 / 68

The Two-Electron Integral Tensor (TEI) Given a basis {φ i (r)} n i=1 of atomic orbital functions, we consider the following order-4 tensor: φ p (r 1 )φ q (r 1 )φ r (r 2 )φ s (r 2 ) A(p, q, r, s) = dr 1 dr 2. R 3 R 3 r 1 r 2 The TEI tensor plays an important role in electronic structure theory and ab initio quantum chemistry. The TEI tensor has these symmetries: A(q, p, r, s) A(p, q, r, s) = A(p, q, s, r) A(r, s, p, q) (i) (ii) (iii) We say that A is ((12)(34))-symmetric. From Matrix to Tensor From Tensor To Matrix 61 / 68

The [1, 2] [3, 4] Unfolding of a ((12)(34)) Symmetric A If A = A [1,2] [3,4], then A is symmetric and (among other things) is perfect shuffle symmetric. A = 11 12 13 12 14 15 13 15 16 12 17 18 17 19 20 18 20 21 13 18 22 18 23 24 22 24 25 12 17 18 17 19 20 18 20 21 14 19 23 19 26 27 23 27 28 15 20 24 20 27 29 24 29 30 13 18 22 18 23 24 22 24 25 15 20 24 20 27 29 24 29 30 16 21 25 21 28 30 25 30 31 Each column reshapes into a 3x3 symmetric matrix, e.g., A(:, ) reshapes to 11 12 13 12 14 15 13 15 16 What is perfect shuffle symmetry? From Matrix to Tensor From Tensor To Matrix 62 / 68

Perfect Shuffle Symmetry An n 2 -by-n 2 matrix A has perfect shuffle symmetry if where A = Π n,n AΠ n,n Π n,n = I n 2(:, v), v = [ 1:n:n 2 2:n:n 2 n:n:n 2 ]. e.g., Π 3,3 = 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 From Matrix to Tensor From Tensor To Matrix 63 / 68

Structured Low-Rank Approximation We have an n 2 -by-n 2 matrix A that is symmetric and perfect shuffle symmetric and it basically has rank n. Using PAP T = LDL T we are able to write A = n d k u k uk T k=1 where each rank-1 is symmetric and perfect shuffle symmetric. This structured data-sparse representation reduces work by an order of magnitude in the application we are considering. From Matrix to Tensor From Tensor To Matrix 64 / 68

Notation: The Challenge Scientific computing is increasingly tensor-based. It is hard to spread the word about tensor computations because summations, transpositions, and symmetries are typically described through multiple indices. And different camps have very different notations, e.g. t i 1i 2 i 3 i 4 i 5 = a i 1 j1 b i 2 j1 j 2 c i 2 j2 j 3 d i 2 j 3 j 4 e i 2 j4 From Matrix to Tensor From Tensor To Matrix 65 / 68

Brevity is the Soul of Wit Multiple Summations n j=1 n 1 j 1 =1 n d j d =1 Transposition If T = [2 1 4 3] then B = A T means B(i 1, i 2, i 3, i 4 ) = A(i 2, i 1, i 4, i 3 ) Contractions For all 1 i m and 1 j n: A(i, j) = p B(i, k)c(k, j) k=1 From Matrix to Tensor From Tensor To Matrix 66 / 68

From Jacobi s 1846 Eigenvalue Paper A system of linear equations: (a, a)α + (a, b)β + (a, c)γ + + (a, p) ω = α x (b, a)α + (b, b)β + (b, c)γ + + (b, p) ω = β x (p, a)α + (p, b)β + (p, c)γ + + (p, p) ω = ω x Somewhere between 1846 and the present we picked up conventional matrix-vector notation: Ax = b How did the transition from scalar notation to matrix-vector notation happen? From Matrix to Tensor From Tensor To Matrix 67 / 68

The Next Big Thing... Scalar-Level Thinking 1960 s Matrix-Level Thinking The factorization paradigm: LU, LDL T, QR, UΣV T, etc. 1980 s Block Matrix-Level Thinking 2000 s Tensor-Level Thinking Cache utilization, parallel computing, LAPACK, etc. High-dimensional modeling, cheap storage, good notation etc. From Matrix to Tensor From Tensor To Matrix 68 / 68