Lecture 4. CP and KSVD Representations. Charles F. Van Loan

Similar documents
Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

From Matrix to Tensor. Charles F. Van Loan

Lecture 2. Tensor Iterations, Symmetries, and Rank. Charles F. Van Loan

Fundamentals of Multilinear Subspace Learning

Tensor Decompositions and Applications

Introduction to Tensors. 8 May 2014

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

Lecture 2. Tensor Unfoldings. Charles F. Van Loan

Tensor Network Computations in Quantum Chemistry. Charles F. Van Loan Department of Computer Science Cornell University

1. Connecting to Matrix Computations. Charles F. Van Loan

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Low Rank Approximation Lecture 7. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

Matrix decompositions

Lecture 2: Linear Algebra Review

Linear Least Squares. Using SVD Decomposition.

1. Structured representation of high-order tensors revisited. 2. Multi-linear algebra (MLA) with Kronecker-product data.

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

9. Numerical linear algebra background

Kronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm

Computational Multilinear Algebra

Computational Linear Algebra

STA141C: Big Data & High Performance Statistical Computing

Tensor Decompositions Workshop Discussion Notes American Institute of Mathematics (AIM) Palo Alto, CA

EECS 275 Matrix Computation

EE731 Lecture Notes: Matrix Computations for Signal Processing

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Tensor Analysis. Topics in Data Mining Fall Bruno Ribeiro

STA141C: Big Data & High Performance Statistical Computing

Basic Concepts in Matrix Algebra

BOOLEAN TENSOR FACTORIZATIONS. Pauli Miettinen 14 December 2011

Faloutsos, Tong ICDE, 2009

4. Multi-linear algebra (MLA) with Kronecker-product data.

Moore-Penrose Conditions & SVD

/16/$ IEEE 1728

9. Numerical linear algebra background

14 Singular Value Decomposition

Quantum Computing Lecture 2. Review of Linear Algebra

Linear Algebra in Actuarial Science: Slides to the lecture

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

The Singular Value Decomposition

Linear Algebra (Review) Volker Tresp 2018

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Parallel Singular Value Decomposition. Jiaxing Tan

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Orthogonal tensor decomposition

Matrix assembly by low rank tensor approximation

7 Principal Component Analysis

CP DECOMPOSITION AND ITS APPLICATION IN NOISE REDUCTION AND MULTIPLE SOURCES IDENTIFICATION

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry

Knowledge Discovery and Data Mining 1 (VO) ( )

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Multi-Way Compressed Sensing for Big Tensor Data

Image Registration Lecture 2: Vectors and Matrices

2. Review of Linear Algebra

Problem set 5: SVD, Orthogonal projections, etc.

FACTORIZATION STRATEGIES FOR THIRD-ORDER TENSORS

Tensors and graphical models

ECE 275A Homework #3 Solutions

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Linear Algebra (Review) Volker Tresp 2017

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

Linear Algebra Review. Vectors

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Linear Algebra: Matrix Eigenvalue Problems

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Math 671: Tensor Train decomposition methods II

The Singular Value Decomposition

Lecture notes: Applied linear algebra Part 1. Version 2

c 2008 Society for Industrial and Applied Mathematics

Probabilistic Latent Semantic Analysis

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY

B553 Lecture 5: Matrix Algebra Review

SVD, PCA & Preprocessing

Must-read Material : Multimedia Databases and Data Mining. Indexing - Detailed outline. Outline. Faloutsos

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

Numerical Methods in Matrix Computations

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Seismic data interpolation and denoising by learning a tensor tight frame

Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods

Matrix-Product-States/ Tensor-Trains

Review of Linear Algebra

A LINEAR PROGRAMMING BASED ANALYSIS OF THE CP-RANK OF COMPLETELY POSITIVE MATRICES

Large Scale Data Analysis Using Deep Learning

Introduction to Numerical Linear Algebra II

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

MATH 612 Computational methods for equation solving and function minimization Week # 2

Data Analysis and Manifold Learning Lecture 2: Properties of Symmetric Matrices and Examples

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

IFT 6760A - Lecture 1 Linear Algebra Refresher

Transcription:

Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD Representations Charles F. Van Loan Cornell University CIME-EMS Summer School June 22-26, 2015 Cetraro, Italy Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 1 / 36

What is this Lecture About? Two More Tensor SVDs The CP Representation has diagonal aspect like the SVD but there is no orthogonality. The Kronecker Product SVD can be used to write a given matrix as an optimal sum of Kronecker products. If the matrix is obtained via a tensor unfolding, then we obtain yet another SVD-like representation. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 2 / 36

The CP Representation Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 3 / 36

The CP Representation Definition The CP representation for an n 1 n 2 n 3 tensor A has the form A = r λ k F (:, k) G(:, k) H(:, k) k=1 where λ s are real scalars and F IR n 1 r, G IR n 2 r, and H IR n 3 r Equivalent A(i 1, i 2, i 3 ) = vec(a) = r λ j F (i 1, j) G(i 2, j) H(i 3, j)) j=1 r λ j H(:, j) G(:, j) F (:, j) j=1 Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 4 / 36

Tucker Vs. CP The Tucker Representation A = r 1 j 1 =1 r 2 j 2 =1 r 3 j 3 =1 S(j 1, j 2, j 3 ) U 1 (:, j 1 ) U 2 (:, j 2 ) U 3 (:, j 3 ) The CP Representation r A = λ j F (:, j) G(:, j) H(:, j) j=1 In Tucker the U s have orthonormal columns. In CP, the matrices F, G, and H do not have orthonormal columns. In CP the core tensor is diagonal while in Tucker it is not. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 5 / 36

A Note on Terminology The CP Decomposition It also goes by the name of the CANDECOMP/PARAFAC Decomposition. CANDECOMP = Canonical Decomposition PARAFAC = Parallel Factors Decomposition Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 6 / 36

A Little More About Tensor Rank Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 7 / 36

The CP Representation and Rank Definition If r A = λ j F (:, j) G(:, j) H(:, j) j=1 is the shortset possible CP representation of A, then rank(a) = r Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 8 / 36

Tensor Rank Anomaly 1 The largest rank attainable for an n 1 -by-...-n d tensor is called the maximum rank. It is not a simple formula that depends on the dimensions n 1,..., n d. Indeed, its precise value is only known for small examples. Maximum rank does not equal min{n 1,..., n d } unless d 2. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 9 / 36

Tensor Rank Anomaly 2 If the set of rank-k tensors in IR n 1 n d measure, then k is a typical rank. has positive Lebesgue Size Typical Ranks 2 2 2 2,3 3 3 3 4 3 3 4 4,5 3 3 5 5,6 For n 1 -by-n 2 matrices, typical rank and maximal rank are both equal to the smaller of n 1 and n 2. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 10 / 36

Tensor Rank Anomaly 3 The rank of a particular tensor over the real field may be different than its rank over the complex field. Anomaly 4 A tensor with a given rank may be approximated with arbitrary precision by a tensor of lower rank. Such a tensor is said to be degenerate. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 11 / 36

The Nearest CP Problem Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 12 / 36

The CP Approximation Problem Definition Given: A IR n 1 n 2 n 3 and r Determine: λ IR r and F IR n 1 r, G IR n 2 r, and H IR n 3 r (with unit 2-norm columns) so that if X = r λ j F (:, j) G(:, j) H(:, j) j=1 then A X 2 F is minimized. A multilinear optimization problem. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 13 / 36

The CP Approximation Problem Equivalent Formulations r A λ j F (:, j) G(:, j) H(:, j) A (1) A (2) A (3) j=1 F = r λ j F (:, j) ( H(:, j) G(:, j) ) T j=1 = r λ j G(:, j) ( H(:, j) F (:, j) ) T j=1 = r λ j H(:, j) ( G(:, j) F (:, j) ) T j=1 F F F Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 14 / 36

Introducing the Khatri-Rao Product Definition If B = [ b 1 b r ] IR n 1 r C = [ c 1 c r ] IR n 2 r then the Khatri-Rao product of B and C is given by B C = [ b 1 c 1 b r c r ]. Column-wise KPs. Note that B C IR n1n2 r. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 15 / 36

The CP Approximation Problem Equivalent Formulations r A λ j F (:, j) G(:, j) H(:, j) j=1 = A (1) F diag(λ j ) (H G) T F = A (2) G diag(λ j ) (H F ) T F = A (3) H diag(λ j ) (G F ) T F F Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 16 / 36

The CP Approximation Problem The Alternating LS Solution Framework... A X F = A (1) F diag(λ j ) (H G) T F = A (2) G diag(λ j ) (H F ) T F = A (3) H diag(λ j ) (G F ) T F 1. Fix G and H and improve λ and F. 2. Fix F and H and improve λ and G. 3. Fix F and G and improve λ and H. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 17 / 36

The CP Approximation Problem The Alternating LS Solution Framework Repeat: 1. Let F minimize A (1) F (H G) T F and for j = 1:r set λ j = F (:, j) 2 and F (:, j) = F (:, j)/λ j. 2. Let G minimize A (2) G (H F ) T F and for j = 1:r set λ j = G(:, j) 2 and G(:, j) = G(:, j)/λ j. 3. Let H minimize A (3) H (G F ) T F and for j = 1:r set λ j = H(:, j) 2 and H(:, j) = H(:, j)/λ j. These are linear least squares problems. The columns of F, G, and H are normalized. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 18 / 36

The CP Approximation Problem Solving the LS Problems The solution to min F A (1) F (H G) T F = min F A T (1) (H G) F T F can be obtained by solving the normal equation system (H G) T (H G) F T = (H G) T A T (1) Can be solved efficiently by exploiting two properties of the Khatri-Rao product. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 19 / 36

The Khatri-Rao Product Fast Property 1. If B IR n1 r and C IR n2 r, then (B C) T (B C) = (B T B). (C T C) where. denotes pointwise multiplication. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 20 / 36

The Khatri-Rao Product Fast Property 2. If B = [ b 1 b r ] IR n 1 r C = [ c 1 c r ] IR n 2 r z IR n 1n 2, and y = (B C) T z, then c1 T Zb 1 y =. Z = reshape(z, n 2, n 1 ) cr T Zb r Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 21 / 36

Overall: The Khatri-Rao LS Problem Structure Given B IR n 1 r, C IR n 2 r, and b IR n 1n 2, minimize B C)x z 2 Data Sparse: An n 1 n 2 -by-r LS problem defined by O((n 1 + n 2 )r) data. Solution Procedure 1. Form M = (B T B). (C T C). O((n 1 + n 2 )r 2 ). 2. Cholesky: M = LL T. O(r 3 ). 3. Form y = (B C) T using Property 2. O(n 1 n 2 r). 4. Solve Mx = y. O(r 2 ). O(n 1 n 2 r) vs O((n 1 n 2 r 2 ) Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 22 / 36

The Kronecker Product SVD Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 23 / 36

The Nearest Kronecker Product Problem Find B and C so that A B C F = min a 11 a 12 a 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 a 51 a 52 a 53 a 54 a 61 a 62 a 63 a 64 a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 b 11 b 12 b 21 b 22 b 31 b 32 = b 11 b 21 b 31 b 12 b 22 b 32 [ c11 c 12 c 21 c 22 ] F [ c11 c 21 c 12 c 22 ] F Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 24 / 36

The Nearest Kronecker Product Problem Find B and C so that A B C F = min It is a nearest rank-1 problem, φ A (B, C) = a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 b 11 b 21 b 31 b 12 b 22 b 32 [ ] c11 c 21 c 12 c 22 F = Ã vec(b)vec(c)t F with SVD solution: Ã = UΣV T vec(b) = σ 1 U(:, 1) vec(c) = σ 1 V (:, 1) Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 25 / 36

The Nearest Kronecker Product Problem The Tilde Matrix a 11 a 12 a 13 a 14 A = a 21 a 22 a 23 a 24 a 31 a 32 a 33 a 34 a 41 a 42 a 43 a 44 = a 51 a 52 a 53 a 54 a 61 a 62 a 63 a 64 implies à = a 11 a 21 a 12 a 22 a 31 a 41 a 32 a 42 a 51 a 61 a 52 a 62 a 13 a 23 a 14 a 24 a 33 a 43 a 34 a 44 a 53 a 63 a 54 a 64 = A 11 A 12 A 21 A 22 A 31 A 32 vec(a 11 ) T vec(a 21 ) T vec(a 31 ) T vec(a 12 ) T vec(a 22 ) T vec(a 32 ) T. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 26 / 36

The Kronecker Product SVD (KPSVD) Theorem If A = A 11 A 1,c2..... A r2,1 A r2,c 2 A i2,j 2 IR r 1 c 1 then there exist U 1,..., U rkp IR r 2 c 2, V 1,..., V rkp IR r 1 c 1, and scalars σ 1 σ rkp > 0 such that A = r KP σ k U k V k. k=1 The sets {vec(u k )} and {vec(v k )} are orthonormal and r KP is the Kronecker rank of A with respect to the chosen blocking. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 27 / 36

The Kronecker Product SVD (KPSVD) Constructive Proof Compute the SVD of Ã: Ã = UΣV T = r KP σ k u k vk T k=1 and define the U k and V k by for k = 1:r KP. vec(u k ) = u k vec(v k ) = v k U k = reshape(u k, r 2, c 2 ), V k = reshape(v k, r 1, c 1 ) Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 28 / 36

The Kronecker Product SVD (KPSVD) Nearest rank-r If r r KP, then A r = r σ k U k V k k=1 is the nearest matrix to A (in the Frobenius norm) that has Kronecker rank r. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 29 / 36

Structured Kronecker Product Approximation min B,C A B C F Problems If A is symmetric and positive definite, then so are B and C. If A is a block Toeplitz with Toeplitz blocks, then B and C are Toeplitz. If A is a block band matrix with banded blocks, the B and C are banded. Can use Lanczos SVD if A is large and sparse. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 30 / 36

A Tensor Approximation Idea Motivation Unfold A IR n n n n into an n 2 -by-n 2 matrix A. Express A as a sum of Kronecker products: A = r σ k B k C k k=1 B k, C k IR n n Back to tensor: A = i.e., A(i 1, i 2, j 1, j 2 ) = r σ k C k B k k=1 r σ k C k (i 1, i 2 )B k (j 1, j 2 ) k=1 Sums of tensor products of matrices instead of vectors. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 31 / 36

The Nearest Kronecker Product Problem Harder φ A (B, C, D) = r 1 c 1 r 2 c 2 r 3 i 1=1 j 1=1 i 2=1 j 2=1 i 3=1 j 3=1 A B C D F = c 2 A(i 1, j 1, i 2, j 2, i 3, j 3 ) B(i 3, j 3 )C(i 2, j 2 )D(i 1, j 1 ) Trying to approximate an order-6 tensor with a triplet of order-2 tensors. Would have to apply componentwise optimization. Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 32 / 36

Concluding Remarks Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 33 / 36

Optional Fun Problems Problem E4. Suppose " B11 C 11 B 12 C 12 A = B 21 C 21 B 22 C 22 # and that the B ij and C ij are each m-by-m. (a) Assuming that structure is fully exploited, how many flops are required to compute y = Ax where x IR 2m2? (b) How many flops are required to explicitly form A? (c) How many flops are required to compute y = Ax assuming that A has been explicitly formed? Problem A4. Suppose A is n 2 -by-n 2. How would you compute X IR n n so that A X X F is minimized? Structured Matrix Computations from Structured Tensors Lecture 4. CP and KSVD 34 / 36