Math 671: Tensor Train decomposition methods

Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016

Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3 Model problem: integral equations Problem formulation

Preliminaries and goal Motivation: Low rank approximation Optimal low rank approximation using a truncated SVD with k terms Quasi-optimal low rank approximation using the ID Block low-rank structure - Fast Multipole Method and others (e.g. Butterfly algorithm)

Preliminaries and goal What is a tensor? A tensor is a generalization of a vector to multiple dimensions In the most abstract setting, a tensor T is an element (v 1, v 2,..., v d ) in the product of vector spaces V 1 V 2 V d. In the context of vectors and matrices (V i = R or R n ), you can also think of them as multidimensional arrays of d dimensions: A vector v is then a 1-tensor, with elements v(i). A matrix A is a 2-tensor, with elements A(i, j) A tensor T of dimension d has elements T (i 1, i 2,..., i d )

Preliminaries and goal Going from a vector or matrix to a tensor It is possible to go from a tensor to a vector by flattening or vectorizing it. It is also possible to create extra-dimensions in a vector to tensorize it. You might ve already done this with matrices, and in fact, all arrays in your computer are really just vectors. Matlab commands reshape and permute When we merge tensor indices, we will indicate it by placing a bar on top: i = i 1 i 2 i d

Preliminaries and goal The Tensor Train decomposition TT decomposition (Tyrtyshnikov, Oseledets et al): an extremely efficient numerical method to compress tensors. Extension of low rank approximation to d dimensional tensors. Overcomes curse of dimensionality: for many examples work and storage are O(d) or O(d k ) for small k. This decomposition can be applied to vectors and matrices, by reshaping them as higher dimensional tensors.

Preliminaries and goal Goal: function compression and fast matrix algebra By tensorizing function samples, we can use the TT to evaluate, interpolate and perform operations on that function efficiently. Applying this to matrices, we can obtain compact matrix factorizations of A and A 1. We also have algorithms to perform operations with these (e.g. apply them to vectors) fast.

Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3 Model problem: integral equations Problem formulation

Unfolding matrices For a tensor T of dimension d, unfolding matrices can be defined as T k (i 1 i 2..i k, i k+1..i d ) = T (i 1, i 2,.., i d ) k = 1,, d, That is, rows and columns result from merging the first k and the last d k dimensions of T, respectively. Using Matlab notation, T k = reshape(t, k n i, i=1 d i=k+1 n i )

Why are unfolding matrices T k important? As we will see, the TT decomposition is obtained from low rank decompositions of a tensor s unfolding matrices. Low rank decomposition for matrices What do unfolding matrices of a tensorized vector mean? What about for tensorized matrices? Why can we expect them to be low rank?

Unfolding matrices for tensorized arrays What does it mean to tensorize a vector? Let us assume we have a vector f, which we have obtained from taking 8 samples of a function f (x). (Say, f (x) = sin(x) on [0, 2π]). What does it mean to make it a tensor?

Unfolding matrices for tensorized arrays Tensorized vector index Domain x i Ω 0 1 0 1 0 1 0 1 0 1 0 1 0 1 i i i 1 2 3 Unfolding Matrix F T ( i i, i ) = f(x ) 1 2 3 i

Unfolding matrices for tensorized arrays Unfolding matrices: vectors of function samples Let f : Ω R Hierarchical partition of Ω tree structure T of depth d Any x Ω can be encoded by indices {i l } d 1 l=1, which of the n l tree branches it belongs to at level l. We sample f on n d samples on each leaf (indexed by i d ). Index by integer coordinates and tensorize: F T (i 1, i 2,.., i d ) = f (x i ), i = i 1 i d

Unfolding matrices for tensorized arrays Why are these unfolding matrices low rank?

Unfolding matrices for tensorized arrays Model problem: Integral Equations for PDEs Many boundary value problems from classical physics, when cast as boundary or volume integral equations, take the form A[σ](x) = a(x)σ(x) + K (x, y)σ(y)ds(y) = f (x), x Γ, Γ K (x, y) kernel function related to PDE fundamental solution. Typically singular near diagonal (y = x) but otherwise smooth. We prefer Fredholm of the 2nd kind (Identity + Compact).

Unfolding matrices for tensorized arrays Discretization Discretizing these integrals using, e.g., the Nyström method: (Aσ) i = a(x i )σ i + N K (x i, y j )σ j ω j = f (x i ) j=1 results in a linear system of the form Aσ = f where A is a dense N N matrix. If K is singular, special quadratures are needed if sources x i and targets y j get close.

Unfolding matrices for tensorized arrays Matrices of kernel samples Matrix entries are kernel evaluations K (x i, y j ) for N sources {y j } and M targets {x i }. If we partition the domain and range of K (x, y), we can consider a hierarchy of interactions At every level, a matrix block is encoded by the integer coordinate pair (i l, j l ), or equivalently, by a block coordinate b l = i l j l A T (i 1 j 1, i 2 j 2,.., i d j d ) = A(i 1 i 2... i d, j 1 j 2... j d ) = K (x i, y j )

Unfolding matrices for tensorized arrays Tensorized matrix index A(i,j) =... A 1 A 2 A 3 16 x 16 32 x 4 64 x 1 Unfolding matrix A l all interactions between source and target nodes at a given level.

Unfolding matrices for tensorized arrays Why are these unfolding matrices low rank?

The Tensor Train decomposition What is the TT decomposition? For a d dimensional tensor A sampled at N = d i=1 n i points indexed by (i 1, i 2,..., i d ), this decomposition can be written as: A(i 1, i 2,..., i d ) α 1,...,α d 1 G 1 (i 1, α 1 )G 2 (α 1, i 2, α 2 )... G d (α d 1, i d ) Each G k is known as a tensor core. Auxiliary indices α k determine number of terms in the decomposition and run from 1 to r k. r k is known as the kth TT rank.

The Tensor Train decomposition How to compute TT cores and ranks? The k-th TT rank is the rank of the k-th unfolding matrix A k, and a TT decomposition may be obtained by a series of low rank approximations. You can think of it as an extension of the SVD to tensors (it s not the only one, but it s one with some of the very same properties). By truncating the ranks, we also get a quasi-optimal analogue to the optimal low rank approximation.

The Tensor Train decomposition How to obtain a TT decomposition If we compute our favorite low rank approximation of A 1, we obtain: A 1 (i 1, i 2..i d ) = U(i 1, α 1 )V (α 1, i 2..i d ) The first core G 1 (α 1, i 1 ) is a reshaping of U. We can iterate this procedure for V (α 1 i 2, i 3,.., i d ) More efficient algorithms: series of low TT rank approximations, enriched with updates local to each core G k. (amencross algorithm from TT-Toolbox)

The Tensor Train decomposition TT for matrices A = A 1 = 16 16 M 1 = 16 r 1 U 1 G 1 [ ] r 1 16 V 1 4r 1 4 M 2 = 4r 1 r 2 U 2 G 2 [ ] r 2 4 V 2 4r 2 1 M 3 = U 3 G 3 =

The Tensor Train decomposition Why does it achieve better compression? TT rank is low if there exists a small basis of interactions For many examples from differential and integral equations, ranks are actually bounded or grow very slowly (like log N) Other examples have higher growth (Toeplitz have ranks with growth N 1/2 ) Symmetries, and particularly translation or rotation invariance reduce TT ranks significantly.

Model problem: integral equations Problem formulation TT matrix-vector apply Once we have an approximation of A or A 1, we want to compute fast matrix-vector products. If x can be efficiently compressed using the TT decomposition, the TT cores of y = Ax can be computed: Y k (α k β k, i k, α k+1 β k+1 ) = j k G k (α k, i k, j k, α k+1 )X k (β k, j k, β k+1 ) If x is a dense / incompressible vector, a fast O(N log N) algorithm proceeds by contracting each dimension (applying each core) at a time.