TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY

Similar documents
NUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA

NEW TENSOR DECOMPOSITIONS IN NUMERICAL ANALYSIS AND DATA PROCESSING

Math 671: Tensor Train decomposition methods

Institute for Computational Mathematics Hong Kong Baptist University

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Linear Algebra and its Applications

TENSORS AND COMPUTATIONS

Institute for Computational Mathematics Hong Kong Baptist University

Math 671: Tensor Train decomposition methods II

Tensor networks and deep learning

Numerical tensor methods and their applications

Matrix-Product-States/ Tensor-Trains

Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition

Tensor properties of multilevel Toeplitz and related. matrices.

An Introduction to Hierachical (H ) Rank and TT Rank of Tensors with Examples

Tensors and graphical models

für Mathematik in den Naturwissenschaften Leipzig

Institute for Computational Mathematics Hong Kong Baptist University

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

4. Multi-linear algebra (MLA) with Kronecker-product data.

From Matrix to Tensor. Charles F. Van Loan

EE731 Lecture Notes: Matrix Computations for Signal Processing

The multiple-vector tensor-vector product

Lecture 4 Orthonormal vectors and QR factorization

Max Planck Institute Magdeburg Preprints

Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko

This work has been submitted to ChesterRep the University of Chester s online research repository.

Low Rank Approximation Lecture 3. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

Subset selection for matrices

1. Structured representation of high-order tensors revisited. 2. Multi-linear algebra (MLA) with Kronecker-product data.

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

Vector and Matrix Norms. Vector and Matrix Norms

arxiv: v4 [math.na] 27 Nov 2017

für Mathematik in den Naturwissenschaften Leipzig

Fast low rank approximations of matrices and tensors

Orthogonal tensor decomposition

Fast matrix algebra for dense matrices with rank-deficient off-diagonal blocks

Rank Determination for Low-Rank Data Completion

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

arxiv: v2 [math.na] 13 Dec 2014

Matrix decompositions

Computational Linear Algebra

Third-Order Tensor Decompositions and Their Application in Quantum Chemistry

Chapter 1. Matrix Algebra

Least Squares Approximation

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Lecture 4. CP and KSVD Representations. Charles F. Van Loan

Linear Algebra Massoud Malek

ON MANIFOLDS OF TENSORS OF FIXED TT-RANK

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

Incomplete Cross Approximation in the Mosaic-Skeleton Method

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Parallel Singular Value Decomposition. Jiaxing Tan

The Singular Value Decomposition

This can be accomplished by left matrix multiplication as follows: I

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Numerical Experiments for Finding Roots of the Polynomials in Chebyshev Basis

Introduction to Tensors. 8 May 2014

Intrinsic products and factorizations of matrices

Faloutsos, Tong ICDE, 2009

NORMS ON SPACE OF MATRICES

CS60021: Scalable Data Mining. Dimensionality Reduction

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

MATRIX COMPLETION AND TENSOR RANK

Tensor Product Approximation

Sparse Grids. Léopold Cambier. February 17, ICME, Stanford University

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Numerical Linear Algebra And Its Applications

On Multivariate Newton Interpolation at Discrete Leja Points

Linear Algebra (Review) Volker Tresp 2017

Parameter Selection Techniques and Surrogate Models

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Orthogonalization and least squares methods

SPARSE signal representations have gained popularity in recent

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

Linear Algebra Methods for Data Mining

Linear Algebra (Review) Volker Tresp 2018

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Direct Numerical Solution of Algebraic Lyapunov Equations For Large-Scale Systems Using Quantized Tensor Trains

Tensor Low-Rank Completion and Invariance of the Tucker Core

/16/$ IEEE 1728

Lecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion:

STA141C: Big Data & High Performance Statistical Computing

Foundations of Matrix Analysis

Matrix assembly by low rank tensor approximation

Symmetric Matrices and Eigendecomposition

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Block Bidiagonal Decomposition and Least Squares Problems

Some Notes on Least Squares, QR-factorization, SVD and Fitting

STA141C: Big Data & High Performance Statistical Computing

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Dynamical low-rank approximation

14.2 QR Factorization with Column Pivoting

Class notes: Approximation

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Part IB Numerical Analysis

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Transcription:

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY Eugene Tyrtyshnikov Institute of Numerical Mathematics Russian Academy of Sciences (joint work with Ivan Oseledets)

WHAT ARE TENSORS? Tensors = d-dimensional arrays: A = [a ij...k ] i I, j J,..., k K Tensor A has: dimensionality (order) d = number of indices (modes, axes, directions, ways) size n 1... n d (number of nodes along each axis)

WHAT IS A PROBLEM? NUMBER OF TENSOR ELEMENTS = n d GROWS EXPONENTIALLY IN d WATER AND UNIVERSE H 2 O molecule has 18 electrons. Each electron has 3 coordinates. Thus we have 18 3 = 54 axes. If we take 32 nodes on each axis, we obtain 32 54 10 81 points, which is close to the number of atoms in the universe. CURSE OF DIMENSIONALITY

WE SURVIVE WITH COMPACT (LOW-PARAMETRIC) REPRESENTATIONS FOR TENSORS METHODS FOR COMPUTATIONS IN COMPACT REPRESENTATIONS

TUCKER DECOMPOSITION a(i 1,..., i d ) = r 1 α 1 =1... r d α d =1 g(α 1,..., α d ) q 1 (i 1, α 1 )... q d (i d, α d ) L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika, V. 31, P. 279 311 (1966). COMPONENTS: 2D arrays q 1,..., q d with dnr entries d-dimensional array g(α 1,..., α d ) with r d entries CURSE OF DIMENSIONALITY REMAINS

CANONICAL DECOMPOSITION (PARAFAC, CANDECOMP) a(i 1,..., i d ) = R α=1 u 1 (i 1, α)... u d (i d, α) Number of defining parameters is drn. DRAWBACKS: INSTABILITY (cf. Lim, de Silva) x 1,..., x d, y 1,..., y d d a = z t 1... zt d, t=1 linearly independent zt k = { xk, k t y k, k = t a = 1 ε (x 1 + εy 1 )... (x d + εy d ) 1 ε x 1... x d + O(ε) EVENTUAL LACK OF ROBUST ALGORITHMS

a(i 1,..., i d ) = r 1 α 1 =1... r d α d =1 g(α 1,..., α d ) q 1 (i 1, α 1 )... q d (i d, α d ) TUCKER DECOMPOSITION

a(i 1,..., i d ) = R α=1 u 1 (i 1, α)... u d (i d, α) CANONICAL DECOMPOSITION (PARAFAC, CANDECOMP)

a(i 1,..., i d ) = α 1,..., α d 1 g 1 (i 1, α 1 ) g 2 (α 1, i 2, α 2 )...... g d 1 (α d 2, i d 1, α d 1 ) g d (α d 1, i d ) TENSOR-TRAIN DECOMPOSITION

TENSORS AND MATRICES Let A = [a ijklm ]. Take up a pair of mutually complementary long indices (ij) and (klm) (kl) and (ijm)... Tensor A gives rise to unfolding matrices: B 1 = [b (ij),(klm) ] By definition, B 2 = [b (kl),(ijm) ]... b (ij),(klm) = b (kl),(ijm) =... = a ijklm

DIMENSIONALITY CAN BE DECREASED a(i 1,..., i d ) = a(i 1,..., i k ; i k+1,..., i s ) r = u(i 1,..., i k ; s) v(i k+1,..., i d ; s) s=1 Dimension d reduces to dimensions k + 1 and d k + 1. Proceed by recursion. Binary tree arises.

TUCKER VIA RECURSION 2 3 4 5 α 2 3 4 5 α 2 α 2 3 4 5 α α 2 3 α 3 4 5 α α 2 α 3 4 α 4 5 α α 2 α 3 α 4 5 α 5 α α 2 α 3 α 4 α 5 a(i 1, i 2, i 3, i 4, i 5 ) = α 1,α 2,α 3,α 4,α 5 g(α 1, α 2, α 3, α 4, α 5 ) q 1 (i 1, α 1 ) q 2 (i 2, α 2 ) q 3 (i 3, α 3 ) q 4 (i 4, α 4 ) q 5 (i 5, α 5 )

BINARY TREE IMPLIES Any auxiliary index belongs to exactly two leaf tensors. Tensor is the sum over all auxiliary indices of the product of elements of the leaf tensors. HOW TO AVOID r d PARAMETERS Let any leaf tensor have at most one spatial index. Let any leaf tensor have at most two (three) auxiliary indices.

TREE WITHOUT TUCKER 2 3 4 5 α 2 3 4 5 α 2 α 2 3 4 5 α α 2 3 α α 3 4 5 α 2 α 3 4 α 2 α 4 5 α 3 α 4 TENSOR-TRAIN DECOMPOSITION a(i 1, i 2, i 3, i 4, i 5 ) = α 1,α 2,α 3,α 4 g 1 (i 1, α 1 ) g 2 (α 1, i 3, α 3 ) g 3 (α 3, i 5, α 4 ) g 4 (α 4, i 4, α 2 ) g 5 (α 2, i 2 )

HOW MANY PARAMETERS NUMBER OF TT PARAMETERS = 2nr + (d 2)nr 2 EXTENDED TT DECOMPOSITION 2 3 4 5 2 α 3 4 5 α α 2 2 α α 2 3 α 3 4 5 α α 3 2 α 5 α α 2 α 5 4 α α 4 5 α 3 α 4 4 α 6 α α 4 α 6 5 α 7 α 3 α 4 α 7 NUMBER OF EXTENDED TT PARAMETERS = dnr + (d 2)r 3

TREE IS NOT NEEDED! ALL IS DEFINED BY A PERMUTATION OF SPATIAL INDICES TENSOR-TRAIN DECOMPOSITION a(i 1, i 2, i 3, i 4, i 5 ) = β 1,β 2,β 3,β 4 g 1 (i σ(1), β 1 ) g 2 (β 1, i σ(2), β 2 ) g 3 (β 2, i σ(3), β 4 ) g 4 (β 4, i σ(5), β 5 ) g 5 (β 5, i σ(5) ) TT = Tree Tucker neither Tree, nor Tucker TENSOR TRAIN

MINIMAL TT DECOMPOSITION Let 1 β k r k. What are minimal values for compression ranks r k? r k ranka σ k A σ k = [ a σ (i σ(1),..., i σ(k) ; i σ(k+1),..., i σ(d) ) ] a σ (i σ(1),..., i σ(k) ; i σ(k+1),..., i σ(d) ) = a(i 1,..., i d )

GENERAL PROPERTIES THEOREM 1. Assume that a tensor a(i 1,..., i d ) possesses a canonical decomposition with R terms. Then a(i 1,..., i d ) admits a TT decomposition of rank R or less. THEOREM 2. Assume that a tensor a(i 1,..., i d ), when ε-perturbed, with any small ε possesses a canonical decomposition with R terms. Then a(i 1,..., i d ) admits a TT decomposition of rank R or less.

FROM CANONICAL TO TENSOR TRAIN a(i 1,..., i d ) = R s=1 u(i 1, s)... u(i d, s) = α 1,...,α d 1 u(i 1, α 1 ) δ(α 1, α 2 )u(i 2, α 2 )...... δ(α d 2, α d 1 )u(i d 1, α d 1 ) u(i d, α d 1 ) FREE!

EFFECTIVE RANK OF A TENSOR ERank(a) = lim sup ε +0 min RANK(b) b a ε b C(n 1,..., n d ) F(n 1,..., n d ): all tensors of size n 1... n d with entries from F. Let a F(n 1,..., n d ) C(n 1,..., n d ). Then canonical rank over F depends on F, effective rank does not. Close to border rank concept (Bini-Capovani). Which still depends on F. THEOREM 2 (reformulated) Let a F(n 1,..., n d ). Then for this tensor there exists a TT decomposition of rank r ERank(a) with entries of all tensors belonging to F.

EXAMPLE 1 d-dimensional tensor in the matrix form A = Λ I... I + I Λ... I +... + I... I Λ P(h) d s=1 (I + hλ) = I + ha + O(h2 ) A = 1 h P(h) 1 P(0) + O(h) h ERank(A) = 2

EXAMPLE 2 Real-valued tensor F by the function f(x 1,..., x d ) = sin(x 1 +... + x d ) on some 1D grids for x 1,..., x d. Beylkin et al: canonical rank over R of F does not exceed d (it is likely to be exactly d). However, sin x = exp(ix) exp( ix) 2i ERank(F) = 2

EXAMPLE 3 d-dimensional tensor A from discretization of operator A = 1 i j d a ij x i x j on a tensor grid for variables x 1,..., x d. Canonical rank d 2 /2. However, ERank(A) 3 2 d + 1 (N. Zamarashkin, I. Oseledets, E. Tyrtyshnikov)

TENSOR TRAIN DECOMPOSITION a(i 1,...i d ) = α 0,...,α d g 1 (α 0, i 1, α 1 ) g 2 (α 1, i 2, α 2 )... g d (α d 1, i d, α d ) MATRIX FORM a(i 1,..., i d ) = G i 1 1 G i 2 2... G i d d MINIMAL TT COMPRESSION RANKS: r k = ranka k, A k = [a (i1...i k )(i k+1...i d )], 0 k d size(g i k k ) = r k 1 r k

THE KEY TO EVERYTHING PROBLEM OF RECOMPRESSION: Given a tensor train, but with large ranks. Let us try to find in ε-vicinity a tensor train with lesser compression ranks. METHOD OF TT RECOMPRESSION (I.V.Oseledets): NUMBER OF OPERATIONS IS LINEAR IN DIMENSIONALITY d AND MODE SIZE n THE RESULT HAS GUARANTEED APPROXIMATION ACCURACY

METHOD OF TENSOR TRAIN RECOMPRESSION Minimal TT compression ranks = ranks of unfolding matrices A k Matrices A k are of size n k n d k, but never appear as full arrays of n d elements. Nevertheless, the SVD for A k are constructed with orthogonal (unitary) matrices in a compact factorized form. When neglecting smallest singular values, we provide GUARANTEED ACCURACY. To show the idea, consider a TT decomposition a(i 1, i 2, i 3 ) = α 1,α 2 g 1 (i 1, α 1 ) g 2 (α 1, i 2, α 2 ) g 3 (α 2, i 3 )

TENSOR TRAIN RECOMPRESSION RIGHT TO LEFT by QR a(i 1, i 2, i 3 ) = g 1 (i 1, α 1 ) g 2 (α 1, i 2, α 2 ) g 3 (α 2 ; i 3 ) α 1,α 2 = g 1 (i 1, α 1 ) g 2 (α 1, i 2 ; α 2 ) q 3(α 2 ; i 3) α 1,α 2 = ĝ 1 (i 1 ; α 1 ) q 2(α 1, i 2; α 2 ) q 3(α 2 ; i 3) α 1,α 2 Matrices q 2 (α 1 ; i 2, α 2 ), q 3(α 2 ; i 3) obtain orthonormal rows. g 3 (α 2 ; i 3 ) = r 3 (α 2 ; α α 2 ) q 3(α 2 ; i 3) 2 g 2 (α 1, i 2 ; α 2 ) = g 2 (α 1, i 2 ; α 2 ) r 3 (α 2, α 2 ) α 2 g 2 (α 1 ; i 2, α 2 ) = r 2 (α 1 ; α α 1 ) q 2(α 1 ; i 2, α 2 ) 1 ĝ 1 (i 1 ; α 1 ) = g 1 (i 1 ; α 1 ) r 2 (α 1 ; α 2 ) α 1 QR QR

TENSOR TRAIN RECOMPRESSION LEFT TO RIGHT by SVD a(i 1, i 2, i 3 ) = ĝ 1 (i 1 ; α 1 ) q 2(α 1, i 2, α 2 ) q 3(α 2, i 3) α 1,α 2 = α 1,α 2 = α 1,α 2 z 1 (i 1 ; α 1 ) ĝ 2(α 1 ; i 2, α 2 ) q 3(α 2, i 3) z 1 (i 1 ; α 1 ) z 2(α 1 ; i 2, α 2 ) ĝ 3(α 2, i 3) Matrices z 1 (i 1 ; α 1 ), z 2(α 1, i 2; α 2 ) obtain orthonormal columns.

LEMMA ON ORTHONORMALITY Let k l and matrices q k (α k 1 ; i k, α k ),..., q l (α l 1 ; i l, α l ) have orthonormal rows. Then the matrix Q k (α k ; i) Q k (α k 1 ; i k,..., i l, α l ) α k,...,α l 1 q k (α k 1 ; i k, α k )... q l (α l 1 ; i l, α l ) has orthonormal rows as well. PROOF BY INDUCTION. Q k (α k 1 ; i k, i) = α k q k (α k 1 ; i k, i) Q k+1 (α k ; i) Q k (α ; i k, i) Q k (β ; i k, i) = i k,i q k (α ; i k, µ) Q k+1 (µ ; i)q k (β ; i k, ν) Q k+1 (ν ; i) = i k,i µ,ν q k (α, ; i k, µ) q k (β ; i k, ν) δ(µ, ν) = i k µ,ν q k (α, ; i k, α k ) q k (β ; i k, α k ) = δ(α, β) i k,α k

TENSOR TRAIN RECOMPRESSION a(i 1, i 2, i 3 ) = ĝ 1 (i 1, α 1 ) q 2(α 1, i 2, α 2 ) q 3(α 2, i 3) α 1,α 2 = α 1,α 2 = α 1,α 2 z 1 (i 1, α 1 ) ĝ 2(α 1, i 2, α 2 ) q 3(α 2, i 3) z 1 (i 1, α 1 ) z 2(α 1, i 2, α 2 ) ĝ 3(α 2, i 3) ranka 1 = rank [ ĝ 1 (α 0, i 1; α 1 )] ranka 2 = rank [ ĝ 2 (α 1, i 2; α 2 )] ranka 3 = rank [ ĝ 3 (α 2, i 3; α 3 )] Complexity of computation of compression ranks is linear in d. Truncation is performed in the SVD of small-size matrices. NUMBER OF OPERATIONS = O(dnr 3 ) GUARANTEED ACCURACY = d ε (in the Frobenius norm)

TT APPROXIMATION FOR LAPLACIAN d TT recompression time Canonical rank Compresison rank 10 0.01 sec 10 2 20 0.09 sec 20 2 40 0.78 sec 40 2 80 13 sec 80 2 160 152 sec 160 2 200 248 sec 200 2 1D grids are of size 32. Tensor has modes of size n = 1024.

WHAT CAN WE DO WITH TENSOR TRAINS? a(i 1,...i d ) = α 1,...,α d 1 g 1 (i 1, α 1 ) g 2 (α 1, i 2, α 2 )... g d (α d 1, i d ) RECOMPRESSION: given a tensor train with TT-ranks r, we can approximate it by another tensor train with a guaranteed accuracy using O(dnr 3 ) operations. QUASI-OPTIMALITY OF RECOMPRESSION: ERROR d 1 BEST APPROX. ERROR WITH SAME TT-RANKS EFFICIENT APPROXIMATE MATRIX OPERATIONS

CANONICAL VERSUS TENSOR-TRAIN Canonical Tensor-Train Number of parameters O(dnR) O(dnr + (d 2)r 3 ) Matrix-by-vector O(dn 2 R 2 ) O(dn 2 r 2 + dr 6 ) Addition O(dnR) O(dnr) Recompression O(dnR 2 + d 3 R 3 ) O(dnr 2 + dr 4 ) Tensor-vector contraction O(dnR) O(dnr + dr 3 )

TENSOR-VECTOR CONTRACTION γ = i 1,...,i d a(i 1,..., i d ) x 1 (i 1 )... x d (i d ) ALGORITHM: Compute matrices Z k = ik g k (i k, α k 1, α k ) x k (i k ) Multiply matrices γ = Z 1 Z 2...Z k NUMBER OF OPERATIONS = O(dnr 2 )

RECOVER A d-dimensional TENSOR FROM A SMALL PORTION OF ITS ELEMENTS Given a procedure for computation of a(i 1,..., i d ). We need to choose true elements and use them to construct a TT approximation for this tensor. TT decomposition with maximal compression rank r is allowed to be constructed from some O(dnr 2 ) elements.

HOW THIS PROBLEM IS SOLVED FOR MATRICES Let A be close to a matrix of rank r: σ r+1 (A) ε Then there exists a cross of r columns C and r rows R such that (A CG 1 R) ij (r + 1)ε G is an r r matrix on the intersection of C and R Take G of maximal volume among all r r submatrices in A. S.A.Goreinov, E.E.Tyrtyshnikov: The maximal-volume concept in approximation by low-rank matrices, Contemporary Mathematics, Vol. 208 (2001), 47 51. S.A.Goreinov, E.E.Tyrtyshnikov, N.L.Zamarashkin: A theory of pseudo-skeleton approximations, Linear Algebra Appl. 261: 1 21 (1997). Doklady RAS (1995).

GOOD INSTEAD OF BEST: PSEUDO-MAX-VOLUME Given A of size n r, find a row permutation to move a good submatrix to the upper r r block. Since volume does not change by right-side multiplications, assume that 1... A = 1 a r+1,1... a r+1,r......... a n1... a nr NECESSARY FOR MAX-VOL: a ij 1, r + 1 i n, 1 j r Let this define a good submatrix. Then here is an algorithm: If a ij 1 + δ, then swap rows i and j. Make I in the first r rows by right-side multiplication. Check new a ij. Quit if all are less than 1 + δ. Otherwise repeat.

MATRIX CROSS ALGORITHM Assume we are given some initial column indices j 1,..., j r. Find maximal-volume row indices i 1,..., i r in these columns. Find maximal-volume column indices in the rows i 1,..., i r. Proceed choosing columns and rows until the skeleton cross approximations stabilize. E.E.Tyrtyshnikov, Incomplete cross approximation in the mosaic-skeleton method, Computing 64, no. 4 (2000), 367 380.

TENSOR-TRAIN CROSS INTERPOLATION Given a(i 1, i 2, i 3, i 4 ), consider the unfoldings and r-column sets: A 1 = [a(i 1 ; i 2, i 3, i 4 )], J 1 = {i (β 1) 2 i (β 1) 3 i (β 1) 4 } A 2 = [a(i 1, i 2 ; i 3, i 4 )], J 2 = {i (β 2) 3 i (β 2) 4 } A 3 = [a(i 1, i 2, i 3 ; i 4 )], J 3 = {i (β 3) 4 } Successively choose good rows: I 1 = {i (α 1) 1 } in a(i 1 ; i 2, i 3, i 4 ) : a = α 1 g 1 (i 1 ; α 1 ) a 2 (α 1 ; i 2, i 3, i 4 ) I 2 = {i (α 2) 1 i (α 2) 2 } in a 2 (α 1, i 2 ; i 3, i 4 ) : a 2 = α 2 g 2 (α 1, i 2 ; α 2 ) a 3 (α 2, i 3 ; i 4 ) I 3 = {i (α 3) 1 i (α 3) 2 i (α 3) 3 } in a 3 (α 2, i 3 ; i 4 ) : a 3 = α 3 g 3 (α 2, i 3 ; α 3 ) g 4 (α 3 ; i 4 ) Finally a = g 1 (i 1, α 1 ) g 2 (α 1, i 2, α 2 ) g 3 (α 2, i 3, α 3 ) g 4 (α 3, i 4 ) α 1,α 2,α 3,α 4

TT-CROSS INTERPOLATION OF A TENSOR Tensor A of size n 1 n 2... n d with compression ranks r k = ranka k, A k = A(i 1 i 2... i k ; i k+1... i d ) is recovered by elements of TT-cross C k (α k 1, i k, β k ) = A(i (α k 1) 1, i (α k 1) 2,..., i (α k 1) k 1, i k, j (β k) k+1,..., j(β k) d ) TT-cross is defined by index sets I k = {i (α k) 1... i (α k) k }, 1 α k r k J k = {j (β k) k+1... j(β k) d }, 1 β k r k Nested property for α sets. Require nonsingularity of r k r k matrices  k (α k, β k ) = A(i (α k) 1, i (α k) 2,..., i (α k) k ; j (β k) k+1,..., j(β k) d ) α k, β k = 1,..., r k

FORMULA FOR TT-INTERPOLATION A(i 1, i 2,..., i d ) = α 1,...,α d 1 Ĉ 1 (α 0, i 1, α 1 ) Ĉ2(α 1, i 2, α 2 )... Ĉ d (α d 1, i d, α d ) Ĉ k (α k 1, i k, α k ) = α k C k (α k 1, i k, α k )  1 k (α k, α k) k = 1,..., d  d = I

TENSOR-TRAIN CROSS ALGORITHM Assume we are given r k initial column indices j (β k) k+1,..., j(β k) d in the unfolding matrices A k. Find r k maximal-volume rows in submatrices in A k of the form a(i (α k 1) 1,..., i (α k 1) k 1, i k ; j (β k) k+1,..., j(β k) d ). Use the row indices obtained and do the same from right to left to find new column indices. Proceed with these sweeps from left to right and from right to left. Stop when tensor trains stabilize.

EXAMPLE OF TT-CROSS APPROXIMATION HILBERT TENSOR 1 a(i 1, i 2,..., i d ) = i 1 + i 2 +... + i d d = 60, n = 32 r max Time Iterations Relative accuracy 2 1.37 5 1.897278e+00 3 4.22 7 5.949094e-02 4 7.19 7 2.226874e-02 5 15.42 9 2.706828e-03 6 21.82 9 1.782433e-04 7 29.62 9 2.151107e-05 8 38.12 9 4.650634e-06 9 48.97 9 5.233465e-07 10 59.14 9 6.552869e-08 11 72.14 9 7.915633e-09 12 75.27 8 2.814507e-09

COMPUTATION OF d-dimensional INTEGRALS: example 1 I(d) = sin(x 1 + x 2 +... + x d ) dx 1 dx 2... dx d = Im e i(x 1+x 2 +...+x d ) dx 1 dx 2... dx d = Im(( ei 1 ) d ) [0,1] d i Use the Chebyshev (Clenshaw-Curtis) quadrature with n = 11 nodes. All n d values are NEVER COMPUTED! Instead, we find a TT cross and construct a TT approximation for this tensor. d I Relative accuracy Time 10-6.299353e-01 1.409952e-15 0.14 100-3.926795e-03 2.915654e-13 0.77 500-7.287664e-10 2.370536e-12 4.64 1000-2.637513e-19 3.482065e-11 11.60 2000 2.628834e-37 8.905594e-12 33.05 4000 9.400335e-74 2.284085e-10 105.49

COMPUTATION OF d-dimensional INTEGRALS: example 2 I(d) = [0,1] d x 2 1 + x 2 2 +... x 2 d dx 1dx 2... dx d d = 100 Chebyshev quadrature with n = 41 nodes plus TT-cross of size r max = 32 give a reference solution. For comparison, take n = 11 nodes: r max Relative accuracy Time 2 1.747414e-01 1.76 4 2.823821e-03 11.52 8 4.178328e-05 42.76 10 3.875489e-07 66.28 12 2.560370e-07 94.39 14 4.922604e-08 127.60 16 9.789895e-10 167.02 18 1.166096e-10 211.09 20 2.706435e-11 260.13

INCREASE DIMENSIONALITY (TENSORS INSTEAD MATRICES) Matrix is a 2-way array. A d-level matrix is naturally viewed as a 2d-way array: A(i, j) = A(i 1, i 2,..., i d ; j 1, j 2,..., j d ) i (i 1...i d ), j (j 1...j d ) Important to consider a related reshaped array: B(i 1 j 1,..., i d j d ) = A(i 1, i 2,..., i d ; j 1, j 2,..., j d ) Matrix A is represented by tensor B.

MINIMAL TENSOR TRAINS a(i 1... i d ; j 1... j d ) = 1 α k r k g 1 (i 1 j 1, α 1 ) g 2 (α 1, i 2 j 2, α 2 )... g d 1 (α d 2, i d 1 j d 1, α d 1 ) g d (α d 1, i d j d ) Minimal possible values of compression ranks r k are equal to the ranks of specific unfolding matrices: r k = ranka k, A k = [A(i 1 j 1,..., i k j k ; i k+1 j k+1,..., i d j d )] If all r k = 1 then A = G 1... G d In general A = α 1 G 1α1 α2 G 2α1 α 2 α3 G 3α2 α 3......

NO CURSE OF DIMENSIONALITY Let 1 i k, j k n and r k = r. Then the number of representation parameters is dn 2 r 2. Dependence on d is linear! SO LET US MAKE d AS LARGE AS POSSIBLE BY ADDING FICTITIOUS AXES Assume we had d 0 levels. If n = 2 d 1 then set d = d 0 d 1.Then memory = 4dr 2 d = log 2 (size(a)) LOGARITHMIC IN THE SIZE OF MATRIX

CAUCHY TOEPLITZ EXAMPLE [ ] 1 A = i j + 1/2 Relative accuracy Compression ranks for A and A 1 1.e-5 3 7 8 8 8 7 7 7 3 1.e-7 3 7 9 10 10 9 9 7 3 1.e-9 3 7 11 11 11 11 11 7 3 1.e-11 3 7 12 13 13 13 12 7 3 1.e-13 3 7 14 14 15 14 14 7 3 n = 1024, d 0 = 1, d 1 = 10

INVERSES TO BANDED TOEPLITZ MATRICES Let A be a band Toeplitz matrix: A ij = [a(i j)] a k = 0, k > s, s is half-bandwidth. THEOREM Let size(a) = 2 d 2 d and det A 0. Then r k (A 1 ) 4s 2 + 1, k = 1,..., d 1, the estimate being sharp. COROLLARY The inverse to a band Toeplitz matrix A of size 2 d 2 d with halfbandwidth s has a TT representation with the number of parameters O(s 4 log 2 n). Using Newton with approximations we obtain the inversion algorithm with complexity O(log 2 n).

AVERAGE COMPRESSION RANK r = memory 2d memory = 4dr 2 INVERSION OF d 0 -DIMENSIONAL LAPLACIAN BY MODIFIED NEWTON d 1 = 10 Physical dimensionality (= d 0 ) 1 3 5 10 30 50 Average compression rank of A 2.8 3.5 3.6 3.7 3.8 3.8 Average compression rank of approximation to A 1 7.3 18.6 19.2 17.4 16.1 16.5 Time (sec) 2. 10. 17. 23. 27. 33. AX I / I 1.e-2 6.e-3 2.e-3 5.e-5 4.e-5 4.e-5 The last matrix size is 2 100.

INVERSION OF 10-DIMENSIONAL LAPLACIAN VIA INTEGRAL REPRESENTATION BY STENGER FORMULA 0 exp( At)dt h τ M k= M w k exp ( t k τ A ) h = π/ M, w k = t k = exp(hk), λ min (A/τ) 1

CONCLUSIONS AND PERSPECTIVES Tensor-train decompositions and corresponding algorithms (see http://pub.inm.ras.ru) provide us with excellent approximation tools for vectors and matrices. TT-toolbox for Matlab is available: http://spring.inm.ras.ru/osel. The memory needed depends on the matrix size logarithmically. It is terrific advantage when compression ranks are small. It is exactly so in many applications. Approximate inverses can be computed in the tensor-train format generally with complexity logarithmic in the size of matrix. Applications unclude huge-scale matrices (with size up to 2 100 ) and as well typical large-scale and even modest-scale matrices (like images). The key to efficient tensor-train operations is the recompression algorithm with complexity O(dnr 6 ) and reliability of the SVD. Modified Newton method with truncations and integral representations of matrix functions are viable in the tensor-train format.

GOOD PERSPECTIVES Multi-variate interpolation (construction of tensor trains from a small portion of all elements, tensor cross methods using the maximal volume concept). Fast computation of integrals in d dimensions (no Monte Carlo). Approximate matrix operations (e.g. inversion) with complexity O(log 2 n). linear in d = linear in log 2 n New direction in data compression and image processing (movies). Statistical interpretation of tensor trains. Applications to quantum chemistry, multi-parametric optimization, stochastic PDEs, data mining etc.

MORE DETAILS and WORK IN PROGRESS I. V. Oseledets and E. E. Tyrtyshnikov, Breaking the curse od dimensionality, or how to use SVD in many dimensions, Research Report 09-03, Hong Kong: ICM HKBU, 2009 (www.math.hkbu.edu.hk/icm/pdf/09-03.pdf), SIAM J. Sci. Comput., 2009. I. Oseledets, Compact matrix form of the d-dimensional tensor decomposition, SIAM J. Sci. Comput., 2009. I. V. Oseledets, "Tensors inside matrices give logarithmic complexity", SIAM J. Matrix Anal. Appl., 2009. I. V. Oseledets, TT-Cross Approximation for Multidimensional Arrays, Research Report 09-11, Hong Kong: ICM HKBU, 2009 (www.math.hkbu.edu.hk/icm/pdf/09-11.pdf), Linear ALgebra Appl., 2009. I. Oseledets, E. E. Tyrtyshnikov, On a recursive decomposition of multi-dimensional tensors, Doklady RAS, vol. 427, no. 2 (2009). I. Oseledets, On a new tensor decomposition, Doklady RAS, vol. 427, no. 3 (2009). I. Oseledets, On approximation of matrices with logarithmic number of parameters, Doklady RAS, vol. 427, no. 4 (2009). N. Zamarashkin, I. Oseledets, E. Tyrtyshnikov, Tensor structure of the inverse to a banded Toeplitz matrix, Doklady RAS, vol. 427, no. 5 (2009). Efficient ranks of tensors and stability of TT approximations, TTM for image processing, TT approximations in electronic structure calculations. In preparation.