Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko

Similar documents
Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko

Lecture 1: Center for Uncertainty Quantification. Alexander Litvinenko. Computation of Karhunen-Loeve Expansion:

Uncertainty Quantification and related areas - An overview

Sampling and low-rank tensor approximation of the response surface

NON-LINEAR APPROXIMATION OF BAYESIAN UPDATE

Numerical tensor methods and their applications

An Introduction to Hierachical (H ) Rank and TT Rank of Tensors with Examples

Math 671: Tensor Train decomposition methods

Matrix-Product-States/ Tensor-Trains

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Tensor networks and deep learning

Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition

Math 671: Tensor Train decomposition methods II

NUMERICAL METHODS WITH TENSOR REPRESENTATIONS OF DATA

für Mathematik in den Naturwissenschaften Leipzig

Polynomial Chaos and Karhunen-Loeve Expansion

1. Structured representation of high-order tensors revisited. 2. Multi-linear algebra (MLA) with Kronecker-product data.

Tensor Sparsity and Near-Minimal Rank Approximation for High-Dimensional PDEs

TENSOR APPROXIMATION TOOLS FREE OF THE CURSE OF DIMENSIONALITY

Matrix assembly by low rank tensor approximation

für Mathematik in den Naturwissenschaften Leipzig

NEW TENSOR DECOMPOSITIONS IN NUMERICAL ANALYSIS AND DATA PROCESSING

4. Multi-linear algebra (MLA) with Kronecker-product data.

A new truncation strategy for the higher-order singular value decomposition

Sampling and Low-Rank Tensor Approximations

Adaptive low-rank approximation in hierarchical tensor format using least-squares method

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Tensor approach to optimal control problems with fractional d-dimensional elliptic operator in constraints

A New Scheme for the Tensor Representation

Hierarchical matrix techniques for maximum likelihood covariance estimation

TENSORS AND COMPUTATIONS

Experiences with Model Reduction and Interpolation

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

SIGNAL AND IMAGE RESTORATION: SOLVING

Tensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia

Tensor Decompositions and Applications

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

Kernel Method: Data Analysis with Positive Definite Kernels

Introduction to Mathematical Programming

CVPR A New Tensor Algebra - Tutorial. July 26, 2017

arxiv: v2 [math.na] 8 Apr 2017

A Quick Tour of Linear Algebra and Optimization for Machine Learning

Fundamentals of Multilinear Subspace Learning

Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution

An Empirical Chaos Expansion Method for Uncertainty Quantification

arxiv: v2 [math.na] 18 Jan 2017

Dynamical low rank approximation in hierarchical tensor format

Linear Algebra Methods for Data Mining

Schwarz Preconditioner for the Stochastic Finite Element Method

Tensor Product Approximation

Faloutsos, Tong ICDE, 2009

From Matrix to Tensor. Charles F. Van Loan

Dealing with curse and blessing of dimensionality through tensor decompositions

. a m1 a mn. a 1 a 2 a = a n

Lecture 4. Tensor-Related Singular Value Decompositions. Charles F. Van Loan

Linear Algebra and its Applications

Direct Numerical Solution of Algebraic Lyapunov Equations For Large-Scale Systems Using Quantized Tensor Trains

Solving the Stochastic Steady-State Diffusion Problem Using Multigrid

Krylov subspace methods for linear systems with tensor product structure

Rank Determination for Low-Rank Data Completion

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

A Spectral Approach to Linear Bayesian Updating

This work has been submitted to ChesterRep the University of Chester s online research repository.

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Karhunen-Loève Approximation of Random Fields Using Hierarchical Matrix Techniques

H 2 -matrices with adaptive bases

Orthogonal tensor decomposition

Ecient computation of highly oscillatory integrals by using QTT tensor approximation

8.1 Concentration inequality for Gaussian random matrix (cont d)

Tucker tensor method for fast grid-based summation of long-range potentials on 3D lattices with defects

Preconditioning. Noisy, Ill-Conditioned Linear Systems

ON MANIFOLDS OF TENSORS OF FIXED TT-RANK

A Vector-Space Approach for Stochastic Finite Element Analysis

Introduction to Data Mining

Hierarchical Parallel Solution of Stochastic Systems

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

Implementation of Sparse Wavelet-Galerkin FEM for Stochastic PDEs

Quantifying Uncertainty: Modern Computational Representation of Probability and Applications

Tensors and graphical models

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Linear Algebra (Review) Volker Tresp 2017

Background Mathematics (2/2) 1. David Barber

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Max Planck Institute Magdeburg Preprints

Hierarchical Matrices. Jon Cockayne April 18, 2017

Numerical Linear and Multilinear Algebra in Quantum Tensor Networks

Principal Component Analysis

Coupled Matrix/Tensor Decompositions:

Foundations of the stochastic Galerkin method

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Institute for Computational Mathematics Hong Kong Baptist University


Linear Algebra (Review) Volker Tresp 2018

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Lecture 4. CP and KSVD Representations. Charles F. Van Loan

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Fast Numerical Methods for Stochastic Computations

Numerical schemes of resolution of stochastic optimal control HJB equation

Transcription:

tifica Lecture 1: Introduction to low-rank tensor representation/approximation Alexander Litvinenko http://sri-uq.kaust.edu.sa/

KAUST Figure : KAUST campus, 5 years old, approx. 7000 people (include 1400 kids), 100 nations. 2 / 46

Stochastic Numerics Group at KAUST 3 / 46

Outline Motivation Examples Post-processing in low-rank data format Lecture 2: Higher order tensors Example: low-rank approx. with FFT-techniques Definitions of different tensor formats Applications Computation of the characteristic 4 / 46

Aims of low-rank/sparse data approximation 1. Represent/approximate multidimensional operators and functions in data sparse format 2. Perform linear algebra algorithms in tensor format (truncation is an issue) 3. Extract needed information from the data-sparse solution (e.g. mean, variance, quantiles etc) 5 / 46

Used Literature and Slides Book of W. Hackbusch 2012, Dissertations of I. Oseledets and M. Espig Articles of Tyrtyshnikov et al., De Lathauwer et al., L. Grasedyck, B. Khoromskij, M. Espig Lecture courses and presentations of Boris and Venera Khoromskij Software T. Kolda et al.; M. Espig et al.; D. Kressner, K. Tobler; I. Oseledets et al. 6 / 46

Challenges of numerical computations modelling of multi-particle interactions in large molecular systems such as proteins, biomolecules, modelling of large atomic (metallic) clusters, stochastic and parametric equations, machine learning, data mining and information technologies, multidimensional dynamical systems, data compression financial mathematics, tion Logo analysis Lock-up of multi-dimensional data. 7 / 46

An example of SPDE (κ(x, ω) u(x, ω)) = f (x, ω), where ω Ω, and U = L 2 (G). x G R d Further decomposition L 2 (Ω) = L 2 ( j Ω j ) = j L 2(Ω j ) = j L 2(R, Γ j ) results in u(x, ω) U S = L 2 (G) j L 2(R, Γ j ). 8 / 46

Example 1: KLE and PCE in stochastic PDEs Write first Karhunen-Loeve Expansion and then for uncorrelated random variables the Polynomial Chaos Expansion u(x, ω) = K λi ϕ i (x)ξ i (ω) = i=1 K λi ϕ i (x) i=1 α J where ξ (α) i is a tensor because α = (α 1, α 2,..., α M,...) is a multi-index. ξ (α) i H α (θ(ω)) (1) 9 / 46

Tensor of order 2 Let M := UΣV T Ũ ΣṼ T = M k. (Truncated Singular Value Decomposition). Denote A := Ũ Σ and B := Ṽ, then M k = AB T. Storage of A and B T is k(n + m) in contrast to nm for M. M = T U Σ V 10 / 46

Example: Arithmetic operations Let v R m. Suppose M k = AB T R n m, A R n k, B R m k is given. Property 1: M k v = AB T v = (A(B T v)). Cost O(km + kn). Suppose M = CD T, C R n k and D R m k. Property 2: M k + M = AnewB T new, Anew := [A C] R n 2k and Bnew = [B D] R m 2k. tion Logo Cost Lock-up of rank truncation from 2k to k is O((n + m)k 2 + k 3 ). 11 / 46

Example: Difference between matrix rank and tensor rank In both cases below the matrix rank is full. Gaussian kernel exp x y 2 has the Kronecker rank 1. The exponential kernel e x y can be approximated by a tensor with a low Kronecker rank r 1 2 3 4 5 6 10 C C r C 11.5 1.7 0.4 0.14 0.035 0.007 2.8e 8 C C r 2 C 2 6.7 0.52 0.1 0.03 0.008 0.001 5.3e 9 12 / 46

Example: Compute mean and variance in a low-rank format Let W := [v 1, v 2,..., v m ], where v i are snapshots. Given tsvd W k = AB T W R n m, A := U k S k, B := V k. v = 1 m m v i = 1 m i=1 m A b i = Ab, (2) i=1 C = 1 m 1 W cw T c = 1 m 1 ABT BA T = 1 m 1 AAT. (3) Diagonal of C can be computed with the complexity O(k 2 (m + n)). 13 / 46

Accuracy of the mean and covariance Lemma Let W W k ε and v k be a rank-k approximation of the mean v, then a) v v k 1 n ε, b) C C k 1 m 1 ε2. 14 / 46

Lecture 2: Higher order tensors 15 / 46

Definition of tensor of order d Tensor of order d is a multidimensional array over a d-tuple index set I = I 1 I d, A = [a i1...i d : i l I l ] R I, I l = {1,..., n l }, l = 1,.., d. A is an element of the linear space V n = d V l, l=1 V l = R I l equipped with the Euclidean scalar product, : V n V n R, defined as A, B := a i1...i d b i1...i d, for A, B V n. (i 1...i d ) I 16 / 46

Rank-k representation Tensor product of vectors u (l) = {u (l) i l } n i l =1 RI l forms the canonical rank-1 tensor with entries u i = u (1) i 1 A (1) [u i ] i I = u (1)... u (d),... u (d) i d. 17 / 46

Tensor formats: CP, Tucker, TT r A(i 1, i 2, i 3 ) u 1 (i 1, α)u 2 (i 2, α)u 3 (i 3, α) α=1 A(i 1, i 2, i 3 ) c(α 1, α 2, α 3 )u 1 (i 1, α 1 )u 2 (i 2, α 2 )u 3 (i 3, α 3 ) α 1,α 2,α 3 A(i 1,..., i d ) G 1 (i 1, α 1 )G 2 (α 1, i 2, α 2 )...G d 1 (α d 1, i d ) α 1,...,α d 1 Discrete: G k (i k ) is a r k 1 r k matrix, r 1 = r d = 1. 18 / 46

Canonical and Tucker tensor formats (Taken from Boris Khoromskij) 19 / 46

Tensor and Matrices Tensor (A l ) l I R I where I = I 1 I 2... I d, #I µ = n µ, #I = d µ=1 n µ. Rank-1 tensor A = u 1 u 2... u d =: d µ=1 u µ A i1,...,i d = (u 1 ) i1... (u d ) id Rank-k tensor A = k i=1 u i v i, matrix A = k i=1 u ivi T. tion Logo Kronecker Lock-up product A B R nm nm is a block matrix whose ij-th block is [A ij B]. 20 / 46

Examples (B. Khoromskij s lecture) Rank-1: f = exp(f 1 (x 1 ) +... + f d (x d )) = d j=1 exp(f j(x j )) Rank-2: f = sin( d j=1 x j), since 2i sin( d j=1 x j) = e i d j=1 x j e i d j=1 x j Rank-d function f (x 1,..., x d ) = x 1 + x 2 +... + x d can be approximated by rank-2: with any prescribed accuracy: d j=1 f (1 + εx d j) j=1 1 + O(ε), as ε 0 ε ε 21 / 46

Examples where you can apply low-rank tensor approx. f (x) = e iκ x y x y [ a,a]3 e iµ x y x y Classical Green kernels, x, y R d log( x y ), Other multivariate functions Helmholtz kernel (4) u(y)dy Yukawa potential (5) 1 x y 1 1 (a) x1 2 +.. + x d 2, (b), (c) e λ x x1 2 +.. + x d 2 x. (7) For (a) use (ρ = x1 2 +... + x d 2 [1, R], R > 1) 1 ρ = e ρt dt. (8) R + (6) 22 / 46

Examples (B. Khoromskij s lecture) Canonical rank d, TT rank 2. f (x 1,..., x d ) = w 1 (x 1 ) + w 2 (x 2 ) +... + w d (x d ) ( ) ( ) ( ) 1 0 1 0 1 = (w 1 (x 1 ), 1)... w 2 (x 2 ) 1 w d 1 (x d 1 ) 1 w d (x d ) 23 / 46

Examples: rank(f )=2 f = sin(x 1 + x 2 +... + x d ) ( ) ( ) ( ) cosx2 sinx = (sinx 1, cosx 1 ) 2 cosxd 1 sinx... d 1 cosxd sinx 2 cosx 2 sinx d 1 cosx d 1 sinx d 1 24 / 46

Tensor properties of FFT Then the d-dimensional Fourier transformation ũ = F [d] (u) is given by ( d ) ku ( d ) ũ = F [d] (u) = F i u ji = i=1 j=1 i=1 k u j=1 i=1 d (F i (u ji )) (9) 25 / 46

FFT, Hadamard and Kronecker products Lemma If k u u q = Then d j=1 i=1 u ji k q d l=1 i=1 k q k u F [d] (u q) = q li l=1 j=1 i=1 k u k q = j=1 l=1 i=1 d (F i (u ji q li )). d (u ji q li ). 26 / 46

Numerics: an example from geology Domain: 20m 20m 20m, 25, 000 25, 000 25, 000 dofs. 4,000 measurements randomly distributed within the volume, with increasing data density towards the lower left back corner of the domain. The covariance model is anisotropic Gaussian with unit variance and with 32 correlation lengths fitting into the domain in the horizontal directions, and 64 correlation lengths fitting into the vertical direction. 27 / 46

Example 6: Kriging The top left figure shows the entire domain at a sampling rate of 1:64 per direction, and then a series of zooms into the respective lower left back corner with zoom factors (sampling rates) of 4 (1:16), 16 (1:4), 64 (1:1) for the top right, bottom left and bottom right plots, respectively. Color scale: showing the 95% confidence interval [µ 2σ, µ + 2σ]. 28 / 46

Definitions of different tensor formats MATHEMATICAL DEFINITIONS, PROPERTIES AND APPLICATIONS 29 / 46

Definitions of CP Let T := d µ=1 Rnµ be the tensor product constructed from vector spaces (R nµ,, R nµ ) (d 3). Tensor representation U is a multilinear map U : P T, where parametric space P = D ν=1 P ν (d D). Further, P ν depends on some representation rank parameter r ν N. A standard example of a tensor representation is the canonical tensor format. (!!!)We distinguish between a tensor v T and its tensor format tion Logo representation Lock-up p P, where v = U(p). 30 / 46

r-terms, Tensor Rank, Canonical Tensor Format The set R r of tensors which can be represented in T with r-terms is defined as r d R r (T ) := R r := v iµ T : v iµ R nµ. (10) i=1 µ=1 Let v T. The tensor rank of v in T is rank(v) := min {r N 0 : v R r }. (11) Example: The Laplace operator in 3d: 3 = 1 I I + I 1 I + I I 1 31 / 46

Definitions of CP The canonical tensor format is defined by the mapping U cp : d µ=1 R nµ r R r, (12) r d ˆv := (v iµ : 1 i r, 1 µ d) U cp (ˆv) := v iµ. i=1 µ=1 32 / 46

Properties of CP Lemma Let r 1, r 2 N, u R r1 and v R r2. We have r2 (i) u, v T = r 1 d j1 =1 j2 =1 µ=1 u j 1 µ, v j2 µ R nµ. The ( ) computational cost of u, v T is O r 1 r d 2 µ=1 n µ. (ii) u + v R r1 +r 2. (iii) u v R r1 r 2, where denotes the point wise Hadamard product. Further, u v can be computed in the canonical tensor format with r 1 r 2 d µ=1 n µ arithmetic operations. tion Logo Let RLock-up 1 = A 1 B1 T, R 2 = A 2 B2 T be rank-k matrices, then R 1 + R 2 = [A 1 A 2 ][B 1 B 2 ] T be rank-2k matrix. Rank truncation! 33 / 46

Properties of CP, Hadamard product and FT Let u = k j=1 d i=1 u ji, u ji R n. k d F [d] (ũ) = (F i (ũ ji )), where F [d] = j=1 i=1 d F i. (13) i=1 Let S = AB T = k 1 i=0 a ibi T R n m, T = CD T = k 2 j=0 c idi T R n m where a i, c i R n, b i, d i R m, k 1, k 2, n, m > 0. Then k 1 k 2 F (2) (S T ) = F(a i c j )F(bi T dj T ). i=0 j=0 34 / 46

Tucker Tensor Format k 1 A =... i 1 =1 k d i d =1 Core tensor c R k 1... k d, rank (k 1,..., k d ). Nonlinear fixed rank approximation problem: c i1,...,i d u 1 i 1... u d i d (14) X = argmin X min rank(k1,...,k d ) A X (15) Problem is well-posed but not solved There are many local minima HOSVD (Lathauwer et al.) yields rank (k 1,..., k d ) Y : A Y d A X reliable arithmetic, exponential scaling (c R k 1 k 2... k d ) 35 / 46

Example: Canonical rank d, whereas TT rank 2 d-laplacian over uniform tensor grid. It is known to have the Kronecker rank-d representation, d = A I N... I N +I N A... I N +...+I N I N... A R I d I d (16) with A = 1 = tridiag{ 1, 2, 1} R N N, and I N being the N N identity. Notice that for the canonical rank we have rank k C ( d ) = d, while TT-rank of d is equal to 2 for any dimension due to the explicit representation ( ) ( ) ( ) I 0 I 0 I d = ( 1 I )... (17) I I 1 where the rank product operation is defined as a regular matrix product of the two corresponding core matrices, their blocks being multiplied by means of tensor product. The similar bound is true for the Tucker rank rank Tuck ( d ) = 2. 1 1 36 / 46

Advantages and disadvantages Denote k - rank, d-dimension, n = # dofs in 1D: 1. CP: ill-posed approx. alg-m, O(dnk), hard to compute approx. 2. Tucker: reliable arithmetic based on SVD, O(dnk + k d ) 3. Hierarchical Tucker: based on SVD, storage O(dnk + dk 3 ), truncation O(dnk 2 + dk 4 ) 4. TT: based on SVD, O(dnk 2 ) or O(dnk 3 ), stable, linear in d, polynomial in r 5. Quantics-TT: O(n d ) O(dlog q n) 37 / 46

POSTPROCESSING Postprocessing in the canonical tensor format Plan: Want to compute frequency and level sets ( needed for the probability density function) An intermediate step: compute the sign function and then the characteristic function χ I (u) 38 / 46

Characteristic, Sign Definition The characteristic χ I (u) T of u T in I R is for every multiindex i I pointwise defined as (χ I (u)) i := { 1, ui I ; 0, u i / I. (18) Furthermore, the sign(u) T is for all i I pointwise defined by 1, u i > 0; (sign(u)) i := 1, u i < 0; (19) 0, u i = 0. 39 / 46

How to compute the characteristic function? Lemma: Let u T, a, b R, and 1 = d µ=1 1 µ, where 1 µ := (1,..., 1) t R nµ. (i) If I = R <b, then we have χ I (u) = 1 2 (1 + sign(b1 u)). (ii) If I = R >a, then we have χ I (u) = 1 2 (1 sign(a1 u)). (iii) If I = (a, b), then we have χ I (u) = 1 2 (sign(b1 u) sign(a1 u)). 40 / 46

Definition of Level Set and Frequency Definition Let I R and u T. The level set L I (u) T of u respect to I is pointwise defined by (L I (u)) i := { ui, u i I ; 0, u i / I, (20) for all i I. The frequency F I (u) N of u respect to I is defined as F I (u) := # supp χ I (u). (21) 41 / 46

Computation of level sets and frequency Proposition Let I R, u T, and χ I (u) its characteristic. We have L I (u) = χ I (u) u and rank(l I (u)) rank(χ I (u))rank(u). The frequency F I (u) N of u respect to I is F I (u) = χ I (u), 1, where 1 = d 1 µ=1 µ, 1 µ := (1,..., 1) T R nµ. 42 / 46

How to compute the mean value in CP format Let u = r d j=1 µ=1 u jµ R r, then the mean value u can be computed as a scalar product r d d u = u jµ, 1 r d ujµ, 1 1 µ µ = = n µ n µ j=1 µ=1 = r µ=1 d 1 n j=1 µ=1 µ where 1 µ := (1,..., 1) T R nµ. ( Numerical cost is O r d ) µ=1 n µ. j=1 µ=1 ( (22) nµ ) u jµ, (23) k=1 43 / 46

How to compute the variance in CP format Let u R r and ũ := u u d µ=1 1 r+1 1 = n µ j=1 µ=1 d ũ jµ R r+1, (24) then the variance var(u) of u can be computed as follows ũ, ũ 1 r+1 d r+1 var(u) = d µ=1 n = d µ µ=1 n ũ iµ, µ r+1 r+1 = d i=1 j=1 µ=1 1 n µ ũ iµ, ũ jµ. i=1 µ=1 ( Numerical cost is O (r + 1) 2 d ) µ=1 n µ. d j=1 ν=1 ũ jν 44 / 46

Conclusion Today we discussed: Motivation: why do we need low-rank tensors Tensors of the second order (matrices) Post processing: Computation of mean and variance CP, Tucker and tensor train tensor formats Many classical kernels have (or can be approximated in ) low-rank tensor format An example from kriging 45 / 46

Tensor Software Ivan Oseledets et al., Tensor Train toolbox (Matlab), http://spring.inm.ras.ru/osel D.Kressner, C. Tobler, Hierarchical Tucker Toolbox (Matlab), http://www.sam.math.ethz.ch/nlagroup/htucker toolbox.html M. Espig, et al Tensor Calculus library (C): http://gitorious.org/tensorcalculus 46 / 46