Lecture 1: Introduction to low-rank tensor representation/approximation. Center for Uncertainty Quantification. Alexander Litvinenko

tifica Lecture 1: Introduction to low-rank tensor representation/approximation Alexander Litvinenko http://sri-uq.kaust.edu.sa/

KAUST Figure : KAUST campus, 5 years old, approx. 7000 people (include 1400 kids), 100 nations. 2 / 46

Stochastic Numerics Group at KAUST 3 / 46

Outline Motivation Examples Post-processing in low-rank data format Lecture 2: Higher order tensors Example: low-rank approx. with FFT-techniques Definitions of different tensor formats Applications Computation of the characteristic 4 / 46

Aims of low-rank/sparse data approximation 1. Represent/approximate multidimensional operators and functions in data sparse format 2. Perform linear algebra algorithms in tensor format (truncation is an issue) 3. Extract needed information from the data-sparse solution (e.g. mean, variance, quantiles etc) 5 / 46

Used Literature and Slides Book of W. Hackbusch 2012, Dissertations of I. Oseledets and M. Espig Articles of Tyrtyshnikov et al., De Lathauwer et al., L. Grasedyck, B. Khoromskij, M. Espig Lecture courses and presentations of Boris and Venera Khoromskij Software T. Kolda et al.; M. Espig et al.; D. Kressner, K. Tobler; I. Oseledets et al. 6 / 46

Challenges of numerical computations modelling of multi-particle interactions in large molecular systems such as proteins, biomolecules, modelling of large atomic (metallic) clusters, stochastic and parametric equations, machine learning, data mining and information technologies, multidimensional dynamical systems, data compression financial mathematics, tion Logo analysis Lock-up of multi-dimensional data. 7 / 46

An example of SPDE (κ(x, ω) u(x, ω)) = f (x, ω), where ω Ω, and U = L 2 (G). x G R d Further decomposition L 2 (Ω) = L 2 ( j Ω j ) = j L 2(Ω j ) = j L 2(R, Γ j ) results in u(x, ω) U S = L 2 (G) j L 2(R, Γ j ). 8 / 46

Example 1: KLE and PCE in stochastic PDEs Write first Karhunen-Loeve Expansion and then for uncorrelated random variables the Polynomial Chaos Expansion u(x, ω) = K λi ϕ i (x)ξ i (ω) = i=1 K λi ϕ i (x) i=1 α J where ξ (α) i is a tensor because α = (α 1, α 2,..., α M,...) is a multi-index. ξ (α) i H α (θ(ω)) (1) 9 / 46

Tensor of order 2 Let M := UΣV T Ũ ΣṼ T = M k. (Truncated Singular Value Decomposition). Denote A := Ũ Σ and B := Ṽ, then M k = AB T. Storage of A and B T is k(n + m) in contrast to nm for M. M = T U Σ V 10 / 46

Example: Arithmetic operations Let v R m. Suppose M k = AB T R n m, A R n k, B R m k is given. Property 1: M k v = AB T v = (A(B T v)). Cost O(km + kn). Suppose M = CD T, C R n k and D R m k. Property 2: M k + M = AnewB T new, Anew := [A C] R n 2k and Bnew = [B D] R m 2k. tion Logo Cost Lock-up of rank truncation from 2k to k is O((n + m)k 2 + k 3 ). 11 / 46

Example: Difference between matrix rank and tensor rank In both cases below the matrix rank is full. Gaussian kernel exp x y 2 has the Kronecker rank 1. The exponential kernel e x y can be approximated by a tensor with a low Kronecker rank r 1 2 3 4 5 6 10 C C r C 11.5 1.7 0.4 0.14 0.035 0.007 2.8e 8 C C r 2 C 2 6.7 0.52 0.1 0.03 0.008 0.001 5.3e 9 12 / 46

Example: Compute mean and variance in a low-rank format Let W := [v 1, v 2,..., v m ], where v i are snapshots. Given tsvd W k = AB T W R n m, A := U k S k, B := V k. v = 1 m m v i = 1 m i=1 m A b i = Ab, (2) i=1 C = 1 m 1 W cw T c = 1 m 1 ABT BA T = 1 m 1 AAT. (3) Diagonal of C can be computed with the complexity O(k 2 (m + n)). 13 / 46

Accuracy of the mean and covariance Lemma Let W W k ε and v k be a rank-k approximation of the mean v, then a) v v k 1 n ε, b) C C k 1 m 1 ε2. 14 / 46

Lecture 2: Higher order tensors 15 / 46

Definition of tensor of order d Tensor of order d is a multidimensional array over a d-tuple index set I = I 1 I d, A = [a i1...i d : i l I l ] R I, I l = {1,..., n l }, l = 1,.., d. A is an element of the linear space V n = d V l, l=1 V l = R I l equipped with the Euclidean scalar product, : V n V n R, defined as A, B := a i1...i d b i1...i d, for A, B V n. (i 1...i d ) I 16 / 46

Rank-k representation Tensor product of vectors u (l) = {u (l) i l } n i l =1 RI l forms the canonical rank-1 tensor with entries u i = u (1) i 1 A (1) [u i ] i I = u (1)... u (d),... u (d) i d. 17 / 46

Tensor formats: CP, Tucker, TT r A(i 1, i 2, i 3 ) u 1 (i 1, α)u 2 (i 2, α)u 3 (i 3, α) α=1 A(i 1, i 2, i 3 ) c(α 1, α 2, α 3 )u 1 (i 1, α 1 )u 2 (i 2, α 2 )u 3 (i 3, α 3 ) α 1,α 2,α 3 A(i 1,..., i d ) G 1 (i 1, α 1 )G 2 (α 1, i 2, α 2 )...G d 1 (α d 1, i d ) α 1,...,α d 1 Discrete: G k (i k ) is a r k 1 r k matrix, r 1 = r d = 1. 18 / 46

Canonical and Tucker tensor formats (Taken from Boris Khoromskij) 19 / 46

Tensor and Matrices Tensor (A l ) l I R I where I = I 1 I 2... I d, #I µ = n µ, #I = d µ=1 n µ. Rank-1 tensor A = u 1 u 2... u d =: d µ=1 u µ A i1,...,i d = (u 1 ) i1... (u d ) id Rank-k tensor A = k i=1 u i v i, matrix A = k i=1 u ivi T. tion Logo Kronecker Lock-up product A B R nm nm is a block matrix whose ij-th block is [A ij B]. 20 / 46

Examples (B. Khoromskij s lecture) Rank-1: f = exp(f 1 (x 1 ) +... + f d (x d )) = d j=1 exp(f j(x j )) Rank-2: f = sin( d j=1 x j), since 2i sin( d j=1 x j) = e i d j=1 x j e i d j=1 x j Rank-d function f (x 1,..., x d ) = x 1 + x 2 +... + x d can be approximated by rank-2: with any prescribed accuracy: d j=1 f (1 + εx d j) j=1 1 + O(ε), as ε 0 ε ε 21 / 46

Examples where you can apply low-rank tensor approx. f (x) = e iκ x y x y [ a,a]3 e iµ x y x y Classical Green kernels, x, y R d log( x y ), Other multivariate functions Helmholtz kernel (4) u(y)dy Yukawa potential (5) 1 x y 1 1 (a) x1 2 +.. + x d 2, (b), (c) e λ x x1 2 +.. + x d 2 x. (7) For (a) use (ρ = x1 2 +... + x d 2 [1, R], R > 1) 1 ρ = e ρt dt. (8) R + (6) 22 / 46

Examples (B. Khoromskij s lecture) Canonical rank d, TT rank 2. f (x 1,..., x d ) = w 1 (x 1 ) + w 2 (x 2 ) +... + w d (x d ) ( ) ( ) ( ) 1 0 1 0 1 = (w 1 (x 1 ), 1)... w 2 (x 2 ) 1 w d 1 (x d 1 ) 1 w d (x d ) 23 / 46

Examples: rank(f )=2 f = sin(x 1 + x 2 +... + x d ) ( ) ( ) ( ) cosx2 sinx = (sinx 1, cosx 1 ) 2 cosxd 1 sinx... d 1 cosxd sinx 2 cosx 2 sinx d 1 cosx d 1 sinx d 1 24 / 46

Tensor properties of FFT Then the d-dimensional Fourier transformation ũ = F [d] (u) is given by ( d ) ku ( d ) ũ = F [d] (u) = F i u ji = i=1 j=1 i=1 k u j=1 i=1 d (F i (u ji )) (9) 25 / 46

FFT, Hadamard and Kronecker products Lemma If k u u q = Then d j=1 i=1 u ji k q d l=1 i=1 k q k u F [d] (u q) = q li l=1 j=1 i=1 k u k q = j=1 l=1 i=1 d (F i (u ji q li )). d (u ji q li ). 26 / 46

Numerics: an example from geology Domain: 20m 20m 20m, 25, 000 25, 000 25, 000 dofs. 4,000 measurements randomly distributed within the volume, with increasing data density towards the lower left back corner of the domain. The covariance model is anisotropic Gaussian with unit variance and with 32 correlation lengths fitting into the domain in the horizontal directions, and 64 correlation lengths fitting into the vertical direction. 27 / 46

Example 6: Kriging The top left figure shows the entire domain at a sampling rate of 1:64 per direction, and then a series of zooms into the respective lower left back corner with zoom factors (sampling rates) of 4 (1:16), 16 (1:4), 64 (1:1) for the top right, bottom left and bottom right plots, respectively. Color scale: showing the 95% confidence interval [µ 2σ, µ + 2σ]. 28 / 46

Definitions of different tensor formats MATHEMATICAL DEFINITIONS, PROPERTIES AND APPLICATIONS 29 / 46

Definitions of CP Let T := d µ=1 Rnµ be the tensor product constructed from vector spaces (R nµ,, R nµ ) (d 3). Tensor representation U is a multilinear map U : P T, where parametric space P = D ν=1 P ν (d D). Further, P ν depends on some representation rank parameter r ν N. A standard example of a tensor representation is the canonical tensor format. (!!!)We distinguish between a tensor v T and its tensor format tion Logo representation Lock-up p P, where v = U(p). 30 / 46

r-terms, Tensor Rank, Canonical Tensor Format The set R r of tensors which can be represented in T with r-terms is defined as r d R r (T ) := R r := v iµ T : v iµ R nµ. (10) i=1 µ=1 Let v T. The tensor rank of v in T is rank(v) := min {r N 0 : v R r }. (11) Example: The Laplace operator in 3d: 3 = 1 I I + I 1 I + I I 1 31 / 46

Definitions of CP The canonical tensor format is defined by the mapping U cp : d µ=1 R nµ r R r, (12) r d ˆv := (v iµ : 1 i r, 1 µ d) U cp (ˆv) := v iµ. i=1 µ=1 32 / 46

Properties of CP Lemma Let r 1, r 2 N, u R r1 and v R r2. We have r2 (i) u, v T = r 1 d j1 =1 j2 =1 µ=1 u j 1 µ, v j2 µ R nµ. The ( ) computational cost of u, v T is O r 1 r d 2 µ=1 n µ. (ii) u + v R r1 +r 2. (iii) u v R r1 r 2, where denotes the point wise Hadamard product. Further, u v can be computed in the canonical tensor format with r 1 r 2 d µ=1 n µ arithmetic operations. tion Logo Let RLock-up 1 = A 1 B1 T, R 2 = A 2 B2 T be rank-k matrices, then R 1 + R 2 = [A 1 A 2 ][B 1 B 2 ] T be rank-2k matrix. Rank truncation! 33 / 46

Properties of CP, Hadamard product and FT Let u = k j=1 d i=1 u ji, u ji R n. k d F [d] (ũ) = (F i (ũ ji )), where F [d] = j=1 i=1 d F i. (13) i=1 Let S = AB T = k 1 i=0 a ibi T R n m, T = CD T = k 2 j=0 c idi T R n m where a i, c i R n, b i, d i R m, k 1, k 2, n, m > 0. Then k 1 k 2 F (2) (S T ) = F(a i c j )F(bi T dj T ). i=0 j=0 34 / 46

Tucker Tensor Format k 1 A =... i 1 =1 k d i d =1 Core tensor c R k 1... k d, rank (k 1,..., k d ). Nonlinear fixed rank approximation problem: c i1,...,i d u 1 i 1... u d i d (14) X = argmin X min rank(k1,...,k d ) A X (15) Problem is well-posed but not solved There are many local minima HOSVD (Lathauwer et al.) yields rank (k 1,..., k d ) Y : A Y d A X reliable arithmetic, exponential scaling (c R k 1 k 2... k d ) 35 / 46

Example: Canonical rank d, whereas TT rank 2 d-laplacian over uniform tensor grid. It is known to have the Kronecker rank-d representation, d = A I N... I N +I N A... I N +...+I N I N... A R I d I d (16) with A = 1 = tridiag{ 1, 2, 1} R N N, and I N being the N N identity. Notice that for the canonical rank we have rank k C ( d ) = d, while TT-rank of d is equal to 2 for any dimension due to the explicit representation ( ) ( ) ( ) I 0 I 0 I d = ( 1 I )... (17) I I 1 where the rank product operation is defined as a regular matrix product of the two corresponding core matrices, their blocks being multiplied by means of tensor product. The similar bound is true for the Tucker rank rank Tuck ( d ) = 2. 1 1 36 / 46

Advantages and disadvantages Denote k - rank, d-dimension, n = # dofs in 1D: 1. CP: ill-posed approx. alg-m, O(dnk), hard to compute approx. 2. Tucker: reliable arithmetic based on SVD, O(dnk + k d ) 3. Hierarchical Tucker: based on SVD, storage O(dnk + dk 3 ), truncation O(dnk 2 + dk 4 ) 4. TT: based on SVD, O(dnk 2 ) or O(dnk 3 ), stable, linear in d, polynomial in r 5. Quantics-TT: O(n d ) O(dlog q n) 37 / 46

POSTPROCESSING Postprocessing in the canonical tensor format Plan: Want to compute frequency and level sets ( needed for the probability density function) An intermediate step: compute the sign function and then the characteristic function χ I (u) 38 / 46

Characteristic, Sign Definition The characteristic χ I (u) T of u T in I R is for every multiindex i I pointwise defined as (χ I (u)) i := { 1, ui I ; 0, u i / I. (18) Furthermore, the sign(u) T is for all i I pointwise defined by 1, u i > 0; (sign(u)) i := 1, u i < 0; (19) 0, u i = 0. 39 / 46

How to compute the characteristic function? Lemma: Let u T, a, b R, and 1 = d µ=1 1 µ, where 1 µ := (1,..., 1) t R nµ. (i) If I = R <b, then we have χ I (u) = 1 2 (1 + sign(b1 u)). (ii) If I = R >a, then we have χ I (u) = 1 2 (1 sign(a1 u)). (iii) If I = (a, b), then we have χ I (u) = 1 2 (sign(b1 u) sign(a1 u)). 40 / 46

Definition of Level Set and Frequency Definition Let I R and u T. The level set L I (u) T of u respect to I is pointwise defined by (L I (u)) i := { ui, u i I ; 0, u i / I, (20) for all i I. The frequency F I (u) N of u respect to I is defined as F I (u) := # supp χ I (u). (21) 41 / 46

Computation of level sets and frequency Proposition Let I R, u T, and χ I (u) its characteristic. We have L I (u) = χ I (u) u and rank(l I (u)) rank(χ I (u))rank(u). The frequency F I (u) N of u respect to I is F I (u) = χ I (u), 1, where 1 = d 1 µ=1 µ, 1 µ := (1,..., 1) T R nµ. 42 / 46

How to compute the mean value in CP format Let u = r d j=1 µ=1 u jµ R r, then the mean value u can be computed as a scalar product r d d u = u jµ, 1 r d ujµ, 1 1 µ µ = = n µ n µ j=1 µ=1 = r µ=1 d 1 n j=1 µ=1 µ where 1 µ := (1,..., 1) T R nµ. ( Numerical cost is O r d ) µ=1 n µ. j=1 µ=1 ( (22) nµ ) u jµ, (23) k=1 43 / 46

How to compute the variance in CP format Let u R r and ũ := u u d µ=1 1 r+1 1 = n µ j=1 µ=1 d ũ jµ R r+1, (24) then the variance var(u) of u can be computed as follows ũ, ũ 1 r+1 d r+1 var(u) = d µ=1 n = d µ µ=1 n ũ iµ, µ r+1 r+1 = d i=1 j=1 µ=1 1 n µ ũ iµ, ũ jµ. i=1 µ=1 ( Numerical cost is O (r + 1) 2 d ) µ=1 n µ. d j=1 ν=1 ũ jν 44 / 46

Conclusion Today we discussed: Motivation: why do we need low-rank tensors Tensors of the second order (matrices) Post processing: Computation of mean and variance CP, Tucker and tensor train tensor formats Many classical kernels have (or can be approximated in ) low-rank tensor format An example from kriging 45 / 46

Tensor Software Ivan Oseledets et al., Tensor Train toolbox (Matlab), http://spring.inm.ras.ru/osel D.Kressner, C. Tobler, Hierarchical Tucker Toolbox (Matlab), http://www.sam.math.ethz.ch/nlagroup/htucker toolbox.html M. Espig, et al Tensor Calculus library (C): http://gitorious.org/tensorcalculus 46 / 46