Adaptive low-rank approximation in hierarchical tensor format using least-squares method

Size: px

Start display at page:

Download "Adaptive low-rank approximation in hierarchical tensor format using least-squares method"

Jonas Alvin Barton
5 years ago
Views:

1 Workshop on Challenges in HD Analysis and Computation, San Servolo 4/5/2016 Adaptive low-rank approximation in hierarchical tensor format using least-squares method Anthony Nouy Ecole Centrale Nantes, GeM Joint work with Mathilde Chevreuil, Loic Giraldi, Prashant Rai Supported by the ANR (CHORUS project) Anthony Nouy Ecole Centrale Nantes 1

2 High-dimensional problems in uncertainty quantification (UQ) Parameter-dependent models: where X are random variables. M(u(X ); X ) = 0 Anthony Nouy Ecole Centrale Nantes 2

3 High-dimensional problems in uncertainty quantification (UQ) Parameter-dependent models: where X are random variables. M(u(X ); X ) = 0 Solving forward and inverse UQ problems requires the evaluation of the model for many instances of X. Anthony Nouy Ecole Centrale Nantes 2

4 High-dimensional problems in uncertainty quantification (UQ) Parameter-dependent models: where X are random variables. M(u(X ); X ) = 0 Solving forward and inverse UQ problems requires the evaluation of the model for many instances of X. In practice, we rely on approximations of the solution map which are used as surrogate models. x u(x) Anthony Nouy Ecole Centrale Nantes 2

5 High-dimensional problems in uncertainty quantification (UQ) Parameter-dependent models: where X are random variables. M(u(X ); X ) = 0 Solving forward and inverse UQ problems requires the evaluation of the model for many instances of X. In practice, we rely on approximations of the solution map which are used as surrogate models. x u(x) Complexity issues: High-dimensional function u(x 1,..., x d ) For complex numerical models, only a few evaluations of the function are available. Anthony Nouy Ecole Centrale Nantes 2

6 Structured approximation for high-dimensional problems Specific structures of high-dimensional functions have to be exploited (application dependent) Low effective dimensionality Low-order interactions Sparsity (relatively to a basis or frame) u(x 1,..., x d ) u 1(x 1) u(x 1,..., x d ) u 1(x 1) u d (x d ) u(x) = α N d u αψ α(x) α Λ u αψ α(x)... Low-rank structures Anthony Nouy Ecole Centrale Nantes 3

7 Outline 1 Rank-structured approximation 2 Statistical learning methods for tensor approximation 3 Adaptive approximation in tree-based low-rank formats Anthony Nouy Ecole Centrale Nantes 4

8 Outline 1 Rank-structured approximation 2 Statistical learning methods for tensor approximation 3 Adaptive approximation in tree-based low-rank formats Anthony Nouy Ecole Centrale Nantes 5

9 Rank-structured approximation A multivariate function u(x 1,..., x d ) defined on X = X 1... X d is identified with an element of the tensor space V 1... V d = span{v 1 (x 1)... v d (x d ); v ν V ν} where V ν is a space of functions defined on X ν. Anthony Nouy Ecole Centrale Nantes 5

10 Rank-structured approximation A multivariate function u(x 1,..., x d ) defined on X = X 1... X d is identified with an element of the tensor space V 1... V d = span{v 1 (x 1)... v d (x d ); v ν V ν} where V ν is a space of functions defined on X ν. Approximation in a subset of tensors with bounded rank M r = {v V n 1... V n d ; rank(v) r} Anthony Nouy Ecole Centrale Nantes 5

11 Rank-structured approximation A multivariate function u(x 1,..., x d ) defined on X = X 1... X d is identified with an element of the tensor space V 1... V d = span{v 1 (x 1)... v d (x d ); v ν V ν} where V ν is a space of functions defined on X ν. Approximation in a subset of tensors with bounded rank M r = {v V n 1... V n d ; rank(v) r} For order-two tensors, a single notion of rank: rank(v) r v = r i=1 v 1 i (x 1)v 2 i (x 2) Anthony Nouy Ecole Centrale Nantes 5

12 Rank-structured approximation A multivariate function u(x 1,..., x d ) defined on X = X 1... X d is identified with an element of the tensor space V 1... V d = span{v 1 (x 1)... v d (x d ); v ν V ν} where V ν is a space of functions defined on X ν. Approximation in a subset of tensors with bounded rank M r = {v V n 1... V n d ; rank(v) r} For order-two tensors, a single notion of rank: rank(v) r v = r i=1 v 1 i (x 1)v 2 i (x 2) For higher-order tensors, different notions of rank, such as the canonical rank rank(v) r v = r i=1 Storage complexity: O(rdn) v 1 i (x 1)... v d i (x d ) Anthony Nouy Ecole Centrale Nantes 5

13 Subspace based low-rank formats α-rank: for α {1,..., d}, V = V α V α c, with V α = µ α Vµ, and rank α(v) r α v = r α i=1 vi α (x α)vi αc (x α c ), vi α V α, vi αc V α c v U α V α c with dim(u α) = r α {1, 2, 3, 4} α = {1, 2} α c = {3, 4} Storage complexity: O(r α(n #α + n #αc )) Anthony Nouy Ecole Centrale Nantes 6

14 Subspace based low-rank formats Tree-based Tucker format [Hackbusch-Kuhn 09,Oseledets-Tyrtyshnikov 09]: rank T (v) = (rank α(v) : α T ) with T a dimension tree {1, 2, 3, 4} {1, 2} {3, 4} {1} {2} {3} {4} storage complexity: O(dnR + dr s+1 ) with R r α Example (additive model): u(x 1,..., x d ) = u 1(x 1) u d (x d ) has rank T (u) = (2, 2, 2,..., 2) for any tree T. Anthony Nouy Ecole Centrale Nantes 7

15 Subspace-based low-rank formats Tensor-train format: a degenerate case where T is a subset of a linearly-structured dimension tree {1, 2, 3, 4} {1, 2, 3} {4} {1, 2} {3} T = {{1}, {1, 2}, {1, 2, 3},..., {1,..., d 1}} {1} {2} r {1} rank T (v) r v(x) = r {1,2} i 1 =1 i 2 =1... r {1,...,d 1} i d 1 =1 v 1 1,i 1 (x 1)v 2 i 1,i 2 (x 2)... v d i d 1,1(x d ) storage complexity: O(dnR 2 ) with R r α Anthony Nouy Ecole Centrale Nantes 8

16 Subspace based low-rank formats Storage and computational complexity scales as O(d). Good topological and geometrical properties of manifolds M r of tensors with bounded rank. [Holtz-Rohwedder-Schneider 11, Uschmajew-Vandereycken 13,Falco-Hackbusch-N. 15] Multilinear parametrization: where F is a multilinear map. M r = {v = F (p 1,..., p L ); p k P k, 1 k L} Anthony Nouy Ecole Centrale Nantes 9

17 Outline 1 Rank-structured approximation 2 Statistical learning methods for tensor approximation 3 Adaptive approximation in tree-based low-rank formats Anthony Nouy Ecole Centrale Nantes 10

18 Statistical learning methods for tensor approximation Approximation of a function u(x ) = u(x 1,..., X d ) from evaluations {y k = u(x k )} K k=1 on a training set {x k } K k=1 (i.i.d. samples of X ) Anthony Nouy Ecole Centrale Nantes 10

19 Statistical learning methods for tensor approximation Approximation of a function u(x ) = u(x 1,..., X d ) from evaluations {y k = u(x k )} K k=1 on a training set {x k } K k=1 (i.i.d. samples of X ) Approximation in subsets of rank-structured functions by minimization of an empirical risk M r = {v : rank(v) r} R K (v) = 1 K where l is a certain loss function. K l(u(x k ), v(x k )) k=1 Anthony Nouy Ecole Centrale Nantes 10

20 Statistical learning methods for tensor approximation Approximation of a function u(x ) = u(x 1,..., X d ) from evaluations {y k = u(x k )} K k=1 on a training set {x k } K k=1 (i.i.d. samples of X ) Approximation in subsets of rank-structured functions by minimization of an empirical risk M r = {v : rank(v) r} R K (v) = 1 K where l is a certain loss function. K l(u(x k ), v(x k )) k=1 Here, we consider for least-squares regression R K (v) = 1 K K (u(x k ) v(x k )) 2 = ÊK ((u(x ) v(x )) 2 ) k=1 but other loss functions could be used for different objectives than L 2 -approximation (e.g. classification). Anthony Nouy Ecole Centrale Nantes 10

21 Alternating minimization algorithm Multilinear parametrization of tensor manifolds M r = {v = F (p 1,..., p L ) : p l R m l, 1 l L} so that min RK (v) = min RK (F (p v M r p 1,...,p 1,..., p d )) L Anthony Nouy Ecole Centrale Nantes 11

22 Alternating minimization algorithm Multilinear parametrization of tensor manifolds so that M r = {v = F (p 1,..., p L ) : p l R m l, 1 l L} min RK (v) = min RK (F (p v M r p 1,...,p 1,..., p d )) L Alternating minimization algorithm: Successive minimization problems min p l R m l R K (F (p 1,..., p l,..., p d )) }{{} Ψ l ( ) T p l which are standard linear approximation problems min p l R m l 1 K K l(u(x k ), Ψ l (x k ) T p l ) k=1 Anthony Nouy Ecole Centrale Nantes 11

23 Regularization Regularization min RK (F (p p 1,...,p 1,..., p L )) + L with regularization functionals Ω l promoting sparsity (e.g. Ω l (p l ) = p l 1), smoothness,... L λ l Ω l (p l ) Alternating minimization algorithm requires the solution of successive standard regularized linear approximation problems l=1 1 min p l K K l(u(x k ), Ψ l (x k ) T p l ) + λ l Ω l (p l ) k=1 ( ) For square-loss and Ω l (p l ) = p l 1, ( ) is a LASSO problem. Cross-validation methods for the selection of regularization parameters λ l. Anthony Nouy Ecole Centrale Nantes 12

24 Illustrations Approximation in tensor-train (TT) format: v(x 1,..., x d ) = Polynomial approximations r 1 i 1 =1... r d 1 i d 1 =1 vi k k 1,i k P q v 1 1,i 1 (x 1)... v d i d 1,1(x d ) v = F (p 1,..., p d ) with parameter p k R (q+1)r k r k 1 gathering the coefficients of functions of x k on a polynomial basis (orthonormal in L 2 P Xk (X k )). Sparsity inducing regularization and cross-validation (leave one out) for the automatic selection of polynomial basis functions. Use of standard least-squares in the selected basis. Anthony Nouy Ecole Centrale Nantes 13

25 Illustration : Borehole function The Borehole function models water flow through a borehole: 2πT u(h u H l ) u(x ) = ( ), X = (r w, log(r), T u, H u, T l, H l, L, K w ) 2LT ln(r/r w ) 1 + u + Tu ln(r/r w )rw 2 Kw T l r w radius of borehole (m) N(µ = 0.10, σ = ) r radius of influence (m) LN(µ = 7.71, σ = ) T u transmissivity of upper aquifer (m 2 /yr) U(63070, ) H u potentiometric head of upper aquifer (m) U(990, 1110) T l transmissivity of lower aquifer (m 2 /yr) U(63.1, 116) H l potentiometric head of lower aquifer (m) U(700, 820) L length of borehole (m) U(1120, 1680) K w hydraulic conductivity of borehole (m/yr) U(9855, 12045) Polynomial approximation with degree q = 8. Test set of size Anthony Nouy Ecole Centrale Nantes 14

26 Illustration : Borehole function Test error for different ranks and for different sizes K of the training set. rank K = 100 K=1000 K=10000 ( ) ( ) ( ) ( ) ( ) ( ) Anthony Nouy Ecole Centrale Nantes 15

27 Illustration : Borehole function Test error for different ranks and for different sizes K of the training set. rank K = 100 K=1000 K=10000 ( ) ( ) ( ) ( ) ( ) ( ) Finding optimal rank is a combinatorial problem... Anthony Nouy Ecole Centrale Nantes 15

28 Outline 1 Rank-structured approximation 2 Statistical learning methods for tensor approximation 3 Adaptive approximation in tree-based low-rank formats Anthony Nouy Ecole Centrale Nantes 16

29 Heuristic strategy for rank adaptation (tree-based Tucker format) Given T 2 {1,...,d}, construction of a sequence of approximations u m in tree-based Tucker format with increasing rank: u m {v : rank T (v) (r m α ) α T } Anthony Nouy Ecole Centrale Nantes 16

30 Heuristic strategy for rank adaptation (tree-based Tucker format) Given T 2 {1,...,d}, construction of a sequence of approximations u m in tree-based Tucker format with increasing rank: u m {v : rank T (v) (r m α ) α T } At iteration m, { rα m+1 = rα m + 1 if α T m rα m+1 = rα m if α / T m where T m is selected in order to obtain (hopefully) the fastest decrease of the error. Anthony Nouy Ecole Centrale Nantes 16

31 Heuristic strategy for rank adaptation (tree-based Tucker format) Given T 2 {1,...,d}, construction of a sequence of approximations u m in tree-based Tucker format with increasing rank: u m {v : rank T (v) (r m α ) α T } At iteration m, { r m+1 α = r m α + 1 if α T m r m+1 α = r m α if α / T m where T m is selected in order to obtain (hopefully) the fastest decrease of the error. A possible strategy consists in computing the singular values σ1 α... σr α α m of α-matricizations M α(u m) of u m for all α T, M α(u m) = r m α i=1 σ α i v α i v αc i V α V α c Anthony Nouy Ecole Centrale Nantes 16

32 Heuristic strategy for rank adaptation (tree-based Tucker format) Given T 2 {1,...,d}, construction of a sequence of approximations u m in tree-based Tucker format with increasing rank: u m {v : rank T (v) (r m α ) α T } At iteration m, { r m+1 α = r m α + 1 if α T m r m+1 α = r m α if α / T m where T m is selected in order to obtain (hopefully) the fastest decrease of the error. A possible strategy consists in computing the singular values σ1 α... σr α α m of α-matricizations M α(u m) of u m for all α T, M α(u m) = r m α i=1 u m 2 = r m α i=1 (σ α i ) 2 for all α T. σ α i v α i v αc i V α V α c Anthony Nouy Ecole Centrale Nantes 16

33 Heuristic strategy for rank adaptation (tree-based Tucker format) Given T 2 {1,...,d}, construction of a sequence of approximations u m in tree-based Tucker format with increasing rank: u m {v : rank T (v) (r m α ) α T } At iteration m, { r m+1 α = r m α + 1 if α T m r m+1 α = r m α if α / T m where T m is selected in order to obtain (hopefully) the fastest decrease of the error. A possible strategy consists in computing the singular values σ1 α... σr α α m of α-matricizations M α(u m) of u m for all α T, M α(u m) = r m α i=1 σ α i v α i v αc i V α V α c u m 2 = r m α i=1 (σ α i ) 2 for all α T. σ α r m α provides an estimation of an upper bound of u um (V α V α c ) Anthony Nouy Ecole Centrale Nantes 16

34 Heuristic strategy for rank adaptation (tree-based Tucker format) Given T 2 {1,...,d}, construction of a sequence of approximations u m in tree-based Tucker format with increasing rank: u m {v : rank T (v) (r m α ) α T } At iteration m, { r m+1 α = r m α + 1 if α T m r m+1 α = r m α if α / T m where T m is selected in order to obtain (hopefully) the fastest decrease of the error. A possible strategy consists in computing the singular values σ1 α... σr α α m of α-matricizations M α(u m) of u m for all α T, M α(u m) = r m α i=1 σ α i v α i v αc i V α V α c u m 2 = rα m i=1 (σi α ) 2 for all α T. σr α α m provides an estimation of an upper bound of u um (V α V α c ) Letting 0 θ 1, we choose { } T m = α T : σr α α m θ max β T σβ r β m Anthony Nouy Ecole Centrale Nantes 16

35 Illustration : Borehole function Training set of size K = 1000 iteration rank test error 0 ( ) ( ) ( ) ( ) ( ) ( ) ( ) The selected rank is one order of magnitude better than the optimal isotropic rank (r, r,..., r) Anthony Nouy Ecole Centrale Nantes 17

36 Illustration : Borehole function Different sizes K of training set, selection of optimal ranks. TT format K rank test error 100 ( ) ( ) ( ) Canonical format K rank test error Anthony Nouy Ecole Centrale Nantes 18

37 Influence of the tree Test error for different trees T (Training set of size K = 50) {ν 1,..., ν d } {ν 1} {ν 2,..., ν d } {ν d 1 } {ν d } tree {ν 1,..., ν d } optimal rank test error T 1 ( ) ( ) T 2 ( ) ( ) T 3 ( ) ( ) T 4 ( ) ( ) Finding the optimal tree is a combinatorial problem... Anthony Nouy Ecole Centrale Nantes 19

38 Strategy for tree adaptation Starting from an initial tree, we perform iteratively the following two steps: Run the algorithm with rank adaptation to compute an approximation v associated with the current tree v(x 1,..., x d ) = r 1 i 1 =1... r d 1 i d 1 =1 v 1 1,i 1 (x ν1 )... v d i d 1,1(x νd ) Run a tree optimization algorithm yielding an equivalent representation of v (at the current precision) v(x 1,..., x d ) r 1 i 1 =1... r d 1 i d 1 =1 v 1 1,i 1 (x ν1 )... v d i d 1,1(x νd ) with reduced ranks { r 1,..., r d 1 }, where { ν 1,..., ν d } is a permutation of {ν 1,..., ν d }. Anthony Nouy Ecole Centrale Nantes 20

39 Strategy for tree adaptation Illustration with training set of size K = 50. We run the algorithm for different initial trees. Indicated in blue are the permuted dimensions in the final tree. tree {ν 1,..., ν d } optimal rank test error initial ( ) ( ) final ( ) ( ) initial ( ) ( ) final ( ) ( ) initial and final ( ) ( ) initial ( ) ( ) ( ) ( ) final ( ) ( ) Anthony Nouy Ecole Centrale Nantes 21

40 Concluding remarks and on-going works For rank adaptation, possible use of constructive (greedy) algorithms for tree-based Tucker formats. Need for robust strategies for tree adaptation. Statistical dimension of low-rank subsets? Adaptive sampling strategies. Goal-oriented construction of low-rank approximations. Anthony Nouy Ecole Centrale Nantes 22

41 References and announcement A. Nouy. Low-rank methods for high-dimensional approximation and model order reduction. arxiv: M. Chevreuil, R. Lebrun, A. Nouy, and P. Rai. A least-squares method for sparse low rank approximation of multivariate functions. SIAM/ASA Journal on Uncertainty Quantification, 3(1): , Open post-doc positions in Ecole Centrale Nantes (France). Mode order reduction for uncertainty quantification, high-dimensional approximation, low-rank tensor approximation, statistical learning Anthony Nouy Ecole Centrale Nantes 23

A regularized least-squares method for sparse low-rank approximation of multivariate functions

Workshop Numerical methods for high-dimensional problems April 18, 2014 A regularized least-squares method for sparse low-rank approximation of multivariate functions Mathilde Chevreuil joint work with