On the Convergence of Alternating Least Squares Optimisation in Tensor Format Representations

Size: px

Start display at page:

Download "On the Convergence of Alternating Least Squares Optimisation in Tensor Format Representations"

Shannon Nichols
6 years ago
Views:

1 M A Y 2 5 P R E P R I N T On the Convergence of Alternating Least Squares Optimisation in Tensor Format Representations Mike Espig*, Wolfgang Hackbusch, Aram Khachatryan* Institut für Geometrie und Praktische Mathematik Templergraben 55, 5262 Aachen, Germany * RWTH Aachen University, Germany Address: RWTH Aachen University, Department of Mathematics, IGPM Pontdriesch 4-6, 5262 Aachen Germany. Phone: +49 () , address: mike.espig@alopax.de Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany

2 On the Convergence of Alternating Least Squares Optimisation in Tensor Format Representations Mike Espig Wolfgang Hackbusch Aram Khachatryan 28th May 25 Abstract The approximation of tensors is important for the efficient numerical treatment of high dimensional problems, but it remains an extremely challenging task. One of the most popular approach to tensor approximation is the alternating least squares method. In our study, the convergence of the alternating least squares algorithm is considered. The analysis is done for arbitrary tensor format representations and based on the multiliearity of the tensor format. In tensor format representation techniques, tensors are approximated by multilinear combinations of objects lower dimensionality. The resulting reduction of dimensionality not only reduces the amount of required storage but also the computational effort. Keywords: tensor format, tensor representation, tensor network, alternating least squares optimisation, orthogonal projection method. MSC: 5A69, 49M2, 65K5, 68W25, 9C26. Introduction During the last years, tensor format representation techniques were successfully applied to the solution of high-dimensional problems like stochastic and parametric partial differential equations [6,, 4, 2, 24, 26, 27]. With standard techniques it is impossible to store all entries of the discretised high-dimensional objects explicitly. The reason is that the computational complexity and the storage cost are growing exponentially with the number of dimensions. Besides of the storage one should also solve this high-dimensional problems in a reasonable (e.g. linear) time and obtain a solution in some compressed (low-rank/sparse) tensor formats. Among other prominent problems, the efficient solving of linear systems is one of the most important tasks in scientific computing. We consider a minimisation problem on the tensor space V = d ν= Rmν equipped with the Euclidean inner product,. The objective function f : V R of the optimisation task is quadratic f(v) := b 2 [ Av, v b, v 2 ], () where A R m m d m m d is a positive definite matrix (A >, A T = A) and b V. A tensor u V is represented in a tensor format. A tensor format U : P P L V is a multilinear map from the cartesian RWTH Aachen University, Germany Address: RWTH Aachen University, Department of Mathematics, IGPM Pontdriesch 4-6, 5262 Aachen Germany. Phone: +49 () , address: mike.espig@alopax.de Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany

3 product of parameter spaces P,..., P L into the tensor space V. A L-tuple of vectors (p,..., p L ) P := P P L is called a representation system of u if u = U(p,..., p L ). The precise definition of tensor format representations is given in Section 2. The solution A b = argmin v V f(v) is approximated by elements from the range set of the tensor format U, i.e. we are looking for a representation system (p,..., p L ) P such that for we have F := f U : P V R (2) [ ] F (p,..., p L ) = b 2 2 AU(p,..., p L ), U(p,..., p L ) b, U(p,..., p L ) F (p,..., p L) = inf F (p,..., p L ). (p,...,p L ) P The alternating least squares (ALS) algorithm [2, 3,, 8, 2, 29, 3] is iteratively defined. Suppose that the k-th iterate p k = (p k,..., pk L ) and the first µ components pk+,..., p k+ µ of the (k + )-th iterate pk+ have been determined. The basic step of the ALS algorithm is to compute the minimum norm solution p k+ µ := argmin qµ P µ F (p k+,..., p k+ µ, q µ, p k µ+,..., p k L). Thus, in order to obtain p k+ from p k, we have to solve successively L ordinary least squares problems. The ALS algorithm is a nonlinear Gauss Seidel method. The local convergence of the nonlinear Gauss Seidel method to a stationary point p P follows from the convergence of the linear Gauss Seidel method applied to the Hessian F (p ) at the limit point p. If the linear Gauss Seidel method converges R-linear then there exists a neighbourhood B(p ) of p such that for every initial guess p B(p ) the nonlinear Gauss Seidel method converges R-linear with the same rate as the linear Gauss Seidel method. We refer the reader to Ortega and Rheinboldt for a description of nonlinear Gauss Seidel method [28, Section 7.4] and convergence analysis [28, Thm..3.5, Thm..3.4, and Thm...3]. A representation system of a represented tensor is not unique, since the tensor representation U is multilinear. Consequently, the matrix F (p ) is not positive definite. Therefore, convergence of the linear Gauss Seidel method is in general not ensured. However, if the Hessian matrix at p is positive semidefinite then the linear Gauss Seidel method still converges for sequences orthogonal to the kernel of F (p ), see e.g. [9, 23]. Under useful assumptions on the null space of F (p ), Uschmajew et al. [33, 36] showed local convergence of the ALS method. These assumptions are related to the nonuniqueness of a representation system and meaningful in the context of a nonlinear Gauss Seidel method. However, for tensor format representations the assumptions are not true in general, see the counterexample of Mohlenkamp [25, Section 2.5] and discussion in [36, Section 3.4]. The current analysis is not based on the mathematical techniques developed for the nonlinear Gauss Seidel method, but on the multilinearity of the tensor representation U. This fact is in contrast to previous works. The present article is partially related to the study by Mohlenkamp [25]. For example, the statement of Lemma 4.4 is already described for the canonical tensor format. Section 2 contains a unified mathematical description of tensor formats. The relation between an orthogonal projection method and the ALS algorithm is explained in Section 3. The convergence of the ALS method is analysed in Section 4, where we consider global convergence. Further, the rate of convergence is described in detail and explicit examples for all kind of convergent rates are given. The ALS method can converge for all tensor formats of practical interest sublinearly, Q-linearly, and even Q-superlinearly. We illustrate our theoretical results on numerical examples in Section 5. We refer the reader to [28] for details concerning convergence speed. 2

4 2 Unified Description of Tensor Format Representations A tensor format representation for tensors in V is described by a parameter space P = L µ= P µ and a multilinear map U : P V from the parameter space into the tensor space. For the numerical treatment of high dimensional problems by means of tensor formats it is essential to distinguish between a tensor u V and a representation system p P of u, where u = U(p). The data size of a representation system is often proportional to d. Thanks to the multilinearity of U, the numerical cost of standard operations like matrix vector multiplication, addition, and computation of scalar products is also proportional to d, see e.g. [, 5, 7, 3, 32]. Notation 2. (N n ). The set N n of natural numbers smaller than n N is denoted by N n := {j N : j n}. Definition 2.2 (Parameter Space, Tensor Format Representation, Representation System). Let L d, µ N d, and P µ a finite dimensional vector spaces equipped with an inner product, Pµ. The parameter space P is the following cartesian product P := L µ= P µ. (3) A multilinear map U from the parameter space P into the tensor space V is called a tensor format representation L d U : P µ R mν. (4) µ= We say u V is represented in the tensor format representation U if u rangeu. A tuple (p,..., p L ) P is called a representation system of u if u = U(p,..., p L ). Remark 2.3. Due to the multilinearity of U, a representation system of a given tensor u range(u) is not uniquely determined. Example 2.4. For the canonical tensor format representation with r-terms we have L = d and P µ = R mµ r. The canonical tensor format representation with r-terms is the following multilinear map U CF : d µ= ν= R mµ r V (p,..., p d ) U CF (p,..., p d ) := r j= µ= d p µ,j, where p µ,j denotes the j-th column of the matrix p µ R mµ r. For recent algorithms in the canonical tensor format we refer to [7, 8, 9,, 2]. The tensor train (TT) format representation discussed in [3] is for d = 3 and representation ranks r, r 2 N defined by the multilinear map U T T : R m r R m 2 r r 2 R m 3 r 2 R m R m 2 R m 3 (p, p 2, p 3 ) U T T (p, p 2, p 3 ) := r r 2 i= j= p,i p 2,i,j p 3,j. 3

5 3 Orthogonal Projection Method and Alternating Least Squares Algorithm It is shown in the following that the ALS algorithm is an orthogonal projection method on subspaces of V = d ν= Rmν. For a better understanding, we briefly repeat the description of projection methods, see e.g. [4, 34] for a detailed description. An orthogonal projection method for solving the linear system Av = b is defined by means of a sequence (K k ) k N of subspaces of V and the construction of a sequence (v k ) k N V such that v k+ K k and r k+ = b Av k+ K k. A prototype of projection method is explained in Algorithm. Algorithm Prototype Projection Method : while Stop Condition do 2: Compute an orthonormal basis V k = [ u k,..., uk m k ] of Kk 3: r k := b Av k 4: v k+ = v k + V k (V T 5: k k + 6: end while k AV k) V T k r k Notation 3. (L(A, B)). Let A, B be two arbitrary vector spaces. The vector space of linear maps from A to B is denoted by L(A, B) := {ϕ : A B : ϕ is linear}. In the following, let U : P V be a tensor format representation, see Definition 2.2. We need to define subspaces of V in order to show that the ALS algorithm is an orthogonal projection method. The multilinearity of U and the special form of the ALS micro-step are important for the definition of these subspaces. Let µ N L and v V be a tensor represented in the tensor format U, i.e. there is (p,..., p µ, p µ, p µ+,..., p L ) P such that v = U(p,..., p µ, p µ, p µ+,..., p L ). Since the tensor format representation U is multilinear we can define a linear map W µ (p,..., p µ, p µ+,..., p L ) L(P µ, V) such that v = W µ (p,..., p µ, p µ+,..., p L )p µ. The map W µ depends multilinearly on the parameter p,..., p µ, p µ+,..., p L. The linear subspace range (W µ (p,..., p µ, p µ+,..., p L )) V is of great importance for ALS method. For the rest of the article, we identify linear maps with its canonical matrix representation. Definition 3.2. Let µ N L. We write for a given representation system p = (p,..., p L ) P and define p [µ] := (p,..., p µ, p µ+,... p L ) W µ, p [µ] : P µ V (5) p µ W µ, p [µ] p µ := U(p,..., p µ, p µ, p µ+,... p L ). We simply write W µ for W µ, p [µ], i.e. W µ := W µ, p [µ] if it is clear from the context which representation system is considered. Proposition 3.3. Let µ N L and p = (p,..., p L ) P. The following holds: ( ) (i) W µ, p [µ] is a linear map and range W µ, p [µ] is a linear subspace of V. 4

6 ( ) (ii) We have rank W µ, p [µ] dim(p µ ). ( ) ( ) (iii) range W µ, p [µ] range(u), i.e. for all v range W µ, p [µ] there exist p µ P µ such that v = U(p,..., p µ, p µ, p µ+,..., p L ). (iv) Set H µ := W T W µ, p [µ] µ, p [µ] R dim Pµ dim Pµ and let H µ = ŨµD µ Ũµ T be the diagonalisation of the square matrix H µ, where D µ = diag(δ i,µ ) i=,...,dim Pµ with δ,µ δ 2,µ δ dim Pµ,µ. Define further D µ = diag( δ i,µ ) i=,...,rank(w. Then the columns of µ, p [µ]) ( ) build an orthonormal basis of range W µ, p [µ] and V µ p µ = W µ, p [µ] V µ := W µ, p [µ]ũµ D 2 µ (6) ( ) Ũ µ D 2 µ p µ = U(p,..., p µ, Ũµ D 2 µ p µ, p µ+,..., p L ). (7) (v) The map is multilinear. W µ : P P µ P µ+..., P L L(P µ, V) p µ W µ, p [µ] Proof. Note that W µ, p [µ] is linear, since the tensor format U is multilinear. The rest of the assertions follows after short calculations, where the last assertion (v) is a direct consequence of the multilinearity of U. Remark 3.4. In chemistry the definition of V µ in Proposition 3.3 (iv) is often called Löwdin transformation, see [35, Section 3.4.5]. Nevertheless, the construction can be found in several proofs for the existence of the singular value decomposition, see e.g. [6, Lemma 2.9]. Definition 3.5. Let µ N L, p = (p,..., p L ) P, and F : P R as defined in Eq. (2). We define F µ,p [µ] : P µ R (8) p µ F µ,p [µ]( p µ ) := F (p,..., p µ, p µ, p µ+,..., p L ). We write for convenience F µ := F µ,p [µ] if it is clear from the context which representation system is considered. Lemma 3.6. Let µ N L and p = (p,..., p L ) P. We have (i) F µ(q µ ) = W T µ (b AW µ q µ ), (ii) (V T µ AV µ ) V T µ b = argmin qµ P µ F µ (q µ ), where V µ is defined in Eq. (6). Proof. (i): Let q µ P µ. We have f(w µ q µ ) = F µ (q µ ) for all µ N L and F µ (q µ ) = [ ] b 2 2 AW µq µ, W µ q µ b, W µ q µ = [ ] W T b 2 2 µ AW µ q µ, q µ W T µ b, q µ. 5

7 Since Wµ T AW µ is symmetric, we have F µ(q µ ) = Wµ T AW µ q µ Wµ T b = Wµ T (b AW µ q µ ). (ii): For V µ q µ range(w µ ) we can write f(v µ q µ ) = [ ] V T b 2 2 µ AV µ q µ, q µ V T µ b, q µ. Since V µ is a basis of range(w µ ), we have that V T µ AV µ is positive definite and therefore p µ = argmin qµ P µ F µ (q µ ) V T µ AV µ p µ V T µ b = p µ = (V T µ AV µ ) V T µ b. Theorem 3.7. Let µ N L and p = (p,..., p L ) P. We have where V µ is from Eq. (6). p µ = argmin qµ P µ F µ (q µ ) b A V µ p µ rangew µ, Proof. Follows from Lemma 3.6 and orthogonal projection theorem. Algorithm 2 Alternating Least Squares (ALS) Algorithm : Set k := and choose an initial guess p = (p,..., p L ) P, p, := p, and v := U(p ). 2: while Stop Condition do 3: v k, := v k 4: for µ L do 5: Compute an orthonormal basis V of the range space of W := W [µ] µ,p k, µ 6: p k+, see e.g. Eq. (5) and (6). µ := Ũk, µ D 2 k, µ (V T AV ) D 2 k, µũ T T Wµ (p k+,..., p k+ µ }{{}, pk µ+,..., p k L)b (9) G + k, µ := p + := (p k+,..., p k+ µ, pk+ µ, p k µ+,..., p k L) r := b Av k, µ v + = v + V (V T ) V T = V (V T ) V T + ) where Ũk, µ D 2 k, µ is from Eq. (6) 7: end for 8: p k+ := p k,l and v k+ := U(p k+ ) 9: k k + : end while Remark 3.8. From the definition of p k+ µ, it follows directly that p k+ µ kernel(w µ,k ) and p k+ µ is the vector with smallest norm that fulfils the normal equation G p k+ µ = W T b. This is very important for the convergence analysis of the ALS method and we like to point out that our results are based on this condition. We must give special attention in a correct implementation of an ALS micro-step in order to fulfil this essential property. 6

8 b Range (W ) V v r + v + Figure : Graphical illustration of an ALS micro- step for the case when A = id. At the current iteration step, we define the linear map W L(P µ, V) by means of v and the multilinearity of U, cf. Definition 3.2. The successor v + is then the best approximation of b on the subspace range (W ) V. 4 Convergence Analysis We consider global convergence of the ALS method. The convergence analysis for an arbitrary tensor format representation U : L µ= P µ V is a quite challenging task. The objective function F from Eq. (2) is highly nonlinear. Even the existence of a minimum is in general not ensured, see [5] and [22]. We need further assumptions on the sequence from the ALS method. In order to justify our assumptions, let us study an example from Lim and de Silva [5] where it is shown that the tensor b = x x y + x y x + y x x with tensor rank 3 has no best tensor rank 2 approximation. Lim and de Silva explained this by constructing a sequence (v k ) k N of rank 2 tensors with v k = (x + k ) y (x + k ) y (kx + y) x x kx b. k The linear map W,k from Definition 3.2 and the first component vector p,k of the parameter system have the following form: 7

9 W,k = p k = (x + k ) y (x + k ) y, x x Id R n, }{{} column vectors of the matrix ( ) ( ) (kx + y) + kx. It is easy to verify that the equation W,k p k = v k holds. Furthermore, we have lim k pk =, W = lim W,k = ( x x x x ) Id R n. k Obviously, the rank of W is equal to n but rank(w,k ) = 2n for all k N. This example shows already that we need assumptions on the boundedness of the parameter system and on the dimension of the subspace span(w µ,k ). Definition 4. (Critical Points). The set M of critical points is defined by M := { v V : p P : v = U(p) F (p) = }. () In our context, critical points are tensors that can be represented in our tensor format U and there exists a parameter system p such that (f U) (p) =, i.e. p is a stationary point of F = f U. A representation system of a tensor v = U(p) is never uniquely defined since the tensor format is a multilinear map. The following remark shows that the non uniqueness of a parameter system has even more subtle effects, in particular when the parameter system of v = U(p) is also a stationary point of F. Remark 4.2. In general, v M does not imply F (ˆp) = for any parameter system ˆp of v, i.e. there exist a tensor format Ũ and two different p, ˆp P such that Ũ(p) = Ũ(ˆp) and = F (p) F (ˆp). Proof. Let Ũ : R 2 R 2 R 2 R 2 R 4 (x, y) Obviously, Ũ is a bilinear map. Further, let b = and e 2 the canonical vectors in R 2, i.e. Then the following holds e = ( Ũ(x, y) := x y + x 2 y x y + x 2 y x y 2 x 2 y 2., A = Id in the definition of F from Eq. (2), and e ) (, e 2 = ). 8

10 a) Ũ(e, e ) = Ũ(e 2, e ), b) F (e, e ) =, c) F (e 2, e ). Elementary calculations result in Ũ(e, e ) = Ũ(e 2, e ) =. The definition of F from Eq. (2) gives Then and F (x, y) = 3 ( 2 (2(y (x + x 2 )) 2 + (x y 2 ) 2 + (x 2 y 2 ) 2 ) 2(x + x 2 )y x 2 y 2 ). F x(x, e ) = F (x, e ) = 3 ((x + x 2 ) 2 2(x + x 2 )), F (e, y) = 3 (y2 + 2 y2 2 2y ) F (e 2, y) = 3 (y2 + 2 y2 2 2y y 2 ). ( 2(x +x 2 ) 3 2(x +x 2 ) 3 One verifies that F (e, e ) = ), F y(e, y) = and F (e 2, e ) = ( 2(y ) 3 y ) ( 2(y ), F y(e 2, y) = 3 y 2 3 ). For a convenient understanding, let us briefly repeat the notations from the ALS method, see Algorithm 2. Let µ N L {}, k N, and p = (p k+,..., p k+ µ, pk µ,... p k L) P, () v = U(p ) = U(p k+,..., p k+ µ, pk µ,... p k L) V be the elements of the sequences (p ) k N and (v ) k N from the ALS algorithm. Note that p k = p k, = (p k,..., pk L ) and v k = U(p k ). Definition 4.3 (A(v k )). The set of accumulation points of (v k ) k N is denoted by A(v k ), i.e. A(v k ) := {v V : v is an accumulation point of (v k ) k N }. (2) We demonstrate in Theorem 4.3 that every accumulation point of (v k ) k N is a critical point, i.e. A(v k ) M. This is an existence statement on the parameter space P. Lemma 4.5 shows us a candidate for such a parameter system. 9

11 Remark 4.4. Obviously, if the sequence of parameter (p k ) k N is bounded, then the set of accumulation points of (p k ) k N is not empty. Consequently, the set A(v k ) is not empty, since the tensor format U is a continuous map. Lemma 4.5. Let the sequence (p k ) k N from the ALS method be bounded and define for J N the following set of accumulation points: A J := L µ= There exists p = (p,..., p L ) A J such that {p P : p is an accumulation point of (p ) k J }. p = min p A J p. Proof. Since the sequence (p k ) k N is bounded, it follows from the definition in Eq. () that (p ) k N is } also bounded. Therefore the set {p P : p is an accumulation point of (p ) k J is not empty and compact. Hence A is a compact and non-empty set. We are now ready to establish our main assumptions on the sequence from the ALS method. Assumption 4.6. During the article, we say that (p k ) k N satisfies assumption A or assumption A2 if the following holds true: (A:) The sequence (p k ) k N is bounded. (A2:) The sequence (p k ) k N is bounded and for J N we have µ N L : k J : k J : k k rank(w ) = rank(w µ), (3) where p = (p,..., p L ) A J is a accumulation point form Lemma 4.5 and W = W µ (p k+,... p k+ µ, pk µ+,..., p k L), Wµ = W µ (p,... p µ, p µ+,..., p L). Remark 4.7. In the proof of Theorem 4.3, assumption A2 ensures that the ALS method depends continuously on the parameter system p, i.e. G + W T b k G+ µ W T µ b for a convergent subsequence (p ) k J. Using the notations and definitions from Section 3, we define further for k N and µ N L. A := V T AV, (4) For the ALS method there is an explicit formula for the decay of the values between f(v k, µ+ ) and f(v k, µ ). The relation between the function values from Eq. (5) is crucial for the convergence analysis of the ALS method. Lemma 4.8. Let k N, µ N L. We have f(v k, µ+ ) f(v ) = V A V T r, r 2 b 2 (5)

12 Proof. From Algorithm 2, we have that v + = v +, where := V A V T r. Elementary calculations give f(v + ) = b 2 [A(v + ), v + b, v + ] = f(v ) + r k, µ, k, µ + 2 A k, µ, k, µ b 2 V A V T r k, µ, r k, µ + 2 AV A V T r, V A V T r = f(v ) + V A V T r k, µ, r k, µ + 2 = f(v ) + b 2 = f(v ) 2 b 2 V A V T r k, µ, r k, µ. b 2 V A V T r k, µ, r k, µ Corollary 4.9. (f(v k )) k N R is a descending sequence and there exists α R such that f(v k ) k α. Proof. Let k N and µ N L. From Lemma 4.8 it follows that f(v k+ ) f(v k ) = f(v k,l ) f(v k, ) = = 2 b 2 L µ= L f(v k, µ ) f(v ) µ= A V T r, V T r, since the matrices A are positive definite. This shows that (f(v k)) k N R is a descending sequence. The sequence of function values (f(v k )) k N is bounded from below, since the matrix A in the definition of f is positive definite. Therefore, there exists an α R such that f(v k ) α. k Lemma 4.. Let (v ) k N,µ NL V be the sequence from Algorithm 2. We have f(v ) = 2 b 2 v, b = 2 b 2 v 2 A (6) for all k N, µ N L, where v, w A := Av, w and v A := v, v A. Proof. Let k N and µ N L. We have ( ) v, v A = AV V T ( ) AV V T b, V V T AV V T = V T b, ( V T AV ) V T b = v, b. The rest follows from the definition of f, see Eq. (). b Corollary 4.. Let (v ) k N,µ NL V be the sequence of represented tensors from the ALS algorithm. The following holds:

13 (a) f(v + ) f(v ), (b) v + 2 A v 2 A, (c) v +, b v, b. Proof. Follows from Lemma 4.9 and Lemma 4.. Lemma 4.2. Let the sequence (p k ) k N P fulfil the assumption A. Then we have max F µ(p k µ) µ L k. Proof. According to Lemma 3.6 and Lemma 4.8, we have f(v k ) f(v k+ ) = = = L f(v ) f(v k, µ ) = L 2 b 2 µ= µ= L ( 2 b 2 A Ũ D 2 kµ 2 b 2 µ= L µ= A ) T W T r, A V T r, V T r ( ) T Ũ D 2 kµ W T r ( ) T ( ) T Ũ D 2 kµ F µ(p k µ), Ũ D 2 kµ F µ(p k µ) L 2 b 2 λ min (A )λ min (Ũ D kµ ) Ũ T F µ (p k µ) µ= L 2λ max (A) b 2 µ= λ min (Ũ D 2 kµ ) Ũ T F µ (p k µ) 2, where V = W Ũ D 2 kµ is from Eq. (6). In the last estimate, we have used that the Ritz values are bounded by the smallest and largest eigenvalue of A, i.e λ min (A) λ min (A ) λ max (A ) λ max (A). Since the tensor format U is continues and the sequence (p k ) k N is bounded, it follows from the theorem of ) Gershgorin and Cauchy-Schwarz inequality that there is γ > such that λ max (W T W γ, recall that W T W = ŨD Ũ T, see Proposition 3.3. Therefore, we have f(v k ) f(v k+ ) 2λ max (A)γ b 2 L F µ(p k µ) 2 µ= Further, it follows from Corollary 4.9 that = lim f(vk ) f(v k+ ) = lim k 2λ max (A)γ b 2 max k µ L F µ(p k µ) max µ L. F µ(p k µ) 2. Theorem 4.3. Let (v k ) k N be the sequence of represented tensors and suppose that the sequence of parameter (p k ) k N P from the ALS method fulfils assumption A2. Every accumulation point of (v k ) k N is a critical point, i.e. A(v k ) M. Further, we have dist (v k, M) k. 2

14 Proof. Let v A(v k ) be an accumulation point and (v k ) k J N U(P ) a subsequence in the range set of the tensor format U with v k v. Then there exists (p k) k J N P with v k = U(p k ) for all k J. Further, k let µ N L and define for all k J g := (p k+,..., p k+ µ, p k µ+,..., p k L) P. Let µ N L and (g ) k J J with g p := argmax p AJ p P, see Lemma 4.5. Without loss k of generality, let us assume that µ =. This assumption makes the the notations not more complicated then necessary. Since (g ) k J is bounded, there exists p [µ] P and a corresponding subsequence (g ) k Jµ J such that p k = g k, = (p k, p k 2,..., p k L) k p = (p, p 2,..., p L ), g k, = (p k+, p k 2,..., p k L) k p[] = ( p, p 2,..., p L ), g = (p k+,..., p k+ µ, p k µ+,..., p k L) k p[µ] = ( p,..., p µ, p µ+,..., p L ), g k,l = (p k+,..., p k+ L ) k p[l] = ( p,..., p L ). From Lemma 4.5 and U(p k ) k v it follows that v = lim k U(g ) = U(p[µ] ) f.a. µ N L, (7) where k J. Furthermore, we have To show this, assume that M := p = p [] = = p [L]. { µ N L : p [µ] p } and define ν := min M N L. From assumption A2 it follows p k+ ν = G + k,ν W T k,ν b k G+ ν W T ν b = p ν. Thus we have in particular that p ν kernelw ν, see the Definition of G + ν and Proposition 3.3. Since U(p ) = U(p [ν] ) W ν p ν = W ν p ν, it follows further that δ ν := p ν p ν kernelw ν and p ν 2 = p ν 2 + δ ν 2. Lemma 3.6 and Lemma 4.2 show that From the definition of ν, we have then note that for ν = min M W T k,ν (AW k,νp k ν b) k. W T ν (AW ν p ν b) =, g k,ν = (p k+,..., p k+ ν, p k ν+,..., p k L) k p[ν] = (p,..., p ν, p ν, p ν+,..., p L ) holds. Since p ν = G + ν W T ν b and p = argmin p AJ p, it follows p ν = p ν. Hence, we have p ν = p ν, because p ν 2 = p ν 2 + δ ν 2 implies then δ ν =. But p ν = p ν contradicts the definition of ν = min M. Consequently, we have p = p [] = = p [L]. 3

15 From Eq. (7) and the definition of p k+ µ it follows then and v = U(p ) = lim k F µ(g ) = F µ(p ) for all µ N L, i.e. A(v k ) M. Now, let δ k = inf v M v k v and suppose that there exists a subsequence (δ k ) k J N with lim k δ k = δ R +. Then (v k ) k J has a convergent subsequence. Since this subsequence must have its limit point in M, it follows that δ =. Which proves dist (v k, M) k by contradiction. Lemma 4.4. Let (v k ) k N V be the sequence of represented tensors from the ALS method. It holds Proof. Let k N. We have v k+ v k 2 L A = v v µ= v k+ v k A k. 2 A L v v A µ= Since v + v = V A V T r, it follows further v + v 2 A = V A V T r 2 = AV A A = A V T r, V T AV A V T = V A V T r, r. Combining this with Eq. (5) and (8) gives V T r 2 L L v + v 2 A.(8) µ= r, V A V T r = A V T r, V T r L v k+ v k 2 A 2L b 2 (f(v ) f(v + )) = 2L b 2 (f(v k ) f(v k+ )). µ= From Corollary 4.9 it follows (f(v k ) f(v k )) k. Therefore, we have v k+ v k A k. Lemma 4.5. Let (v k ) k N V be the sequence of tensors from the ALS method and v V with lim k v k = v. Further, let µ N L and (v ) k N V as defined in Algorithm 2. We have lim v = v for all µ N L. k Proof. Define v k, := v k (like in Algorithmus 2) and assume that M := { } µ N L : lim v v. k 4

16 Furthermore, set µ := min M N L. From Lemma 4.4 it follows v v A v v µ,k A + v v A k. But this contradicts the definition of µ. In the following, the dimension of the tensor space V = d µ= Rmµ is denoted by N N, i.e. N = d µ= m µ. The statement of Lemma 4.2 delivers an explicit recursion formula for the tangent of the angle between iteration points and an arbitrary tensor. This result is important for the rate of convergence of the ALS algorithm. Lemma 4.6. Let µ, ν N L, ν µ, and p = (p,..., p ν,..., p µ,..., p L ) P. There exists a multilinear map M µ,ν : P P ν P ν+ P µ P µ+ P L V L(P ν, P µ ) such that W T µ (p,..., g ν,..., p µ, p µ+,..., p L )b = M µ,ν (p,..., p ν, p ν+,..., p µ, p µ+,..., p L, b)g ν (9) for all g ν P ν. Moreover, we have M µ,ν (p,..., p ν, p ν+,..., p µ, p µ+,..., p L, b) = M T ν,µ(p,..., p ν, p ν+,..., p µ, p µ+,..., p L, b). Proof. Follows form Proposition 3.3 (v) and definition of W T µ (p,..., g ν,..., p µ, p µ+,..., p L )b. Corollary 4.7. Let µ N L, k 2, and p = (p k+,..., p k+ µ, pk µ, p k µ+,..., pk L ) form Algorithm 2. There exists a multilinear map M µ : P P µ 2 P µ+ P L V L(P µ, P µ ) such that p k+ µ = G + k, µ M µ(p k+,..., p k+ µ 2, pk µ+,..., p k L, b)p k+ µ, (2) p k+ µ = G + k, µ M T µ (p k+,..., p k+ µ 2, pk µ+,..., p k L, b)p k µ, (2) i.e. p k+ µ = G + k, µ M µ(p k+,..., p k+ µ 2, pk µ+,..., p k L, b) G + k, µ M T µ (p k+,..., p k+ µ 2, pk µ+,..., p k L, b)p k µ, where G + k, µ and G+ k, µ are defined in Algorithm 2. Proof. Follows form Eq. (9) and Lemma 4.6. The following example shows a concrete realisation of the matrix M µ for the tensor rank-one approximation problem. Example 4.8. The approximation of b V by a rank one tensor is considered. Let v k = p k pk 2... pk d and t t d d b = b µ,iµ, i = β (i,...,i d ) i d = µ= i.e. the tensor b is given in the Tucker decomposition. From Eq. (9) it follows p k+ t t d d = d µ=2 p k µ 2 β (i,...,i d ) b µ,iµ, p k µ b,i i = i d = µ=2 t t d t 2 t d = d b p k,i β (i d 2,...,i d ) = µ=2 p k µ d µ=2 p k µ p k d 2 i = i d = i 2 = B Γ,k Bd T k pd }{{}, M (p k 2,...,pk d )= 5 i d = d µ=2 bµ,iµ, p k µ p k µ b T d,i d p k d

17 where B µ = ( ) b µ,iµ : i µ t µ R n µ t µ, Bµ T B µ = Id R tµ, and the entries of the matrix Γ,k R t t d are defined by bµ,iµ, p k µ [Γ,k ] i,i d = t 2 i 2 = t d d β (i,...,i d ) i d = µ=2 p k µ ( i t, i d t d ). Note that Γ,k is a diagonal matrix if the coefficient tensor β d µ= Rtµ is super- diagonal, see the example in [3]. For p k d it follows further p k d = and finally t d µ= p k 2 µ i = t d d β (i,...,i d ) i d = µ= p k+ = b µ,iµ, p k µ b d,id = d µ= p k µ 2 B Γ,k Γ T,k BT p k. p k 2 d B d Γ T p k µ,k BT p k Lemma 4.9. Let µ N L, (p ) k N P, and (v ) k N V be the sequences from Algorithm 2. Furthermore, define M := M µ (p k+,..., p k+ µ 2, pk µ+,..., p k L, b), H := W T W, N := W G + M H + W T, where we have used the notations from Algorithm 2. A micro-step of the ALS method is described by the following recursion formula: v + = N v for all k 2, (22) i.e. U(p k+,..., p k+ µ, pk+ µ, p k µ+,..., p k L) = N U(p k+,..., p k+ µ, pk µ, p k µ+,..., p k L). Proof. According to Corollary 4.7, Remark 3.8, and definition of v +, we have that N v = W G + M H + W T W p k+ µ = W G + M H + H p k+ µ = W G + M p k+ µ = W G + W T b = W p k+ µ = v +. Lemma 4.2. Let (N ) k N, µ NL be the sequence of matrices from Lemma 4.9, v V \ {}, and R R N N an orthogonal matrix with span( v) = range(r), i.e. the column vectors of R form an orthonormal basis of the linear space span( v). Assume further that c := vt v v R \ {} and s := R T v R N \ {} holds true. Then we have the following recursion formula for the tangent of the angles: q (s) tan [ v, v + ] = tan [ v, v ], where q (s) := q (c) := q (c) R T N v c + R T N R s, s v T N v c + v T N R s. c µ=2 6

18 Proof. The block matrix V := [ v R ] R N N, ( v := v/ v ). is orthogonal, i.e. the columns of the matrix V build an orthonormal basis of the tensor space V. The tensor v and the matrix N are represented with respect to the basis V, i.e v = V V T v = [ v R ] ( v T ) v R T = [ v R ] ( ) c v s and N = V ( V T N V ) V T = [ v R ] [ v T N v v T N R R T N v R T N R ] [ v R ] T. The recursion formula (22) leads to the recursion of the coefficient vector ( ) [ ck+,µ v = T N v v T ] ( ) ( N R c v R T N v R T = T N v c + v T N R s N R R T N v c + R T N R s s k+,µ s ). Since s and c we have tan 2 [ v, v + ] = = RR T v +, v + vv T v +, v + = RT v + 2 (v T v + ) 2 = s + 2 (c + ) 2 = q(s) q (c) 2 R T v 2 (v T v ) 2 = q(s) q (c) 2 tan 2 [ v, v ]. ( ( ) q (s) 2 q (c) ) 2 s 2 (c ) 2 Theorem 4.2. Suppose that the sequence (p k ) k N P from Algorithm 2 fulfils assumption A. If one accumulation point v A(v k ) is isolated, then we have v k k Furthermore, we have that either the ALS method converges after finitely many iteration steps or tan [ v, v + ] q µ tan [ v, v ], v. where q µ := lim sup k q (s) q (c). Proof. Let ε > such that v is the only accumulation point in Ū := {v V : v v A ε}. Assuming that the sequence (v k ) k N V from the ALS algorithm does not converge to v and let I N be a subset with v v k A ε for all k I. Since v is the only accumulation in Ū and (v k) k N does not converge to v the following set I k is for all k I well-defined and finite: { I k := k N : v v k A ε for all k k k }. 7

19 The definition of the map k : I N, k k (k) := max I k implies that v v k (k) A ε and v v k (k)+ A > ε for all k I. Since v is the only accumulation point of (v k ) k N in Ū it follows that the subsequence (v k (k)) k I converges to v. Therefore, we have v v k (k) A ε 2 and v k (k)+ v k (k) A v v k (k)+ A v v k (k) A ε 2 for sufficient large k I. But this contradicts the statement v k+ v k A from Lemma 4.4. The inequality for the rate of convergence of an ALS micro-step tan [ v, v + ] q µ tan [ v, v ] follows direct from Lemma 4.2 and the definition of q µ. Note that in Lemma 4.2, c since lim k v = v. If s k,µ = for some k N, then the ALS method converges after finitely many iteration steps. Corollary Suppose that the sequence (p k ) k N P fulfils A2 and assume that the set of critical points M is discrete, 2 then the sequence of represented tensors (v k ) k N from the ALS method is convergent. Proof. Follows directly from Theorem 4.2 and Theorem 4.3. Remark k The convergence rate for an entire ALS iteration step is given by q := L µ= q µ, since tan [ v, v k+ ] = tan [ v, v k,l ] q L tan [ v, v k,l ] L q µ tan [ v, v k, ] = q tan [ v, v k ]. Without further assumptions on the tensor b from Eq. (), one cannot say more about the rate of convergence. But the ALS method can converge sublinearly, Q-linearly, and even Q-superlinearly. We refer the reader to [28] for a detailed description of convergence speed. - If q =, then the sequence ( tan [ v, v k ] ) k N converges Q-superlinearly. - If q <, then the sequence ( tan [ v, v k ] ) k N converges at least Q-linearly. µ= - If q =, then the sequence ( tan [ v, v k ] ) k N converges sublinearly. A specific tensor format U has practically no impact on the different convergence rates. Since we can find explicit examples for all cases already for rank-one tensors. Please note that the representation of rank-one tensors is included in all tensor formats of practical interest. In the following, we give a brief overview of our results about the convergence rates for the tensor rank-one approximation, please see [3] for proofs and detailed description. The multilinear map that describes the representation of rank-one tensors is given by U : d R n µ= d µ= (p,..., p d ) U(p,..., p d ) = 2 In topology, a set which is made up only of isolated points is called discrete. R n d p µ. µ= 8

20 A tensor b is called totally orthogonal decomposable if there exist r N with b = r λ j j= µ= d b µ,j (b µ,j R n ) such that for all µ N d and j, j 2 N r the following holds: b µ,j, b µ,j = δ j,j 2. The set of all totally orthogonal decomposable tensors is denoted by T O = {b V : b is totally orthogonal decomposable} V. It is shown in [3] that the tensor rank-one approximation of every b T O by means of the ALS method converges Q-superlinearly, i.e. q =. For examples of Q-linear and sublinear convergence, we will consider the tensor b λ V given by b λ = 3 p + λ (p q q + q p q + q q p) µ= for some λ R and p, q R n with p = q =, p, q =. If λ 2, it is shown in [3] that v = 3 µ= p is the unique best approximation of b λ. Furthermore, for the rate of convergence we have the following two cases: a) For λ = 2 it holds q =, i.e. the sequence ( tan [ v, v k] ) k N converges sublinearly. b) For λ < 2 the ALS method converges Q-linearly with the convergence rate [ λ ( q λ = 3λ + λ 2 + ) ] 3 (3λ + λ 2 2 ) 2 + 4λ. This example is not restricted to d = 3. The extension to higher dimensions is straightforward, see [3] for details. 5 Numerical Experiments In this subsection, we observe the convergence behavior of the ALS method by using data from interesting examples and more importantly from real applications. In all cases, we focus particularly on the convergence rate. 5. Example We consider an example introduced by Mohlenkamp in [25, Section 4.3.5]. Here we have A = id and ( ) ( ) ( ) ( ) ( ) ( ) b = 2 +, }{{}}{{} e := } {{ } b := 9 e 2 := } {{ } b 2 :=

21 see Eq. (). The tensor b is orthogonally decomposable. Although the example is rather simple, it is of great theoretical interest. It follows from Theorem 4.2 and [3] that the rate of convergence for an ALS micro- step is q (s) q µ = lim sup k =. q (c) Here the ALS method converges Q-superlinearly. Let τ, our initial guess is defined by ( ) ( ) ( ) τ τ τ v (τ) :=. Since ( 4 ) ( τ, ) 2 = 4τ 2 and ( ) ( τ, ) 2 =, we have for τ < 2 that the initial guess v (τ) dominates at b 2. Therefore, the ALS iteration converge to b 2, see [3] for details. In the our numerical test, the tangents of the angle between the current iteration point and the corresponding parameter of the dominate term b l ( l 2) is plotted in Figure 5., i.e. cos tan ϕ k,l = 2 ϕ k,l cos 2, (23) ϕ k,l where cos ϕ k,l = pk,e l p k..e+.e- tau=.4999 tau=.495 tau=.4.e-2 tan(phi_k,2).e-3.e-4.e-5.e-6.e-7.e k Figure 2: The tangents tan ϕ k,2 from Eq. (23) is plotted for τ {.4,.495,.4999}. 5.2 Example 2 Most algorithms in ab initio electronic structure theory compute quantities in terms of one- and two-electron integrals. In [] we considered the low-rank approximation of the two-electron integrals. In order to illustrate 2

22 the convergence of the ALS method on an example of practical interest, we use the two-electron integrals of the so called AO basis for the CH 4 molecule. We refer the reader to [] for a detailed description of our example. The ALS method converges here Q-linearly, see Figure 3..e+ AO_CH4.dat.e-.e-2 tan phi_k,.e-3.e-4.e-5.e-6.e-7.e k Figure 3: The approximation of two-electron integrals for methane is considered. The tangents of the angle between the current iteration point and the limit point with respect to the iteration number is shown. 5.3 Example 3 We consider the tensor b λ = 3 p + λ (p q q + q p q + q q p) µ= from Remark The vectors p and q are arbitrarily generated orthogonal vectors with norm. The values of tan(ϕ,k ) are plotted, where ϕ,k is the angle between p k and the limit point p. For the case λ =.5 the convergence is sublinearly, whereas for λ <.5 it is Q-linearly. According to Theorem 4.2 and [3], the rate of convergence for an ALS micro-step is given by q (s,λ) q λ = lim sup k, k = λ ( 3λ + λ 2 + ) (3λ + λ 2 2 ) 2 + 4λ. q (c,λ) k, For λ =.46, we have for the convergence rate q.46 =.847. In Figure 5.3 the ratio tan(ϕ,k+) tan(ϕ,k ) is plotted. The ratio tan(ϕ,k+) tan(ϕ,k ) perfectly matches to q.46 =.847. This plot shows on an example the precise analytical description of the convergence rate. 2

23 .e+.e- lambda=.3 lambda=.2 lambda=..e+.e- lambda=.44 lambda=.46 lambda=.48.e-2.e-2 tan phi_k,.e-3.e-4.e-5 tan phi_k,.e-3.e-4.e-5.e-6.e-6.e-7.e-7.e k.e k (a) The tangents tan ϕ k, for λ {.,.2,.3}. (b) The tangents tan ϕ k, for λ {.44,.46,.48}. Figure 4: The approximation of b λ from Remark 4.23 is considered. The tangents of the angle between the current iteration point and the limit point with respect to the iteration ( number is plotted. For λ <.5 the sequence converges Q-linearly with a convergence rate q λ = λ 2 3λ + λ 2 + ) (3λ + λ 2 ) 2 + 4λ <. References [] U. Benedikt, A. Auer, M. Espig, and W. Hackbusch. Tensor decomposition in post-hartree fock methods. i. two-electron integrals and mp2. The Journal of Chemical Physics, 34(5):, 2. [2] G. Beylkin and M. J. Mohlenkamp. Numerical operator calculus in higher dimensions. Proceedings of the National Academy of Sciences, 99(6):246 25, 22. [3] G. Beylkin and M. J. Mohlenkamp. Algorithms for numerical analysis in high dimensions. SIAM Journal on Scientific Computing, 26(6): , 25. [4] C. Brezinski. Projection methods for systems of equations. Studies in computational mathematics. Elsevier Science, 997. [5] V. de Silva and L.-H. Lim. Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM. J. Matrix Anal. and Appl., 3:84 27, 28. [6] Alireza Doostan, AbdoulAhad Validi, and Gianluca Iaccarino. Non-intrusive low-rank separated approximation of high-dimensional stochastic models. Comput. Methods Appl. Mech. Engrg., 263:42 55, 23. [7] M. Espig. Effiziente Bestapproximation mittels Summen von Elementartensoren in hohen Dimensionen. PhD thesis, Universität Leipzig, 28. [8] M. Espig, L. Grasedyck, and W. Hackbusch. Black box low tensor rank approximation using fibrecrosses. Constructive approximation, 29. [9] M. Espig and W. Hackbusch. A regularized newton method for the efficient approximation of tensors represented in the canonical tensor format. Num. Math., 22. [] M. Espig, W. Hackbusch, S. Handschuh, and R. Schneider. Optimization problems in contracted tensor networks. Computing and Visualization in Science, 4(6):27 285, 2. 22

24 .e+ lambda=.5 tan phi_k,.e k Figure 5: The approximation of b λ from Remark 4.23 is considered. The tangents of the angle between the current iteration point and the limit point with respect to the iteration number is plotted. For λ =.5, we have sublinear convergence since q.5 =. [] M. Espig, W. Hackbusch, A. Litvinenko, H. G. Matthies, and E. Zander. Efficient analysis of high dimensional data in tensor formats. accepted for Springer Lecture Note series for Computational Science and Engineering, 2. [2] M. Espig, W. Hackbusch, T. Rohwedder, and R. Schneider. Variational calculus with sums of elementary tensors of fixed rank. Numerische Mathematik, 22. [3] M. Espig and A. Khachatryan. Convergence of alternating least squares optimisation for rank-one approximation to high order tensors. Preprint: [4] Mike Espig, Wolfgang Hackbusch, Alexander Litvinenko, Hermann G. Matthies, and Philipp Wähnert. Efficient low-rank approximation of the stochastic Galerkin matrix in tensor formats. Computers & Mathematics with Applications, 22. [5] L. Grasedyck. Hierarchical singular value decomposition of tensors. SIAM J. Matrix Analysis Applications, 3(4): , 2. [6] W. Hackbusch. Tensor Spaces and Numerical Tensor Calculus. Springer, 22. [7] W. Hackbusch and S. Kühn. A new scheme for the tensor representation. The journal of Fourier analysis and applications, 5(5):76 722, 29. [8] S. Holtz, T. Rohwedder, and R. Schneider. The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Comput., 34(2):683 73, March 22. [9] H. Keller. On the solution of singular and semidefinite linear systems by iteration. Journal of the Society for Industrial and Applied Mathematics Series B Numerical Analysis, 2(2):28 29,

25 .e+ q=.847 (tan phi_k+,)/(tan phi_k,).e k Figure 6: The ratio tan(ϕ,k+) tan(ϕ,k ) is plotted for λ =.46. The rate of convergence from Theorem 4.2 is for this example equal to.847. The plot illustrates that the description of the convergence rate is accurate and sharp. [2] B. N. Khoromskij and C. Schwab. Tensor-structured galerkin approximation of parametric and stochastic elliptic pdes. SIAM J. Sci. Comput., 33(): , February 2. [2] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM REVIEW, 5(3):455 5, 29. [22] Joseph M. Landsburg, Yang Qi, and Ke Ye. On the geometry of tensor network states. Quantum Information and Computation, 2(3-4): , 22. [23] Young-Ju Lee, Jinbiao Wu, Jinchao Xu, and Ludmil Zikatanov. On the convergence of iterative methods for semidefinite linear systems. SIAM J. Matrix Anal. Appl., 28(3):634 64, August 26. [24] H. G. Matthies and E. Zander. Solving stochastic systems with low-rank tensor compression. 436(): , May 22. [25] M. J. Mohlenkamp. Musings on multilinear fitting. Linear Algebra Appl., 438(2): , 23. [26] Anthony Nouy. A generalized spectral decomposition technique to solve a class of linear stochastic partial differential equations. Comput. Methods Appl. Mech. Engrg., 96(45-48): , 27. [27] Anthony Nouy. Proper generalized decompositions and separated representations for the numerical solution of high dimensional stochastic problems. Arch. Comput. Methods Eng., 7(4):43 434, 2. [28] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Society for Industrial Mathematics, 97. [29] I. V. Oseledets. Dmrg approach to fast linear algebra in the tt-format. Comput. Meth. in Appl. Math., (3): , 2. 24

26 [3] I. V. Oseledets. Tensor-train decomposition. SIAM J. Scientific Computing, 33(5): , 2. [3] I. V. Oseledets and S. V. Dolgov. Solution of linear systems and matrix inversion in the tt-format. SIAM J. Scientific Computing, 34(5), 22. [32] I. V. Oseledets and E. Tyrtyshnikov. Breaking the curse of dimensionality, or how to use svd in many dimensions. SIAM J. Scientific Computing, 3(5): , 29. [33] T. Rohwedder and A. Uschmajew. On local convergence of alternating schemes for optimization of convex problems in the tensor train format. SIAM Journal on Numerical Analysis, 5(2):34 62, 23. [34] Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, 2. [35] A. Szabó and N.S. Ostlund. Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory. Dover Books on Chemistry Series. Dover Publications, 996. [36] A. Uschmajew. Local convergence of the alternating least squares algorithm for canonical tensor approximation. SIAM Journal on Matrix Analysis and Applications, 33(2): ,

Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition

Tensor networks, TT (Matrix Product States) and Hierarchical Tucker decomposition R. Schneider (TUB Matheon) John von Neumann Lecture TU Munich, 2012 Setting - Tensors V ν := R n, H d = H := d ν=1 V ν