Computing and decomposing tensors

Size: px

Start display at page:

Download "Computing and decomposing tensors"

Benjamin Marsh
5 years ago
Views:

1 Computing and decomposing tensors Tensor rank decomposition Nick Vannieuwenhoven (FWO / KU Leuven)

2 Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

3 Sensitivity Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

4 Sensitivity Condition numbers Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

5 Sensitivity Condition numbers Sensitivity In numerical computations, the sensitivity of the output of a computation to perturbations at the input is extremely important. We have already seen an example: computing singular values appears to behave nicely with respect to perturbations.

6 Sensitivity Condition numbers Consider the matrix A = Computing the singular value decomposition ÛŜ V T of the floating-point representation Ã of A numerically using Matlab, we find A ÛŜ V T The singular values are numerical exact = = In all cases, we found 6 correct digits of the exact solution.

7 Sensitivity Condition numbers However, when comparing the computed left singular vector corresponding to σ = to the exact solution, we get numerical exact We have only recovered digits correctly, even though the matrix ÛŜ V T contains at least 5 correct digits of each entry. How is this possible?

8 Sensitivity Condition numbers We say that the problem of computing the singular values has a different sensitivity to perturbations than the computational problem of computing the left singular vectors. Assuming the singular values are distinct, these problems can be modeled as functions f : F m n F min{m,n}, respectively f : F m n F m min{m,n}. What we have observed above is that 0.4 f (x) f (x + δx) δx f (x) f (x + δx) δx 800 at least x = A and δx = Ã A (with δx ).

9 Sensitivity Condition numbers Condition numbers The condition number quantifies the worst-case sensitivity of f to perturbations of the input. f (y) κɛ x ɛ y f (x) f (y) f (x) κ[f ](x) := lim sup ɛ 0 y x. y B ɛ(x)

10 Sensitivity Condition numbers If f : F M X Y F N is a differentiable function, then the condition number is fully determined by the first-order approximation of f. Indeed, in this case we have f (x + ) = f (x) + J + o( ), where J is the Jacobian matrix containing all first-order partial derivatives. Then, κ = lim sup ɛ 0 ɛ J = max = = J. f (x) + J + o( ) f (x)

11 Sensitivity Condition numbers More generally, for manifolds, we can apply Rice s (966) geometric framework of conditioning: Proposition (Rice, 966) Let X F m be a manifold of inputs and Y F n a manifold of outputs with dim X = dim Y. Then, the condition number of F : X Y at x 0 X is κ[f ](x 0 ) = d x0 F = sup d x0 F (x), x = where d x0 F : T x0 X T F (x0 )Y is the derivative. See, e.g., Blum, Cucker, Shub, and Smale (998) or Bürgisser and Cucker (03) for a more modern treatment.

12 Sensitivity Tensor rank decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

13 Sensitivity Tensor rank decomposition The tensor rank decomposition problem In the remainder, S denotes the smooth manifold of rank- tensors in R n n d, called the Segre manifold. To determine the condition number of computing a CPD, we analyze the addition map: Φ r : S S R n n d (A,..., A r ) A + + A r Note that the domain and codomain are smooth manifolds.

14 Sensitivity Tensor rank decomposition From differential geometry, we know that Φ r is a local diffeomorphism onto its image if the derivative d x Φ r is injective, i.e., has maximal rank equal to r dim S. Let a = (A,..., A r ) (S ) r and A = Φ r (a). If d Φr a is injective, then by the Inverse Function Theorem there exists a local inverse function Φ a : E F, where E Im(Φ r ) is an open neighborhood of A, F (S ) r is an open neighborhood of a, and Φ r Φ a = Id E and Φ a Φ r = Id F. In other words, Φ a computes one particular ordered CPD of A. This function has condition number κ[φ a ](A) = d A Φ a = (d a Φ r ).

15 Sensitivity Tensor rank decomposition The derivative d a Φ is seen to be the map d a Φ : T A S T Ar S T A R n n d (Ȧ,..., Ȧ r ) Ȧ + + Ȧ r. Hence, if U i is an orthonormal basis of T Ai S T Ai R n n d, then the map is represented in coordinates as the matrix U = [ U U U r ] R n n d r dim S Summarizing, if we are given an ordered CPD a of A, then the condition number of computing this ordered CPD may be computed as the inverse of the smallest singular value of U.

16 Sensitivity Tensor rank decomposition From the above derivation, it also follows that where κ[φ a ](A) = κ[φ ](A) a = (A π,..., A πr ) for every permutation π S r. Hence, the above gives us the condition number of the tensor decomposition problem at the CPD {A,..., A r }. We write κ({a,..., A r }) := κ[φ a ](A). a

17 Sensitivity Tensor rank decomposition Interpretation If A = A + + A r = B = B + + B r = r a i a d i i= r b i b d i are tensors in R n n d, then for A B F 0 we have the asymptotically sharp bound min r A i B πi π S F κ({a,..., A r }) A B r }{{}}{{ F } i= }{{} condition number backward error forward error i=

18 Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

19 Pencil-based algorithms Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

20 Pencil-based algorithms Pencil-based algorithms A pencil-based algorithm (PBA) is an algorithm for computing the unique rank-r CPD of a tensor A R n n n 3 with n n r. Let A = r a i b i c i, where a i =. i= The matrices A = [a i ], B = [b i ] and C = [c i ] of A are called factor matrices.

21 Pencil-based algorithms Fix a matrix Q R n 3 with two orthonormal columns. The key step in a PBA is the projection step B := (I, I, Q T ) A = r i= a i b i Q T c }{{} i, z i yielding an n n tensor B whose factor matrices are (A, B, Z). Thereafter, we compute the specific orthogonal Tucker decomposition B = (Q, Q, I ) S = r (Q x i) (Q y i) z i, i= where x i = Q T a i and y i = Q T b i.

22 Pencil-based algorithms Let X = [x i / x i ] and Y = [y i / y i ]. Then, the core tensor S R r r has the following two 3-slices: S j = (I, I, e T j ) S = r λ j,i x i y i = X diag(λ j )Y T, j =,, i= where λ j := [z j,i x i y i ]r i= and z i = [z j,i ] j=. If S and S are nonsingular, then we see that S S = X diag(λ ) diag(λ ) X. Thus X is the r r matrix of eigenvectors of S S.

23 Pencil-based algorithms Recall that the factor matrices of B are (A, B, Z) and by definition A = Q X. The rank- tensors can then be recovered by noting that A () = r a i (b i c i ) T =: A(B C) T, i= where B C := [b i c i ] r i= Rn n 3 r. Since A is left invertible, we get A (A A () ) T = A (B C) = [a i (b i c i )] R n n n 3 r which is a matrix containing the rank- tensors of the CPD as columns.

24 Pencil-based algorithms Algorithm : Standard PBA Algorithm input : A tensor A R n n n 3, n n r, of rank r. output: Rank- tensors a i b i c i of the CPD of A. Sample a random n 3 matrix Q with orthonormal columns. B (I, I, Q T ) A; Compute a rank-(r, r, min{r, n 3 }) HOSVD (Q, Q, Q 3 ) S = A; S (I, I, e T ) S; S (I, I, e T ) S; Compute eigendecomposition S S = XDX ; A Q X ; A B C A (A A () ) T ;

25 Pencil-based algorithms Numerical issues A variant of this algorithm is implemented in Tensorlab v3.0 as cpd gevd. Let us perform an experiment with it. We create the first tensor that comes to mind: a rank-5 random tensor A of size 5 5 5: >> Ut{} = randn(5,5); >> Ut{} = randn(5,5); >> Ut{3} = randn(5,5); >> A = cpdgen(ut);

26 Pencil-based algorithms Compute A s decomposition and compare its distance to the input decomposition, relative to the machine precision ɛ 0 6 : >> Ur = cpd_gevd(a, 5); >> E = kr(ut) - kr(ur); >> norm( E(:), ) / eps ans = 8.649e+04 This large number can arise because of a high condition number. However, >> kappa = condition_number( Ut ) ans =.34

27 Pencil-based algorithms It thus appears that there is something wrong with the algorithm, even though it is a mathematically sound algorithm for computing low-rank CPDs. The reason is the following. Theorem (Beltrán, Breiding, V, 08) Many algorithms based on a reduction to R n n are numerically unstable: the forward error produced by the algorithm divided by the backward error is much larger than the condition number on an open set of inputs. These methods should be used with care as they do not necessarily yield the highest attainable precision.

28 Pencil-based algorithms The instability of the algorithm leads to an excess factor ω on top of the condition number of the computational problem: cpd_pba cpd_pba cpd_gevd cpd_pba cpd_pba cpd_gevd Test with random rank- tuples.

29 Alternating least squares methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

30 Alternating least squares methods Alternating least squares methods The problems of computing a CPD of A F n n d and the problem of approximating A by a (low-rank) CPD can be formulated as an optimization problem. For example, min A k F n k r A (A A A d ) F. k=,...,d One of the earliest methods devised specifically for this problem is the alternating least squares (ALS) scheme.

31 Alternating least squares methods The ALS method is based on a reformulation of the objective function; it is equivalent to min A j F n k r, A (k) A k (A A k A k+ A d ) T. j=,...,d for every k =,,..., d. The key observation is that if A j, j k, are fixed, then finding the optimal A k is a linear problem! It is namely A (k) ((A A k A k+ A d ) T ).

32 Alternating least squares methods The standard ALS method then solves the optimization problem by cyclically fixing all but one factor matrix A k. Algorithm : ALS method input : A tensor A F n n d. input : A target rank r. output: Factor matrices (A,..., A d ) of a CPD approximating A. Initialize factor matrices A k F n k r (e.g., entries sampled i.i.d. from N(0, ), or truncated HOSVD); while Not converged do for k =,,..., d do Â k A A k A k+ A d ; A k A (k) (Â k )T ; end end

33 Alternating least squares methods The ALS scheme produces a sequence of incrementally better approximations. However, the convergence properties of the ALS method are very poorly understood. The scheme has accumulation points, but it is not known if they correspond to critical points of the objective function. That is, we do not know if the accumulation points of the ALS scheme correspond to points satisfying the first-order optimality conditions. Uschmajew (0) proved local convergence to critical points where the Hessian is positive semi-definite and of maximal rank.

34 Quasi-Newton optimization methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

35 Quasi-Newton optimization methods Quasi-Newton optimization methods The optimization problem min A k F n k r, A (A A A d ). k=,...,d can be solved using any of the standard optimization methods, such as nonlinear conjugate gradient and quasi Newton methods. We discuss the Gauss Newton method with trust region as globalization method.

36 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

37 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

38 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

39 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

40 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

41 Quasi-Newton optimization methods Newton s method for minimizing a smooth real function f (x) consists of generating a sequence of iterates x 0, x,... where x k+ is the optimal solution of the local second-order Taylor series expansion of f (x), namely where f (x k + p) m xk (p) := f (x k ) + p T g k + pt H k p g k := f (x k ) is the gradient of f at x k, and H k := f (x k ) is the symmetric Hessian matrix of f at x k.

42 Quasi-Newton optimization methods Recall from calculus that the gradient f : R n R with coordinates x i on R n is x f (y) x f (y) := f (y).. x n f (y) The Hessian matrix is x x f (y) x x f (y) x x n f (y) f (y) := x x f (y) x x f (y) x x n f (y).... x n x f (y) x n x f (y) x n x n f (y)

43 Quasi-Newton optimization methods The optimal search direction p according to the second-order model should satisfy the first-order optimality conditions. That is, Hence, 0 = m xk (p) = g k + H k p p = H k g k; this is called the Newton search direction.

44 Quasi-Newton optimization methods Hence, the basic Newton method is obtained. Algorithm 3: Newton s method input : An objective function f. input : A starting point x 0 R n. output: A critical point x of the objective function f k 0; while Not converged do Compute the gradient g k = f (x k ); Compute the Hessian H k = f (x k ); p k H k g k; x k+ x k + p k ; k k + ; end

45 Quasi-Newton optimization methods Newton s method is locally quadratically convergent: if the initial iterate x 0 is sufficiently close to the solution, then x x k+ = O ( x x k ). This means that close to the solution the number of correct digits doubles every step.

46 Quasi-Newton optimization methods For example, Newton s method applied to [ ] [ ] f (x, y) = y 4 x xy converges to the root at (, ) starting from x 0 = (3, 3) x xk k

47 Quasi-Newton optimization methods While Newton s method has great local convergence properties, the plain version is not suitable because: it has no guaranteed global convergence, and the Hessian matrix can be difficult to compute. These problems are addressed respectively by incorporating a trust region scheme, and using a cheap approximation of the Hessian matrix.

48 Quasi-Newton optimization methods The Gauss Newton Hessian approximation A cheap approximation of the Hessian matrix is available for nonlinear least squares problems. In this case, the objective function takes the form f (x) = F (x) = F (x), F (x), where F : Rn R m.

49 Quasi-Newton optimization methods The gradient of a least-squares objective function f at x is f (x) = ( J F (x) ) T F (x), where J F (x) R m n is the Jacobian matrix of F. That is, if F k (x) denotes the kth component function of F, then x F (x) x J F (x) := F (x). x F m (x) x F (x) x F (x) x n F (x) x n F (x)... x F m (x) x n F m (x)

50 Quasi-Newton optimization methods The Hessian matrix of f is f (x) = ( J F (x) ) T ( JF (x) ) + dj F (x), F (x). Near a solution, we hope to have F (x ) 0, so that the last term often has a negligible contribution. This reasoning leads to the Gauss Newton approximation ( JF (x) ) T ( JF (x) ) f (x). Replacing the Hessian with the Gauss Newton approximation yields local linear convergence. If f (x ) = 0 at a solution, then the local convergence is quadratic.

51 Quasi-Newton optimization methods Trust region globalization The method sketched thus far has no global convergence guarantees. Furthermore, the Gauss Newton approximation of the Hessian could be very ill-conditioned resulting in large updates. The trust region globalization scheme can solve both of these problems. Let J k := J F (x k ). The idea is to trust the local second-order model at x k, m xk (p) = f (x k ) + p T g k + pt J T k J kp, only in a small neighborhood around x k.

52 Quasi-Newton optimization methods Instead of taking the unconstrained minimizer of m(x k + p), a trust region method solves the trust region subproblem: min p R n m x k (p) subject to p k, where k > 0 is the trust region radius. p x k

53 Quasi-Newton optimization methods The trust region radius is modified in every step according to the following scheme. Let the computed update direction be p k, with p k k. The trustworthiness of the second order model is defined as ρ k = f (x k) f (x k + p k ) m xk (0) m xk (p k ). If the trustworthiness ρ k > 0 is very high (e.g., ρ k 0.75) and if in addition p k k, then the trust region radius is increased (e.g, k+ = k ). On the other hand, if ρ k β is very low (e.g., ρ k 0.5), then the trust region radius is decreased (e.g., k+ = k /4).

54 Quasi-Newton optimization methods The trustworthiness ρ k is also used to decide whether or not to accept a step in the direction of the computed p k. If ρ k γ β is very small (e.g., ρ k 0.), then the search direction p k is rejected. Otherwise, p k is accepted as a good direction, and we set x k+ = x k + p k.

55 Quasi-Newton optimization methods For (approximately) solving the trust region subproblem min m p R n x k (p) subject to p k, one can exploit the following fact. If the unconstrained minimizer p k = (JT k J k) g k = (J T k J k) J T k r k = J k r k where r k := F (x k ), falls within the trust region, p k k then this is the optimal solution of the trust region subproblem. Otherwise, there exists a λ > 0 such that the optimal solution p k satisfies (J T k J k + λi )p k = JT k x r with p k = k. Nocedal and Wright (006) discuss strategies for finding λ.

56 Quasi-Newton optimization methods Several variations of this quasi Newton method with trust region globalization are implemented in Tensorlab as cpd nls. See Sorber, Van Barel, De Lathauwer (03) for details.

57 Riemannian quasi Newton optimization methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

58 Riemannian quasi Newton optimization methods Riemannian optimization An alternative way to formulate the approximation of a tensor by a low-rank CPD consists of optimizing over a product of Segre manifolds: min Φ r (A,..., A r ) A F. (A,...,A r ) (S S ) This is an optimization of a differentiable function, over a smooth manifold. These problems are studied in Riemannian optimization.

59 Riemannian quasi Newton optimization methods In general, if M R N is an m-dimensional smooth manifold and F : M R n a smooth function, then min x M F (x) is a Riemannian optimization problem that can be solved by, e.g., a Riemannian Gauss Newton method; see Absil, Mahoney, Sepulchre (008).

60 Riemannian quasi Newton optimization methods In a RGN method, the objective function f (x) = Φ r (x) A is locally approximated at a S r by the quadratic model where m a (t) := f (a) + d a f, t + t, (d p Φ r d a Φ r )(t), H a := d a Φ r d a Φ r is the GN Hessian approximation, and, is the inner product inherited from the ambient R N. Note that the domain of m a is now T a S r.

61 Riemannian quasi Newton optimization methods As before, the RGN method with trust region considers the model to be accurate only in a radius about a. p a The trust region subproblem (TRS) is min m a (t) subject to t, t T as r whose solution p T a S r yields the next search direction.

62 Riemannian quasi Newton optimization methods In Breiding, V (08), the TRS is solved by combining a standard dogleg step with a hot restarting scheme. Let g a be the coordinate representation of d a f, and let H a be the matrix of d a Φ r d a Φ r. The dogleg step approximates the solution p of the TRS by p N = H ag a if p N p = p C = gt a Haga g ga T g a if p N > and p C a p I := p C + (τ )(p N p C ) s.t. p I =, otherwise where τ is the solution of p C + (τ )(p N p C ) =..

63 Riemannian quasi Newton optimization methods The Newton direction p N = H ag a. is vital to the dogleg step. Unfortunately, the H a = d a Φ r d a Φ r can be close to a singular matrix. In fact, where m = dim S r. H a = =: κ(a), ς m (d a Φ r ) H a is ill-conditioned if and only if the CPD is ill-conditioned at a. Whenever H a is close to a singular matrix we suggest to apply random perturbations to the current decomposition a until H a is sufficiently well-behaved. We call this hot restarting.

64 Riemannian quasi Newton optimization methods Retraction We need to advance from a S r to a S r, along the direction p. However, while a + p T a S r, this point does not lie in S r! T a S r a p R a (p) S r We need a retraction operator (Absil, Mahoney, Sepulchre, 008) for smoothly mapping a neighborhood of 0 T a S r back to S r.

65 Riemannian quasi Newton optimization methods Given a retraction operator R for S, a retraction operator R for the product manifold S r = S S at a = (A,..., A r ) is R a ( ) := (R A R A R A r )( ), which is called the product retraction. Some known retraction operators for S are the rank-(,..., ) T-HOSVD, and the rank-(,..., ) ST-HOSVD, both essentially proved by (Kressner, Steinlechner, Vandereycken, 04).

66 Riemannian quasi Newton optimization methods RGN with trust region method: S. Choose random initial points A i S. S. Let a () (A,..., A r ), and set k 0. S3. Choose a trust region radius > 0. S4. While not converged, do: S4.. Solve the trust region subproblem, resulting in p k T a S r. S4.. Compute the tentative next iterate a (k+) R a (k)(p k ) via a retraction in the direction of p k from p (k). S4.3. Accept or reject the next iterate. If the former, increment k. S4.4. Update the trust region radius. The details can be found in Breiding, V (08).

67 Riemannian quasi Newton optimization methods Numerical experiments We compare the RGN method of (Breiding, V, 08) with some state-of-the-art nonlinear least squares solvers in Tensorlab v3.0 (Vervliet et al., 06), namely nls lm and nls gndl, both with the LargeScale option turned off and on.

68 Riemannian quasi Newton optimization methods We consider parameterized tensors in R n n n 3 with varying condition numbers. There are three parameters: c [0, ] regulates the colinearity of the factor matrices s regulates the scaling, and 3 r is the rank. Typically, increasing c increases the geometric condition number. increasing s increases the classic condition number. 3 increasing r decreases the probability of finding a decomposition. See the afternotes for the precise construction.

69 Riemannian quasi Newton optimization methods The true rank-r tensor is then A = r a i a i a 3 i. i= Finally, we normalize the tensor and add random Gaussian noise E R n n n 3 of magnitude τ: B = A E + τ. A F E F The tensor B is the one we would like to approximate by a tensor of rank r.

70 Riemannian quasi Newton optimization methods We will choose k random starting points and then apply each of the methods to each of the starting points. The key performance criterion (on a single processor) is the expected time to success (TTS). Let the probability of success be p S, the probability of failure be p F = p S, 3 a successful decomposition take m S seconds, and 4 a failed decomposition take m F seconds. Then, the expected time to a first success is E[TTS] = k=0 p k F p S (m S + (k )m F ) = p Sm S + p F m F. p S

71 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 3

72 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 5

73 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 7

74 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, 3 9 tensors GNDL GNDL-PCG s failed inf failed inf inf r r noise level τ = 0 5

75 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

76 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

77 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

78 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

79 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

80 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

81 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

82 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

83 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

84 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

85 Tensorlab Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

86 Tensorlab Tensorlab is a package facilitating computations with both structured and unstructured tensors in Matlab. Development on Tensorlab v.0 started in 0 at KU Leuven by the research group of L. De Lathauwer. The current version, v3.0, was released in March 06. Version 4.0 is anticipated next week!

87 Tensorlab Tensorlab focuses on interpretable models, offering algorithms for working with Tucker decompositions, tensor rank decompositions, and certain block term decompositions. In addition, it offers a flexible framework called structured data fusion in which the various decompositions can be coupled or fused while imposing additional structural constraints, such as symmetry, sparsity and nonnegativity.

88 Tensorlab Basic operations Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

89 Tensorlab Basic operations A dense tensor is represented as a plain Matlab array: A = randn ( 3, 5, 7, ) ; A ( :, :, :, : ) ans ( :, :,, ) = ans ( :, :,, ) = ans ( :, :,, ) = ans ( :, :,, ) =

90 Tensorlab Basic operations The Frobenius norm of a tensor is computed as follows. A = reshape ( : 0 0, [ 5 0 ] ) ; frob (A) ans = frob (A, squared ) 00*0*0/6 ans = 0

91 Tensorlab Basic operations Flattenings are computed as follows. A = reshape ( : 4, [ 3 4 ] ) ; % mode f l a t t e n i n g tensmat (A, [ ], [ 3 ] ) ans = % mode f l a t t e n i n g tensmat (A, [ ], [ ] ) ans =

92 Tensorlab Basic operations Multilinear multiplications are computed as follows. A = randn ( 5, 5, 5 ) ; U = randn ( 5, 5 ) ; U = randn ( 5, 5 ) ; U3 = randn ( 5, 5 ) ; T = tmprod (A, {U, U, U3}, : 3 ) ; X = tmprod (A, {U, U3}, : 3, H ) ; X = tmprod (A, {U, U3 }, : 3 ) ; frob (X X) ans = 0

93 Tensorlab Basic operations Tensorlab has a nice way for visualizing third-order tensors. U = {. 5. ˆ ( 0 : 5 ), l i n s p a c e ( 0,, 5 0 ), * abs ( s i n ( l i n s p a c e (0,4* pi, 5 0 ) ) ) } ; A = cpdgen (U ) ; voxel3 (A) produces the following graphic:

94 Tensorlab Tucker decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

95 Tensorlab Tucker decomposition Given a core tensor and the factor matrices, Tensorlab can generate the full tensor represented by this Tucker decomposition as follows. S = randn ( 5, 5, 5 ) ; U = {randn ( 4, 5 ), randn ( 6, 5 ), randn ( 7, 5 ) } ; T = lmlragen (U, S ) ; s i z e (T) ans = % Compare with d e f i n i t i o n T = tmprod (S, U, : 3 ) ; frob (T T) ans =.435 e 5

96 Tensorlab Tucker decomposition The HOSVD is called the multilinear singular value decomposition in Tensorlab and can be computed as follows. [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C ) ; [ U, S, sv ] = mlsvd (A ) ; % 0.3 s

97 Tensorlab Tucker decomposition The factor-k singular values can be plotted by running f o r k = : 4, semilogy ( sv { k }, x ), hold a l l, end legend ( f a c t o r, f a c t o r, f a c t o r 3, f a c t o r 4 ) which produces the graphic

98 Tensorlab Tucker decomposition Of practical importance are truncated HOSVDs. The default strategy in Tensorlab is sequential truncation, while parallel truncation is provided as an option. [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C) + e 5*randn ( [ ] ) ; % S e q u e n t i a l t r u n c a t i o n [ U, S ] = mlsvd (A, [ ] ) ; % 6.5 s [ U, S ] = mlsvd (A, 5e, : 3 ) ; % 4.6 s % P a r a l l e l t r u n c a t i o n [ V, T ] = mlsvd (A, [ ], 0 ) ; % s [ V, T ] = mlsvd (A, 5e, 0 ) ; % s

99 Tensorlab Tucker decomposition Tensorlab also offers optimization algorithms for seeking the optimal low multilinear rank approximation of a tensor. By default Tensorlab chooses an initial point by either a fast approximation to the ST-HOSVD via randomized SVDs (mlsvd rsi), or by adaptive cross approximation (lmlra aca). The basic usage is as follows: [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C ) ; % Compute l m l r a u s i n g d e f a u l t s e t t i n g s. [ U, S ] = lmlra (A, [ ] ) ;

100 Tensorlab Tensor rank decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

101 Tensorlab Tensor rank decomposition Tensorlab represents a CPD by a cell array containing its factor matrices. For example, random factor matrices can be generated as follows: s i z e T e n s o r = [ ] ; rnk = 5 ; F = cpd rnd ( s i z e T e n s o r, rnk ) F = x3 c e l l a r r a y [ 0 x5 double ] [ x5 double ] [ 8 7 x5 double ] The tensor represented by these factor matrices can be generated as follows: A = cpdgen (F ) ;

102 Tensorlab Tensor rank decomposition Tensorlab offers several algorithms for computing CPDs. A deterministic but unstable PBA that can be applied if the rank is smaller than two of the dimensions is cpd gevd. tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; af = cpd gevd (A, 5 ) ; frobcpdres (A, af ) ans =.966 e Warning cpd gevd returns random factor matrices if the assumptions of the method are not satisfied.

103 Tensorlab Tensor rank decomposition Several optimization methods are implemented in Tensorlab for approximating a tensor by a low-rank CPD. The advised way for computing a CPD is via the driver routine cpd, which automatically performs several steps: optional Tucker compression, choice of initialization (cpd gevd if possible, otherwise random), 3 optional decompression and refinement if compression was applied.

104 Tensorlab Tensor rank decomposition The basic usage is as follows: tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; af = cpd (A, 5 ) ; frobcpdres (A, af ) ans = 5.09 e 4

105 Tensorlab Tensor rank decomposition Optionally, some algorithm options can be specified. The most important options are the following. Compression: the string auto, the boolean false, or a function handle to the Tucker compression algorithm, for example mlsvd, lmlra aca or mlsvd rsi. Initialization: the string auto, or a function handle to the initialization method that should be employed, for example cpd rnd or cpd gevd. Algorithm: a function handle to the optimization method for computing the tensor rank decomposition, for example cpd als, cpd nls or cpd minf. AlgorithmOptions: a structure containing the options to for the optimization method.

106 Tensorlab Tensor rank decomposition The algorithm options are important in their own right, as they determine how much computational effort is spent and how accurate the returned solution will be. The main options are the following. TolAbs: The tolerance for the squared error between the tensor represented by the current factor matrices and the given tensor. If it is less than this value, the optimization method halts successfully. TolFun: The tolerance for the relative change in function value. If it is less than this value, the optimization method halts successfully. TolX: The tolerance for the relative change in factor matrices. If it is less than this value, the optimization method halts successfully. MaxIter: The maximum number of iterations the optimization method will perform before giving up.

107 Tensorlab Tensor rank decomposition For example, a l g O p t i o n s = s t r u c t ; a l g O p t i o n s. TolAbs = e ; a l g O p t i o n s. TolFun = e ; a l g O p t i o n s. TolX = e 8; a l g O p t i o n s. MaxIter = 5 00; a l g O p t i o n s. Algorithm n l s g n d l ; o p t i o n s = s t r u c t ; o p t i o n s. Compression = f a l s e ; o p t i o n s. I n i t i a l i z a t i o n rnd ; o p t i o n s. Algorithm c p d n l s ; o p t i o n s. A l g o r i t h m O p t i o n s = a l g O p t i o n s ;

108 Tensorlab Tensor rank decomposition % C r e a t e an e a s y problem tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; % S o l v e u s i n g o p t i o n s [ af, out ] = cpd (A, 5, o p t i o n s ) ; frobcpdres (A, af ) ans = e 0

109 Tensorlab Tensor rank decomposition You can also ask for detailed information about the progression of the optimization method. out. Algorithm ans = s t r u c t with f i e l d s : Name : c p d n l s a l p h a : [ ] c g i t e r a t i o n s : [ ] c g r e l r e s : [ e e e 07] d e l t a : [ ] f v a l : [ e e e ] i n f o : 4 % A b s o l u t e t o l e r a n c e r e a c h e d i n f o p s : [ ] i t e r a t i o n s : 5 r e l e r r : 5.9 e 4 r e l f v a l : [ e e 5] r e l s t e p : [ e 08] rho : [ ]

110 References Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

111 References References for sensitivity Bürgisser, Cucker, Condition: The Geometry of Numerical Algorithms, Springer, 03. Blum, Cucker, Shub, Smale, Complexity and Real Computation, Springer, 998. Breiding, Vannieuwenhoven, The condition number of join decompositions, SIAM Journal on Matrix Analysis, 07. Rice, A theory of condition, SIAM Journal on Numerical Analysis, 966.

112 References References for algorithms Absil, Mahony, Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 008. Beltrán, Breiding, Vannieuwenhoven, Pencil-based algorithms for tensor decomposition are unstable, arxiv: , 08. Breiding, Vannieuwenhoven, A Riemannian trust region method for the canonical tensor rank approximation problem, SIAM Journal on Optimization, 08. Kolda, Bader, Tensor decompositions and applications, SIAM Review, 008. Kressner, Steinlechner, Vandereycken, Low-Rank tensor completion by Riemannian optimization, BIT, 04. Leurgans, Ross, Abel, A decomposition for three-way arrays, SIAM Journal on Matrix Analysis and Applications, 993. Nocedal, Wright, Numerical Optimization, Springer, 006.

113 References References for algorithms Sorber, Van Barel, De Lathauwer, Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(l r, L r, ) terms, and a new generalization, SIAM Journal on Optimization, 03. Uschmajew, Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM Journal on Matrix Analysis and Applications, 0.

114 References References for Tensorlab Sidiropoulos, De Lathauwer, Fu, Huang, Papalexakis, Faloutsos, Tensor decompositions for signal processing and machine learning, IEEE Transactions on Signal Processing, 07. Sorber, Van Barel, De Lathauwer, Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(l r, L r, ) terms, and a new generalization, SIAM Journal on Optimization, 03. Vervliet, Debals, Sorber, Van Barel, De Lathauwer, Tensorlab v3.0, Vervliet, De Lathauwer, Numerical optimization based algorithms for data fusion, 08.

Pencil-based algorithms for tensor rank decomposition are unstable

Pencil-based algorithms for tensor rank decomposition are unstable September 13, 2018 Nick Vannieuwenhoven (FWO / KU Leuven) Introduction Overview 1 Introduction 2 Sensitivity 3 Pencil-based algorithms