Computing and decomposing tensors

Size: px
Start display at page:

Download "Computing and decomposing tensors"

Transcription

1 Computing and decomposing tensors Tensor rank decomposition Nick Vannieuwenhoven (FWO / KU Leuven)

2 Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

3 Sensitivity Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

4 Sensitivity Condition numbers Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

5 Sensitivity Condition numbers Sensitivity In numerical computations, the sensitivity of the output of a computation to perturbations at the input is extremely important. We have already seen an example: computing singular values appears to behave nicely with respect to perturbations.

6 Sensitivity Condition numbers Consider the matrix A = Computing the singular value decomposition ÛŜ V T of the floating-point representation à of A numerically using Matlab, we find A ÛŜ V T The singular values are numerical exact = = In all cases, we found 6 correct digits of the exact solution.

7 Sensitivity Condition numbers However, when comparing the computed left singular vector corresponding to σ = to the exact solution, we get numerical exact We have only recovered digits correctly, even though the matrix ÛŜ V T contains at least 5 correct digits of each entry. How is this possible?

8 Sensitivity Condition numbers We say that the problem of computing the singular values has a different sensitivity to perturbations than the computational problem of computing the left singular vectors. Assuming the singular values are distinct, these problems can be modeled as functions f : F m n F min{m,n}, respectively f : F m n F m min{m,n}. What we have observed above is that 0.4 f (x) f (x + δx) δx f (x) f (x + δx) δx 800 at least x = A and δx = Ã A (with δx ).

9 Sensitivity Condition numbers Condition numbers The condition number quantifies the worst-case sensitivity of f to perturbations of the input. f (y) κɛ x ɛ y f (x) f (y) f (x) κ[f ](x) := lim sup ɛ 0 y x. y B ɛ(x)

10 Sensitivity Condition numbers If f : F M X Y F N is a differentiable function, then the condition number is fully determined by the first-order approximation of f. Indeed, in this case we have f (x + ) = f (x) + J + o( ), where J is the Jacobian matrix containing all first-order partial derivatives. Then, κ = lim sup ɛ 0 ɛ J = max = = J. f (x) + J + o( ) f (x)

11 Sensitivity Condition numbers More generally, for manifolds, we can apply Rice s (966) geometric framework of conditioning: Proposition (Rice, 966) Let X F m be a manifold of inputs and Y F n a manifold of outputs with dim X = dim Y. Then, the condition number of F : X Y at x 0 X is κ[f ](x 0 ) = d x0 F = sup d x0 F (x), x = where d x0 F : T x0 X T F (x0 )Y is the derivative. See, e.g., Blum, Cucker, Shub, and Smale (998) or Bürgisser and Cucker (03) for a more modern treatment.

12 Sensitivity Tensor rank decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

13 Sensitivity Tensor rank decomposition The tensor rank decomposition problem In the remainder, S denotes the smooth manifold of rank- tensors in R n n d, called the Segre manifold. To determine the condition number of computing a CPD, we analyze the addition map: Φ r : S S R n n d (A,..., A r ) A + + A r Note that the domain and codomain are smooth manifolds.

14 Sensitivity Tensor rank decomposition From differential geometry, we know that Φ r is a local diffeomorphism onto its image if the derivative d x Φ r is injective, i.e., has maximal rank equal to r dim S. Let a = (A,..., A r ) (S ) r and A = Φ r (a). If d Φr a is injective, then by the Inverse Function Theorem there exists a local inverse function Φ a : E F, where E Im(Φ r ) is an open neighborhood of A, F (S ) r is an open neighborhood of a, and Φ r Φ a = Id E and Φ a Φ r = Id F. In other words, Φ a computes one particular ordered CPD of A. This function has condition number κ[φ a ](A) = d A Φ a = (d a Φ r ).

15 Sensitivity Tensor rank decomposition The derivative d a Φ is seen to be the map d a Φ : T A S T Ar S T A R n n d (Ȧ,..., Ȧ r ) Ȧ + + Ȧ r. Hence, if U i is an orthonormal basis of T Ai S T Ai R n n d, then the map is represented in coordinates as the matrix U = [ U U U r ] R n n d r dim S Summarizing, if we are given an ordered CPD a of A, then the condition number of computing this ordered CPD may be computed as the inverse of the smallest singular value of U.

16 Sensitivity Tensor rank decomposition From the above derivation, it also follows that where κ[φ a ](A) = κ[φ ](A) a = (A π,..., A πr ) for every permutation π S r. Hence, the above gives us the condition number of the tensor decomposition problem at the CPD {A,..., A r }. We write κ({a,..., A r }) := κ[φ a ](A). a

17 Sensitivity Tensor rank decomposition Interpretation If A = A + + A r = B = B + + B r = r a i a d i i= r b i b d i are tensors in R n n d, then for A B F 0 we have the asymptotically sharp bound min r A i B πi π S F κ({a,..., A r }) A B r }{{}}{{ F } i= }{{} condition number backward error forward error i=

18 Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

19 Pencil-based algorithms Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

20 Pencil-based algorithms Pencil-based algorithms A pencil-based algorithm (PBA) is an algorithm for computing the unique rank-r CPD of a tensor A R n n n 3 with n n r. Let A = r a i b i c i, where a i =. i= The matrices A = [a i ], B = [b i ] and C = [c i ] of A are called factor matrices.

21 Pencil-based algorithms Fix a matrix Q R n 3 with two orthonormal columns. The key step in a PBA is the projection step B := (I, I, Q T ) A = r i= a i b i Q T c }{{} i, z i yielding an n n tensor B whose factor matrices are (A, B, Z). Thereafter, we compute the specific orthogonal Tucker decomposition B = (Q, Q, I ) S = r (Q x i) (Q y i) z i, i= where x i = Q T a i and y i = Q T b i.

22 Pencil-based algorithms Let X = [x i / x i ] and Y = [y i / y i ]. Then, the core tensor S R r r has the following two 3-slices: S j = (I, I, e T j ) S = r λ j,i x i y i = X diag(λ j )Y T, j =,, i= where λ j := [z j,i x i y i ]r i= and z i = [z j,i ] j=. If S and S are nonsingular, then we see that S S = X diag(λ ) diag(λ ) X. Thus X is the r r matrix of eigenvectors of S S.

23 Pencil-based algorithms Recall that the factor matrices of B are (A, B, Z) and by definition A = Q X. The rank- tensors can then be recovered by noting that A () = r a i (b i c i ) T =: A(B C) T, i= where B C := [b i c i ] r i= Rn n 3 r. Since A is left invertible, we get A (A A () ) T = A (B C) = [a i (b i c i )] R n n n 3 r which is a matrix containing the rank- tensors of the CPD as columns.

24 Pencil-based algorithms Algorithm : Standard PBA Algorithm input : A tensor A R n n n 3, n n r, of rank r. output: Rank- tensors a i b i c i of the CPD of A. Sample a random n 3 matrix Q with orthonormal columns. B (I, I, Q T ) A; Compute a rank-(r, r, min{r, n 3 }) HOSVD (Q, Q, Q 3 ) S = A; S (I, I, e T ) S; S (I, I, e T ) S; Compute eigendecomposition S S = XDX ; A Q X ; A B C A (A A () ) T ;

25 Pencil-based algorithms Numerical issues A variant of this algorithm is implemented in Tensorlab v3.0 as cpd gevd. Let us perform an experiment with it. We create the first tensor that comes to mind: a rank-5 random tensor A of size 5 5 5: >> Ut{} = randn(5,5); >> Ut{} = randn(5,5); >> Ut{3} = randn(5,5); >> A = cpdgen(ut);

26 Pencil-based algorithms Compute A s decomposition and compare its distance to the input decomposition, relative to the machine precision ɛ 0 6 : >> Ur = cpd_gevd(a, 5); >> E = kr(ut) - kr(ur); >> norm( E(:), ) / eps ans = 8.649e+04 This large number can arise because of a high condition number. However, >> kappa = condition_number( Ut ) ans =.34

27 Pencil-based algorithms It thus appears that there is something wrong with the algorithm, even though it is a mathematically sound algorithm for computing low-rank CPDs. The reason is the following. Theorem (Beltrán, Breiding, V, 08) Many algorithms based on a reduction to R n n are numerically unstable: the forward error produced by the algorithm divided by the backward error is much larger than the condition number on an open set of inputs. These methods should be used with care as they do not necessarily yield the highest attainable precision.

28 Pencil-based algorithms The instability of the algorithm leads to an excess factor ω on top of the condition number of the computational problem: cpd_pba cpd_pba cpd_gevd cpd_pba cpd_pba cpd_gevd Test with random rank- tuples.

29 Alternating least squares methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

30 Alternating least squares methods Alternating least squares methods The problems of computing a CPD of A F n n d and the problem of approximating A by a (low-rank) CPD can be formulated as an optimization problem. For example, min A k F n k r A (A A A d ) F. k=,...,d One of the earliest methods devised specifically for this problem is the alternating least squares (ALS) scheme.

31 Alternating least squares methods The ALS method is based on a reformulation of the objective function; it is equivalent to min A j F n k r, A (k) A k (A A k A k+ A d ) T. j=,...,d for every k =,,..., d. The key observation is that if A j, j k, are fixed, then finding the optimal A k is a linear problem! It is namely A (k) ((A A k A k+ A d ) T ).

32 Alternating least squares methods The standard ALS method then solves the optimization problem by cyclically fixing all but one factor matrix A k. Algorithm : ALS method input : A tensor A F n n d. input : A target rank r. output: Factor matrices (A,..., A d ) of a CPD approximating A. Initialize factor matrices A k F n k r (e.g., entries sampled i.i.d. from N(0, ), or truncated HOSVD); while Not converged do for k =,,..., d do  k A A k A k+ A d ; A k A (k) ( k )T ; end end

33 Alternating least squares methods The ALS scheme produces a sequence of incrementally better approximations. However, the convergence properties of the ALS method are very poorly understood. The scheme has accumulation points, but it is not known if they correspond to critical points of the objective function. That is, we do not know if the accumulation points of the ALS scheme correspond to points satisfying the first-order optimality conditions. Uschmajew (0) proved local convergence to critical points where the Hessian is positive semi-definite and of maximal rank.

34 Quasi-Newton optimization methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

35 Quasi-Newton optimization methods Quasi-Newton optimization methods The optimization problem min A k F n k r, A (A A A d ). k=,...,d can be solved using any of the standard optimization methods, such as nonlinear conjugate gradient and quasi Newton methods. We discuss the Gauss Newton method with trust region as globalization method.

36 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

37 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

38 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

39 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

40 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea

41 Quasi-Newton optimization methods Newton s method for minimizing a smooth real function f (x) consists of generating a sequence of iterates x 0, x,... where x k+ is the optimal solution of the local second-order Taylor series expansion of f (x), namely where f (x k + p) m xk (p) := f (x k ) + p T g k + pt H k p g k := f (x k ) is the gradient of f at x k, and H k := f (x k ) is the symmetric Hessian matrix of f at x k.

42 Quasi-Newton optimization methods Recall from calculus that the gradient f : R n R with coordinates x i on R n is x f (y) x f (y) := f (y).. x n f (y) The Hessian matrix is x x f (y) x x f (y) x x n f (y) f (y) := x x f (y) x x f (y) x x n f (y).... x n x f (y) x n x f (y) x n x n f (y)

43 Quasi-Newton optimization methods The optimal search direction p according to the second-order model should satisfy the first-order optimality conditions. That is, Hence, 0 = m xk (p) = g k + H k p p = H k g k; this is called the Newton search direction.

44 Quasi-Newton optimization methods Hence, the basic Newton method is obtained. Algorithm 3: Newton s method input : An objective function f. input : A starting point x 0 R n. output: A critical point x of the objective function f k 0; while Not converged do Compute the gradient g k = f (x k ); Compute the Hessian H k = f (x k ); p k H k g k; x k+ x k + p k ; k k + ; end

45 Quasi-Newton optimization methods Newton s method is locally quadratically convergent: if the initial iterate x 0 is sufficiently close to the solution, then x x k+ = O ( x x k ). This means that close to the solution the number of correct digits doubles every step.

46 Quasi-Newton optimization methods For example, Newton s method applied to [ ] [ ] f (x, y) = y 4 x xy converges to the root at (, ) starting from x 0 = (3, 3) x xk k

47 Quasi-Newton optimization methods While Newton s method has great local convergence properties, the plain version is not suitable because: it has no guaranteed global convergence, and the Hessian matrix can be difficult to compute. These problems are addressed respectively by incorporating a trust region scheme, and using a cheap approximation of the Hessian matrix.

48 Quasi-Newton optimization methods The Gauss Newton Hessian approximation A cheap approximation of the Hessian matrix is available for nonlinear least squares problems. In this case, the objective function takes the form f (x) = F (x) = F (x), F (x), where F : Rn R m.

49 Quasi-Newton optimization methods The gradient of a least-squares objective function f at x is f (x) = ( J F (x) ) T F (x), where J F (x) R m n is the Jacobian matrix of F. That is, if F k (x) denotes the kth component function of F, then x F (x) x J F (x) := F (x). x F m (x) x F (x) x F (x) x n F (x) x n F (x)... x F m (x) x n F m (x)

50 Quasi-Newton optimization methods The Hessian matrix of f is f (x) = ( J F (x) ) T ( JF (x) ) + dj F (x), F (x). Near a solution, we hope to have F (x ) 0, so that the last term often has a negligible contribution. This reasoning leads to the Gauss Newton approximation ( JF (x) ) T ( JF (x) ) f (x). Replacing the Hessian with the Gauss Newton approximation yields local linear convergence. If f (x ) = 0 at a solution, then the local convergence is quadratic.

51 Quasi-Newton optimization methods Trust region globalization The method sketched thus far has no global convergence guarantees. Furthermore, the Gauss Newton approximation of the Hessian could be very ill-conditioned resulting in large updates. The trust region globalization scheme can solve both of these problems. Let J k := J F (x k ). The idea is to trust the local second-order model at x k, m xk (p) = f (x k ) + p T g k + pt J T k J kp, only in a small neighborhood around x k.

52 Quasi-Newton optimization methods Instead of taking the unconstrained minimizer of m(x k + p), a trust region method solves the trust region subproblem: min p R n m x k (p) subject to p k, where k > 0 is the trust region radius. p x k

53 Quasi-Newton optimization methods The trust region radius is modified in every step according to the following scheme. Let the computed update direction be p k, with p k k. The trustworthiness of the second order model is defined as ρ k = f (x k) f (x k + p k ) m xk (0) m xk (p k ). If the trustworthiness ρ k > 0 is very high (e.g., ρ k 0.75) and if in addition p k k, then the trust region radius is increased (e.g, k+ = k ). On the other hand, if ρ k β is very low (e.g., ρ k 0.5), then the trust region radius is decreased (e.g., k+ = k /4).

54 Quasi-Newton optimization methods The trustworthiness ρ k is also used to decide whether or not to accept a step in the direction of the computed p k. If ρ k γ β is very small (e.g., ρ k 0.), then the search direction p k is rejected. Otherwise, p k is accepted as a good direction, and we set x k+ = x k + p k.

55 Quasi-Newton optimization methods For (approximately) solving the trust region subproblem min m p R n x k (p) subject to p k, one can exploit the following fact. If the unconstrained minimizer p k = (JT k J k) g k = (J T k J k) J T k r k = J k r k where r k := F (x k ), falls within the trust region, p k k then this is the optimal solution of the trust region subproblem. Otherwise, there exists a λ > 0 such that the optimal solution p k satisfies (J T k J k + λi )p k = JT k x r with p k = k. Nocedal and Wright (006) discuss strategies for finding λ.

56 Quasi-Newton optimization methods Several variations of this quasi Newton method with trust region globalization are implemented in Tensorlab as cpd nls. See Sorber, Van Barel, De Lathauwer (03) for details.

57 Riemannian quasi Newton optimization methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

58 Riemannian quasi Newton optimization methods Riemannian optimization An alternative way to formulate the approximation of a tensor by a low-rank CPD consists of optimizing over a product of Segre manifolds: min Φ r (A,..., A r ) A F. (A,...,A r ) (S S ) This is an optimization of a differentiable function, over a smooth manifold. These problems are studied in Riemannian optimization.

59 Riemannian quasi Newton optimization methods In general, if M R N is an m-dimensional smooth manifold and F : M R n a smooth function, then min x M F (x) is a Riemannian optimization problem that can be solved by, e.g., a Riemannian Gauss Newton method; see Absil, Mahoney, Sepulchre (008).

60 Riemannian quasi Newton optimization methods In a RGN method, the objective function f (x) = Φ r (x) A is locally approximated at a S r by the quadratic model where m a (t) := f (a) + d a f, t + t, (d p Φ r d a Φ r )(t), H a := d a Φ r d a Φ r is the GN Hessian approximation, and, is the inner product inherited from the ambient R N. Note that the domain of m a is now T a S r.

61 Riemannian quasi Newton optimization methods As before, the RGN method with trust region considers the model to be accurate only in a radius about a. p a The trust region subproblem (TRS) is min m a (t) subject to t, t T as r whose solution p T a S r yields the next search direction.

62 Riemannian quasi Newton optimization methods In Breiding, V (08), the TRS is solved by combining a standard dogleg step with a hot restarting scheme. Let g a be the coordinate representation of d a f, and let H a be the matrix of d a Φ r d a Φ r. The dogleg step approximates the solution p of the TRS by p N = H ag a if p N p = p C = gt a Haga g ga T g a if p N > and p C a p I := p C + (τ )(p N p C ) s.t. p I =, otherwise where τ is the solution of p C + (τ )(p N p C ) =..

63 Riemannian quasi Newton optimization methods The Newton direction p N = H ag a. is vital to the dogleg step. Unfortunately, the H a = d a Φ r d a Φ r can be close to a singular matrix. In fact, where m = dim S r. H a = =: κ(a), ς m (d a Φ r ) H a is ill-conditioned if and only if the CPD is ill-conditioned at a. Whenever H a is close to a singular matrix we suggest to apply random perturbations to the current decomposition a until H a is sufficiently well-behaved. We call this hot restarting.

64 Riemannian quasi Newton optimization methods Retraction We need to advance from a S r to a S r, along the direction p. However, while a + p T a S r, this point does not lie in S r! T a S r a p R a (p) S r We need a retraction operator (Absil, Mahoney, Sepulchre, 008) for smoothly mapping a neighborhood of 0 T a S r back to S r.

65 Riemannian quasi Newton optimization methods Given a retraction operator R for S, a retraction operator R for the product manifold S r = S S at a = (A,..., A r ) is R a ( ) := (R A R A R A r )( ), which is called the product retraction. Some known retraction operators for S are the rank-(,..., ) T-HOSVD, and the rank-(,..., ) ST-HOSVD, both essentially proved by (Kressner, Steinlechner, Vandereycken, 04).

66 Riemannian quasi Newton optimization methods RGN with trust region method: S. Choose random initial points A i S. S. Let a () (A,..., A r ), and set k 0. S3. Choose a trust region radius > 0. S4. While not converged, do: S4.. Solve the trust region subproblem, resulting in p k T a S r. S4.. Compute the tentative next iterate a (k+) R a (k)(p k ) via a retraction in the direction of p k from p (k). S4.3. Accept or reject the next iterate. If the former, increment k. S4.4. Update the trust region radius. The details can be found in Breiding, V (08).

67 Riemannian quasi Newton optimization methods Numerical experiments We compare the RGN method of (Breiding, V, 08) with some state-of-the-art nonlinear least squares solvers in Tensorlab v3.0 (Vervliet et al., 06), namely nls lm and nls gndl, both with the LargeScale option turned off and on.

68 Riemannian quasi Newton optimization methods We consider parameterized tensors in R n n n 3 with varying condition numbers. There are three parameters: c [0, ] regulates the colinearity of the factor matrices s regulates the scaling, and 3 r is the rank. Typically, increasing c increases the geometric condition number. increasing s increases the classic condition number. 3 increasing r decreases the probability of finding a decomposition. See the afternotes for the precise construction.

69 Riemannian quasi Newton optimization methods The true rank-r tensor is then A = r a i a i a 3 i. i= Finally, we normalize the tensor and add random Gaussian noise E R n n n 3 of magnitude τ: B = A E + τ. A F E F The tensor B is the one we would like to approximate by a tensor of rank r.

70 Riemannian quasi Newton optimization methods We will choose k random starting points and then apply each of the methods to each of the starting points. The key performance criterion (on a single processor) is the expected time to success (TTS). Let the probability of success be p S, the probability of failure be p F = p S, 3 a successful decomposition take m S seconds, and 4 a failed decomposition take m F seconds. Then, the expected time to a first success is E[TTS] = k=0 p k F p S (m S + (k )m F ) = p Sm S + p F m F. p S

71 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 3

72 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 5

73 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 7

74 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, 3 9 tensors GNDL GNDL-PCG s failed inf failed inf inf r r noise level τ = 0 5

75 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

76 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

77 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

78 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

79 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

80 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

81 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

82 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

83 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

84 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

85 Tensorlab Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

86 Tensorlab Tensorlab is a package facilitating computations with both structured and unstructured tensors in Matlab. Development on Tensorlab v.0 started in 0 at KU Leuven by the research group of L. De Lathauwer. The current version, v3.0, was released in March 06. Version 4.0 is anticipated next week!

87 Tensorlab Tensorlab focuses on interpretable models, offering algorithms for working with Tucker decompositions, tensor rank decompositions, and certain block term decompositions. In addition, it offers a flexible framework called structured data fusion in which the various decompositions can be coupled or fused while imposing additional structural constraints, such as symmetry, sparsity and nonnegativity.

88 Tensorlab Basic operations Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

89 Tensorlab Basic operations A dense tensor is represented as a plain Matlab array: A = randn ( 3, 5, 7, ) ; A ( :, :, :, : ) ans ( :, :,, ) = ans ( :, :,, ) = ans ( :, :,, ) = ans ( :, :,, ) =

90 Tensorlab Basic operations The Frobenius norm of a tensor is computed as follows. A = reshape ( : 0 0, [ 5 0 ] ) ; frob (A) ans = frob (A, squared ) 00*0*0/6 ans = 0

91 Tensorlab Basic operations Flattenings are computed as follows. A = reshape ( : 4, [ 3 4 ] ) ; % mode f l a t t e n i n g tensmat (A, [ ], [ 3 ] ) ans = % mode f l a t t e n i n g tensmat (A, [ ], [ ] ) ans =

92 Tensorlab Basic operations Multilinear multiplications are computed as follows. A = randn ( 5, 5, 5 ) ; U = randn ( 5, 5 ) ; U = randn ( 5, 5 ) ; U3 = randn ( 5, 5 ) ; T = tmprod (A, {U, U, U3}, : 3 ) ; X = tmprod (A, {U, U3}, : 3, H ) ; X = tmprod (A, {U, U3 }, : 3 ) ; frob (X X) ans = 0

93 Tensorlab Basic operations Tensorlab has a nice way for visualizing third-order tensors. U = {. 5. ˆ ( 0 : 5 ), l i n s p a c e ( 0,, 5 0 ), * abs ( s i n ( l i n s p a c e (0,4* pi, 5 0 ) ) ) } ; A = cpdgen (U ) ; voxel3 (A) produces the following graphic:

94 Tensorlab Tucker decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

95 Tensorlab Tucker decomposition Given a core tensor and the factor matrices, Tensorlab can generate the full tensor represented by this Tucker decomposition as follows. S = randn ( 5, 5, 5 ) ; U = {randn ( 4, 5 ), randn ( 6, 5 ), randn ( 7, 5 ) } ; T = lmlragen (U, S ) ; s i z e (T) ans = % Compare with d e f i n i t i o n T = tmprod (S, U, : 3 ) ; frob (T T) ans =.435 e 5

96 Tensorlab Tucker decomposition The HOSVD is called the multilinear singular value decomposition in Tensorlab and can be computed as follows. [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C ) ; [ U, S, sv ] = mlsvd (A ) ; % 0.3 s

97 Tensorlab Tucker decomposition The factor-k singular values can be plotted by running f o r k = : 4, semilogy ( sv { k }, x ), hold a l l, end legend ( f a c t o r, f a c t o r, f a c t o r 3, f a c t o r 4 ) which produces the graphic

98 Tensorlab Tucker decomposition Of practical importance are truncated HOSVDs. The default strategy in Tensorlab is sequential truncation, while parallel truncation is provided as an option. [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C) + e 5*randn ( [ ] ) ; % S e q u e n t i a l t r u n c a t i o n [ U, S ] = mlsvd (A, [ ] ) ; % 6.5 s [ U, S ] = mlsvd (A, 5e, : 3 ) ; % 4.6 s % P a r a l l e l t r u n c a t i o n [ V, T ] = mlsvd (A, [ ], 0 ) ; % s [ V, T ] = mlsvd (A, 5e, 0 ) ; % s

99 Tensorlab Tucker decomposition Tensorlab also offers optimization algorithms for seeking the optimal low multilinear rank approximation of a tensor. By default Tensorlab chooses an initial point by either a fast approximation to the ST-HOSVD via randomized SVDs (mlsvd rsi), or by adaptive cross approximation (lmlra aca). The basic usage is as follows: [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C ) ; % Compute l m l r a u s i n g d e f a u l t s e t t i n g s. [ U, S ] = lmlra (A, [ ] ) ;

100 Tensorlab Tensor rank decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

101 Tensorlab Tensor rank decomposition Tensorlab represents a CPD by a cell array containing its factor matrices. For example, random factor matrices can be generated as follows: s i z e T e n s o r = [ ] ; rnk = 5 ; F = cpd rnd ( s i z e T e n s o r, rnk ) F = x3 c e l l a r r a y [ 0 x5 double ] [ x5 double ] [ 8 7 x5 double ] The tensor represented by these factor matrices can be generated as follows: A = cpdgen (F ) ;

102 Tensorlab Tensor rank decomposition Tensorlab offers several algorithms for computing CPDs. A deterministic but unstable PBA that can be applied if the rank is smaller than two of the dimensions is cpd gevd. tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; af = cpd gevd (A, 5 ) ; frobcpdres (A, af ) ans =.966 e Warning cpd gevd returns random factor matrices if the assumptions of the method are not satisfied.

103 Tensorlab Tensor rank decomposition Several optimization methods are implemented in Tensorlab for approximating a tensor by a low-rank CPD. The advised way for computing a CPD is via the driver routine cpd, which automatically performs several steps: optional Tucker compression, choice of initialization (cpd gevd if possible, otherwise random), 3 optional decompression and refinement if compression was applied.

104 Tensorlab Tensor rank decomposition The basic usage is as follows: tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; af = cpd (A, 5 ) ; frobcpdres (A, af ) ans = 5.09 e 4

105 Tensorlab Tensor rank decomposition Optionally, some algorithm options can be specified. The most important options are the following. Compression: the string auto, the boolean false, or a function handle to the Tucker compression algorithm, for example mlsvd, lmlra aca or mlsvd rsi. Initialization: the string auto, or a function handle to the initialization method that should be employed, for example cpd rnd or cpd gevd. Algorithm: a function handle to the optimization method for computing the tensor rank decomposition, for example cpd als, cpd nls or cpd minf. AlgorithmOptions: a structure containing the options to for the optimization method.

106 Tensorlab Tensor rank decomposition The algorithm options are important in their own right, as they determine how much computational effort is spent and how accurate the returned solution will be. The main options are the following. TolAbs: The tolerance for the squared error between the tensor represented by the current factor matrices and the given tensor. If it is less than this value, the optimization method halts successfully. TolFun: The tolerance for the relative change in function value. If it is less than this value, the optimization method halts successfully. TolX: The tolerance for the relative change in factor matrices. If it is less than this value, the optimization method halts successfully. MaxIter: The maximum number of iterations the optimization method will perform before giving up.

107 Tensorlab Tensor rank decomposition For example, a l g O p t i o n s = s t r u c t ; a l g O p t i o n s. TolAbs = e ; a l g O p t i o n s. TolFun = e ; a l g O p t i o n s. TolX = e 8; a l g O p t i o n s. MaxIter = 5 00; a l g O p t i o n s. Algorithm n l s g n d l ; o p t i o n s = s t r u c t ; o p t i o n s. Compression = f a l s e ; o p t i o n s. I n i t i a l i z a t i o n rnd ; o p t i o n s. Algorithm c p d n l s ; o p t i o n s. A l g o r i t h m O p t i o n s = a l g O p t i o n s ;

108 Tensorlab Tensor rank decomposition % C r e a t e an e a s y problem tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; % S o l v e u s i n g o p t i o n s [ af, out ] = cpd (A, 5, o p t i o n s ) ; frobcpdres (A, af ) ans = e 0

109 Tensorlab Tensor rank decomposition You can also ask for detailed information about the progression of the optimization method. out. Algorithm ans = s t r u c t with f i e l d s : Name : c p d n l s a l p h a : [ ] c g i t e r a t i o n s : [ ] c g r e l r e s : [ e e e 07] d e l t a : [ ] f v a l : [ e e e ] i n f o : 4 % A b s o l u t e t o l e r a n c e r e a c h e d i n f o p s : [ ] i t e r a t i o n s : 5 r e l e r r : 5.9 e 4 r e l f v a l : [ e e 5] r e l s t e p : [ e 08] rho : [ ]

110 References Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References

111 References References for sensitivity Bürgisser, Cucker, Condition: The Geometry of Numerical Algorithms, Springer, 03. Blum, Cucker, Shub, Smale, Complexity and Real Computation, Springer, 998. Breiding, Vannieuwenhoven, The condition number of join decompositions, SIAM Journal on Matrix Analysis, 07. Rice, A theory of condition, SIAM Journal on Numerical Analysis, 966.

112 References References for algorithms Absil, Mahony, Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 008. Beltrán, Breiding, Vannieuwenhoven, Pencil-based algorithms for tensor decomposition are unstable, arxiv: , 08. Breiding, Vannieuwenhoven, A Riemannian trust region method for the canonical tensor rank approximation problem, SIAM Journal on Optimization, 08. Kolda, Bader, Tensor decompositions and applications, SIAM Review, 008. Kressner, Steinlechner, Vandereycken, Low-Rank tensor completion by Riemannian optimization, BIT, 04. Leurgans, Ross, Abel, A decomposition for three-way arrays, SIAM Journal on Matrix Analysis and Applications, 993. Nocedal, Wright, Numerical Optimization, Springer, 006.

113 References References for algorithms Sorber, Van Barel, De Lathauwer, Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(l r, L r, ) terms, and a new generalization, SIAM Journal on Optimization, 03. Uschmajew, Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM Journal on Matrix Analysis and Applications, 0.

114 References References for Tensorlab Sidiropoulos, De Lathauwer, Fu, Huang, Papalexakis, Faloutsos, Tensor decompositions for signal processing and machine learning, IEEE Transactions on Signal Processing, 07. Sorber, Van Barel, De Lathauwer, Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(l r, L r, ) terms, and a new generalization, SIAM Journal on Optimization, 03. Vervliet, Debals, Sorber, Van Barel, De Lathauwer, Tensorlab v3.0, Vervliet, De Lathauwer, Numerical optimization based algorithms for data fusion, 08.

Pencil-based algorithms for tensor rank decomposition are unstable

Pencil-based algorithms for tensor rank decomposition are unstable Pencil-based algorithms for tensor rank decomposition are unstable September 13, 2018 Nick Vannieuwenhoven (FWO / KU Leuven) Introduction Overview 1 Introduction 2 Sensitivity 3 Pencil-based algorithms

More information

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Convergence analysis of Riemannian GaussNewton methods and its connection with the geometric condition number by Paul Breiding and

More information

Dealing with curse and blessing of dimensionality through tensor decompositions

Dealing with curse and blessing of dimensionality through tensor decompositions Dealing with curse and blessing of dimensionality through tensor decompositions Lieven De Lathauwer Joint work with Nico Vervliet, Martijn Boussé and Otto Debals June 26, 2017 2 Overview Curse of dimensionality

More information

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors

A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors Nico Vervliet Joint work with Lieven De Lathauwer SIAM AN17, July 13, 2017 2 Classification of hazardous

More information

/16/$ IEEE 1728

/16/$ IEEE 1728 Extension of the Semi-Algebraic Framework for Approximate CP Decompositions via Simultaneous Matrix Diagonalization to the Efficient Calculation of Coupled CP Decompositions Kristina Naskovska and Martin

More information

Decompositions of Higher-Order Tensors: Concepts and Computation

Decompositions of Higher-Order Tensors: Concepts and Computation L. De Lathauwer Decompositions of Higher-Order Tensors: Concepts and Computation Lieven De Lathauwer KU Leuven Belgium Lieven.DeLathauwer@kuleuven-kulak.be 1 L. De Lathauwer Canonical Polyadic Decomposition

More information

The multiple-vector tensor-vector product

The multiple-vector tensor-vector product I TD MTVP C KU Leuven August 29, 2013 In collaboration with: N Vanbaelen, K Meerbergen, and R Vandebril Overview I TD MTVP C 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition

More information

A variable projection method for block term decomposition of higher-order tensors

A variable projection method for block term decomposition of higher-order tensors A variable projection method for block term decomposition of higher-order tensors Guillaume Olikier 1, P.-A. Absil 1, and Lieven De Lathauwer 2 1- Université catholique de Louvain - ICTEAM Institute B-1348

More information

Trust Regions. Charles J. Geyer. March 27, 2013

Trust Regions. Charles J. Geyer. March 27, 2013 Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

On the convergence of higher-order orthogonality iteration and its extension

On the convergence of higher-order orthogonality iteration and its extension On the convergence of higher-order orthogonality iteration and its extension Yangyang Xu IMA, University of Minnesota SIAM Conference LA15, Atlanta October 27, 2015 Best low-multilinear-rank approximation

More information

Coupled Matrix/Tensor Decompositions:

Coupled Matrix/Tensor Decompositions: Coupled Matrix/Tensor Decompositions: An Introduction Laurent Sorber Mikael Sørensen Marc Van Barel Lieven De Lathauwer KU Leuven Belgium Lieven.DeLathauwer@kuleuven-kulak.be 1 Canonical Polyadic Decomposition

More information

Parallel Tensor Compression for Large-Scale Scientific Data

Parallel Tensor Compression for Large-Scale Scientific Data Parallel Tensor Compression for Large-Scale Scientific Data Woody Austin, Grey Ballard, Tamara G. Kolda April 14, 2016 SIAM Conference on Parallel Processing for Scientific Computing MS 44/52: Parallel

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods

Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section 4 of

More information

Math 411 Preliminaries

Math 411 Preliminaries Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector

More information

Numerical Methods I Solving Nonlinear Equations

Numerical Methods I Solving Nonlinear Equations Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)

More information

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Jim Lambers MAT 610 Summer Session Lecture 2 Notes Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic

More information

Visual SLAM Tutorial: Bundle Adjustment

Visual SLAM Tutorial: Bundle Adjustment Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w

More information

A new truncation strategy for the higher-order singular value decomposition

A new truncation strategy for the higher-order singular value decomposition A new truncation strategy for the higher-order singular value decomposition Nick Vannieuwenhoven K.U.Leuven, Belgium Workshop on Matrix Equations and Tensor Techniques RWTH Aachen, Germany November 21,

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k

j=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

arxiv: v1 [math.oc] 22 May 2018

arxiv: v1 [math.oc] 22 May 2018 On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods Yu Bai Song Mei arxiv:1805.08756v1 [math.oc] 22 May 2018 May 23, 2018 Abstract We prove that a simple Sequential

More information

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee

DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Computing Periodic Orbits and their Bifurcations with Automatic Differentiation

Computing Periodic Orbits and their Bifurcations with Automatic Differentiation Computing Periodic Orbits and their Bifurcations with Automatic Differentiation John Guckenheimer and Brian Meloon Mathematics Department, Ithaca, NY 14853 September 29, 1999 1 Introduction This paper

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices

A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices ao Shen and Martin Kleinsteuber Department of Electrical and Computer Engineering Technische Universität München, Germany {hao.shen,kleinsteuber}@tum.de

More information

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,

More information

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.

MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k. Problem du jour Week 6: Monday, Mar 7 Show that for any initial guess x 0 > 0, Newton iteration on f(x) = x 2 a produces a decreasing sequence x 1 x 2... x n a. What is the rate of convergence if a = 0?

More information

Least-Squares Fitting of Model Parameters to Experimental Data

Least-Squares Fitting of Model Parameters to Experimental Data Least-Squares Fitting of Model Parameters to Experimental Data Div. of Mathematical Sciences, Dept of Engineering Sciences and Mathematics, LTU, room E193 Outline of talk What about Science and Scientific

More information

Conditioning and illposedness of tensor approximations

Conditioning and illposedness of tensor approximations Conditioning and illposedness of tensor approximations and why engineers should care Lek-Heng Lim MSRI Summer Graduate Workshop July 7 18, 2008 L.-H. Lim (MSRI SGW) Conditioning and illposedness July 7

More information

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation

CS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80

More information

arxiv: v2 [math.oc] 31 Jan 2019

arxiv: v2 [math.oc] 31 Jan 2019 Analysis of Sequential Quadratic Programming through the Lens of Riemannian Optimization Yu Bai Song Mei arxiv:1805.08756v2 [math.oc] 31 Jan 2019 February 1, 2019 Abstract We prove that a first-order Sequential

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

A Multi-Affine Model for Tensor Decomposition

A Multi-Affine Model for Tensor Decomposition Yiqing Yang UW Madison breakds@cs.wisc.edu A Multi-Affine Model for Tensor Decomposition Hongrui Jiang UW Madison hongrui@engr.wisc.edu Li Zhang UW Madison lizhang@cs.wisc.edu Chris J. Murphy UC Davis

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

1 Linearity and Linear Systems

1 Linearity and Linear Systems Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

More information

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition

ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University

More information

Introduction, basic but important concepts

Introduction, basic but important concepts Introduction, basic but important concepts Felix Kubler 1 1 DBF, University of Zurich and Swiss Finance Institute October 7, 2017 Felix Kubler Comp.Econ. Gerzensee, Ch1 October 7, 2017 1 / 31 Economics

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

On Lagrange multipliers of trust-region subproblems

On Lagrange multipliers of trust-region subproblems On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008

More information

Derivative-Free Trust-Region methods

Derivative-Free Trust-Region methods Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32 Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework

More information

Normed & Inner Product Vector Spaces

Normed & Inner Product Vector Spaces Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed

More information

Kronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm

Kronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm Kronecker Product Approximation with Multiple actor Matrices via the Tensor Product Algorithm King Keung Wu, Yeung Yam, Helen Meng and Mehran Mesbahi Department of Mechanical and Automation Engineering,

More information

Full-State Feedback Design for a Multi-Input System

Full-State Feedback Design for a Multi-Input System Full-State Feedback Design for a Multi-Input System A. Introduction The open-loop system is described by the following state space model. x(t) = Ax(t)+Bu(t), y(t) =Cx(t)+Du(t) () 4 8.5 A =, B =.5.5, C

More information

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Implicitly and Explicitly Constrained Optimization Problems for Training of Recurrent Neural Networks

Implicitly and Explicitly Constrained Optimization Problems for Training of Recurrent Neural Networks Implicitly and Explicitly Constrained Optimization Problems for Training of Recurrent Neural Networks Carl-Johan Thore Linköping University - Division of Mechanics 581 83 Linköping - Sweden Abstract. Training

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

Computational Linear Algebra

Computational Linear Algebra Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 6: Some Other Stuff PD Dr.

More information

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...

More information

Fundamentals of Multilinear Subspace Learning

Fundamentals of Multilinear Subspace Learning Chapter 3 Fundamentals of Multilinear Subspace Learning The previous chapter covered background materials on linear subspace learning. From this chapter on, we shall proceed to multiple dimensions with

More information

Matrix Derivatives and Descent Optimization Methods

Matrix Derivatives and Descent Optimization Methods Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign

More information

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems

Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems Lars Eldén Department of Mathematics, Linköping University 1 April 2005 ERCIM April 2005 Multi-Linear

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

REPORTS IN INFORMATICS

REPORTS IN INFORMATICS REPORTS IN INFORMATICS ISSN 0333-3590 A class of Methods Combining L-BFGS and Truncated Newton Lennart Frimannslund Trond Steihaug REPORT NO 319 April 2006 Department of Informatics UNIVERSITY OF BERGEN

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review

10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review 10.34 Numerical Methods Applied to Chemical Engineering Fall 2015 Quiz #1 Review Study guide based on notes developed by J.A. Paulson, modified by K. Severson Linear Algebra We ve covered three major topics

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

1 Non-negative Matrix Factorization (NMF)

1 Non-negative Matrix Factorization (NMF) 2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,

More information

Line Search Algorithms

Line Search Algorithms Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you

More information

Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions

Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions 260 Journal of Advances in Applied Mathematics, Vol. 1, No. 4, October 2016 https://dx.doi.org/10.22606/jaam.2016.14006 Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum

More information

Low Rank Approximation Lecture 7. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL

Low Rank Approximation Lecture 7. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Alternating least-squares / linear scheme General setting:

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005 3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Lecture 7 Unconstrained nonlinear programming

Lecture 7 Unconstrained nonlinear programming Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem

Fitting a Tensor Decomposition is a Nonlinear Optimization Problem Fitting a Tensor Decomposition is a Nonlinear Optimization Problem Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda* Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia

More information

CHAPTER 10: Numerical Methods for DAEs

CHAPTER 10: Numerical Methods for DAEs CHAPTER 10: Numerical Methods for DAEs Numerical approaches for the solution of DAEs divide roughly into two classes: 1. direct discretization 2. reformulation (index reduction) plus discretization Direct

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Algunos ejemplos de éxito en matemáticas computacionales

Algunos ejemplos de éxito en matemáticas computacionales Algunos ejemplos de éxito en matemáticas computacionales Carlos Beltrán Coloquio UC3M Leganés 4 de febrero de 2015 Solving linear systems Ax = b where A is n n, det(a) 0 Input: A M n (C), det(a) 0. b C

More information

Chapter 8 Structured Low Rank Approximation

Chapter 8 Structured Low Rank Approximation Chapter 8 Structured Low Rank Approximation Overview Low Rank Toeplitz Approximation Low Rank Circulant Approximation Low Rank Covariance Approximation Eculidean Distance Matrix Approximation Approximate

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Caroline Chaux Joint work with X. Vu, N. Thirion-Moreau and S. Maire (LSIS, Toulon) Aix-Marseille

More information

Scientific Computing: Dense Linear Systems

Scientific Computing: Dense Linear Systems Scientific Computing: Dense Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 February 9th, 2012 A. Donev (Courant Institute)

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig

Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Towards a condition number theorem for the tensor rank decomposition by Paul Breiding and Nick Vannieuwenhoven Preprint no.: 3 208

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Key words. conjugate gradients, normwise backward error, incremental norm estimation. Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322

More information