Computing and decomposing tensors
|
|
- Benjamin Marsh
- 5 years ago
- Views:
Transcription
1 Computing and decomposing tensors Tensor rank decomposition Nick Vannieuwenhoven (FWO / KU Leuven)
2 Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
3 Sensitivity Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
4 Sensitivity Condition numbers Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
5 Sensitivity Condition numbers Sensitivity In numerical computations, the sensitivity of the output of a computation to perturbations at the input is extremely important. We have already seen an example: computing singular values appears to behave nicely with respect to perturbations.
6 Sensitivity Condition numbers Consider the matrix A = Computing the singular value decomposition ÛŜ V T of the floating-point representation à of A numerically using Matlab, we find A ÛŜ V T The singular values are numerical exact = = In all cases, we found 6 correct digits of the exact solution.
7 Sensitivity Condition numbers However, when comparing the computed left singular vector corresponding to σ = to the exact solution, we get numerical exact We have only recovered digits correctly, even though the matrix ÛŜ V T contains at least 5 correct digits of each entry. How is this possible?
8 Sensitivity Condition numbers We say that the problem of computing the singular values has a different sensitivity to perturbations than the computational problem of computing the left singular vectors. Assuming the singular values are distinct, these problems can be modeled as functions f : F m n F min{m,n}, respectively f : F m n F m min{m,n}. What we have observed above is that 0.4 f (x) f (x + δx) δx f (x) f (x + δx) δx 800 at least x = A and δx = Ã A (with δx ).
9 Sensitivity Condition numbers Condition numbers The condition number quantifies the worst-case sensitivity of f to perturbations of the input. f (y) κɛ x ɛ y f (x) f (y) f (x) κ[f ](x) := lim sup ɛ 0 y x. y B ɛ(x)
10 Sensitivity Condition numbers If f : F M X Y F N is a differentiable function, then the condition number is fully determined by the first-order approximation of f. Indeed, in this case we have f (x + ) = f (x) + J + o( ), where J is the Jacobian matrix containing all first-order partial derivatives. Then, κ = lim sup ɛ 0 ɛ J = max = = J. f (x) + J + o( ) f (x)
11 Sensitivity Condition numbers More generally, for manifolds, we can apply Rice s (966) geometric framework of conditioning: Proposition (Rice, 966) Let X F m be a manifold of inputs and Y F n a manifold of outputs with dim X = dim Y. Then, the condition number of F : X Y at x 0 X is κ[f ](x 0 ) = d x0 F = sup d x0 F (x), x = where d x0 F : T x0 X T F (x0 )Y is the derivative. See, e.g., Blum, Cucker, Shub, and Smale (998) or Bürgisser and Cucker (03) for a more modern treatment.
12 Sensitivity Tensor rank decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
13 Sensitivity Tensor rank decomposition The tensor rank decomposition problem In the remainder, S denotes the smooth manifold of rank- tensors in R n n d, called the Segre manifold. To determine the condition number of computing a CPD, we analyze the addition map: Φ r : S S R n n d (A,..., A r ) A + + A r Note that the domain and codomain are smooth manifolds.
14 Sensitivity Tensor rank decomposition From differential geometry, we know that Φ r is a local diffeomorphism onto its image if the derivative d x Φ r is injective, i.e., has maximal rank equal to r dim S. Let a = (A,..., A r ) (S ) r and A = Φ r (a). If d Φr a is injective, then by the Inverse Function Theorem there exists a local inverse function Φ a : E F, where E Im(Φ r ) is an open neighborhood of A, F (S ) r is an open neighborhood of a, and Φ r Φ a = Id E and Φ a Φ r = Id F. In other words, Φ a computes one particular ordered CPD of A. This function has condition number κ[φ a ](A) = d A Φ a = (d a Φ r ).
15 Sensitivity Tensor rank decomposition The derivative d a Φ is seen to be the map d a Φ : T A S T Ar S T A R n n d (Ȧ,..., Ȧ r ) Ȧ + + Ȧ r. Hence, if U i is an orthonormal basis of T Ai S T Ai R n n d, then the map is represented in coordinates as the matrix U = [ U U U r ] R n n d r dim S Summarizing, if we are given an ordered CPD a of A, then the condition number of computing this ordered CPD may be computed as the inverse of the smallest singular value of U.
16 Sensitivity Tensor rank decomposition From the above derivation, it also follows that where κ[φ a ](A) = κ[φ ](A) a = (A π,..., A πr ) for every permutation π S r. Hence, the above gives us the condition number of the tensor decomposition problem at the CPD {A,..., A r }. We write κ({a,..., A r }) := κ[φ a ](A). a
17 Sensitivity Tensor rank decomposition Interpretation If A = A + + A r = B = B + + B r = r a i a d i i= r b i b d i are tensors in R n n d, then for A B F 0 we have the asymptotically sharp bound min r A i B πi π S F κ({a,..., A r }) A B r }{{}}{{ F } i= }{{} condition number backward error forward error i=
18 Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
19 Pencil-based algorithms Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
20 Pencil-based algorithms Pencil-based algorithms A pencil-based algorithm (PBA) is an algorithm for computing the unique rank-r CPD of a tensor A R n n n 3 with n n r. Let A = r a i b i c i, where a i =. i= The matrices A = [a i ], B = [b i ] and C = [c i ] of A are called factor matrices.
21 Pencil-based algorithms Fix a matrix Q R n 3 with two orthonormal columns. The key step in a PBA is the projection step B := (I, I, Q T ) A = r i= a i b i Q T c }{{} i, z i yielding an n n tensor B whose factor matrices are (A, B, Z). Thereafter, we compute the specific orthogonal Tucker decomposition B = (Q, Q, I ) S = r (Q x i) (Q y i) z i, i= where x i = Q T a i and y i = Q T b i.
22 Pencil-based algorithms Let X = [x i / x i ] and Y = [y i / y i ]. Then, the core tensor S R r r has the following two 3-slices: S j = (I, I, e T j ) S = r λ j,i x i y i = X diag(λ j )Y T, j =,, i= where λ j := [z j,i x i y i ]r i= and z i = [z j,i ] j=. If S and S are nonsingular, then we see that S S = X diag(λ ) diag(λ ) X. Thus X is the r r matrix of eigenvectors of S S.
23 Pencil-based algorithms Recall that the factor matrices of B are (A, B, Z) and by definition A = Q X. The rank- tensors can then be recovered by noting that A () = r a i (b i c i ) T =: A(B C) T, i= where B C := [b i c i ] r i= Rn n 3 r. Since A is left invertible, we get A (A A () ) T = A (B C) = [a i (b i c i )] R n n n 3 r which is a matrix containing the rank- tensors of the CPD as columns.
24 Pencil-based algorithms Algorithm : Standard PBA Algorithm input : A tensor A R n n n 3, n n r, of rank r. output: Rank- tensors a i b i c i of the CPD of A. Sample a random n 3 matrix Q with orthonormal columns. B (I, I, Q T ) A; Compute a rank-(r, r, min{r, n 3 }) HOSVD (Q, Q, Q 3 ) S = A; S (I, I, e T ) S; S (I, I, e T ) S; Compute eigendecomposition S S = XDX ; A Q X ; A B C A (A A () ) T ;
25 Pencil-based algorithms Numerical issues A variant of this algorithm is implemented in Tensorlab v3.0 as cpd gevd. Let us perform an experiment with it. We create the first tensor that comes to mind: a rank-5 random tensor A of size 5 5 5: >> Ut{} = randn(5,5); >> Ut{} = randn(5,5); >> Ut{3} = randn(5,5); >> A = cpdgen(ut);
26 Pencil-based algorithms Compute A s decomposition and compare its distance to the input decomposition, relative to the machine precision ɛ 0 6 : >> Ur = cpd_gevd(a, 5); >> E = kr(ut) - kr(ur); >> norm( E(:), ) / eps ans = 8.649e+04 This large number can arise because of a high condition number. However, >> kappa = condition_number( Ut ) ans =.34
27 Pencil-based algorithms It thus appears that there is something wrong with the algorithm, even though it is a mathematically sound algorithm for computing low-rank CPDs. The reason is the following. Theorem (Beltrán, Breiding, V, 08) Many algorithms based on a reduction to R n n are numerically unstable: the forward error produced by the algorithm divided by the backward error is much larger than the condition number on an open set of inputs. These methods should be used with care as they do not necessarily yield the highest attainable precision.
28 Pencil-based algorithms The instability of the algorithm leads to an excess factor ω on top of the condition number of the computational problem: cpd_pba cpd_pba cpd_gevd cpd_pba cpd_pba cpd_gevd Test with random rank- tuples.
29 Alternating least squares methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
30 Alternating least squares methods Alternating least squares methods The problems of computing a CPD of A F n n d and the problem of approximating A by a (low-rank) CPD can be formulated as an optimization problem. For example, min A k F n k r A (A A A d ) F. k=,...,d One of the earliest methods devised specifically for this problem is the alternating least squares (ALS) scheme.
31 Alternating least squares methods The ALS method is based on a reformulation of the objective function; it is equivalent to min A j F n k r, A (k) A k (A A k A k+ A d ) T. j=,...,d for every k =,,..., d. The key observation is that if A j, j k, are fixed, then finding the optimal A k is a linear problem! It is namely A (k) ((A A k A k+ A d ) T ).
32 Alternating least squares methods The standard ALS method then solves the optimization problem by cyclically fixing all but one factor matrix A k. Algorithm : ALS method input : A tensor A F n n d. input : A target rank r. output: Factor matrices (A,..., A d ) of a CPD approximating A. Initialize factor matrices A k F n k r (e.g., entries sampled i.i.d. from N(0, ), or truncated HOSVD); while Not converged do for k =,,..., d do  k A A k A k+ A d ; A k A (k) ( k )T ; end end
33 Alternating least squares methods The ALS scheme produces a sequence of incrementally better approximations. However, the convergence properties of the ALS method are very poorly understood. The scheme has accumulation points, but it is not known if they correspond to critical points of the objective function. That is, we do not know if the accumulation points of the ALS scheme correspond to points satisfying the first-order optimality conditions. Uschmajew (0) proved local convergence to critical points where the Hessian is positive semi-definite and of maximal rank.
34 Quasi-Newton optimization methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
35 Quasi-Newton optimization methods Quasi-Newton optimization methods The optimization problem min A k F n k r, A (A A A d ). k=,...,d can be solved using any of the standard optimization methods, such as nonlinear conjugate gradient and quasi Newton methods. We discuss the Gauss Newton method with trust region as globalization method.
36 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea
37 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea
38 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea
39 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea
40 Quasi-Newton optimization methods Recall that the univariate Newton method is based on the following idea
41 Quasi-Newton optimization methods Newton s method for minimizing a smooth real function f (x) consists of generating a sequence of iterates x 0, x,... where x k+ is the optimal solution of the local second-order Taylor series expansion of f (x), namely where f (x k + p) m xk (p) := f (x k ) + p T g k + pt H k p g k := f (x k ) is the gradient of f at x k, and H k := f (x k ) is the symmetric Hessian matrix of f at x k.
42 Quasi-Newton optimization methods Recall from calculus that the gradient f : R n R with coordinates x i on R n is x f (y) x f (y) := f (y).. x n f (y) The Hessian matrix is x x f (y) x x f (y) x x n f (y) f (y) := x x f (y) x x f (y) x x n f (y).... x n x f (y) x n x f (y) x n x n f (y)
43 Quasi-Newton optimization methods The optimal search direction p according to the second-order model should satisfy the first-order optimality conditions. That is, Hence, 0 = m xk (p) = g k + H k p p = H k g k; this is called the Newton search direction.
44 Quasi-Newton optimization methods Hence, the basic Newton method is obtained. Algorithm 3: Newton s method input : An objective function f. input : A starting point x 0 R n. output: A critical point x of the objective function f k 0; while Not converged do Compute the gradient g k = f (x k ); Compute the Hessian H k = f (x k ); p k H k g k; x k+ x k + p k ; k k + ; end
45 Quasi-Newton optimization methods Newton s method is locally quadratically convergent: if the initial iterate x 0 is sufficiently close to the solution, then x x k+ = O ( x x k ). This means that close to the solution the number of correct digits doubles every step.
46 Quasi-Newton optimization methods For example, Newton s method applied to [ ] [ ] f (x, y) = y 4 x xy converges to the root at (, ) starting from x 0 = (3, 3) x xk k
47 Quasi-Newton optimization methods While Newton s method has great local convergence properties, the plain version is not suitable because: it has no guaranteed global convergence, and the Hessian matrix can be difficult to compute. These problems are addressed respectively by incorporating a trust region scheme, and using a cheap approximation of the Hessian matrix.
48 Quasi-Newton optimization methods The Gauss Newton Hessian approximation A cheap approximation of the Hessian matrix is available for nonlinear least squares problems. In this case, the objective function takes the form f (x) = F (x) = F (x), F (x), where F : Rn R m.
49 Quasi-Newton optimization methods The gradient of a least-squares objective function f at x is f (x) = ( J F (x) ) T F (x), where J F (x) R m n is the Jacobian matrix of F. That is, if F k (x) denotes the kth component function of F, then x F (x) x J F (x) := F (x). x F m (x) x F (x) x F (x) x n F (x) x n F (x)... x F m (x) x n F m (x)
50 Quasi-Newton optimization methods The Hessian matrix of f is f (x) = ( J F (x) ) T ( JF (x) ) + dj F (x), F (x). Near a solution, we hope to have F (x ) 0, so that the last term often has a negligible contribution. This reasoning leads to the Gauss Newton approximation ( JF (x) ) T ( JF (x) ) f (x). Replacing the Hessian with the Gauss Newton approximation yields local linear convergence. If f (x ) = 0 at a solution, then the local convergence is quadratic.
51 Quasi-Newton optimization methods Trust region globalization The method sketched thus far has no global convergence guarantees. Furthermore, the Gauss Newton approximation of the Hessian could be very ill-conditioned resulting in large updates. The trust region globalization scheme can solve both of these problems. Let J k := J F (x k ). The idea is to trust the local second-order model at x k, m xk (p) = f (x k ) + p T g k + pt J T k J kp, only in a small neighborhood around x k.
52 Quasi-Newton optimization methods Instead of taking the unconstrained minimizer of m(x k + p), a trust region method solves the trust region subproblem: min p R n m x k (p) subject to p k, where k > 0 is the trust region radius. p x k
53 Quasi-Newton optimization methods The trust region radius is modified in every step according to the following scheme. Let the computed update direction be p k, with p k k. The trustworthiness of the second order model is defined as ρ k = f (x k) f (x k + p k ) m xk (0) m xk (p k ). If the trustworthiness ρ k > 0 is very high (e.g., ρ k 0.75) and if in addition p k k, then the trust region radius is increased (e.g, k+ = k ). On the other hand, if ρ k β is very low (e.g., ρ k 0.5), then the trust region radius is decreased (e.g., k+ = k /4).
54 Quasi-Newton optimization methods The trustworthiness ρ k is also used to decide whether or not to accept a step in the direction of the computed p k. If ρ k γ β is very small (e.g., ρ k 0.), then the search direction p k is rejected. Otherwise, p k is accepted as a good direction, and we set x k+ = x k + p k.
55 Quasi-Newton optimization methods For (approximately) solving the trust region subproblem min m p R n x k (p) subject to p k, one can exploit the following fact. If the unconstrained minimizer p k = (JT k J k) g k = (J T k J k) J T k r k = J k r k where r k := F (x k ), falls within the trust region, p k k then this is the optimal solution of the trust region subproblem. Otherwise, there exists a λ > 0 such that the optimal solution p k satisfies (J T k J k + λi )p k = JT k x r with p k = k. Nocedal and Wright (006) discuss strategies for finding λ.
56 Quasi-Newton optimization methods Several variations of this quasi Newton method with trust region globalization are implemented in Tensorlab as cpd nls. See Sorber, Van Barel, De Lathauwer (03) for details.
57 Riemannian quasi Newton optimization methods Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
58 Riemannian quasi Newton optimization methods Riemannian optimization An alternative way to formulate the approximation of a tensor by a low-rank CPD consists of optimizing over a product of Segre manifolds: min Φ r (A,..., A r ) A F. (A,...,A r ) (S S ) This is an optimization of a differentiable function, over a smooth manifold. These problems are studied in Riemannian optimization.
59 Riemannian quasi Newton optimization methods In general, if M R N is an m-dimensional smooth manifold and F : M R n a smooth function, then min x M F (x) is a Riemannian optimization problem that can be solved by, e.g., a Riemannian Gauss Newton method; see Absil, Mahoney, Sepulchre (008).
60 Riemannian quasi Newton optimization methods In a RGN method, the objective function f (x) = Φ r (x) A is locally approximated at a S r by the quadratic model where m a (t) := f (a) + d a f, t + t, (d p Φ r d a Φ r )(t), H a := d a Φ r d a Φ r is the GN Hessian approximation, and, is the inner product inherited from the ambient R N. Note that the domain of m a is now T a S r.
61 Riemannian quasi Newton optimization methods As before, the RGN method with trust region considers the model to be accurate only in a radius about a. p a The trust region subproblem (TRS) is min m a (t) subject to t, t T as r whose solution p T a S r yields the next search direction.
62 Riemannian quasi Newton optimization methods In Breiding, V (08), the TRS is solved by combining a standard dogleg step with a hot restarting scheme. Let g a be the coordinate representation of d a f, and let H a be the matrix of d a Φ r d a Φ r. The dogleg step approximates the solution p of the TRS by p N = H ag a if p N p = p C = gt a Haga g ga T g a if p N > and p C a p I := p C + (τ )(p N p C ) s.t. p I =, otherwise where τ is the solution of p C + (τ )(p N p C ) =..
63 Riemannian quasi Newton optimization methods The Newton direction p N = H ag a. is vital to the dogleg step. Unfortunately, the H a = d a Φ r d a Φ r can be close to a singular matrix. In fact, where m = dim S r. H a = =: κ(a), ς m (d a Φ r ) H a is ill-conditioned if and only if the CPD is ill-conditioned at a. Whenever H a is close to a singular matrix we suggest to apply random perturbations to the current decomposition a until H a is sufficiently well-behaved. We call this hot restarting.
64 Riemannian quasi Newton optimization methods Retraction We need to advance from a S r to a S r, along the direction p. However, while a + p T a S r, this point does not lie in S r! T a S r a p R a (p) S r We need a retraction operator (Absil, Mahoney, Sepulchre, 008) for smoothly mapping a neighborhood of 0 T a S r back to S r.
65 Riemannian quasi Newton optimization methods Given a retraction operator R for S, a retraction operator R for the product manifold S r = S S at a = (A,..., A r ) is R a ( ) := (R A R A R A r )( ), which is called the product retraction. Some known retraction operators for S are the rank-(,..., ) T-HOSVD, and the rank-(,..., ) ST-HOSVD, both essentially proved by (Kressner, Steinlechner, Vandereycken, 04).
66 Riemannian quasi Newton optimization methods RGN with trust region method: S. Choose random initial points A i S. S. Let a () (A,..., A r ), and set k 0. S3. Choose a trust region radius > 0. S4. While not converged, do: S4.. Solve the trust region subproblem, resulting in p k T a S r. S4.. Compute the tentative next iterate a (k+) R a (k)(p k ) via a retraction in the direction of p k from p (k). S4.3. Accept or reject the next iterate. If the former, increment k. S4.4. Update the trust region radius. The details can be found in Breiding, V (08).
67 Riemannian quasi Newton optimization methods Numerical experiments We compare the RGN method of (Breiding, V, 08) with some state-of-the-art nonlinear least squares solvers in Tensorlab v3.0 (Vervliet et al., 06), namely nls lm and nls gndl, both with the LargeScale option turned off and on.
68 Riemannian quasi Newton optimization methods We consider parameterized tensors in R n n n 3 with varying condition numbers. There are three parameters: c [0, ] regulates the colinearity of the factor matrices s regulates the scaling, and 3 r is the rank. Typically, increasing c increases the geometric condition number. increasing s increases the classic condition number. 3 increasing r decreases the probability of finding a decomposition. See the afternotes for the precise construction.
69 Riemannian quasi Newton optimization methods The true rank-r tensor is then A = r a i a i a 3 i. i= Finally, we normalize the tensor and add random Gaussian noise E R n n n 3 of magnitude τ: B = A E + τ. A F E F The tensor B is the one we would like to approximate by a tensor of rank r.
70 Riemannian quasi Newton optimization methods We will choose k random starting points and then apply each of the methods to each of the starting points. The key performance criterion (on a single processor) is the expected time to success (TTS). Let the probability of success be p S, the probability of failure be p F = p S, 3 a successful decomposition take m S seconds, and 4 a failed decomposition take m F seconds. Then, the expected time to a first success is E[TTS] = k=0 p k F p S (m S + (k )m F ) = p Sm S + p F m F. p S
71 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 3
72 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 5
73 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, tensors r = 5 r = 0 r = 5 r = 30 s RGN-Reg s GNDL s GNDL-PCG c c c c noise level τ = 0 7
74 Riemannian quasi Newton optimization methods Speedup of RGN-HR Model, 3 9 tensors GNDL GNDL-PCG s failed inf failed inf inf r r noise level τ = 0 5
75 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
76 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
77 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
78 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
79 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
80 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
81 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
82 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
83 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
84 Riemannian quasi Newton optimization methods Convergence plots Model, rank 7, scaling s =, noise level τ = 0 5 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
85 Tensorlab Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
86 Tensorlab Tensorlab is a package facilitating computations with both structured and unstructured tensors in Matlab. Development on Tensorlab v.0 started in 0 at KU Leuven by the research group of L. De Lathauwer. The current version, v3.0, was released in March 06. Version 4.0 is anticipated next week!
87 Tensorlab Tensorlab focuses on interpretable models, offering algorithms for working with Tucker decompositions, tensor rank decompositions, and certain block term decompositions. In addition, it offers a flexible framework called structured data fusion in which the various decompositions can be coupled or fused while imposing additional structural constraints, such as symmetry, sparsity and nonnegativity.
88 Tensorlab Basic operations Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
89 Tensorlab Basic operations A dense tensor is represented as a plain Matlab array: A = randn ( 3, 5, 7, ) ; A ( :, :, :, : ) ans ( :, :,, ) = ans ( :, :,, ) = ans ( :, :,, ) = ans ( :, :,, ) =
90 Tensorlab Basic operations The Frobenius norm of a tensor is computed as follows. A = reshape ( : 0 0, [ 5 0 ] ) ; frob (A) ans = frob (A, squared ) 00*0*0/6 ans = 0
91 Tensorlab Basic operations Flattenings are computed as follows. A = reshape ( : 4, [ 3 4 ] ) ; % mode f l a t t e n i n g tensmat (A, [ ], [ 3 ] ) ans = % mode f l a t t e n i n g tensmat (A, [ ], [ ] ) ans =
92 Tensorlab Basic operations Multilinear multiplications are computed as follows. A = randn ( 5, 5, 5 ) ; U = randn ( 5, 5 ) ; U = randn ( 5, 5 ) ; U3 = randn ( 5, 5 ) ; T = tmprod (A, {U, U, U3}, : 3 ) ; X = tmprod (A, {U, U3}, : 3, H ) ; X = tmprod (A, {U, U3 }, : 3 ) ; frob (X X) ans = 0
93 Tensorlab Basic operations Tensorlab has a nice way for visualizing third-order tensors. U = {. 5. ˆ ( 0 : 5 ), l i n s p a c e ( 0,, 5 0 ), * abs ( s i n ( l i n s p a c e (0,4* pi, 5 0 ) ) ) } ; A = cpdgen (U ) ; voxel3 (A) produces the following graphic:
94 Tensorlab Tucker decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
95 Tensorlab Tucker decomposition Given a core tensor and the factor matrices, Tensorlab can generate the full tensor represented by this Tucker decomposition as follows. S = randn ( 5, 5, 5 ) ; U = {randn ( 4, 5 ), randn ( 6, 5 ), randn ( 7, 5 ) } ; T = lmlragen (U, S ) ; s i z e (T) ans = % Compare with d e f i n i t i o n T = tmprod (S, U, : 3 ) ; frob (T T) ans =.435 e 5
96 Tensorlab Tucker decomposition The HOSVD is called the multilinear singular value decomposition in Tensorlab and can be computed as follows. [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C ) ; [ U, S, sv ] = mlsvd (A ) ; % 0.3 s
97 Tensorlab Tucker decomposition The factor-k singular values can be plotted by running f o r k = : 4, semilogy ( sv { k }, x ), hold a l l, end legend ( f a c t o r, f a c t o r, f a c t o r 3, f a c t o r 4 ) which produces the graphic
98 Tensorlab Tucker decomposition Of practical importance are truncated HOSVDs. The default strategy in Tensorlab is sequential truncation, while parallel truncation is provided as an option. [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C) + e 5*randn ( [ ] ) ; % S e q u e n t i a l t r u n c a t i o n [ U, S ] = mlsvd (A, [ ] ) ; % 6.5 s [ U, S ] = mlsvd (A, 5e, : 3 ) ; % 4.6 s % P a r a l l e l t r u n c a t i o n [ V, T ] = mlsvd (A, [ ], 0 ) ; % s [ V, T ] = mlsvd (A, 5e, 0 ) ; % s
99 Tensorlab Tucker decomposition Tensorlab also offers optimization algorithms for seeking the optimal low multilinear rank approximation of a tensor. By default Tensorlab chooses an initial point by either a fast approximation to the ST-HOSVD via randomized SVDs (mlsvd rsi), or by adaptive cross approximation (lmlra aca). The basic usage is as follows: [ F, C ] = lmlra rnd ( [ ], [ ] ) ; A = lmlragen (F, C ) ; % Compute l m l r a u s i n g d e f a u l t s e t t i n g s. [ U, S ] = lmlra (A, [ ] ) ;
100 Tensorlab Tensor rank decomposition Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
101 Tensorlab Tensor rank decomposition Tensorlab represents a CPD by a cell array containing its factor matrices. For example, random factor matrices can be generated as follows: s i z e T e n s o r = [ ] ; rnk = 5 ; F = cpd rnd ( s i z e T e n s o r, rnk ) F = x3 c e l l a r r a y [ 0 x5 double ] [ x5 double ] [ 8 7 x5 double ] The tensor represented by these factor matrices can be generated as follows: A = cpdgen (F ) ;
102 Tensorlab Tensor rank decomposition Tensorlab offers several algorithms for computing CPDs. A deterministic but unstable PBA that can be applied if the rank is smaller than two of the dimensions is cpd gevd. tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; af = cpd gevd (A, 5 ) ; frobcpdres (A, af ) ans =.966 e Warning cpd gevd returns random factor matrices if the assumptions of the method are not satisfied.
103 Tensorlab Tensor rank decomposition Several optimization methods are implemented in Tensorlab for approximating a tensor by a low-rank CPD. The advised way for computing a CPD is via the driver routine cpd, which automatically performs several steps: optional Tucker compression, choice of initialization (cpd gevd if possible, otherwise random), 3 optional decompression and refinement if compression was applied.
104 Tensorlab Tensor rank decomposition The basic usage is as follows: tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; af = cpd (A, 5 ) ; frobcpdres (A, af ) ans = 5.09 e 4
105 Tensorlab Tensor rank decomposition Optionally, some algorithm options can be specified. The most important options are the following. Compression: the string auto, the boolean false, or a function handle to the Tucker compression algorithm, for example mlsvd, lmlra aca or mlsvd rsi. Initialization: the string auto, or a function handle to the initialization method that should be employed, for example cpd rnd or cpd gevd. Algorithm: a function handle to the optimization method for computing the tensor rank decomposition, for example cpd als, cpd nls or cpd minf. AlgorithmOptions: a structure containing the options to for the optimization method.
106 Tensorlab Tensor rank decomposition The algorithm options are important in their own right, as they determine how much computational effort is spent and how accurate the returned solution will be. The main options are the following. TolAbs: The tolerance for the squared error between the tensor represented by the current factor matrices and the given tensor. If it is less than this value, the optimization method halts successfully. TolFun: The tolerance for the relative change in function value. If it is less than this value, the optimization method halts successfully. TolX: The tolerance for the relative change in factor matrices. If it is less than this value, the optimization method halts successfully. MaxIter: The maximum number of iterations the optimization method will perform before giving up.
107 Tensorlab Tensor rank decomposition For example, a l g O p t i o n s = s t r u c t ; a l g O p t i o n s. TolAbs = e ; a l g O p t i o n s. TolFun = e ; a l g O p t i o n s. TolX = e 8; a l g O p t i o n s. MaxIter = 5 00; a l g O p t i o n s. Algorithm n l s g n d l ; o p t i o n s = s t r u c t ; o p t i o n s. Compression = f a l s e ; o p t i o n s. I n i t i a l i z a t i o n rnd ; o p t i o n s. Algorithm c p d n l s ; o p t i o n s. A l g o r i t h m O p t i o n s = a l g O p t i o n s ;
108 Tensorlab Tensor rank decomposition % C r e a t e an e a s y problem tf = cpd rnd ( [ ], 5 ) ; A = cpdgen ( tf ) ; % S o l v e u s i n g o p t i o n s [ af, out ] = cpd (A, 5, o p t i o n s ) ; frobcpdres (A, af ) ans = e 0
109 Tensorlab Tensor rank decomposition You can also ask for detailed information about the progression of the optimization method. out. Algorithm ans = s t r u c t with f i e l d s : Name : c p d n l s a l p h a : [ ] c g i t e r a t i o n s : [ ] c g r e l r e s : [ e e e 07] d e l t a : [ ] f v a l : [ e e e ] i n f o : 4 % A b s o l u t e t o l e r a n c e r e a c h e d i n f o p s : [ ] i t e r a t i o n s : 5 r e l e r r : 5.9 e 4 r e l f v a l : [ e e 5] r e l s t e p : [ e 08] rho : [ ]
110 References Overview Sensitivity Condition numbers Tensor rank decomposition Pencil-based algorithms Alternating least squares methods Quasi-Newton optimization methods Riemannian quasi Newton optimization methods 3 Tensorlab Basic operations Tucker decomposition Tensor rank decomposition 4 References
111 References References for sensitivity Bürgisser, Cucker, Condition: The Geometry of Numerical Algorithms, Springer, 03. Blum, Cucker, Shub, Smale, Complexity and Real Computation, Springer, 998. Breiding, Vannieuwenhoven, The condition number of join decompositions, SIAM Journal on Matrix Analysis, 07. Rice, A theory of condition, SIAM Journal on Numerical Analysis, 966.
112 References References for algorithms Absil, Mahony, Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 008. Beltrán, Breiding, Vannieuwenhoven, Pencil-based algorithms for tensor decomposition are unstable, arxiv: , 08. Breiding, Vannieuwenhoven, A Riemannian trust region method for the canonical tensor rank approximation problem, SIAM Journal on Optimization, 08. Kolda, Bader, Tensor decompositions and applications, SIAM Review, 008. Kressner, Steinlechner, Vandereycken, Low-Rank tensor completion by Riemannian optimization, BIT, 04. Leurgans, Ross, Abel, A decomposition for three-way arrays, SIAM Journal on Matrix Analysis and Applications, 993. Nocedal, Wright, Numerical Optimization, Springer, 006.
113 References References for algorithms Sorber, Van Barel, De Lathauwer, Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(l r, L r, ) terms, and a new generalization, SIAM Journal on Optimization, 03. Uschmajew, Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM Journal on Matrix Analysis and Applications, 0.
114 References References for Tensorlab Sidiropoulos, De Lathauwer, Fu, Huang, Papalexakis, Faloutsos, Tensor decompositions for signal processing and machine learning, IEEE Transactions on Signal Processing, 07. Sorber, Van Barel, De Lathauwer, Optimization-based algorithms for tensor decompositions: Canonical polyadic decomposition, decomposition in rank-(l r, L r, ) terms, and a new generalization, SIAM Journal on Optimization, 03. Vervliet, Debals, Sorber, Van Barel, De Lathauwer, Tensorlab v3.0, Vervliet, De Lathauwer, Numerical optimization based algorithms for data fusion, 08.
Pencil-based algorithms for tensor rank decomposition are unstable
Pencil-based algorithms for tensor rank decomposition are unstable September 13, 2018 Nick Vannieuwenhoven (FWO / KU Leuven) Introduction Overview 1 Introduction 2 Sensitivity 3 Pencil-based algorithms
More informationMax-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig
Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Convergence analysis of Riemannian GaussNewton methods and its connection with the geometric condition number by Paul Breiding and
More informationDealing with curse and blessing of dimensionality through tensor decompositions
Dealing with curse and blessing of dimensionality through tensor decompositions Lieven De Lathauwer Joint work with Nico Vervliet, Martijn Boussé and Otto Debals June 26, 2017 2 Overview Curse of dimensionality
More informationA randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors
A randomized block sampling approach to the canonical polyadic decomposition of large-scale tensors Nico Vervliet Joint work with Lieven De Lathauwer SIAM AN17, July 13, 2017 2 Classification of hazardous
More information/16/$ IEEE 1728
Extension of the Semi-Algebraic Framework for Approximate CP Decompositions via Simultaneous Matrix Diagonalization to the Efficient Calculation of Coupled CP Decompositions Kristina Naskovska and Martin
More informationDecompositions of Higher-Order Tensors: Concepts and Computation
L. De Lathauwer Decompositions of Higher-Order Tensors: Concepts and Computation Lieven De Lathauwer KU Leuven Belgium Lieven.DeLathauwer@kuleuven-kulak.be 1 L. De Lathauwer Canonical Polyadic Decomposition
More informationThe multiple-vector tensor-vector product
I TD MTVP C KU Leuven August 29, 2013 In collaboration with: N Vanbaelen, K Meerbergen, and R Vandebril Overview I TD MTVP C 1 Introduction Inspiring example Notation 2 Tensor decompositions The CP decomposition
More informationA variable projection method for block term decomposition of higher-order tensors
A variable projection method for block term decomposition of higher-order tensors Guillaume Olikier 1, P.-A. Absil 1, and Lieven De Lathauwer 2 1- Université catholique de Louvain - ICTEAM Institute B-1348
More informationTrust Regions. Charles J. Geyer. March 27, 2013
Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationOn the convergence of higher-order orthogonality iteration and its extension
On the convergence of higher-order orthogonality iteration and its extension Yangyang Xu IMA, University of Minnesota SIAM Conference LA15, Atlanta October 27, 2015 Best low-multilinear-rank approximation
More informationCoupled Matrix/Tensor Decompositions:
Coupled Matrix/Tensor Decompositions: An Introduction Laurent Sorber Mikael Sørensen Marc Van Barel Lieven De Lathauwer KU Leuven Belgium Lieven.DeLathauwer@kuleuven-kulak.be 1 Canonical Polyadic Decomposition
More informationParallel Tensor Compression for Large-Scale Scientific Data
Parallel Tensor Compression for Large-Scale Scientific Data Woody Austin, Grey Ballard, Tamara G. Kolda April 14, 2016 SIAM Conference on Parallel Processing for Scientific Computing MS 44/52: Parallel
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationDELFT UNIVERSITY OF TECHNOLOGY
DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationLecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods
Lecture 7: CS395T Numerical Optimization for Graphics and AI Trust Region Methods Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section 4 of
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationJim Lambers MAT 610 Summer Session Lecture 2 Notes
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 2 Notes These notes correspond to Sections 2.2-2.4 in the text. Vector Norms Given vectors x and y of length one, which are simply scalars x and y, the
More informationRobust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds
Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationOutline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St
Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic
More informationVisual SLAM Tutorial: Bundle Adjustment
Visual SLAM Tutorial: Bundle Adjustment Frank Dellaert June 27, 2014 1 Minimizing Re-projection Error in Two Views In a two-view setting, we are interested in finding the most likely camera poses T1 w
More informationA new truncation strategy for the higher-order singular value decomposition
A new truncation strategy for the higher-order singular value decomposition Nick Vannieuwenhoven K.U.Leuven, Belgium Workshop on Matrix Equations and Tensor Techniques RWTH Aachen, Germany November 21,
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationj=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k
Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationarxiv: v1 [math.oc] 22 May 2018
On the Connection Between Sequential Quadratic Programming and Riemannian Gradient Methods Yu Bai Song Mei arxiv:1805.08756v1 [math.oc] 22 May 2018 May 23, 2018 Abstract We prove that a simple Sequential
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation
More informationScientific Computing
Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting
More informationInterpolation-Based Trust-Region Methods for DFO
Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv
More informationDUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM. Geunseop Lee
J. Korean Math. Soc. 0 (0), No. 0, pp. 1 0 https://doi.org/10.4134/jkms.j160152 pissn: 0304-9914 / eissn: 2234-3008 DUAL REGULARIZED TOTAL LEAST SQUARES SOLUTION FROM TWO-PARAMETER TRUST-REGION ALGORITHM
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationComputing Periodic Orbits and their Bifurcations with Automatic Differentiation
Computing Periodic Orbits and their Bifurcations with Automatic Differentiation John Guckenheimer and Brian Meloon Mathematics Department, Ithaca, NY 14853 September 29, 1999 1 Introduction This paper
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationA Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices
A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices ao Shen and Martin Kleinsteuber Department of Electrical and Computer Engineering Technische Universität München, Germany {hao.shen,kleinsteuber}@tum.de
More informationNumerical Methods I Solving Square Linear Systems: GEM and LU factorization
Numerical Methods I Solving Square Linear Systems: GEM and LU factorization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 18th,
More informationMATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.
MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationBindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.
Problem du jour Week 6: Monday, Mar 7 Show that for any initial guess x 0 > 0, Newton iteration on f(x) = x 2 a produces a decreasing sequence x 1 x 2... x n a. What is the rate of convergence if a = 0?
More informationLeast-Squares Fitting of Model Parameters to Experimental Data
Least-Squares Fitting of Model Parameters to Experimental Data Div. of Mathematical Sciences, Dept of Engineering Sciences and Mathematics, LTU, room E193 Outline of talk What about Science and Scientific
More informationConditioning and illposedness of tensor approximations
Conditioning and illposedness of tensor approximations and why engineers should care Lek-Heng Lim MSRI Summer Graduate Workshop July 7 18, 2008 L.-H. Lim (MSRI SGW) Conditioning and illposedness July 7
More informationCS 450 Numerical Analysis. Chapter 8: Numerical Integration and Differentiation
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationarxiv: v2 [math.oc] 31 Jan 2019
Analysis of Sequential Quadratic Programming through the Lens of Riemannian Optimization Yu Bai Song Mei arxiv:1805.08756v2 [math.oc] 31 Jan 2019 February 1, 2019 Abstract We prove that a first-order Sequential
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationA Multi-Affine Model for Tensor Decomposition
Yiqing Yang UW Madison breakds@cs.wisc.edu A Multi-Affine Model for Tensor Decomposition Hongrui Jiang UW Madison hongrui@engr.wisc.edu Li Zhang UW Madison lizhang@cs.wisc.edu Chris J. Murphy UC Davis
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition
ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University
More informationIntroduction, basic but important concepts
Introduction, basic but important concepts Felix Kubler 1 1 DBF, University of Zurich and Swiss Finance Institute October 7, 2017 Felix Kubler Comp.Econ. Gerzensee, Ch1 October 7, 2017 1 / 31 Economics
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationOn Lagrange multipliers of trust-region subproblems
On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008
More informationDerivative-Free Trust-Region methods
Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32 Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework
More informationNormed & Inner Product Vector Spaces
Normed & Inner Product Vector Spaces ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 27 Normed
More informationKronecker Product Approximation with Multiple Factor Matrices via the Tensor Product Algorithm
Kronecker Product Approximation with Multiple actor Matrices via the Tensor Product Algorithm King Keung Wu, Yeung Yam, Helen Meng and Mehran Mesbahi Department of Mechanical and Automation Engineering,
More informationFull-State Feedback Design for a Multi-Input System
Full-State Feedback Design for a Multi-Input System A. Introduction The open-loop system is described by the following state space model. x(t) = Ax(t)+Bu(t), y(t) =Cx(t)+Du(t) () 4 8.5 A =, B =.5.5, C
More informationIntroduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim
Introduction - Motivation Many phenomena (physical, chemical, biological, etc.) are model by differential equations. Recall the definition of the derivative of f(x) f f(x + h) f(x) (x) = lim. h 0 h Its
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationImplicitly and Explicitly Constrained Optimization Problems for Training of Recurrent Neural Networks
Implicitly and Explicitly Constrained Optimization Problems for Training of Recurrent Neural Networks Carl-Johan Thore Linköping University - Division of Mechanics 581 83 Linköping - Sweden Abstract. Training
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationComputational Linear Algebra
Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19 Part 6: Some Other Stuff PD Dr.
More informationLecture 4 - The Gradient Method Objective: find an optimal solution of the problem
Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem min{f (x) : x R n }. The iterative algorithms that we will consider are of the form x k+1 = x k + t k d k, k = 0, 1,...
More informationFundamentals of Multilinear Subspace Learning
Chapter 3 Fundamentals of Multilinear Subspace Learning The previous chapter covered background materials on linear subspace learning. From this chapter on, we shall proceed to multiple dimensions with
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationMulti-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems
Multi-Linear Mappings, SVD, HOSVD, and the Numerical Solution of Ill-Conditioned Tensor Least Squares Problems Lars Eldén Department of Mathematics, Linköping University 1 April 2005 ERCIM April 2005 Multi-Linear
More informationSolving linear equations with Gaussian Elimination (I)
Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian
More informationREPORTS IN INFORMATICS
REPORTS IN INFORMATICS ISSN 0333-3590 A class of Methods Combining L-BFGS and Truncated Newton Lennart Frimannslund Trond Steihaug REPORT NO 319 April 2006 Department of Informatics UNIVERSITY OF BERGEN
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More information10.34 Numerical Methods Applied to Chemical Engineering Fall Quiz #1 Review
10.34 Numerical Methods Applied to Chemical Engineering Fall 2015 Quiz #1 Review Study guide based on notes developed by J.A. Paulson, modified by K. Severson Linear Algebra We ve covered three major topics
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More information1 Non-negative Matrix Factorization (NMF)
2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,
More informationLine Search Algorithms
Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you
More informationNewton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum Functions
260 Journal of Advances in Applied Mathematics, Vol. 1, No. 4, October 2016 https://dx.doi.org/10.22606/jaam.2016.14006 Newton-type Methods for Solving the Nonsmooth Equations with Finitely Many Maximum
More informationLow Rank Approximation Lecture 7. Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL
Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Alternating least-squares / linear scheme General setting:
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationMatrix Assembly in FEA
Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,
More informationFitting a Tensor Decomposition is a Nonlinear Optimization Problem
Fitting a Tensor Decomposition is a Nonlinear Optimization Problem Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda* Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia
More informationCHAPTER 10: Numerical Methods for DAEs
CHAPTER 10: Numerical Methods for DAEs Numerical approaches for the solution of DAEs divide roughly into two classes: 1. direct discretization 2. reformulation (index reduction) plus discretization Direct
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationAlgunos ejemplos de éxito en matemáticas computacionales
Algunos ejemplos de éxito en matemáticas computacionales Carlos Beltrán Coloquio UC3M Leganés 4 de febrero de 2015 Solving linear systems Ax = b where A is n n, det(a) 0 Input: A M n (C), det(a) 0. b C
More informationChapter 8 Structured Low Rank Approximation
Chapter 8 Structured Low Rank Approximation Overview Low Rank Toeplitz Approximation Low Rank Circulant Approximation Low Rank Covariance Approximation Eculidean Distance Matrix Approximation Approximate
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationNonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy
Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Caroline Chaux Joint work with X. Vu, N. Thirion-Moreau and S. Maire (LSIS, Toulon) Aix-Marseille
More informationScientific Computing: Dense Linear Systems
Scientific Computing: Dense Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 February 9th, 2012 A. Donev (Courant Institute)
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationComplexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,
More informationMax-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig
Max-Planck-Institut für Mathematik in den Naturwissenschaften Leipzig Towards a condition number theorem for the tensor rank decomposition by Paul Breiding and Nick Vannieuwenhoven Preprint no.: 3 208
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationKey words. conjugate gradients, normwise backward error, incremental norm estimation.
Proceedings of ALGORITMY 2016 pp. 323 332 ON ERROR ESTIMATION IN THE CONJUGATE GRADIENT METHOD: NORMWISE BACKWARD ERROR PETR TICHÝ Abstract. Using an idea of Duff and Vömel [BIT, 42 (2002), pp. 300 322
More information