Algorithms for tensor approximations
|
|
- Melanie Ferguson
- 5 years ago
- Views:
Transcription
1 Algorithms for tensor approximations Lek-Heng Lim MSRI Summer Graduate Workshop July 7 18, 2008 L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
2 Synopsis Naïve: the Gauss-Seidel heuristic. Harmonic analysis: pursuits algorithms. Real algebraic geometry: semi-definite programming. Riemannian geometry: Grassman-Newton method. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
3 Recap: best low rank approximation of a hypermatrix Outer product rank: A R l m n. Want u i R l, v i R m, w i R n unit vectors, σ i R, that minimize A r σ iu i v i w i. i=1 Symmetric outer product rank: A S 3 (R n ). Want v i unit vector, λ i R, that minimize A r λ iv i v i v i. i=1 Nonnegative outer product rank: A R l m n +. Want x i R l +, y i R m +, z i R n + unit vectors, δ i R +, that minimize A r i=1 δ ix i y i z i. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
4 Recap: best low rank approximation of a hypermatrix Multilinear rank: A R l m n. Want U R l r 1, V R m r 2, W R n r 3 matrices with orthonormal columns, C R r 1 r 2 r 3, that minimize A (U, V, W ) C. Hybrid: A R l m n. Want B 1,..., B r R l m n with rank (B i ) (r 1, r 2, r 3 ), B i = 1, that minimize A r i=1 σ ib i. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
5 Gauss-Seidel method Optimal solution B to argmin rank (B) r A B F not easy to compute since the objective function is non-convex. A widely used strategy is a nonlinear Gauss-Seidel algorithm, better known as the Alternating Least Squares algorithm: Algorithm: ALS for optimal rank-r approximation initialize X (0) R l r, Y (0) R m r, Z (0) R n r ; initialize s (0), ε > 0, k = 0; while ρ (k+1) /ρ (k) > ε; X (k+1) argmin X R l r T r α=1 x(k+1) α Y (k+1) argmin Ȳ R m r T r α=1 x(k+1) α Z (k+1) argmin Z R n r T r α=1 x(k+1) α ρ (k+1) r α=1 [x(k+1) a y (k+1) α z (k+1) α y (k) α ȳ (k+1) α y (k+1) α x (k) α k k + 1; Coordinate cycling heuristic. May not converge. z (k) α 2 F ; z (k) α 2 F ; z (k+1) α 2 F ; y α (k) z (k) α ] 2 F ; L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
6 Best r-term approximation f α 1 f 1 + α 2 f α r f r. Target function f H vector space, cone, etc. f 1,..., f r D H dictionary. α 1,..., α r R or C (linear), R + (convex), R { } (tropical). with respect to ϕ : H H R, some measure of nearness between pairs of points (e.g. norms, metric, volumes, expectation, entropy, Brègman divergences, etc), want argmin{ϕ(f, α 1 f α r f r ) f i D}. For concreteness, H separable Hilbert space; measure of nearness is a norm, but not necessarily the one induced by its inner product. Reference: various papers by A. Cohen, R. DeVore, V. Temlyakov. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
7 Recap: dictionaries Discrete cosine: D = { 2 N cos(k )(n ) π N } k [N 1] C N. Taylor: Fourier: D = {x n n N {0}} C ω (R). D = {cos(nx), sin(nx) n Z} L 2 ( π, π). Peter-Weyl: D = { π(x)e i, e j π Ĝ, i, j [d π]} L 2 (G). L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
8 Recap: dictionaries Paley-Wiener: Gabor: Wavelet: D = {sinc(x n) n Z} H 2 (R). D = {e iαnx e (x mβ)2 /2 (m, n) Z Z} L 2 (R). D = {2 n/2 ψ(2 n x m) (m, n) Z Z} L 2 (R). Friends of wavelets: D L 2 (R 2 ) beamlets, brushlets, curvelets, ridgelets, wedgelets, multiwavelets. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
9 Approximants Definition Dictionary D H. For r N, the set of r-term approximants is { r } Σ r (D) := α if i H α i C, f i D. i=1 Let f H. The error of r-term approximation is σ n (f ) := inf g Σr (D) f g. Linear combination of two r-term approximants may have more than r non-zero terms. Σ r (D) not a subspace of H. Hence nonlinear approximation. In contrast with usual (linear) approximation, ie. inf g span(d) f g. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
10 Small is beautiful f i I D α if i Want good approximation, ie. f i I D α if i small. Want sparse/concentrated representation, ie. I small. Sparsity depends on choice of D. D 10 = {10 n n Z}, D 3 = {3 n n Z} R, 1 3 = [ ] 10 = n= n = [0.1] 3 = D fourier = {cos(nx), sin(nx) n Z}, Dtaylor = {x n n N {0}}, 1 2 x = sin(x) 1 2 sin(2x) sin(3x). sin(x) = x 1 6 x x 5. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
11 Bigger is better Union of dictionaries: allows for efficient (sparse) representation of different features D = Dfourier D wavelets, D = Dspikes D sinusoids D splines, D = D wavelets D curvelets D beamlets D ridgelets. D overcomplete or redundant dictionary. Trade off: computational complexity. Rule of thumb: the larger and more diverse the dictionary, the more efficient/sparser the representation. Observation: D above all zero dimensional (at most countably infinite). Question: What about dictionaries with a continuously varying families of functions? L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
12 Dictionaries of positive dimensions Neural networks: D = {σ(w x + w 0 ) L 2 (R n ) (w 0, w) R R n } where σ : R R sigmoid function, eg. σ(x) = [1 + exp( x)] 1. Exponential: Separable: D = {e tx t R + } or D = {e τx τ C}. D = {g L 2 (R 3 ) g(x, y, z) = ϑ(x)ϕ(y)ψ(z)} where ϑ, ϕ, ψ : R R. Symmetric separable: where ϕ : R R. D = {g L 2 (R 3 ) g(x, y, z) = ϕ(x)ϕ(y)ϕ(z)} L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
13 Same thing different names rth secant (quasiprojective) variety of the Segre variety is the set of r term approximants. If D = Seg(R l, R m, R n ), then Σ r (D) = {A R l m n rank (A) r}. Outer product decomposition: D = {u v w (u, v, w) R l R m R n } = {A R l m n rank (A) 1}. Symmetric outer product decomposition: D = {v v v v R n } = {A S 3 (R n ) rank S (A) 1}. Nonnegative outer product decomposition: D = {x y z (x, y, z) R l + R m + R n +} = {A R l m n + rank + (A) 1}. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
14 Pursuit algorithms Stepwise projection: g k = argmin g D { f h h span{g 1,..., g k 1, g}}, f k = proj span{g1,...,g k }(f ). Orthonormal matching pursuit: g k = argmax g D f f k 1, g, f k = proj span{g1,...,g k }(f ). Pure greedy: g k = argmax g D f f k 1, g, f k = f k 1 + f f k 1, g k g k. Relaxed greedy: g k = argmin g D { f h h span{f k 1, g}}, f k = α k f k 1 + β k g k. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
15 Recap: hypermatrices are functions on finite sets Totally ordered finite sets: [n] = {1 < 2 < < n}, n N. Hypermatrix (order 3) f : [l] [m] [n] R. If f (i, j, k) = a ijk, then f is represented by A = a ijk l,m,n i,j,k=1 Rl m n. l 2 ([l] [m] [n]) = l 2 ([l]) l 2 ([m]) l 2 ([n]): A, B R l m n, A, B = l,m,n i,j,k=1 a ijkb ijk. Frobenius norm A 2 F = l,m,n i,j,k=1 a2 ijk. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
16 Pursuit algorithms for tensor approximations Tensor approximation. Target function f : [l] [m] [n] R. Dictionary of separable functions, D = {g : [l] [m] [n] R g(i, j, k) = ϑ(i)ϕ(j)ψ(k)}, where ϑ : [l] R, ϕ : [m] R, ψ : [n] R. Symmetric tensor approximation. Target function: f : [n] [n] [n] R with f (i, j, k) = f (j, i, k) = = f (k, j, i). Dictionary of symmetric separable functions: D S = {g : [n] [n] [n] R g(i, j, k) = ϑ(i)ϑ(j)ϑ(k)}, where ϑ : [l] R. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
17 Pursuit algorithms for tensor approximations Nonnegative tensor approximation. Target function f : [l] [m] [n] R +. Dictionary of nonnegative separable functions, D + = {g : [l] [m] [n] R + g(i, j, k) = ϑ(i)ϕ(j)ψ(k)}, where ϑ : [l] R +, ϕ : [m] R +, ψ : [n] R +. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
18 Some history f polynomial in variables x = (x 1,..., x N ). Suppose f : R N R non-negative valued, ie. f (x) 0 for all x R N. Question: Can we write f as a sum of squares of polynomials, f (x) = M j=1 p j(x) 2? Answer (Hilbert): Not in general, eg. f (w, x, y, z) = w 4 + x 2 y 2 + y 2 z 2 + z 2 x 2 4xyzw. Hilbert s 17th Problem: Can we write f as a sum of squares of rational functions, Answer (Artin): Yes! f (x) = M j=1 ( ) pj (x) 2? q j (x) L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
19 SDP based algorithms Observation 1: F (x 11,..., z nr ) = A r α=1 x α y α z α 2 F = l,m,n i,j,k=1 (a ijk r α=1 x iαy jα z kα ) 2 is a polynomial of total degree 6 (resp. 2k for order k-tensors) in variables x 11,..., z nr. Multivariate polynomial optimization: non-convex problem argmin F (x 11,..., z nr ) may be relaxed to a convex problem (thus global optima is guranteed) which can in turn be solved using semidefinite programming (SDP). [Lasserre; 2001], [Parrilo; 2003], [Parrilo, Sturmfels; 2003]. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
20 How it works Observation 2: If F λ can be expressed as a sum of squares of polynomials F (x 11,..., z nr ) λ = n then λ is a global lower bound for F, ie. for all x 11,..., z nr R. F (x 11,..., z nr ) λ i=1 P i(x 11,..., z nr ) 2, Simple strategy: Find the largest λ such that F λ is a sum of squares. Then λ is often min F (x 11,..., z nr ). L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
21 Sketch Write v = (1, x 11,..., z nr,..., x l1 y m1 z n1,..., znr 6 ), the D-tuple of monomials of total degree 6, where ( ) r(l + m + n) + 3 D :=. 3 Write F (x 11,..., z nr ) = α v where α = (α 1,..., α D ) R D are the coefficients of the respective monomials. Since deg(f ) is even, F may also be written as for some M R D D. So where E 11 = e 1 e 1 RD D. F (x 11,..., z nr ) = v Mv F (x 11,..., z nr ) λ = v (M λe 11 )v L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
22 Sketch Observation 3: The rhs is a sum of squares iff M λe 11 is positive semidefinite (since M λe 11 = B B). Hence we have This is an SDP problem minimize λ subjected to v (S + λe 11 )v = F, S 0. minimize 0 S λ subjected to S B 1 + λ = α 1, S B k = α k, k = 2,..., D S 0, λ R. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
23 Properties May be solved in polynomial time. Like all SDP-based algorithms, duality produces a certificate that tells us whether we have arrived at a globally optimal solution. The duality gap, ie. difference between the values of the primal and dual objective functions, is 0 at a global minima. Complexity: For rank-r approximations to order-k tensors A R d 1 d k, ( ) r(d1 + + d k ) + k D = k is large even for moderate d i, r and k. Sparsity to the rescue: The polynomials that we are interested in are always sparse (eg. for k = 3, only terms of the form xyz or x 2 y 2 z 2 or uvwxyz appear). L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
24 Newton polytope Newton polytope of a polynomial f is the convex hull of the powers of the monomials in f. Example Newton polytope of f (x, y) = 3.67x 4 y x 3 y x y is the convex hull of the points (4, 10), (3, 3), (3, 0), (2, 0), (0, 0) in R 2. Newton polytope of g(x, y, z) = 1.7x 4 y 6 z x 3 z 5 3.0y yz 2 is the convex hull of the points (4, 6, 2), (3, 0, 5), (0, 4, 0), (0, 1, 2) in R 3. Theorem (Reznick) If f (x) = m i=1 p i(x) 2, then the powers of the monomials in p i must lie in Newton(f ). 1 2 L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
25 Multilinear polynomial The Newton polytope for a polynomial of the form f (x 11,..., z nr ) = λ + l,m,n i,j,k=1 (a ijk r α=1 x iαy jα z kα ) 2 is spanned by 1 and monomials of the form xiα 2 y jα 2 z2 kα (ie. monomials of the form x iα y jα z kα and x iα y jα z kα x iβ y jβ z kβ may all be dropped). So if f (x 11,..., z nr ) = N j=1 p j(x 11,..., z nr ) 2, then only 1 and monomials of the form x iα y jα z kα may occur in p 1,..., p N. In other words, ) we have reduced the size of the problem from to rlmn + 1. ( r(l+m+n)+3 3 L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
26 Global convergence If polynomials of the form λ + l,m,n i,j,k=1 (a ijk r α=1 x iαy jα z kα ) 2 can always be written as a sum of polynomials (we don t know), then the SDP algorithm for optimal low-rank tensor approximation will always converge globally. Numerical experiments performed by Parrilo on general polynomials yield λ = min F in all cases. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
27 Best multilinear rank approximation Given A R l m n, want rank (B) = (r 1, r 2, r 3 ) with min A B F = min A (X, Y, Z) C F C R r 1 r 2 r 3, X R l r 1, Y R m r 2, Z R n r 3 orthonormal. Problem overparameterized and equivalent to max (X, Y, Z ) A = max A (X, Y, Z) F, F X X = I, Y Y = I, Z Z = I. Problem defined on a product of Grassmann manifolds since A (X, Y, Z) F = A (XQ 1, YQ 2, ZQ 3 ) F, for any (Q 1, Q 2, Q 3 ) O(l) O(m) O(n). Only the subspaces spanned by X, Y, Z matters. Problem reformulated as max (X,Y,Z) Gr(l,r1 ) Gr(m,r 2 ) Gr(n,r 3 ) Φ(X, Y, Z). L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
28 Newton and Quasi-Newton algorithms on manifolds T X tangent space at X Gr(n, r) R n r T X X = 0 1 Compute Grassmann gradient Φ T (X,Y,Z). 2 Compute Hessian or update Hessian approximation H : T (X,Y,Z) H T (X,Y,Z). 3 At (X, Y, Z) Gr(l, r 1 ) Gr(m, r 2 ) Gr(n, r 3 ), solve H = Φ for search direction. 4 Update iterate (X, Y, Z): Move along geodesic from (X, Y, Z) in the direction given by. Optimize over a product of three (or more) Grassmannians. [Gabay, 1982], [Arias, Edelman, Smith; 1999], [Eldén, Savas; 2008]. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
29 Picture L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
30 Quasi-Newton and BFGS update The BFGS update H k+1 = H k H ks k s k H k s k H ks k + y ky k y k y k where s k = x k+1 x k = t k p k, y k = f k+1 f k. On Grassmann manifold the vectors are defined on different points belonging to different tangent spaces. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
31 Different ways of parallel transporting vectors X Gr(n, r), 1, 2 T X and X (t) geodesic path along 1 Parallel transport using global coordinates 2 (t) = T 1 (t) 2 we have also 1 = X D 1 and 2 = X D 2 where X basis for T X. Let X (t) be basis for T X (t). Parallel transport using local coordinates 2 (t) = X (t) D 2. L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
32 Parallel transport in local coordinates All transported tangent vectors have the same coordinate representation in the basis X (t) at all points on the path X (t). Plus No need to transport the gradient or the Hessian. Minus Need to compute X (t). In global coordinate we compute T k+1 s k = t k T k (t k )p k T k+1 y k = f k+1 T k (t k ) f k T k (t k )H k T 1 k (t k ) : T k+1 T k+1 H k+1 = H k H ks k s k H k s k H ks k + y ky k y k y k L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
33 Limited memory BFGS Compact representation of BFGS in Euclidean space: H k = H 0 + [ S k where ] [ R H 0 Y k k S k = [s 0,..., s k 1 ], (D k + Y k H 0Y k )R 1 R k k R 1 k 0 Y k = [y 0,..., y k 1 ], [ ] D k = diag s 0 y 0,..., s k 1 y k 1, s 0 y 0 s 0 y 1 s 0 y k 1 0 s 1 R k = y 1 s 1 y k s k 1 y k 1 ] [ ] S k Yk H 0 L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
34 Limited memory BFGS Limited memory BFGS [Byrd et al; 1994]. Replace H 0 by γ k I and keep the m most resent s j and y j, H k = γ k I + [ S k where ] [ R γ k Y k k S k = [s k m,..., s k 1 ], (D k + γ k Y k Y k)r 1 R k k R 1 k 0 ] [ S k γ k Y k Y k = [y k m,..., y k 1 ], [ ] D k = diag s k m y k m,..., s k 1 y k 1, s k m y k m s k m y k m+1 s k m y k 1 0 s k m+1 R k = y k m+1 s k m+1 y k s k 1 y k 1 ] L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
35 L-BFGS on the Grassmann manifold In each iteration, parallel transport vectors in S k and Y k to T k, ie. perform S k = TS k, Ȳ k = TY k where T is the transport matrix. No need to modify R k or D k u, v = T u, T v where u, v T k and T u, T v T k+1. H k nonsingular, Hessian is singular. No problem T k at x k is invariant subspace of H k, ie. if v T k then H k v T k. [Savas, L.; 2008] L.-H. Lim (MSRI SGW) Algorithms for tensor approximations July 7 18, / 35
Globally convergent algorithms for PARAFAC with semi-definite programming
Globally convergent algorithms for PARAFAC with semi-definite programming Lek-Heng Lim 6th ERCIM Workshop on Matrix Computations and Statistics Copenhagen, Denmark April 1 3, 2005 Thanks: Vin de Silva,
More informationAlgebraic models for higher-order correlations
Algebraic models for higher-order correlations Lek-Heng Lim and Jason Morton U.C. Berkeley and Stanford Univ. December 15, 2008 L.-H. Lim & J. Morton (MSRI Workshop) Algebraic models for higher-order correlations
More informationNumerical Multilinear Algebra II
Numerical Multilinear Algebra II Lek-Heng Lim University of California, Berkeley January 5 7, 2009 L.-H. Lim (ICM Lecture) Numerical Multilinear Algebra II January 5 7, 2009 1 / 61 Recap: tensor ranks
More informationConditioning and illposedness of tensor approximations
Conditioning and illposedness of tensor approximations and why engineers should care Lek-Heng Lim MSRI Summer Graduate Workshop July 7 18, 2008 L.-H. Lim (MSRI SGW) Conditioning and illposedness July 7
More informationQUASI-NEWTON METHODS ON GRASSMANNIANS AND MULTILINEAR APPROXIMATIONS OF TENSORS
QUASI-NEWTON METHODS ON GRASSMANNIANS AND MULTILINEAR APPROXIMATIONS OF TENSORS BERKANT SAVAS AND LEK-HENG LIM Abstract. In this paper we proposed quasi-newton and limited memory quasi-newton methods for
More informationPrincipal Cumulant Component Analysis
Principal Cumulant Component Analysis Lek-Heng Lim University of California, Berkeley July 4, 2009 (Joint work with: Jason Morton, Stanford University) L.-H. Lim (Berkeley) Principal Cumulant Component
More informationHyperdeterminants, secant varieties, and tensor approximations
Hyperdeterminants, secant varieties, and tensor approximations Lek-Heng Lim University of California, Berkeley April 23, 2008 Joint work with Vin de Silva L.-H. Lim (Berkeley) Tensor approximations April
More informationSpectrum and Pseudospectrum of a Tensor
Spectrum and Pseudospectrum of a Tensor Lek-Heng Lim University of California, Berkeley July 11, 2008 L.-H. Lim (Berkeley) Spectrum and Pseudospectrum of a Tensor July 11, 2008 1 / 27 Matrix eigenvalues
More informationNumerical multilinear algebra in data analysis
Numerical multilinear algebra in data analysis (Ten ways to decompose a tensor) Lek-Heng Lim Stanford University April 5, 2007 Lek-Heng Lim (Stanford University) Numerical multilinear algebra in data analysis
More informationEigenvalues of tensors
Eigenvalues of tensors and some very basic spectral hypergraph theory Lek-Heng Lim Matrix Computations and Scientific Computing Seminar April 16, 2008 Lek-Heng Lim (MCSC Seminar) Eigenvalues of tensors
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationLow-rank approximation of tensors and the statistical analysis of multi-indexed data
Low-rank approximation of tensors and the statistical analysis of multi-indexed data Lek-Heng Lim Institute for Computational and Mathematical Engineering Stanford University NERSC Scientific Computing
More informationc 2010 Society for Industrial and Applied Mathematics
SIAM J. SCI. COMPUT. Vol. 32, No. 6, pp. 3352 3393 c 2010 Society for Industrial and Applied Mathematics QUASI-NEWTON METHODS ON GRASSMANNIANS AND MULTILINEAR APPROXIMATIONS OF TENSORS BERKANT SAVAS AND
More informationNumerical Multilinear Algebra in Data Analysis
Numerical Multilinear Algebra in Data Analysis Lek-Heng Lim Stanford University Computer Science Colloquium Ithaca, NY February 20, 2007 Collaborators: O. Alter, P. Comon, V. de Silva, I. Dhillon, M. Gu,
More informationUniversity of Houston, Department of Mathematics Numerical Analysis, Fall 2005
3 Numerical Solution of Nonlinear Equations and Systems 3.1 Fixed point iteration Reamrk 3.1 Problem Given a function F : lr n lr n, compute x lr n such that ( ) F(x ) = 0. In this chapter, we consider
More informationMost tensor problems are NP hard
Most tensor problems are NP hard (preliminary report) Lek-Heng Lim University of California, Berkeley August 25, 2009 (Joint work with: Chris Hillar, MSRI) L.H. Lim (Berkeley) Most tensor problems are
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationSemi-definite representibility. For fun and profit
Semi-definite representation For fun and profit March 9, 2010 Introduction Any convex quadratic constraint x T C T Cx a + b T x Can be recast as the linear matrix inequality (LMI): ( ) In Cx (Cx) T a +
More informationLMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009
LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix
More informationSemidefinite Programming
Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 28 (Algebra + Optimization) 02 May, 2013 Suvrit Sra Admin Poster presentation on 10th May mandatory HW, Midterm, Quiz to be reweighted Project final report
More informationRepresentations of Positive Polynomials: Theory, Practice, and
Representations of Positive Polynomials: Theory, Practice, and Applications Dept. of Mathematics and Computer Science Emory University, Atlanta, GA Currently: National Science Foundation Temple University
More informationMATHEMATICS. Course Syllabus. Section A: Linear Algebra. Subject Code: MA. Course Structure. Ordinary Differential Equations
MATHEMATICS Subject Code: MA Course Structure Sections/Units Section A Section B Section C Linear Algebra Complex Analysis Real Analysis Topics Section D Section E Section F Section G Section H Section
More information(3) Let Y be a totally bounded subset of a metric space X. Then the closure Y of Y
() Consider A = { q Q : q 2 2} as a subset of the metric space (Q, d), where d(x, y) = x y. Then A is A) closed but not open in Q B) open but not closed in Q C) neither open nor closed in Q D) both open
More informationProjection methods to solve SDP
Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method
More informationApplied Machine Learning for Biomedical Engineering. Enrico Grisan
Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination
More informationApplication of Numerical Algebraic Geometry to Geometric Data Analysis
Application of Numerical Algebraic Geometry to Geometric Data Analysis Daniel Bates 1 Brent Davis 1 Chris Peterson 1 Michael Kirby 1 Justin Marks 2 1 Colorado State University - Fort Collins, CO 2 Air
More informationA new look at nonnegativity on closed sets
A new look at nonnegativity on closed sets LAAS-CNRS and Institute of Mathematics, Toulouse, France IPAM, UCLA September 2010 Positivstellensatze for semi-algebraic sets K R n from the knowledge of defining
More informationOptimization over Polynomials with Sums of Squares and Moment Matrices
Optimization over Polynomials with Sums of Squares and Moment Matrices Monique Laurent Centrum Wiskunde & Informatica (CWI), Amsterdam and University of Tilburg Positivity, Valuations and Quadratic Forms
More informationConvex Optimization & Parsimony of L p-balls representation
Convex Optimization & Parsimony of L p -balls representation LAAS-CNRS and Institute of Mathematics, Toulouse, France IMA, January 2016 Motivation Unit balls associated with nonnegative homogeneous polynomials
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationSparsity in system identification and data-driven control
1 / 40 Sparsity in system identification and data-driven control Ivan Markovsky This signal is not sparse in the "time domain" 2 / 40 But it is sparse in the "frequency domain" (it is weighted sum of six
More informationCOURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion
COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS Didier HENRION www.laas.fr/ henrion October 2006 Geometry of LMI sets Given symmetric matrices F i we want to characterize the shape in R n of the LMI set F
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationdistances between objects of different dimensions
distances between objects of different dimensions Lek-Heng Lim University of Chicago joint work with: Ke Ye (CAS) and Rodolphe Sepulchre (Cambridge) thanks: DARPA D15AP00109, NSF DMS-1209136, NSF IIS-1546413,
More informationApproximation of Multivariate Functions
Approximation of Multivariate Functions Vladimir Ya. Lin and Allan Pinkus Abstract. We discuss one approach to the problem of approximating functions of many variables which is truly multivariate in character.
More informationConvex algebraic geometry, optimization and applications
Convex algebraic geometry, optimization and applications The American Institute of Mathematics The following compilation of participant contributions is only intended as a lead-in to the AIM workshop Convex
More informationPOLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS
POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS Sercan Yıldız syildiz@samsi.info in collaboration with Dávid Papp (NCSU) OPT Transition Workshop May 02, 2017 OUTLINE Polynomial optimization and
More informationWeak sharp minima on Riemannian manifolds 1
1 Chong Li Department of Mathematics Zhejiang University Hangzhou, 310027, P R China cli@zju.edu.cn April. 2010 Outline 1 2 Extensions of some results for optimization problems on Banach spaces 3 4 Some
More informationA solution approach for linear optimization with completely positive matrices
A solution approach for linear optimization with completely positive matrices Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria joint work with M. Bomze (Wien) and F.
More informationQualifying Exams I, Jan where µ is the Lebesgue measure on [0,1]. In this problems, all functions are assumed to be in L 1 [0,1].
Qualifying Exams I, Jan. 213 1. (Real Analysis) Suppose f j,j = 1,2,... and f are real functions on [,1]. Define f j f in measure if and only if for any ε > we have lim µ{x [,1] : f j(x) f(x) > ε} = j
More informationCopositive Programming and Combinatorial Optimization
Copositive Programming and Combinatorial Optimization Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria joint work with I.M. Bomze (Wien) and F. Jarre (Düsseldorf) IMA
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationNumerical Minimization of Potential Energies on Specific Manifolds
Numerical Minimization of Potential Energies on Specific Manifolds SIAM Conference on Applied Linear Algebra 2012 22 June 2012, Valencia Manuel Gra f 1 1 Chemnitz University of Technology, Germany, supported
More informationRobust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds
Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.
More informationMinimum Ellipsoid Bounds for Solutions of Polynomial Systems via Sum of Squares
Journal of Global Optimization (2005) 33: 511 525 Springer 2005 DOI 10.1007/s10898-005-2099-2 Minimum Ellipsoid Bounds for Solutions of Polynomial Systems via Sum of Squares JIAWANG NIE 1 and JAMES W.
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationInterior Point Algorithms for Constrained Convex Optimization
Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems
More informationCambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information
Introduction Consider a linear system y = Φx where Φ can be taken as an m n matrix acting on Euclidean space or more generally, a linear operator on a Hilbert space. We call the vector x a signal or input,
More informationTensors. Lek-Heng Lim. Statistics Department Retreat. October 27, Thanks: NSF DMS and DMS
Tensors Lek-Heng Lim Statistics Department Retreat October 27, 2012 Thanks: NSF DMS 1209136 and DMS 1057064 L.-H. Lim (Stat Retreat) Tensors October 27, 2012 1 / 20 tensors on one foot a tensor is a multilinear
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More information1 Quantum states and von Neumann entropy
Lecture 9: Quantum entropy maximization CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: February 15, 2016 1 Quantum states and von Neumann entropy Recall that S sym n n
More informationMathematical Analysis
Mathematical Analysis A Concise Introduction Bernd S. W. Schroder Louisiana Tech University Program of Mathematics and Statistics Ruston, LA 31CENTENNIAL BICENTENNIAL WILEY-INTERSCIENCE A John Wiley &
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationAdvanced SDPs Lecture 6: March 16, 2017
Advanced SDPs Lecture 6: March 16, 2017 Lecturers: Nikhil Bansal and Daniel Dadush Scribe: Daniel Dadush 6.1 Notation Let N = {0, 1,... } denote the set of non-negative integers. For α N n, define the
More informationSEMIDEFINITE PROGRAM BASICS. Contents
SEMIDEFINITE PROGRAM BASICS BRIAN AXELROD Abstract. A introduction to the basics of Semidefinite programs. Contents 1. Definitions and Preliminaries 1 1.1. Linear Algebra 1 1.2. Convex Analysis (on R n
More informationRobust and Optimal Control, Spring 2015
Robust and Optimal Control, Spring 2015 Instructor: Prof. Masayuki Fujita (S5-303B) G. Sum of Squares (SOS) G.1 SOS Program: SOS/PSD and SDP G.2 Duality, valid ineqalities and Cone G.3 Feasibility/Optimization
More informationII. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES
II. DIFFERENTIABLE MANIFOLDS Washington Mio Anuj Srivastava and Xiuwen Liu (Illustrations by D. Badlyans) CENTER FOR APPLIED VISION AND IMAGING SCIENCES Florida State University WHY MANIFOLDS? Non-linearity
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko
More informationSymmetric eigenvalue decompositions for symmetric tensors
Symmetric eigenvalue decompositions for symmetric tensors Lek-Heng Lim University of California, Berkeley January 29, 2009 (Contains joint work with Pierre Comon, Jason Morton, Bernard Mourrain, Berkant
More informationGlobal Optimization with Polynomials
Global Optimization with Polynomials Deren Han Singapore-MIT Alliance National University of Singapore Singapore 119260 Email: smahdr@nus.edu.sg Abstract The class of POP (Polynomial Optimization Problems)
More informationThree Generalizations of Compressed Sensing
Thomas Blumensath School of Mathematics The University of Southampton June, 2010 home prev next page Compressed Sensing and beyond y = Φx + e x R N or x C N x K is K-sparse and x x K 2 is small y R M or
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More information2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is
More information1. If 1, ω, ω 2, -----, ω 9 are the 10 th roots of unity, then (1 + ω) (1 + ω 2 ) (1 + ω 9 ) is A) 1 B) 1 C) 10 D) 0
4 INUTES. If, ω, ω, -----, ω 9 are the th roots of unity, then ( + ω) ( + ω ) ----- ( + ω 9 ) is B) D) 5. i If - i = a + ib, then a =, b = B) a =, b = a =, b = D) a =, b= 3. Find the integral values for
More informationFunctional Analysis I
Functional Analysis I Course Notes by Stefan Richter Transcribed and Annotated by Gregory Zitelli Polar Decomposition Definition. An operator W B(H) is called a partial isometry if W x = X for all x (ker
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More information6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC
6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC 2003 2003.09.02.10 6. The Positivstellensatz Basic semialgebraic sets Semialgebraic sets Tarski-Seidenberg and quantifier elimination Feasibility
More informationCombinatorics and geometry of E 7
Combinatorics and geometry of E 7 Steven Sam University of California, Berkeley September 19, 2012 1/24 Outline Macdonald representations Vinberg representations Root system Weyl group 7 points in P 2
More informationUnbounded Convex Semialgebraic Sets as Spectrahedral Shadows
Unbounded Convex Semialgebraic Sets as Spectrahedral Shadows Shaowei Lin 9 Dec 2010 Abstract Recently, Helton and Nie [3] showed that a compact convex semialgebraic set S is a spectrahedral shadow if the
More informationQUALIFYING EXAMINATION Harvard University Department of Mathematics Tuesday 10 February 2004 (Day 1)
Tuesday 10 February 2004 (Day 1) 1a. Prove the following theorem of Banach and Saks: Theorem. Given in L 2 a sequence {f n } which weakly converges to 0, we can select a subsequence {f nk } such that the
More informationA ten page introduction to conic optimization
CHAPTER 1 A ten page introduction to conic optimization This background chapter gives an introduction to conic optimization. We do not give proofs, but focus on important (for this thesis) tools and concepts.
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationGauge optimization and duality
1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationREAL AND COMPLEX ANALYSIS
REAL AND COMPLE ANALYSIS Third Edition Walter Rudin Professor of Mathematics University of Wisconsin, Madison Version 1.1 No rights reserved. Any part of this work can be reproduced or transmitted in any
More informationAlexander Ostrowski
Ostrowski p. 1/3 Alexander Ostrowski 1893 1986 Walter Gautschi wxg@cs.purdue.edu Purdue University Ostrowski p. 2/3 Collected Mathematical Papers Volume 1 Determinants Linear Algebra Algebraic Equations
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationLecture 9 Sequential unconstrained minimization
S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities
More informationApproximation theory in neural networks
Approximation theory in neural networks Yanhui Su yanhui su@brown.edu March 30, 2018 Outline 1 Approximation of functions by a sigmoidal function 2 Approximations of continuous functionals by a sigmoidal
More information1 Non-negative Matrix Factorization (NMF)
2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,
More informationTensors. Notes by Mateusz Michalek and Bernd Sturmfels for the lecture on June 5, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra
Tensors Notes by Mateusz Michalek and Bernd Sturmfels for the lecture on June 5, 2018, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra This lecture is divided into two parts. The first part,
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationNetwork Localization via Schatten Quasi-Norm Minimization
Network Localization via Schatten Quasi-Norm Minimization Anthony Man-Cho So Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong (Joint Work with Senshan Ji Kam-Fung
More informationrecent developments of approximation theory and greedy algorithms
recent developments of approximation theory and greedy algorithms Peter Binev Department of Mathematics and Interdisciplinary Mathematics Institute University of South Carolina Reduced Order Modeling in
More informationMatrix stabilization using differential equations.
Matrix stabilization using differential equations. Nicola Guglielmi Universitá dell Aquila and Gran Sasso Science Institute, Italia NUMOC-2017 Roma, 19 23 June, 2017 Inspired by a joint work with Christian
More informationSemidefinite programming lifts and sparse sums-of-squares
1/15 Semidefinite programming lifts and sparse sums-of-squares Hamza Fawzi (MIT, LIDS) Joint work with James Saunderson (UW) and Pablo Parrilo (MIT) Cornell ORIE, October 2015 Linear optimization 2/15
More informationCopositive Programming and Combinatorial Optimization
Copositive Programming and Combinatorial Optimization Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria joint work with M. Bomze (Wien) and F. Jarre (Düsseldorf) and
More informationDistances between spectral densities. V x. V y. Genesis of this talk. The key point : the value of chordal distances. Spectrum approximation problem
Genesis of this talk Distances between spectral densities The shortest distance between two points is always under construction. (R. McClanahan) R. Sepulchre -- University of Cambridge Celebrating Anders
More informationConvex algebraic geometry, optimization and applications
Convex algebraic geometry, optimization and applications organized by William Helton and Jiawang Nie Workshop Summary We start with a little bit of terminology. A linear pencil is a function L : R g d
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationNumerical Multilinear Algebra I
Numerical Multilinear Algebra I Lek-Heng Lim University of California, Berkeley January 5 7, 2009 L.-H. Lim (ICM Lecture) Numerical Multilinear Algebra I January 5 7, 2009 1 / 55 Hope Past 50 years, Numerical
More informationBlock Coordinate Descent for Regularized Multi-convex Optimization
Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline
More information9. Interpretations, Lifting, SOS and Moments
9-1 Interpretations, Lifting, SOS and Moments P. Parrilo and S. Lall, CDC 2003 2003.12.07.04 9. Interpretations, Lifting, SOS and Moments Polynomial nonnegativity Sum of squares (SOS) decomposition Eample
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More information