Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization
|
|
- Ellen Barnett
- 5 years ago
- Views:
Transcription
1 Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization
2 Outline Today: Convex versus nonconvex? Classical nonconvex problems Eigen problems Graph problems Nonconvex proximal operators Discrete problems Infinite-dimensional problems Statistical problems Miscellaneous 2
3 Beyond the tip? 3
4 Some takeaway points If possible, formulate task in terms of convex optimization typically easier to solve, easier to analyze Nonconvex does not necessarily mean nonscientific! However, statistically, it can often mean high(er) variance This is true both intrinsically, and because we can rarely solve nonconvex problems (to global optimality) In more cases than you might expect, nonconvex problems can be solved exactly (to global optimality) 4
5 What does it mean for a problem to be nonconvex? Consider a generic optimization problem: x subject to f(x) g i (x) 0, i = 1,... m h j (x) = 0, j = 1,... r This is a convex problem if f, g i, i = 1,... m are convex, and h j, j = 1,... r are affine A nonconvex problem is one of this form, where not all conditions are met on the functions But trivial modifications of convex problems can lead to nonconvex formulations... so we really just consider nonconvex problems that are not trivially equivalent to convex ones 5
6 What does it mean to solve a nonconvex problem? Nonconvex problems can have local ima, i.e., there can exist a feasible x such that f(y) f(x) for all feasible y such that x y 2 R but x is still not globally optimal. (Note: we proved that this could not happen for convex problems) Hence by solving a nonconvex problem, we mean finding the global imizer We also implicitly mean doing it efficiently, i.e., in polynomial time 6
7 Addendum This is really about putting together a list of interesting problems, that are suprisingly tractable... so there will be exceptions about nonconvexity and/or requiring exact global optima (Also, I m sure that there are many more examples out there that I m missing, so I invite you to give me ideas / contribute!) 7
8 Classical nonconvex problems 8
9 Linear-fractional programs A linear-fractional program is of the form x c T x + d e T x + f subject to Gx h, e T x + f > 0 Ax = b This is nonconvex (but quasiconvex). Provided that this problem is feasible, it is in fact equivalent to the linear program y,z c T y + dz subject to Gy hz 0, z 0 Ay bz = 0, e T y + fz = 1 9
10 10 The link between the two problems is the transformation y = x e T x + f, z = 1 e T x + f The proof of their equivalence is simple; e.g., see B & V Chapter 4 Linear-fractional problems show up in the study of solutions paths for some common statistical estimation problems E.g., knots in the lasso path (values of λ at which coefficient becomes nonzero) are optimal values of linear-fractional programs See Taylor et al. (2013), Inference in adaptive regression via the Kac-Rice formula
11 11 Geometric programs A monomial is a function f : R n ++ R of the form f(x) = γx a 1 1 xa 2 2 xan n for γ > 0, a 1,... a n R. A posynomial is a sum of monomials, p f(x) = γ k x a k1 1 x a k2 2 x a kn n k=1 A geometric program of the form x subject to f(x) g i (x) 1, i = 1,... m h j (x) = 1, j = 1,... r where f, g i, i = 1,... m are posynomials and h j, j = 1,... r are monomials. This is nonconvex
12 12 This is equivalent to a convex problem, via a simple transformation. Given f(x) = γx a 1 1 xa 2 2 xan n, let y i = log x i and rewrite this as γ(e y 1 ) a 1 (e y 2 ) a2 (e yn ) an = e at y+b for b = log γ. Also, a posynomial can be written as p k=1 eat k y+b k. With this variable substitution, and after taking logs, a geometric program is equivalent to ) y subject to log log ( p0 e at 0k y+b 0k k=1 ( pi e at ik y+b ik k=1 ) c T j y + d j = 0, j = 1,... r 0, i = 1,... m This is convex, recalling the convexity of soft max functions
13 13 Many interesting problems are geometric programs; see Boyd et al. (2007), A tutorial on geometric programg, and also Chapters 4.5 and 8.8 of B & V book. Example floor planning program: w i W,H, x,y,w,h W H (x i,y i) C i h i H subject to 0 x i W, i = 1,... n 0 y i H, i = 1,... n x i + w i x j, (i, j) L y i + h i y j, (i, j) B W w i h i = C i, i = 1,... n (Extension: Sra and Hosseini (2013), Geometric optimization on positive definite matrices with application to elliptically contoured distributions )
14 14 Problems with two quadratic functions Consider a problem involving two quadratics x x T A 0 x + 2b T 0 x + c 0 subject to x T A 1 x + 2b T 1 x + c 1 0 Here A 0, A 1 need not be positive definite, so this is nonconvex. The dual problem can be cast as max u,v subject to u [ ] A0 + va 1 b 0 + vb 1 (b 0 + vb 1 ) T 0, v 0 c 0 + vc 1 u Dual is convex (as always), and strong duality holds. See Appendix B of B & V, and also Beck and Eldar (2006), Strong duality in nonconvex quadratic optimization with two quadratic constraints
15 15 Handling convex equality constraints Given convex f, g i, i = 1,... m, the problem x subject to f(x) g i (x) 0, i = 1,... m h(x) = 0 is nonconvex when h is convex but not affine. A convex relaxation of this problem is x subject to f(x) g i (x) 0, i = 1,... m h(x) 0 If we can ensure that h(x ) = 0 at any solution x of the above problem, then the two are equivalent
16 16 From B & V Exercises 4.6 and 4.58, e.g., consider the maximum utility problem T α t u(x t ) max x,b subject to t=1 b t+1 = b t + f(b t ) x t, t = 1,... T 0 x t b t, t = 1,... T where b 0 0 is fixed. Interpretation: x t is the amount spent of your total available money b t at time t; concave function u gives utility, concave function f measures investment return This is not a convex problem, because of the equality constraint; but can relax to b t+1 b t + f(b t ) x t, t = 0,... T without changing solution (think about throwing out money)
17 17 Convexifying constraint sets Given nonconvex set C, consider the nonconvex problem x c T x subject to x C Due to linearity of objective, this is equivalent to convex problem x c T x subject to x conv(c) Proof: let f be optimal value in first problem, x be solution in second. Then x = i a ix i where x i C, a i 0, and i a i = 1. Note f c T x = i a ic T x i i a if = f. Thus all x i must be optimal for first problem But note that the convex problem is not necessarily easy! Could be very hard to even form conv(c) (cf. the cutting plane method for integer programg)
18 Eigen problems 18
19 19 Principal component analysis Given a matrix X R n p, consider the nonconvex problem R X R 2 F subject to rank(r) = k for some fixed k. The solution here is given by the singular value decomposition of X: if X = UDV T, then ˆR = U k D k V T k, where U k, V k are the first k columns of U, V, and D k is the first k diagonal elements of D. I.e., ˆR is the reconstruction of X from its first k principal components This is often called the Eckart-Young Theorem, established in 1936, but was probably known even earlier see Stewart (1992), On the early history of the singular value decomposition
20 20 Fantope Another characterization of the SVD is via the following nonconvex problem, given X R n p : X Z S XZ 2 p F subject to rank(z) = k, Z projection max Z S p XT X, Z subject to rank(z) = k, Z projection The solution here is Ẑ = V kv T k, where the columns of V k R p k give the first k eigenvectors of X T X This is equivalent to a convex problem. Express constraint set C as { } C = Z S p : rank(z) = k, Z is a projection { } = Z S p : λ i (Z) {0, 1}, i = 1,... p, tr(z) = k
21 21 Now consider the convex hull F k = conv(c): { } F k = Z S p : λ i (Z) [0, 1], i = 1,... p, tr(z) = k { } = Z S p : 0 Z I, tr(z) = k This is called the Fantope of order k. Further, the convex problem max Z S p XT X, Z subject to Z F k admits the same solution as the original one, i.e., Ẑ = V kv T k See Fan (1949), On a theorem of Weyl conerning eigenvalues of linear transformations, and Overton and Womersley (1992), On the sum of the largest eigenvalues of a symmetric matrix Sparse PCA extension: Vu et al. (2013), Fantope projection and selection: near-optimal convex relaxation of sparse PCA
22 Classical multidimensional scaling Let x 1,... x n R p, and define similarities S ij = (x i x) T (x j x). For fixed k, classical multidimensional scaling or MDS solves the nonconvex problem z 1,...z n n i,j=1 ( ) 2 S ij (z i z) T (z j z) From Hastie et al. (2009), The elements of statistical learning 22
23 23 Let S be the similarity matrix (entries S ij = (x i x) T (x j x)) The classical MDS problem has an exact solution in terms of the eigendecomposition S = UD 2 U T : ẑ 1,... ẑ n are the rows of U k D k where U k is the first k columns of U, and D k the first k diagonal entries of D Note that other very similar forms of MDS are not convex, and not directly solveable, e.g., least squares scaling, with d ij = x i x j 2 : z 1,...z n n ( ) 2 dij z i z j 2 i,j=1 See Hastie et al. (2009), Chapter 14
24 24 Generalized eigenvalue problems Given B, W S p ++, consider the nonconvex problem max v v T Bv v T W v This is a generalized eigenvalue problem, with exact solution given by the top eigenvector of W 1 B This is important, e.g., in Fisher s discriant analysis, where B is the between-class covariance matrix, and W the within-class covariance matrix See Hastie et al. (2009), Chapter 4
25 Graph problems 25
26 26 Min cut Given a graph G = (V, E) with V = {1,... n}, two nodes s, t V, and costs c ij 0 on edges (i, j) E. Min cut problem: b R E, x R V subject to (i,j) E b ij c ij b ij x i x j b ij, x i, x j {0, 1} for all i, j, x s = 0, x t = 1 Think of b ij as the indicator that the edge (i, j) traverses the cut from s to t; think of x i as an indicator that node i is grouped with t. This nonconvex problem can be solved exactly using max flow
27 27 A relaxation of cut b R E, x R V subject to b ij c ij (i,j) E b ij x i x j for all i, j b 0, x s = 0, x t = 1 This is an LP, and recall that it is the dual of the max flow LP: max f R E (s,j) E f sj subject to f ij 0, f ij c ij for all (i, j) E f ik = f kj for all k V \ {s, t} (i,k) E (k,j) E Max flow cut theorem tells us that the relaxed cut is tight
28 28 Max cut Given an undirected graph G = (V, E) with V = {1,... n}, edge costs c ij 0, (i, j) E. Max cut problem: v 1,...,v n R subject to (i,j) E c ij (1 v i v j ) 2 v 2 i = 1, i = 1,... n Nonconvex, NP hard! SDP relaxation of Goemans and Williamson (1994), Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programg : v 1,...,v n R n subject to (i,j) E c ij (1 v T i v j) 2 v i 2 2 = 1, i = 1,... n Remarkable fact: with randomized rounding, E[f SDP ] 0.878f OPT
29 Nonconvex proximal operators 29
30 Hard-thresholding Given y R n, one of the simplest problems in signal processing: β n (y i β i ) 2 + i=1 n λ i 1{β i 0} i=1 Solution is given by hard-thresholding y, { y i if yi 2 β i = > λ i 0 otherwise, i = 1,... n and can be seen by inspection. Special case of λ i = λ, i = 1,... n: β y β λ β 0 Compare to soft-thresholding, proximal operator for l 1 norm. Also, note: changing the loss to y Xβ 2 2 gives best subset selection, which is NP hard for general matrix X 30
31 31 Potts imization Consider 1d segmentation problem, also called Potts imization: β n n 1 (y i β i ) 2 + λ 1{β i β i+1 } i=1 i=1 Nonconvex, but solveable by dynamic programg, in two ways: Bellman (1961), On the approximation of curves by line segments using dynamic programg, and Johnson (2013) A dynamic programg algorithm for the fused lasso and L 0 -segmentation Johnson: more efficient, Bellman: more general Worst-case O(n 2 ), but with practical performance more like O(n)
32 32 Tree-leaves projection Given target u R n, tree g on R n, and label y {0, 1}, consider z u z λ 1{g(z) y} Interpretation: find z close to u, whose label under g is not unlike y. Argue directly that solution is either ẑ = u or ẑ = P S (u), where S = g 1 (1) = {z : g(z) = y} the set of leaves of g assigned label y. Just compute both options for ẑ and compare costs; problem reduces to computing P S (y), the projection onto a set of tree leaves, highly nonconvex (Subroutine of broader algorithm for nonconvex optimization; see Carreira-Perpinan and Wang (2012), Distributed optimization of deeply nested systems )
33 The set S is a union of axis-aligned boxes; projection onto any one box is fast, O(n) operations 33
34 34 To project onto S, could just scan through all boxes, and take the closest Faster: decorate each node of tree with labels of its leaves, and bounding box. Perform depthfirst search, pruning nodes: that do not contain a leaf labeled y, or whose bounding box is farther away than the current closest box
35 Discrete problems 35
36 36 Binary graph segmentation Given y R n, undirected graph G = (V, E) with V = {1,... n}, consider binary graph segmentation: β {0,1} n n (y i β i ) 2 + λ ij 1{β i β j } i=1 (i,j) E Nonconvex, but simple manipulation delivers the equivalent form max a i + b j λ ij A {1,...n} i A j A c (i,j) E, A {i,j} =1 which is a segmentation problem that can be solved exactly using cut/max flow. E.g., Kleinberg and Tardos (2005), Algorithm design, Chapter 7
37 37 Can apply recursively to get a verison of graph hierarchical clustering (divisive) E.g., take the graph as a 2d grid for image segmentation (From ac.kr)
38 38 Discrete Potts imization Given y R n, now consider discrete Potts imization: β {b 1,...b k } n n n 1 (y i β i ) 2 + λ 1{β i β i+1 } i=1 where {b 1,... b k } is some fixed discrete set. This is nonconvex and can be efficiently solved using classical dynamic programg Key insight is that the 1-dimensional structure allows us to exactly solve and store ˆβ 1 (β 2 ) = ˆβ 2 (β 3 ) =... arg β 1 {b 1,...b k } arg β 2 {b 1,...b k } i=1 (y 1 β 1 ) 2 + λ 1{β 1 β 2 } }{{} f 1 (β 1,β 2 ) f 1 ( ˆβ1 (β 2 ), β 2 ) + (y2 β 2 ) 2 + λ 1{β 2 β 3 }
39 39 DP agorithm: Make a forward pass over β 1,... β n 1, keeping a look-up table; also keep a look-up table for the optimal partial criterion values f 1,... f n 1 Solve exactly for β n Make a backward pass β n 1,... β 1, reading off the look-up table β 1 β 2... β n 1 f 1 f 2... f n 1 b 1 b 2... b k b 1 b 2... b k Requires O(nk 2 ) operations
40 40 Nearly optimal K-means Given data points x 1,... x n R p, the K-means problem is c 1,...c K n x i c k 2 2 k=1,...k i=1 }{{} f(c 1,...c K ) This is nonconvex, NP hard, and it is usually approximately solved using Lloyd s algorithm, run many times, with random starts Careful choice of starting positions makes a big impact: if we run Lloyd s algorithm once, starting at c 1 = s 1,... c K = s K, for special (random) s 1,... s K, then we get estimates ĉ 1,... ĉ K with E [ f(ĉ 1,... ĉ K ) ] 8(log k + 2) f(c 1,... c K ) c 1,...c K R p
41 41 See Arthur and Vassilvitskii (2007), k-means++: The advantages of careful seeding. Their construction of s 1,... s K is simple: Begin by choosing s 1 uniformly at random among x 1,... x n Compute squared distances d 2 i = x i s for all points i not chosen, and choose s 2 by drawing from the remaining points, with probability weights d 2 i / j d2 j Recompute the squared distances as d 2 i = { x i s 1 2 2, x i s 2 2 } 2 and choose s 3 according to the same recipe And so on, until all of s 1,... s K are chosen
42 Infinite-dimensional problems 42
43 43 Smoothing splines Given pairs (x i, y i ) R R, i = 1,... n, smoothing spline solves f n ( yi f(x i ) ) 2 (f + λ (t) ) 2 dt i=1 Optimization domain: all functions f such that (f (t)) 2 dt <, infinite-dimensional set Can show that the solution ˆf to the above problem is unique, and given by natural cubic spline, that has knots at x 1,... x n. (Proof: use integration by parts.) Hence we can parametrize by n f = θ j η j j=1 where η 1,... η n are natural cubic spline basis functions. Task now is to solve for coefficients θ R n
44 44 Plugging in f = n j=1 θ jη j, transform smoothing spline problem into finite-dimensional form: θ y Nθ λθ T Ωθ where N ij = η j (x i ), and Ω ij = η i (t) η j (t) dt. The solution is explicitly given by ˆθ = (N T N + λω) 1 N T y and fitted function is ˆf = n j=1 ˆθ j η j. With proper choice of basis function (B-splines), calculation of ˆθ is O(n) See Wahba (1990), Splines models for observational data ; Green and Silverman (1994), Nonparametric regression and generalized linear models ; Hastie et al. (2009), Chapter 5
45 45 Locally adaptive regression splines Given same setup, (cubic) locally adaptive regression spline solves f n ( yi f(x i ) ) 2 + λ TV(f ) i=1 Optimization domain: all functions f with TV(f ) <, which is again infinite-dimensional Similar to before, can show that the solution ˆf to above problem is a cubic spline, but two key differences: Can have any number of knots n 4 (tuned by λ) Knots do not necessarily lie at input points x 1,... x n! Details in Mammen and van de Geer (1997), Locally adaptive regression splines. Summary: these are statistically more adaptive but computationally more challenging than smoothing splines
46 46 Smoothing spline (easier to compute) Locally adaptive spline (more adaptive) Finite-dimensional approximation is given in Mammen and van de Geer (1997), and a much faster approximation in Tibshirani (2014), Adaptive piecewise polynomial estimation via trend filtering
47 47 Reproducing kernel Hilbert spaces Let κ : R d R d R ++ be a positive definite kernel, and H κ the function space generated by (possibly infinite) linear combinations of functions κ(, z), z R d. This is a reproducing kernel Hilbert space or RKHS, and is equipped with a norm Hκ Given data (x i, y i ) R d R, i = 1,... n, consider the problem f H κ n ( yi f(x i ) ) 2 + λ f 2 Hκ i=1 This is infinite-dimensional, but by Mercer s Theorem, the solution satisfies f = n j=1 α jk(, x j ). Letting K R n n have elements K ij = κ(x i, x j ), our problem reduces to α y Kα λα T Kα (Beyond regression: same trick for kernel SVMs, kernel PCA, etc.)
48 Statistical problems 48
49 49 Sparse underdetered linear systems Suppose that X R n p has unit normed columns, X i 2 = 1, for i = 1,... n. Given y R n, consider the problem of finding sparsest solution to linear system β β 0 subject to Xβ = y This is nonconvex and known to be NP hard, for a generic X. A natural convex relaxation is the l 1 basis pursuit problem: β β 1 subject to Xβ = y It turns out that there is a deep connection between the two; we cite results from Donoho (2006), For most large underdetered systems of linear equations, the imal l 1 norm solution is also the sparsest solution
50 50 As n, p grow large, p > n, there exists a threshold ρ (depending on the ratio p/n), such that for most matrices X, the following holds. If we solve the l 1 problem and find a solution with: fewer than ρn nonzero components, then we must have found the unique solution of the l 0 problem greater than ρn nonzero components, then there is no solution of the linear system with less than ρn nonzero components Here most can be quantified precisely via a uniform probability measure over matrices X with unit norm columns There is a large and fast-moving body of related literature. See Donoho et al. (2009), Message-passing algorithms for compressed sensing for a nice review
51 51 Exact low-rank matrix completion Given a matrix Y R n n, partially observed, over a set of indices Ω {1,..., n} 2. Consider the problem of finding the lowest-rank matrix matching Y on the observed set B rank(b) subject to B ij = Y ij, (i, j) Ω This is nonconvex. Natural convex relaxation: B B tr subject to B ij = Y ij, (i, j) Ω Under some assumptions, it can be shown that the solution to the convex problem is exactly equal to the solution to the nonconvex problem, with high probability over the sampling model See, e.g., Candes and Recht (2008), Exact matrix completion via convex optimization, and many papers since
52 Miscellaneous 52
53 53 Curvature doation Given convex f and nonconvex g, consider a problem x f(x) + g(x) For convex h, we can always rewrite this problem as x f(x) + g(x) h(x) + h(x) }{{} F (x) Sometimes, can choose h to make F smooth and strictly convex! How? Prove that 2 f(x) 2 (h g)(x) for all x This is apparently an old idea. See Parekh and Selesnick (2015), Convex fused lasso denoising with non-convex regularization and its use for pulse detection and references therein. (Related ideas on curvature doation from statistics: Zhang, Loh, Wainwright)
54 Setting of Parekh and Selesnick (2015) (simplified): n 1 1 β 2 y β λ φ a (β i β i+1 ) where φ a (x) = 1 a log(1 + a x ), for a > 0. This uses a nonconvex segmentation penalty whole problem appears to be nonconvex i= Fact: s a (x) = φ a (x) x is twice differentiable, strictly concave, with a s a(x) 0 for all x
55 55 Rewrite problem (denoting by D the first difference operator) as n 1 1 β 2 y ( β λ φa ([Dβ] i ) [Dβ] i ) +λ Dβ 1 i=1 }{{} F a(β) Now compute 2 F a (β) = I + λd T S a (Dβ)D where S a (x) = diag(s a(x 1 ),... s a(x n 1 )). Easy to check that D T S a (Dβ)D ad T D 4a using our previous fact, and λ max (D T D) = 4. Thus F a (β) and our whole problem is strictly convex provided 1 4aλ > 0, i.e., a < 1/(4λ)
56 From Parekh and Selesnick (2015): contour plots of criterion (b) G(x),a 0 =1/2,a 1 =1/3 a small enough a too large Fig. 1. Surface plots illustrating the convexity condition. (a)thefunction G(x) (15) is convex for a 0 =1/2 and a 1 =1/8. (b)thefunctiong(x) is not convex for a 0 =1/2 and a 1 =1/3 (these values violate Theorem 1). Fig. 2. R strictly co For the strict convexity of G, weneedtoensurethat 2 G is a 0 (0, 56
57 57 Gradient descent converges to imizers Given f twice continuously differentiable, dom(f) = R n, and initial point x (0) R n. Our old friend gradient descent, repeats: x (k) = x (k 1) t k f(x (k 1) ), k = 1, 2, 3,... When f has Lipschitz gradient, constant L > 0, but is nonconvex, can anything be said? A very old problem. Remarkable result from Lee et al. (2016), Gradient descent red converges to imizers : if step sizes are small enough t k 1/L, k = 1, 2, 3,..., and x (0) is drawn from any density over R n, then saddle points are unlikely limit points, i.e., ( ) P lim k x(k) = x = 0, for any isolated strict saddle point x of f Panageas and Piliouras (2016) have already relaxed isolated saddle point and global Lipschitz conditions. Much progress since...
Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization /36-725
Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization 10-725/36-725 1 Beyond the tip? 2 Some takeaway points If possible, formulate task in terms of convex optimization typically easier to solve,
More informationNonconvex? NP! (No Problem!) Lecturer: Ryan Tibshirani Convex Optimization /36-725
Nonconvex? NP! (No Problem!) Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: integer programg Given convex function f, convex set C, J {1,... n}, an integer program is a problem
More informationLecture 26: April 22nd
10-725/36-725: Conve Optimization Spring 2015 Lecture 26: April 22nd Lecturer: Ryan Tibshirani Scribes: Eric Wong, Jerzy Wieczorek, Pengcheng Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept.
More informationConvexity II: Optimization Basics
Conveity II: Optimization Basics Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 See supplements for reviews of basic multivariate calculus basic linear algebra Last time: conve sets and functions
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationLecture 4: September 12
10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationCanonical Problem Forms. Ryan Tibshirani Convex Optimization
Canonical Problem Forms Ryan Tibshirani Convex Optimization 10-725 Last time: optimization basics Optimization terology (e.g., criterion, constraints, feasible points, solutions) Properties and first-order
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationLecture 1: January 12
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLecture: Convex Optimization Problems
1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationLMI Methods in Optimal and Robust Control
LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise) What is Optimization? An Optimization Problem has 3 parts. x F f(x) :
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationDuality Uses and Correspondences. Ryan Tibshirani Convex Optimization
Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions
More informationConvex optimization problems. Optimization problem in standard form
Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More information10725/36725 Optimization Homework 4
10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationSemidefinite Programming
Semidefinite Programming Notes by Bernd Sturmfels for the lecture on June 26, 208, in the IMPRS Ringvorlesung Introduction to Nonlinear Algebra The transition from linear algebra to nonlinear algebra has
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationLecture 10: Duality in Linear Programs
10-725/36-725: Convex Optimization Spring 2015 Lecture 10: Duality in Linear Programs Lecturer: Ryan Tibshirani Scribes: Jingkun Gao and Ying Zhang Disclaimer: These notes have not been subjected to the
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationSupplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data
Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationCS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi
CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems Instructor: Shaddin Dughmi Outline 1 Convex Optimization Basics 2 Common Classes 3 Interlude: Positive Semi-Definite
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationDuality. Geoff Gordon & Ryan Tibshirani Optimization /
Duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g., consider
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationCSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization
CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More information8 Approximation Algorithms and Max-Cut
8 Approximation Algorithms and Max-Cut 8. The Max-Cut problem Unless the widely believed P N P conjecture is false, there is no polynomial algorithm that can solve all instances of an NP-hard problem.
More informationMatrix Rank Minimization with Applications
Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationLecture: Examples of LP, SOCP and SDP
1/34 Lecture: Examples of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationminimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,
4 Duality 4.1 Numerical perturbation analysis example. Consider the quadratic program with variables x 1, x 2, and parameters u 1, u 2. minimize x 2 1 +2x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More informationLearning From Data: Modelling as an Optimisation Problem
Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationarxiv: v1 [math.st] 10 Sep 2015
Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees Department of Statistics Yudong Chen Martin J. Wainwright, Department of Electrical Engineering and
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationQUADRATIC MAJORIZATION 1. INTRODUCTION
QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et
More information