Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization /36-725

Size: px
Start display at page:

Download "Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization /36-725"

Transcription

1 Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization /

2 Beyond the tip? 2

3 Some takeaway points If possible, formulate task in terms of convex optimization typically easier to solve, easier to analyze Nonconvex does not necessarily mean nonscientific! However, statistically, it does typically mean high(er) variance In more cases than you might expect, nonconvex problems can be solved exactly (to global optimality) 3

4 What does it mean for a problem to be nonconvex? Consider a generic convex optimization problem: min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j = 1,... r Here f, h i, i = 1,... m are convex, and l j, j = 1,... r are affine A nonconvex problem is one of this form, where not all conditions are met on the functions But trivial modifications of convex problems can lead to nonconvex formulations... so we really just consider nonconvex problems that are not trivially equivalent to convex ones 4

5 What does it mean to solve a nonconvex problem? Nonconvex problems can have local minima, i.e., there can exist a feasible x such that f(y) f(x) for all feasible y such that x y 2 R but x is still not globally optimal. (Note: we proved that this could not happen for convex problems) Hence by solving a nonconvex problem, we mean finding the global minimizer We also implicitly mean doing it efficiently, i.e., in polynomial time 5

6 Addendum This is really about putting together a list of cool problems, that are suprisingly tractable... hence there will be exceptions about nonconvexity and/or requiring exact global optima (Also, I m sure that there are many more examples out there that I m missing, so I invite you to contribute to the list!) 6

7 Outline Rough categories for today s problems: Classical/core nonconvex problems Eigen problems Graph problems Nonconvex proximal operators Discrete problems Infinite-dimensional problems Statistical problems 7

8 Classic/core nonconvex problems 8

9 Linear-fractional programs A linear-fractional program is of the form min x R n c T x + d e T x + f subject to Gx h, e T x + f > 0 Ax = b This is nonconvex (but quasiconvex). Provided that this problem is feasible, it is in fact equivalent to the linear program min y R n, z R c T y + dz subject to Gy hz 0, z 0 Ay bz = 0, e T y + fz = 1 9

10 10 The link between the two problems is the transformation y = x e T x + f, z = 1 e T x + f The proof of their equivlance is simple; e.g., see B & V Chapter 4 Linear-fractional problems show up in the study of solutions paths for many common statistical estimation problems E.g., the knots in the lasso path can be seen as the optimal values of linear-fractional programs See Taylor et al. (2013), Tests in adaptive regression via the Kac- Rice formula

11 Geometric programs A monomial is a function f : R n ++ R of the form f(x) = γx a 1 1 xa 2 2 xan n for γ > 0, a 1,... a n R. A posynomial is a sum of monomials, f(x) = p k=1 A geometric program of the form γ k x a k1 1 x a k2 2 x a kn n min subject to f(x) g i (x) 1, i = 1,... m h j (x) = 1, j = 1,... r where f, g i, i = 1,... m are posynomials and h j, j = 1,... r are monomials. This is nonconvex 11

12 12 This is equivalent to a convex problem, via a simple transformation. Given f(x) = γx a 1 1 xa 2 2 xan n, let y i = log x i and rewrite this as γ(e y 1 ) a 1 (e y 2 ) a2 (e yn ) an = e at y+b for b = log γ. Also, a posynomial can be written as p k=1 eat k y+b k. With this variable substitution, and after taking logs, a geometric program is equivalent to ( p0 ) min log e at 0k y+b 0k subject to log k=1 ( pi e at ik y+b ik k=1 ) c T j y + d j = 0, j = 1,... r 0, i = 1,... m This is convex, recalling the convexity of soft max functions

13 Many interesting problems are geometric programs; see Boyd et al. (2007), A tutorial on geometric programming, and also Chapter 8.8 of B & V book w i (x i,y i ) C i h i H W Extension to matrix world: Sra and Hosseini (2013), Geometric optimization on positive definite matrices with application to elliptically contoured distributions 13

14 14 Handling convex equality constraints Given convex f, h i, i = 1,... m, the problem min x subject to f(x) h i (x) 0, i = 1,... m l(x) = 0 is nonconvex when l is convex but not affine. A convex relaxation of this problem is min x subject to f(x) h i (x) 0, i = 1,... m l(x) 0 If we can ensure that l(x ) = 0 at any solution x of the above problem, then the two are equivalent

15 From B & V Exercises 4.6 and 4.58, e.g., consider the maximum utility problem T max α t u(x t ) x 0,...x T R b 1,...b T +1 R t=0 subject to b t+1 = b t + f(b t ) x t, t = 0,... T 0 x t b t, t = 0,... T where b 0 0 is fixed. Interpretation: x t is the amount spent of your total available money b t at time t; concave function u gives utility, concave function f measures investment return This is not a convex problem, because of the equality constraint; but can relax to b t+1 b t + f(b t ) x t, t = 0,... T without changing solution (think about throwing out money) 15

16 Problems with two quadratic functions Consider the problem involving two quadratics min x R n x T A 0 x + 2b T 0 x + c 0 subject to x T A 1 x + 2b T 1 x + c 1 0 Here A 0, A 1 need not be positive definite, so this is nonconvex. The dual problem can be cast as max u R, v R subject to u [ ] A0 + va 1 b 0 + vb 1 (b 0 + vb 1 ) T 0 c 0 + vc 1 u v 0 and (as always) is convex. Furthermore, strong duality holds. See Appendix B of B & V, see also Beck and Eldar (2006), Strong duality in nonconvex quadratic optimization with two quadratic constraints 16

17 Eigen problems 17

18 18 Principal component analysis Given a matrix Y R n p, consider the nonconvex problem min Y X R X 2 n p F subject to rank(x) = k for some fixed k. The solution here is given by the singular value decomposition of Y : if Y = UDV T, then ˆX = U k D k V T k, where U k, V k are the first k columns of U, V, and D k is the first k diagonal elements of D. I.e., ˆX is the reconstruction of Y from its first k principal components This is often called the Eckart-Young Theorem, established in 1936, but was probably known even earlier see Stewart (1992), On the early history of the singular value decomposition

19 19 Fantope Another characterization of the SVD is via the following nonconvex problem, given a symmetric matrix S R p p : min S Z R Z 2 p p F subject to rank(z) = k, Z is a projection The solution here is Ẑ = V kv T k, where the columns of V k R p k give the first k eigenvectors of S This is equivalent to a convex problem. Start by expressing the constraint set C as { } C = Z R p p : rank(z) = k, Z is a projection { = Z R p p : Z = Z T, λ i (Z) {0, 1} for i = 1,... p, } tr(z) = k

20 20 Now consider the convex hull F k = conv(c): { } F k = Z R p p : Z = Z T, λ i (Z) [0, 1], i = 1,... p, tr(z) = k { } = Z R p p : Z = Z T, 0 Z I, tr(z) = k This is called the Fantope of order k. Further, the convex problem min Z R p p S Z 2 F subject to Z F k admits the same solution as the original one, i.e., Ẑ = V kv T k See Fan (1949), On a theorem of Weyl conerning eigenvalues of linear transformations, and Overton and Womersley (1992), On the sum of the largest eigenvalues of a symmetric matrix Sparse PCA extension: Vu et al. (2013), Fantope projection and selection: near-optimal convex relaxation of sparse PCA

21 Classical multidimensional scaling Let x 1,... x n R p, and define similarities S ij = (x i x) T (x j x). Classical multidimensional scaling solves the nonconvex problem ( ) 2 S ij (z i z) T (z j z) for a fixed k min z 1,...z n R k i,j From Hastie et al. (2009), The elements of statistical learning 21

22 22 Let S be the similarity matrix (entries S ij = (x i x) T (x j x)) The classical MDS problem has an exact solution in terms of the eigendecomposition S = UD 2 U T : ẑ 1,... ẑ n are the rows of U k D k where U k is the first k columns of U, and D k the first k diagonal entries of D Note: other very similar forms of MDS are not convex, and not directly solveable, e.g., least squares scaling, with d ij = x i x j 2 : min z 1,...z n R k ( ) 2 dij z i z j 2 i,j See Hastie et al. (2009), Chapter 14

23 23 Generalized eigenvalue problems Given B, W R p p, B, W 0, consider the nonconvex problem max v R n v T Bv v T W v This is a generalized eigenvalue problem, with exact solution given by the top eigenvector of W 1 B This is important, e.g., in Fisher s discriminant analysis, where B is the between-class covariance matrix, and W the within-class covariance matrix See Hastie et al. (2009), Chapter 4

24 Graph problems 24

25 25 Min cut Given a graph G = (V, E) with V = {1,... n}, two nodes s, t V, and costs c ij 0 on edges (i, j) E. Min cut problem: min b R E,x R V subject to (i,j) E b ij c ij b ij x i x j b ij, x i, x j {0, 1} for all i, j, x s = 0, x t = 1 Think of b ij as the indicator that the edge (i, j) traverses the cut from s to t; think of x i as an indicator that node i is grouped with t. This nonconvex problem can be solved exactly using max flow (max flow/min cut theorem)

26 26 A relaxation of min cut min b R E,x R V subject to (i,j) E b ij c ij b ij x i x j for all i, j b 0 x s = 0, x t = 1 This is an LP; it is the dual of the max flow LP (see lecture 12): max f R E (s,j) E f sj subject to f ij 0, f ij c ij for all (i, j) E f ik = f kj for all k V \ {s, t} (i,k) E (k,j) E Max flow min cut theorem tells us that the relaxed min cut is tight

27 e P Shortest paths Given a graph G = (V, E), with edge costs c e, e E, consider the shortest path problem, between two nodes s, t V min c e min c e paths P P =(e 1,...e r) Dijkstra s algorithm solves this problem (and more), from Dijkstra (1959), A note on two problems in connexion with graphs e P subject to e 1,1 = s, e r,2 = t e i,2 = e i+1,1, i = 1,... r 1 Clever implementations run in O( E log V ) time; e.g., see Kleinberg and Tardos (2005), Algorithm design, Chapter 5 27

28 Nonconvex proximal operators 28

29 Hard-thresholding One of the simplest nonconvex problems, given y R n : min β R n n n (y i β i ) 2 + λ i 1{β i 0} i=1 i=1 Solution is given by hard-thresholding y, { y i if yi 2 β i = > λ i 0 otherwise, i = 1,... n and can be seen by inspection. Special case λ i = λ, i = 1,... n, min β R n y β λ β 0 Compare to soft-thresholding, prox operator for l 1 penalty. Note: changing the loss to y Xβ 2 2 gives best subset selection, which is NP hard for general X 29

30 30 l 0 segmentation Consider the nonconvex l 0 segmentation problem min β R n n n 1 (y i β i ) 2 + λ 1{β i β i+1 } i=1 i=1 Can be solved exactly using dynamic programming, in two ways: Bellman (1961), On the approximation of curves by line segments using dynamic programming, and Johnson (2013) A dynamic programming algorithm for the fused lasso and L 0 -segmentation Johnson: more efficient, Bellman: more general Worst-case O(n 2 ), but with practical performance more like O(n)

31 31 Tree-leaves projection Given target u R n, tree g on R n, and label y {0, 1}, consider min u z R z 2 n 2 + λ 1{g(z) y} Interpretation: find z close to u, whose label under g is not unlike y. Argue directly that solution is either ẑ = u or ẑ = P S (u), where S = g 1 (1) = {z : g(z) = y} the set of leaves of g assigned label y. We simply compute both options for ẑ and compare costs. Therefore problem reduces to computing P S (y), the projection onto a set of tree leaves, a highly nonconvex set This appears as a subroutine of a broader algorithm for nonconvex optimization; see Carreira-Perpinan and Wang (2012), Distributed optimization of deeply nested systems

32 The set S is a union of axis-aligned boxes; projection onto any one box is fast, O(n) operations 32

33 33 To project onto S, could just scan through all boxes, and take the closest Faster: decorate each node of tree with labels of its leaves, and bounding box. Perform depthfirst search, pruning nodes that do not contain a leaf labeled y, or whose bounding box is farther away than the current closest box

34 Discrete problems 34

35 35 Binary graph segmentation Given y R n, and a graph G = (V, E), V = {1,... n}, consider binary graph segmentation: min β {0,1} n n (y i β i ) 2 + λ ij 1{β i β j } i=1 (i,j) E Simple manipulation brings this problem to the form max a i + b j A {1,...n} i A j A c (i,j) E, A {i,j} =1 which is a segmentation problem that can be solved exactly using min cut/max flow. E.g., Kleinberg and Tardos (2005), Algorithm design, Chapter 7 λ ij

36 36 E.g., apply recursively to get a verison of graph hierarchical clustering (divisive) E.g., take the graph as a 2d grid for image segmentation (From ac.kr)

37 37 Discrete l 0 segmentation Now consider discrete l 0 segmentation: min β {b 1,...b k } n n n 1 (y i β i ) 2 + λ 1{β i β i+1 } i=1 where {b 1,... b k } is some fixed discrete set. This can be efficiently solved using classic (discrete) dynamic programming Key insight is that the 1-dimensional structure allows us to exactly solve and store ˆβ 1 (β 2 ) = ˆβ 2 (β 3 ) =... argmin β 1 {b 1,...b k } argmin β 2 {b 1,...b k } i=1 (y 1 β 1 ) 2 + λ β 1 β 2 }{{} f 1 (β 1,β 2 ) f 1 ( ˆβ1 (β 2 ), β 2 ) + (y2 β 2 ) 2 + λ β 2 β 3

38 38 Algorithm: Make a forward pass over β 1,... β n 1, keeping a look-up table; also keep a look-up table for the optimal partial criterion values f 1,... f n 1 Solve exactly for β n Make a backward pass β n 1,... β 1, reading off the look-up table β 1 β 2... β n 1 f 1 f 2... f n 1 b 1 b 2... b k b 1 b 2... b k Requires O(nk) operations

39 Infinite-dimensional problems 39

40 Smoothing splines Given pairs (x i, y i ) R R, i = 1,... n, smoothing splines solve min f n i=1 ( yi f(x i ) ) 2 + λ (f ( k+1 2 ) (t) ) 2 dt for a fixed odd k. The domain of minimization here is all functions f for which (f ( k+1 2 ) (t)) 2 dt <. Infinite-dimensional problem, but convex (in function space) Can show that the solution ˆf to the above problem is unique, and given by a natural spline of order k, with knots at x 1,... x n. This means we can restrict our attention to functions f = n θ j η j j=1 where η 1,... η n are natural spline basis functions 40

41 41 Plugging in f = n j=1 θ jη j, transform smoothing spline problem into finite-dimensional form: min θ R n y Nθ λθ T Ωθ where N ij = η j (x i ), and Ω ij = η ( k+1 2 ) i (t) η ( k+1 j solution is explicitly given by ˆθ = (N T N + λω) 1 N T y 2 ) (t) dt. The and fitted function is ˆf = n j=1 ˆθ j η j. With proper choice of basis function (B-splines), calculation of ˆθ is O(n) See, e.g., Wahba (1990), Splines models for observational data ; Green and Silverman (1994), Nonparametric regression and generalized linear models ; Hastie et al. (2009), Chapter 5

42 42 Locally adaptive regression splines Given same setup, locally adaptive regression splines solve min f n ( yi f(x i ) ) 2 + λ TV(f (k) ) i=1 for fixed k, even or odd. The domain is all f with TV(f (k) ) <, and again this is infinite-dimensional but convex Again, can show that a solution ˆf to above problem is given by a spline of order k, but two key differences: Can have any number of knots n k 1 (tuned by λ) Knots do not necessarily coincide with input points x 1,... x n See Mammen and van de Geer (1997), Locally adaptive regression splines ; in short, these are statistically more adaptive but computationally more challenging than smoothing splines

43 43 Mammen and van de Geer (1997) consider restricting attention to splines with knots contained in {x 1,... x n }; this turns the problem into finite-dimensional form, min θ R n y Gθ λ n j=k+2 θ j where G ij = g j (x i ), and g 1,... g n is a basis for splines with knots at x 1,... x n. The fitted function is ˆf = n j=1 ˆθ j g j These authors prove that the solution of this (tractable) problem ˆf and of the original problem f differ by max x [x 1,x n] ˆf(x) f (x) d k TV ( (f ) (k)) k with the maximum gap between inputs. Therefore, statistically it is reasonable to solve the finite-dimensional problem

44 E.g., a comparison, tuned to the same overall model complexity: Smoothing spline Finite-dimensional locally adaptive regression spline The left fit is easier to compute, but the right is more adaptive (Note: trend filtering estimates are asymptotically equivalent to locally adaptive regression splines, but much more efficient) 44

45 Statistical problems 45

46 46 Sparse underdetermined linear systems Suppose that X R n p has unit normed columns, X i 2 = 1, for i = 1,... n. Given y, consider the problem of finding the sparsest sparse linear solution min β β R p 0 subject to Xβ = y This is nonconvex and known to be NP hard, for a generic X. A natural convex relaxation is the l 1 basis pursuit problem: min β β R p 1 subject to Xβ = y It turns out that there is a deep connection between the two; we cite results from Donoho (2006), For most large underdetermined systems of linear equations, the minimal l 1 norm solution is also the sparsest solution

47 47 As n, p grow large, p > n, there exists a threshold ρ (depending on the ratio p/n), such that for most matrices X, if we solve the l 1 problem and find a solution with: fewer than ρn nonzero components, then this is the unique solution of the l 0 problem greater than ρn nonzero components, then there is no solution of the linear system with less than ρn nonzero components (Here most is quantified precisely in terms of a probability over matrices X, constructed by drawing columns of X uniformly at random over the unit sphere in R n ) There is a large and fast-moving body of related literature. See Donoho et al. (2009), Message-passing algorithms for compressed sensing for a nice review

48 48 Nearly optimal K-means Given data points x 1,... x n R p, the K-means problem solves 1 min c 1,...c K R p n n i=1 min k=1,...k x i c k 2 2 } {{ } f(c 1,...c K ) This is NP hard, and is usually approximately solved using Lloyd s algorithm, run many times, with random starts Careful choice of starting positions makes a big impact: running Lloyd s algorithm once, from c 1 = s 1,... c K = s K, for cleverly chosen random s 1,... s K, yields estimates ĉ 1,... ĉ K satisfying E [ f(ĉ 1,... ĉ K ) ] 8(log k + 2) min f(c 1,... c K ) c 1,...c K R p

49 49 See Arthur and Vassilvitskii (2007), k-means++: The advantages of careful seeding. In fact, their construction of s 1,... s K is very simple: Begin by choosing s 1 uniformly at random among x 1,... x n Compute squared distances d 2 i = x i s for all points i not chosen, and choose s 2 by drawing from the remaining points, with probability weights d 2 i / j d2 j Recompute the squared distances as d 2 i = min { x i s 1 2 2, x i s 2 2 } 2 and choose s 3 according to the same recipe And so on, until s 1,... s K are chosen

Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization

Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization 10-725 1 Outline Today: Convex versus nonconvex? Classical nonconvex problems Eigen problems Graph problems Nonconvex proximal operators

More information

Nonconvex? NP! (No Problem!) Lecturer: Ryan Tibshirani Convex Optimization /36-725

Nonconvex? NP! (No Problem!) Lecturer: Ryan Tibshirani Convex Optimization /36-725 Nonconvex? NP! (No Problem!) Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: integer programg Given convex function f, convex set C, J {1,... n}, an integer program is a problem

More information

Lecture 26: April 22nd

Lecture 26: April 22nd 10-725/36-725: Conve Optimization Spring 2015 Lecture 26: April 22nd Lecturer: Ryan Tibshirani Scribes: Eric Wong, Jerzy Wieczorek, Pengcheng Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

Convexity II: Optimization Basics

Convexity II: Optimization Basics Conveity II: Optimization Basics Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 See supplements for reviews of basic multivariate calculus basic linear algebra Last time: conve sets and functions

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Duality. Geoff Gordon & Ryan Tibshirani Optimization / Duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g., consider

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems. Instructor: Shaddin Dughmi CS675: Convex and Combinatorial Optimization Fall 2016 Convex Optimization Problems Instructor: Shaddin Dughmi Outline 1 Convex Optimization Basics 2 Common Classes 3 Interlude: Positive Semi-Definite

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Lecture 4: September 12

Lecture 4: September 12 10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1, 4 Duality 4.1 Numerical perturbation analysis example. Consider the quadratic program with variables x 1, x 2, and parameters u 1, u 2. minimize x 2 1 +2x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Lecture 10: Duality in Linear Programs

Lecture 10: Duality in Linear Programs 10-725/36-725: Convex Optimization Spring 2015 Lecture 10: Duality in Linear Programs Lecturer: Ryan Tibshirani Scribes: Jingkun Gao and Ying Zhang Disclaimer: These notes have not been subjected to the

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Practical Linear Algebra: A Geometry Toolbox

Practical Linear Algebra: A Geometry Toolbox Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Matrix Rank Minimization with Applications

Matrix Rank Minimization with Applications Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

Gaussian Elimination and Back Substitution

Gaussian Elimination and Back Substitution Jim Lambers MAT 610 Summer Session 2009-10 Lecture 4 Notes These notes correspond to Sections 31 and 32 in the text Gaussian Elimination and Back Substitution The basic idea behind methods for solving

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

arxiv: v2 [stat.ml] 27 Oct 2014

arxiv: v2 [stat.ml] 27 Oct 2014 The Falling Factorial Basis and Its Statistical Applications arxiv:45558v2 statml 27 Oct 24 Yu-Xiang Wang yuxiangw@cscmuedu Alex Smola alex@smolaorg Ryan J Tibshirani, ryantibs@statcmuedu Machine Learning

More information

Sparse PCA in High Dimensions

Sparse PCA in High Dimensions Sparse PCA in High Dimensions Jing Lei, Department of Statistics, Carnegie Mellon Workshop on Big Data and Differential Privacy Simons Institute, Dec, 2013 (Based on joint work with V. Q. Vu, J. Cho, and

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

16.1 L.P. Duality Applied to the Minimax Theorem

16.1 L.P. Duality Applied to the Minimax Theorem CS787: Advanced Algorithms Scribe: David Malec and Xiaoyong Chai Lecturer: Shuchi Chawla Topic: Minimax Theorem and Semi-Definite Programming Date: October 22 2007 In this lecture, we first conclude our

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Graph Functional Methods for Climate Partitioning

Graph Functional Methods for Climate Partitioning Graph Functional Methods for Climate Partitioning Mathilde Mougeot - with D. Picard, V. Lefieux*, M. Marchand* Université Paris Diderot, France *Réseau Transport Electrique (RTE) Buenos Aires, 2015 Mathilde

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016

ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016 ORF 523 Lecture 9 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, March 10, 2016 When in doubt on the accuracy of these notes, please cross check with the instructor

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Relaxations and Randomized Methods for Nonconvex QCQPs

Relaxations and Randomized Methods for Nonconvex QCQPs Relaxations and Randomized Methods for Nonconvex QCQPs Alexandre d Aspremont, Stephen Boyd EE392o, Stanford University Autumn, 2003 Introduction While some special classes of nonconvex problems can be

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

Convex Optimization / Homework 1, due September 19

Convex Optimization / Homework 1, due September 19 Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

Lecture 24 May 30, 2018

Lecture 24 May 30, 2018 Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

2.2. Show that U 0 is a vector space. For each α 0 in F, show by example that U α does not satisfy closure.

2.2. Show that U 0 is a vector space. For each α 0 in F, show by example that U α does not satisfy closure. Hints for Exercises 1.3. This diagram says that f α = β g. I will prove f injective g injective. You should show g injective f injective. Assume f is injective. Now suppose g(x) = g(y) for some x, y A.

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information