Convex Optimization M2

Size: px
Start display at page:

Download "Convex Optimization M2"

Transcription

1 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57

2 Applications A. d Aspremont. Convex Optimization M2. 2/57

3 Outline Geometrical problems Approximation problems Combinatorial optimization Statistics A. d Aspremont. Convex Optimization M2. 3/57

4 Geometrical problems A. d Aspremont. Convex Optimization M2. 4/57

5 Linear discrimination separate two sets of points {x 1,..., x N }, {y 1,..., y M } by a hyperplane: a T x i + b > 0, i = 1,..., N, a T y i + b < 0, i = 1,..., M homogeneous in a, b, hence equivalent to a T x i + b 1, i = 1,..., N, a T y i + b 1, i = 1,..., M a set of linear inequalities in a, b A. d Aspremont. Convex Optimization M2. 5/57

6 Robust linear discrimination (Euclidean) distance between hyperplanes H 1 = {z a T z + b = 1} H 2 = {z a T z + b = 1} is dist(h 1, H 2 ) = 2/ a 2 to separate two sets of points by maximum margin, minimize (1/2) a 2 subject to a T x i + b 1, i = 1,..., N a T y i + b 1, i = 1,..., M (1) (after squaring objective) a QP in a, b A. d Aspremont. Convex Optimization M2. 6/57

7 Lagrange dual of maximum margin separation problem maximize 1 T λ + 1 T µ subject to 2 N i=1 λ ix i M i=1 µ 2 iy i 1 1 T λ = 1 T µ, λ 0, µ 0 from duality, optimal value is inverse of maximum margin of separation (2) interpretation change variables to θ i = λ i /1 T λ, γ i = µ i /1 T µ, t = 1/(1 T λ + 1 T µ) invert objective to minimize 1/(1 T λ + 1 T µ) = t minimize subject to t N i=1 θ ix i M i=1 γ 2 iy i t θ 0, 1 T θ = 1, γ 0, 1 T γ = 1 optimal value is distance between convex hulls A. d Aspremont. Convex Optimization M2. 7/57

8 Approximate linear separation of non-separable sets minimize 1 T u + 1 T v subject to a T x i + b 1 u i, i = 1,..., N a T y i + b 1 + v i, i = 1,..., M u 0, v 0 an LP in a, b, u, v at optimum, u i = max{0, 1 a T x i b}, v i = max{0, 1 + a T y i + b} can be interpreted as a heuristic for minimizing #misclassified points A. d Aspremont. Convex Optimization M2. 8/57

9 Support vector classifier minimize a 2 + γ(1 T u + 1 T v) subject to a T x i + b 1 u i, i = 1,..., N a T y i + b 1 + v i, i = 1,..., M u 0, v 0 produces point on trade-off curve between inverse of margin 2/ a 2 and classification error, measured by total slack 1 T u + 1 T v same example as previous page, with γ = 0.1: A. d Aspremont. Convex Optimization M2. 9/57

10 Support Vector Machines: Duality Given m data points x i R n with labels y i { 1, 1}. The maximum margin classification problem can be written minimize 1 2 w C1 T z subject to y i (w T x i ) 1 z i, i = 1,..., m z 0 in the variables w, z R n, with parameter C > 0. We can set w = (w, 1) and increase the problem dimension by 1. So we can assume w.l.o.g. b = 0 in the classifier w T x i + b. The Lagrangian is written L(w, z, α) = 1 2 w C1 T z + m α i (1 z i y i w T x i ) i=1 with dual variable α R m +. A. d Aspremont. Convex Optimization M2. 10/57

11 Support Vector Machines: Duality The Lagrangian can be rewritten L(w, z, α) = 1 m 2 w 2 α i y i x i i=1 2 m 2 α i y i x i + (C1 α) T z + 1 T α i=1 2 with dual variable α R n +. Minimizing in (w, z) we form the dual problem maximize subject to 1 2 m i=1 α iy i x i T α 0 α C At the optimum, we must have w = m α i y i x i and α i = C if z i > 0 i=1 (this is the representer theorem). A. d Aspremont. Convex Optimization M2. 11/57

12 Support Vector Machines: the kernel trick If we write X the data matrix with columns x i, the dual can be rewritten maximize subject to 1 2 αt diag(y)x T X diag(y)α + 1 T α 0 α C This means that the data only appears in the dual through the gram matrix which is called the kernel matrix. K = X T X In particular, the original dimension n does not appear in the dual. SVM complexity only grows with the number of samples. In particular, the x i are allowed to be infinite dimensional. The only requirement on K is that K 0. A. d Aspremont. Convex Optimization M2. 12/57

13 Approximation problems A. d Aspremont. Convex Optimization M2. 13/57

14 Norm approximation minimize Ax b (A R m n with m n, is a norm on R m ) interpretations of solution x = argmin x Ax b : geometric: Ax is point in R(A) closest to b estimation: linear measurement model y = Ax + v y are measurements, x is unknown, v is measurement error given y = b, best guess of x is x optimal design: x are design variables (input), Ax is result (output) x is design that best approximates desired result b A. d Aspremont. Convex Optimization M2. 14/57

15 examples least-squares approximation ( 2 ): solution satisfies normal equations (x = (A T A) 1 A T b if Rank A = n) A T Ax = A T b Chebyshev approximation ( ): can be solved as an LP minimize subject to t t1 Ax b t1 sum of absolute residuals approximation ( 1 ): can be solved as an LP minimize subject to 1 T y y Ax b y A. d Aspremont. Convex Optimization M2. 15/57

16 Penalty function approximation minimize φ(r 1 ) + + φ(r m ) subject to r = Ax b (A R m n, φ : R R is a convex penalty function) examples quadratic: φ(u) = u 2 deadzone-linear with width a: log barrier quadratic φ(u) = max{0, u a} φ(u) 1 deadzone-linear log-barrier with limit a: φ(u) = { a 2 log(1 (u/a) 2 ) u < a otherwise u A. d Aspremont. Convex Optimization M2. 16/57

17 example (m = 100, n = 30): histogram of residuals for penalties φ(u) = u, φ(u) = u 2, φ(u) = max{0, u a}, φ(u) = log(1 u 2 ) 40 p = 2 p = Log barrier Deadzone r shape of penalty function has large effect on distribution of residuals A. d Aspremont. Convex Optimization M2. 17/57

18 Huber penalty function (with parameter M) φ hub (u) = { u 2 u M M(2 u M) u > M linear growth for large u makes approximation less sensitive to outliers 2 20 φhub(u) f(t) u t left: Huber penalty for M = 1 right: affine function f(t) = α + βt fitted to 42 points t i, y i (circles) using quadratic (dashed) and Huber (solid) penalty A. d Aspremont. Convex Optimization M2. 18/57

19 Combinatorial problems A. d Aspremont. Convex Optimization M2. 19/57

20 Nonconvex Problems Nonconvexity makes problems essentially untractable... sometimes the result of bad problem formulation however, often arises because of some natural limitation: fixed transaction costs, binary communications,... What can be done?... we will use convex optimization results to: find bounds on the optimal value, by relaxation get good feasible points via randomization A. d Aspremont. Convex Optimization M2. 20/57

21 Nonconvex Problems Focus here on a specific class of problems, general QCQPs, written minimize x T P 0 x + q0 T x + r 0 subject to x T P i x + qi T x + r i 0, i = 1,..., m if all P i are p.s.d., this is a convex problem... so here, we suppose at least one P i is not p.s.d. A. d Aspremont. Convex Optimization M2. 21/57

22 Example: Boolean Least Squares Boolean least-squares problem: minimize Ax b 2 subject to x 2 i = 1, i = 1,..., n basic problem in digital communications could check all 2 n possible values of x... an NP-hard problem, and very hard in practice many heuristics for approximate solution A. d Aspremont. Convex Optimization M2. 22/57

23 Example: Partitioning Problem Two-way partitioning problem described in of the textbook minimize x T W x subject to x 2 i = 1, i = 1,..., n where W S n, with W ii = 0. a feasible x corresponds to the partition {1,..., n} = {i x i = 1} {i x i = 1} the matrix coefficient W ij can be interpreted as the cost of having the elements i and j in the same partition. the objective is to find the partition with least total cost classic particular instance: MAXCUT (W ij 0) A. d Aspremont. Convex Optimization M2. 23/57

24 Convex Relaxation the original QCQP minimize x T P 0 x + q0 T x + r 0 subject to x T P i x + qi T x + r i 0, i = 1,..., m can be rewritten minimize Tr(XP 0 ) + q0 T x + r 0 subject to Tr(XP i ) + qi T x + r i 0, i = 1,..., m X xx T Rank(X) = 1 the only nonconvex constraint is now Rank(X) = 1... A. d Aspremont. Convex Optimization M2. 24/57

25 Convex Relaxation: Semidefinite Relaxation we can directly relax this last constraint, i.e. drop the nonconvex Rank(X) = 1 to keep only X xx T the resulting program gives a lower bound on the optimal value minimize Tr(XP 0 ) + q0 T x + r 0 subject to Tr(XP i ) + qi T x + r i 0, i = 1,..., m X xx T Tricky... Can be improved? A. d Aspremont. Convex Optimization M2. 25/57

26 Lagrangian Relaxation Start from the original problem minimize x T P 0 x + q0 T x + r 0 subject to x T P i x + qi T x + r i 0, i = 1,..., m form the Lagrangian L(x, λ) = x T (P 0 + ) m λ i P i x + ( q 0 + ) T m λ i q i x + r 0 + m λ i r i i=1 i=1 i=1 in the variables x R n and λ R m +... A. d Aspremont. Convex Optimization M2. 26/57

27 Lagrangian Relaxation: Lagrangian the dual can be computed explicitly as an (unconstrained) quadratic minimization problem inf x R xt P x + q T x + r = { r 1 4 qt P q, if P 0 and q R(P ), otherwise so inf x L(x, λ) = 1 4 (q 0 + m i=1 λ iq i ) T (P 0 + m i=1 λ ip i ) (q 0 + m i=1 λ iq i ) + m i=1 λ ir i + r 0 where we recognize a Schur complement... A. d Aspremont. Convex Optimization M2. 27/57

28 Lagrangian Relaxation: Dual the dual of the QCQP is then given by maximize subject to γ + m [ i=1 λ ir i + r 0 (P 0 + m i=1 λ ip i ) (q 0 + m i=1 λ iq i ) /2 (q 0 + m i=1 λ iq i ) T /2 γ λ i 0, i = 1,..., m ] 0 which is a semidefinite program in the variable λ R m and can be solved efficiently Let us look at what happens when we use semidefinite duality to compute the dual of this last program (bidual of the original problem)... A. d Aspremont. Convex Optimization M2. 28/57

29 Lagrangian Relaxation: Bidual Taking the dual again, we get an SDP is given by minimize Tr(XP 0 ) + q0 T x + r 0 subject to Tr(XP [ i ) + ] qi T x + r i 0, i = 1,..., m X x T 0 x 1 in the variables X S n and x R n this is a convexification of the original program we have recovered the semidefinite relaxation in an automatic way A. d Aspremont. Convex Optimization M2. 29/57

30 Lagrangian Relaxation: Boolean LS An example: boolean lest squares Using this technique, we can relax the original Boolean LS problem minimize Ax b 2 subject to x 2 i = 1, i = 1,..., n and relax it as an SDP minimize subject to [ Tr(AX) + ] 2b T Ax + b T b X x T 0 x 1 X ii = 1, i = 1,..., n, this program then produces a lower bound on the optimal value of the original Boolean LS program A. d Aspremont. Convex Optimization M2. 30/57

31 Lagrangian Relaxation: Partitioning the partitioning problem defined above is minimize x T W x subject to x 2 i = 1, i = 1,..., n the variable x disappears from the relaxation, which becomes: minimize Tr(W X) subject to X 0 X ii = 1, i = 1,..., n Feasible points? Lagrangian relaxations only provide lower bounds on the optimal value how can we compute good feasible points? can we measure how suboptimal this lower bound is? A. d Aspremont. Convex Optimization M2. 31/57

32 Randomization The original QCQP was relaxed into minimize x T P 0 x + q0 T x + r 0 subject to x T P i x + qi T x + r i 0, i = 1,..., m minimize Tr(XP 0 ) + q0 T x + r 0 subject to Tr(XP [ i ) + ] qi T x + r i 0, i = 1,..., m X x T 0 x 1 the last (Schur complement) constraint is equivalent to X xx T 0 hence, if x and X are the solution to the relaxed program, then X xx T is a covariance matrix... A. d Aspremont. Convex Optimization M2. 32/57

33 Randomization For the problem minimize Tr(XP 0 ) + q0 T x + r 0 subject to Tr(XP [ i ) + ] qi T x + r i 0, i = 1,..., m X x T 0 x 1 pick y as a Gaussian variable with y N (x, X xx T ) y will solve the QCQP on average over this distribution in other words, with E[yy T ] = E[(y x)(y x) T ] xx T = X, which means E[y T P y] = Tr(P X) and the problem above becomes minimize E[y T P 0 y + q0 T y + r 0 ] subject to E[y T P i y + qi T y + r i] 0, i = 1,..., m a good feasible point can then be obtained by sampling enough y... A. d Aspremont. Convex Optimization M2. 33/57

34 Bounds on suboptimality In certain particular cases, it is possible to get a hard bound on the gap between the optimal value and the relaxation result A classic example is that of the MAXCUT bound The MAXCUT problem is a particular case of the partitioning problem maximize x T W x subject to x 2 i = 1, i = 1,..., n with W 0, its Lagrangian relaxation is computed as maximize Tr(W X) subject to X 0 X ii = 1, i = 1,..., n A. d Aspremont. Convex Optimization M2. 34/57

35 Bounds on suboptimality: MAXCUT Let X be a solution to this program We look for a feasible point by sampling a normal distribution N (0, X) We convert each sample point x to a feasible point by rounding it to the nearest value in { 1, 1}, i.e. taking ˆx = sgn(x) Crucially, when ˆx is sampled using that procedure, the expected value of the objective E[ˆx T W ˆx] can be computed explicitly E[ˆx T W ˆx] = 2 π n i,j=1 W ij arcsin(x ij ) = 2 π Tr(W arcsin(x)) A. d Aspremont. Convex Optimization M2. 35/57

36 Bounds on suboptimality: MAXCUT We are guaranteed to reach this expected value 2 π Tr(W arcsin(x)) after sampling a few (feasible) points ˆx Hence we know that the optimal value OP T of the MAXCUT problem satisfies 2 π Tr(W arcsin(x)) OP T Tr(W X) Furthermore, with X arcsin(x), we can simplify (and relax) the above expression to get 2 π Tr(W X) OP T Tr(W X) the procedure detailed above guarantees that we can find a feasible point at most 2/π suboptimal A. d Aspremont. Convex Optimization M2. 36/57

37 Numerical Example: Boolean LS Boolean least-squares problem minimize Ax b 2 subject to x 2 i = 1, i = 1,..., n with Ax b 2 = x T A T Ax 2b T Ax + b T b = Tr A T AX 2b T A T x + b T b where X = xx T, hence can express BLS as minimize Tr A T AX 2b T Ax + b T b subject to X ii = 1, X xx T, rank(x) = 1... still a very hard problem A. d Aspremont. Convex Optimization M2. 37/57

38 SDP relaxation for BLS using Lagrangian relaxation, remember: X xx T [ X x x T 1 ] 0 we obtained the SDP relaxation (with variables X, x) minimize Tr A T AX 2b T A T x + b T b [ ] X x subject to X ii = 1, x T 0 1 optimal value of SDP gives lower bound for BLS if optimal matrix is rank one, we re done A. d Aspremont. Convex Optimization M2. 38/57

39 Interpretation via randomization can think of variables X, x in SDP relaxation as defining a normal distribution z N (x, X xx T ), with E z 2 i = 1 SDP objective is E Az b 2 suggests randomized method for BLS: find X opt, x opt, optimal for SDP relaxation generate z from N (x opt, X opt x opt x optt ) take x = sgn(z) as approximate solution of BLS (can repeat many times and take best one) A. d Aspremont. Convex Optimization M2. 39/57

40 Example (randomly chosen) parameters A R , b R 150 x R 100, so feasible set has points LS approximate solution: minimize Ax b s.t. x 2 = n, then round yields objective 8.7% over SDP relaxation bound randomized method: (using SDP optimal distribution) best of 20 samples: 3.1% over SDP bound best of 1000 samples: 2.6% over SDP bound A. d Aspremont. Convex Optimization M2. 40/57

41 Example: Partitioning Problem SDP bound LS solution frequency Ax b /(SDP bound) A. d Aspremont. Convex Optimization M2. 41/57

42 Example: Partitioning Problem we go back now to the two-way partitioning problem considered in exercise 5.39 of the textbook: minimize x T W x subject to x 2 i = 1, i = 1,..., n the Lagrange dual of this problem is given by the SDP: maximize 1 T ν subject to W + diag(ν) 0 A. d Aspremont. Convex Optimization M2. 42/57

43 Example: Partitioning the dual of this SDP is the new SDP minimize Tr W X subject to X 0 X ii = 1, i = 1,..., n the solution X opt gives a lower bound on the optimal value p opt of the partitioning problem solve this SDP to find X opt (and the bound p opt ) let v denote an eigenvector of X opt associated with its largest eigenvalue now let ˆx = sgn(v) the vector ˆx is our guess for a good partition A. d Aspremont. Convex Optimization M2. 43/57

44 Partitioning: Randomization we generate independent samples x (1),..., x (K) from a normal distribution with zero mean and covariance X opt for each sample we consider the heuristic approximate solution ˆx (k) = sgn(x (k) ) we then take the one with lowest cost We compare the performance of these methods on a randomly chosen problem the optimal SDP lower bound p opt is equal to 1641 the simple sign(x) heuristic gives a partition with total cost 1280 exactly what the optimal value is, we can t say; all we can say at this point is that it is between 1641 and 1280 A. d Aspremont. Convex Optimization M2. 44/57

45 Partitioning: Numerical Example 120 Partitioning histogram of the objective obtained by the randomized heuristic, over 1000 samples: the minimum value reached here is 1328 A. d Aspremont. Convex Optimization M2. 45/57

46 Partitioning: Numerical Example Objective value Number of samples we re not sure what the optimal cost is, but now we know it s between 1641 and 1328 A. d Aspremont. Convex Optimization M2. 46/57

47 Applications in Statistics A. d Aspremont. Convex Optimization M2. 47/57

48 Parametric distribution estimation distribution estimation problem: estimate probability density p(y) of a random variable from observed values parametric distribution estimation: choose from a family of densities p x (y), indexed by a parameter x maximum likelihood estimation maximize (over x) log p x (y) y is observed value l(x) = log p x (y) is called log-likelihood function can add constraints x C explicitly, or define p x (y) = 0 for x C a convex optimization problem if log p x (y) is concave in x for fixed y A. d Aspremont. Convex Optimization M2. 48/57

49 Linear measurements with IID noise linear measurement model y i = a T i x + v i, i = 1,..., m x R n is vector of unknown parameters v i is IID measurement noise, with density p(z) y i is measurement: y R m has density p x (y) = m i=1 p(y i a T i x) maximum likelihood estimate: any solution x of maximize l(x) = m i=1 log p(y i a T i x) (y is observed value) A. d Aspremont. Convex Optimization M2. 49/57

50 examples Gaussian noise N (0, σ 2 ): p(z) = (2πσ 2 ) 1/2 e z2 /(2σ 2), l(x) = m 2 log(2πσ2 ) 1 2σ 2 m (a T i x y i ) 2 i=1 ML estimate is LS solution Laplacian noise: p(z) = (1/(2a))e z /a, l(x) = m log(2a) 1 a m a T i x y i i=1 ML estimate is l 1 -norm solution uniform noise on [ a, a]: l(x) = { m log(2a) a T i x y i a, i = 1,..., m otherwise ML estimate is any x with a T i x y i a A. d Aspremont. Convex Optimization M2. 50/57

51 Logistic regression random variable y {0, 1} with distribution p = Prob(y = 1) = exp(at u + b) 1 + exp(a T u + b) a, b are parameters; u R n are (observable) explanatory variables estimation problem: estimate a, b from m observations (u i, y i ) log-likelihood function (for y 1 = = y k = 1, y k+1 = = y m = 0): l(a, b) = log k i=1 exp(a T u i + b) 1 + exp(a T u i + b) m i=k exp(a T u i + b) = k (a T u i + b) i=1 m log(1 + exp(a T u i + b)) i=1 concave in a, b A. d Aspremont. Convex Optimization M2. 51/57

52 example (n = 1, m = 50 measurements) Prob(y = 1) u circles show 50 points (u i, y i ) solid curve is ML estimate of p = exp(au + b)/(1 + exp(au + b)) A. d Aspremont. Convex Optimization M2. 52/57

53 Experiment design m linear measurements y i = a T i x + w i, i = 1,..., m of unknown x R n measurement errors w i are IID N (0, 1) ML (least-squares) estimate is ˆx = ( m a i a T i i=1 ) 1 m y i a i i=1 error e = ˆx x has zero mean and covariance E = E ee T = ( m a i a T i i=1 ) 1 confidence ellipsoids are given by {x (x ˆx) T E 1 (x ˆx) β} experiment design: choose a i {v 1,..., v p } (a set of possible test vectors) to make E small A. d Aspremont. Convex Optimization M2. 53/57

54 vector optimization formulation ) 1 minimize (w.r.t. S n +) E = ( p k=1 m kv k vk T subject to m k 0, m m p = m m k Z variables are m k (# vectors a i equal to v k ) difficult in general, due to integer constraint relaxed experiment design assume m p, use λ k = m k /m as (continuous) real variable minimize (w.r.t. S n +) E = (1/m) ( p k=1 λ kv k vk T subject to λ 0, 1 T λ = 1 ) 1 common scalarizations: minimize log det E, Tr E, λ max (E),... can add other convex constraints, e.g., bound experiment cost c T λ B A. d Aspremont. Convex Optimization M2. 54/57

55 Experiment design D-optimal design minimize log det ( p k=1 λ kv k vk T subject to λ 0, 1 T λ = 1 ) 1 interpretation: minimizes volume of confidence ellipsoids dual problem maximize log det W + n log n subject to vk T W v k 1, k = 1,..., p interpretation: {x x T W x 1} is minimum volume ellipsoid centered at origin, that includes all test vectors v k complementary slackness: for λ, W primal and dual optimal λ k (1 v T k W v k ) = 0, k = 1,..., p optimal experiment uses vectors v k on boundary of ellipsoid defined by W A. d Aspremont. Convex Optimization M2. 55/57

56 Experiment design example (p = 20) λ 1 = 0.5 λ 2 = 0.5 design uses two vectors, on boundary of ellipse defined by optimal W A. d Aspremont. Convex Optimization M2. 56/57

57 Experiment design Derivation of dual. first reformulate primal problem with new variable X minimize subject to L(X, λ, Z, z, ν) = log det X 1 +Tr log det X 1 X = p k=1 λ kv k vk T, λ 0, 1T λ = 1 ( Z ( X )) p λ k v k vk T z T λ+ν(1 T λ 1) k=1 minimize over X by setting gradient to zero: X 1 + Z = 0 minimum over λ k is unless v T k Zv k z k + ν = 0 Dual problem maximize n + log det Z ν subject to vk T Zv k ν, k = 1,..., p change variable W = Z/ν, and optimize over ν to get dual of page 55. A. d Aspremont. Convex Optimization M2. 57/57

7. Statistical estimation

7. Statistical estimation 7. Statistical estimation Convex Optimization Boyd & Vandenberghe maximum likelihood estimation optimal detector design experiment design 7 1 Parametric distribution estimation distribution estimation

More information

7. Statistical estimation

7. Statistical estimation 7. Statistical estimation Convex Optimization Boyd & Vandenberghe maximum likelihood estimation optimal detector design experiment design 7 1 Parametric distribution estimation distribution estimation

More information

Convex Optimization: Applications

Convex Optimization: Applications Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe Norm Approximation minimize Ax b (A R m

More information

Relaxations and Randomized Methods for Nonconvex QCQPs

Relaxations and Randomized Methods for Nonconvex QCQPs Relaxations and Randomized Methods for Nonconvex QCQPs Alexandre d Aspremont, Stephen Boyd EE392o, Stanford University Autumn, 2003 Introduction While some special classes of nonconvex problems can be

More information

9. Geometric problems

9. Geometric problems 9. Geometric problems EE/AA 578, Univ of Washington, Fall 2016 projection on a set extremal volume ellipsoids centering classification 9 1 Projection on convex set projection of point x on set C defined

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 1 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

6. Approximation and fitting

6. Approximation and fitting 6. Approximation and fitting Convex Optimization Boyd & Vandenberghe norm approximation least-norm problems regularized approximation robust approximation 6 Norm approximation minimize Ax b (A R m n with

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017

Solution to EE 617 Mid-Term Exam, Fall November 2, 2017 Solution to EE 67 Mid-erm Exam, Fall 207 November 2, 207 EE 67 Solution to Mid-erm Exam - Page 2 of 2 November 2, 207 (4 points) Convex sets (a) (2 points) Consider the set { } a R k p(0) =, p(t) for t

More information

Semidefinite Programming Basics and Applications

Semidefinite Programming Basics and Applications Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Support vector machines

Support vector machines Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization Agenda Applications of semidefinite programming 1 Control and system theory 2 Combinatorial and nonconvex optimization 3 Spectral estimation & super-resolution Control and system theory SDP in wide use

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 6 (Conic optimization) 07 Feb, 2013 Suvrit Sra Organizational Info Quiz coming up on 19th Feb. Project teams by 19th Feb Good if you can mix your research

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1, 4 Duality 4.1 Numerical perturbation analysis example. Consider the quadratic program with variables x 1, x 2, and parameters u 1, u 2. minimize x 2 1 +2x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs

Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs Convexification of Mixed-Integer Quadratically Constrained Quadratic Programs Laura Galli 1 Adam N. Letchford 2 Lancaster, April 2011 1 DEIS, University of Bologna, Italy 2 Department of Management Science,

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization April 6, 2018 1 / 34 This material is covered in the textbook, Chapters 9 and 10. Some of the materials are taken from it. Some of

More information

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3

MIT Algebraic techniques and semidefinite optimization February 14, Lecture 3 MI 6.97 Algebraic techniques and semidefinite optimization February 4, 6 Lecture 3 Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo In this lecture, we will discuss one of the most important applications

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Convex relaxation. In example below, we have N = 6, and the cut we are considering Convex relaxation The art and science of convex relaxation revolves around taking a non-convex problem that you want to solve, and replacing it with a convex problem which you can actually solve the solution

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

EE 227A: Convex Optimization and Applications October 14, 2008

EE 227A: Convex Optimization and Applications October 14, 2008 EE 227A: Convex Optimization and Applications October 14, 2008 Lecture 13: SDP Duality Lecturer: Laurent El Ghaoui Reading assignment: Chapter 5 of BV. 13.1 Direct approach 13.1.1 Primal problem Consider

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

EE/AA 578, Univ of Washington, Fall Duality

EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Classification and Support Vector Machine

Classification and Support Vector Machine Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications Professor M. Chiang Electrical Engineering Department, Princeton University February

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Nonlinear Programming Models

Nonlinear Programming Models Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:

More information

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lagrange duality. The Lagrangian. We consider an optimization program of the form Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization

More information

Advances in Convex Optimization: Theory, Algorithms, and Applications

Advances in Convex Optimization: Theory, Algorithms, and Applications Advances in Convex Optimization: Theory, Algorithms, and Applications Stephen Boyd Electrical Engineering Department Stanford University (joint work with Lieven Vandenberghe, UCLA) ISIT 02 ISIT 02 Lausanne

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Lecture 16: Modern Classification (I) - Separating Hyperplanes

Lecture 16: Modern Classification (I) - Separating Hyperplanes Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are

More information

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels 1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:

More information

Tutorial on Convex Optimization: Part II

Tutorial on Convex Optimization: Part II Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise) What is Optimization? An Optimization Problem has 3 parts. x F f(x) :

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Lecture 20: November 1st

Lecture 20: November 1st 10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Announcements - Homework

Announcements - Homework Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information