CSE4830 Kernel Methods in Machine Learning


 Maurice Farmer
 1 years ago
 Views:
Transcription
1 CSE4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, / 45
2 Convex optimization Convex optimisation This lecture Crash course on convex optimisation Lagrangian duality Derivation of the SVM dual Additional reading on convex optimisation: Boyd & Vandenberghe, Convex optimisation, Cambridege University Press, 2004 Juho Rousu 27. September, / 45
3 Convex optimization Convex optimisation Convex functions A function f : R n R is convex if for all x, y, and 0 θ 1, we have f (θx + (1 θ)y) θf (x) + (1 θ)f (y). Geometrical interpretation: the graph of the function lies below the line segment from (x, f (x)) to (y, f (y)) A function f is strictly convex if strict inequality holds above concave if f is convex. Juho Rousu 27. September, / 45
4 Convex optimization Convex optimisation First order conditions Suppose f : R n R is differentiable. Then f is convex if and only if holds for all x, y R n. f (y) f (x) + f (x) (y x) The righthand side, the first order Taylor approximation of f, is a global underestimator of f. Geometrical interpretation: a convex function lies above each its tangent. Corresponding forms of the equation can be written for strictly convex (replace with > and concave functions (replace with ) Juho Rousu 27. September, / 45
5 Convex optimization Convex optimisation Second order conditions Assume f : R n R is twice differentiable. Then f is convex if and only if its Hessian matrix (matrix of second derivatives) H = [ 2 f (x)] ij = 2 f (x) x i x j, i = 1,..., n, j = 1,..., n, is positive semidefinite: for all x Hx 0 Geometrically the condition means that the function has positive curvature at x. Strict convexity is partially characterized by second order conditions: if H = 2 f (x) is positive definite, x Hx > 0 for all x, then f is strictly convex. The converse does not hold true in general. For function defined on R, the condition reduces to the simple condition f (x) 0, that is, that the second derivative is nondecreasing. Analogous conditions can be written for (strictly) concave functions and negative (semi)definite Hessians Juho Rousu 27. September, / 45
6 Convex optimization Convex optimisation Operations that preserve convexity of functions Nonnegative weighted sums: Nonnegative weighted sum of convex functions: f = w 1 f w m f m, where w j 0 is convex. Similarly, nonnegative weighted sum of concave functions is concave. These properties extend to infinite sums and integrals Pointwise maximum and supremum: Pointwise maximum f (x) = max(f1 (x), f 2 (x),..., f m (x)), of a set f 1,..., f m of convex functions is convex. Pointwise supremum of an infinite set of convex functions is convex Similarly: Pointwise minimum (infimum) of concave functions is concave Juho Rousu 27. September, / 45
7 Convex optimization Convex optimisation Convex sets A line segment between x 1 R n and x 2 R n is defined as all points that satisfy x = θx 1 + (1 θ)x 2, 0 θ 1 A convex set contains the line segment between any two distinct points in the set x 1, x 2 C, 0 θ 1 θx 1 + (1 θ)x 2 C Below: Convex and nonconvex sets. Q: Which ones are convex? Juho Rousu 27. September, / 45
8 Convex optimization Convex optimisation Operations that preserve convexity of sets There are two main ways of establishing the convexity of a set C 1. apply the definition of convexity: x 1, x 2 C, 0 θ 1 θx 1 + (1 θ)x 2 C 2. or show that the set can be obtained from simpler convex sets with operations that preserve convexity, most importantly: intersection: if S 1, S 2 are convex, their intersection S 1 S 2 is convex. affine functions: If S is a convex and f (x) = Ax b is affine, the image of S under f, f (S) = {f (x) x S} is convex Juho Rousu 27. September, / 45
9 Convex optimization Convex optimisation Convex optimization problem Standard form of a convex optimization problem min f 0(x) x D s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p The problem is composed of the following components: The variable x D from a domain D, D = R n is typical. The objective function f0 : R n R to be minimized, a convex function of the variable x The constraint functions f i R related to inequality constraints, convex functions of x The constraint functions hi (x) = a i x b i related to equality constraints, affine (linear) functions of x Juho Rousu 27. September, / 45
10 Convex optimization Convex optimisation Convex optimization problem Standard form of a convex optimization problem min f 0(x) x D s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p Value of x that satisfy the constraints is called feasible, the set of all feasible points is called the feasible set. The feasible set defined by the above constraints is a convex set x is optimal if it has the smallest objective function value among all feasible z D. Juho Rousu 27. September, / 45
11 Convex optimization Convex optimisation Why is convexity useful? I Convex objective vs. nonconvex objective I Convex feasible set vs. nonconvex feasible set Juho Rousu 27. September, / 45
12 Convex optimization Convex optimisation Why convexity? Convex objective: We can always improve a suboptimal objective value by stepping towards negative gradient All local optima are global optima Convex constraints i.e. convex feasible set Any point between two feasible points is feasible Updates remain inside the feasible set as long as the update direction is towards a feasible point = fast algorithms based on the principle of feasible descent Juho Rousu 27. September, / 45
13 Convex optimization Convex optimisation Quadratic optimization problem (QP) Quadratic program (QP) has a quadratic objective function and affine constraint functions A QP can be expressed in the form min x R (1/2)x Px + q x + r n s.t. g i x h i 0, i = 1,..., m a i x b i = 0, i = 1,..., p P R n n is positive semidefinite matrix and q, g i, a i R n The feasible set is convex, as the intersection of set of halfspaces (g i x h i ) and set of hyperplanes (a i x b i = 0) The objective decreases the steepest when moving along f 0 (x) = (Px + q) Juho Rousu 27. September, / 45
14 Convex optimization Convex optimisation Ridge regression as a QP min w ( w, x i y i ) 2 + λ w 2 Express norm of the weight vector as w 2 2 = w I n w Write squared error in vector form y X w, y X w = y y 2w X y + w X X w We get P = X X + λi n for the quadratic part, q = 2X y for the linear part Ridge regression problem is thus written as a QP: min w w ( X X + λi n ) w 2w X y + y y This is an example of an unconstrained QP Juho Rousu 27. September, / 45
15 Convex optimization Convex optimisation Softmargin SVM as a QP min w L Hinge ( w, x i, y i ) + λ 2 w 2 Express Hinge loss L Hinge ( w, x, y) = max(0, 1 yw x) by an upper bound (slack) ξ i 0 and inequality constraint: 1 y i w x i ξ i Express Squared norm of the weight vector as w 2 2 = w I D w Denote P = I n, q = 1 l R l = vector of ones We obtain the QP min 1 l ξ + λ w,ξ 2 w I n w s.t. 1 y i w x i ξ i, ξ i 0, i = 1,..., l Juho Rousu 27. September, / 45
16 Duality Principle of viewing an optimization problem from two interchangeable views, primal and dual views Intuitively: Minimization of a primal objective Maximization of the dual objective Primal constraints Dual variables Dual constraints Primal variables Motivation in this course: Learning with Features (primal) Learning with Kernels (dual) Juho Rousu 27. September, / 45
17 Duality: Lagrangian Consider the primal optimisation problem with variable x R n min f 0(x) x D s.t. f i (x) 0, i = 1,..., m h i (x) = 0, i = 1,..., p Augment the objective function with the weighted sum of the constraint functions to form the Lagrangian of the optimization problem: L(x, λ, ν) = f 0 (x) + m λ i f i (x) + p ν i h i (x) λ i, i = 1,..., m and ν i, i = 1,..., p (ν is the greek letter nu ) are called the Lagrangian multipliers or dual variables Juho Rousu 27. September, / 45
18 Example: Minimizing f 0 (x) = x 2, s.t. f 1 (x) = 1 x 0 Lagrangian: L(x, λ) = x 2 + λ(1 x) Juho Rousu 27. September, / 45
19 Lagrangian dual function The Lagrangian dual function g : R m R p R is the minimum value of the Lagrangian over x: g(λ, ν) = inf x L(x, λ, ν) = inf {f 0(x) + x m λ i f i (x) + p ν i h i (x)} Intuitively: Fixing coefficients (λ, ν) corresponds to certain level of penalty, The infimum returns the optimal x for that level of penalty g(λ, ν) is the corresponding value for the Lagrangian g(λ, ν) is a concave function as a pointwise infimum of a family of affine functions of (λ, ν) Juho Rousu 27. September, / 45
20 Example Minimizing x 2, s.t. x 1 Lagrangian dual function : g(λ) = inf x (x 2 + λ(1 x)) Set derivatives to zero x (x 2 + λ(1 x)) = 2x λ = 0 = x = λ/2 Plug back to the Lagrangian: g(λ) = λ2 4 + λ(1 λ/2) = λ2 4 + λ Juho Rousu 27. September, / 45
21 Lower bounds on optimal value The Lagrangian dual function gives us lower bounds on the optimal value p of the primal problem: below, g(λ) 1 = p, Juho Rousu 27. September, / 45
22 Lower bounds on optimal value In general, it holds that g(λ, ν) p for any nonnegative λ (λ i 0, i = 1,..., m) and for any ν To see this, let x be a feasible point of the original problem, thus all primal constraints are satisfied: f i ( x) 0 and h i ( x) = 0 We have λ i f i ( x) 0, i = 1,..., m since λ i > 0 by assumption Similarly ν i h i ( x) = 0 for i = 1,..., p Thus the value of the Lagrangian is less than the objective function at x: m p L( x, λ, ν) = f 0 ( x) + λ i f i ( x) + ν i h i ( x) f 0 ( x) Now, g(λ, ν) = inf x D L(x, λ, ν) L( x, λ, ν) f 0 ( x) as infimum is computed over a set containing x Juho Rousu 27. September, / 45
23 Lower bounds on optimal value The Lagrangian dual function gives us lower bounds on the optimal value p of the primal problem: below, g(λ) 1 = p, Juho Rousu 27. September, / 45
24 The Lagrange dual problem For each pair (λ, ν), λ 0, the Lagrange dual function gives a lower bound on the optimal value of p. What is the tightest lower bound that can be achieved? We need to find the maximum This gives us a optimization problem max λ,ν g(λ, ν) s.t.λ 0 It is called the Lagrange dual problem of the original optimization problem. It is a convex optimisation problem, since it is equivalent to minimising g(λ, ν) which is a convex function Juho Rousu 27. September, / 45
25 Dual problem for SVMs Derivation of Dual SoftMargin SVM Recall the softmargin SVM from last lecture: min w,ξ 1 2 w 2 + C ξ i s.t. 1 y i ( w, x i + b) ξ i i = 1,..., l ξ i 0 GC content before 'AG' AG AG AG AG AG AG AG AG AG ξ GC content after 'AG' AG AG AG ξ i is called the slack variable, when positive, the margin γ(x i ) < 1 The sum of slacks is to be minimized so the objective still favours hyperplanes that separates the classes well The coefficient C > 0 controls the balance between maximizing the margin and the amount of slack needed Juho Rousu 27. September, / 45
26 Dual problem for SVMs Note: ST & Book formulation We use the canonical hyperplane representation: We fix the margin γ = 1 and seek the shortest weight vector w that achieves the margin, penalizing outliers by the slack variables ξ corresponding to the Hinge Loss ST & C book (Section 7.2.2) writes the problem in the margin maximization form, keeping the norm of the weight vector fixed. This leads to a different derivation of the dual, but the same result. min γ + C γ,w,ξ ξ i s.t. y i ( w, x i + b) γ ξ i i = 1,..., l. ξ i 0, w 2 = 1 GC content before 'AG' AG AG AG AG AG AG AG AG AG ξ GC content after 'AG' AG AG AG Juho Rousu 27. September, / 45
27 Dual problem for SVMs Recipe to find the dual problem 1. Write down the Lagrangian 2. Find the minimum of the Lagrangian by setting derivatives w.r.t. primal variables to zero 3. Plug the solution (w, ξ ) back to the Lagrangian to obtain the dual function 4. Write down the optimisation problem to maximise the dual function w.r.t the dual variables Juho Rousu 27. September, / 45
28 Dual problem for SVMs Deriving the dual SVM We first write the constraints in the standard form of a QP: min w,ξ 1 2 w 2 + C s.t. 1 y i w x i ξ i 0, i = 1,..., l ξ i ξ i 0, i = 1,..., l Above we have dropped the intercept (b) of the hyperplane, instead we assume a constant feature x 0 = const has been added Juho Rousu 27. September, / 45
29 Dual problem for SVMs Lagrangian of SoftMargin SVM min w,ξ 1 2 w 2 + C s.t. 1 y i w x i ξ i 0, i = 1,..., l ξ i ξ i 0, i = 1,..., l Step 1. Write down the Lagrangian We have objective and two sets of inequality constraints, use the dual variables λ = (α, β) where α = (α i ) l corresponds to the first set, and β = (β i ) l corresponds to the second set of constraints The Lagrangian becomes L(w, ξ, α, β) = 1 2 w 2 + C l ξ i + l α i(1 y i w x i ) ξ i ) + l β i( ξ i ) Juho Rousu 27. September, / 45
30 Dual problem for SVMs Minimum of the Lagrangian L(w, ξ, α, β) = 1 2 w 2 + C l ξ i + l α i(1 y i w x i ξ i ) + l β i( ξ i ) Step 2. Find the minimum of the Lagrangian by setting derivatives w.r.t. primal variables to zero For variable w we get: w L(w, ξ, α, β) = w = α i y i x i = w l α iy i x i = 0 Juho Rousu 27. September, / 45
31 Dual problem for SVMs Minimum of the Lagrangian L(w, ξ, α, β) = 1 2 w 2 + C l ξ i + l α i(1 y i w x i ξ i ) + l β i( ξ i ) Step 2. Find the minimum of the Lagrangian by setting derivatives w.r.t. primal variables to zero For variable ξ we get: ξi L(w, ξ, α, β) = C α i β i = 0 Together with β i 0 this implies α i C for all i = 1,..., l Juho Rousu 27. September, / 45
32 Dual problem for SVMs Lagrangian Dual function L(w, ξ, α, β) = 1 2 w 2 + C + ξ i α i (1 y i w x i ξ i ) + β i ( ξ i ) Step 3. Plug the solution back to the Lagrangian to obtain the dual function Substitute w = l α iy i x i and C α i β i = 0 The terms containing ξ i and β i cancel out L(w, ξ, α, β) = 1 2 α i y i x i Juho Rousu 27. September, / 45 α i (1 y i ( α i y i x i ) x i ))
33 Dual problem for SVMs Lagrangian Dual function L(w, ξ, α, β) = 1 2 α i y i x i α i (1 y i ( α j y j x j ) x i )) Expanding the squares and simplifying gives the dual function that only depends on the dual variable α j=1 g(α) = α i 1 2 ( α i α j y i y j x j x i j=1 Juho Rousu 27. September, / 45
34 Dual problem for SVMs Dual SVM optimisation problem Step 4. Write down the optimisation problem to maximise the dual function w.r.t the dual variables We have nonnegativity for α i 0, as well as the upper bound derived above α i C, the optimal dual solution will be max g(α) = α i 1 α 2 j=1 s.t. 0 α i C, i = 1,..., l α i α j y i y j x j, x i Plug in a kernel κ(x i, x j ) = φ(x j ), φ(x i ) F to obtain Dual SVM problem: max g(α) = α i 1 α 2 j=1 s.t. 0 α i C, i = 1,..., l α i α j y i y j κ(x i, x j ) Juho Rousu 27. September, / 45
35 Properties of convex optimisation problems Properties of convex optimisation problems We will look at further concepts to understand the properties of convex optimisation problems Weak and strong duality Duality gap Complementary slackness KKT conditions Juho Rousu 27. September, / 45
36 Properties of convex optimisation problems Weak and strong duality Let p and d denote primal and dual optimal values of an optimization problem. Weak duality d p always holds, even when primal optimization problem is nonconvex Strong duality d = p holds for special classes of convex optimization problems QPs have strong duality Juho Rousu 27. September, / 45
37 Properties of convex optimisation problems Duality gap A pair x, (λ, ν) where x is primal feasible and (λ, ν) is dual feasible is called primal dual feasible pair For primal dual feasible pair, the quantity is called the duality gap f 0 (x) g(λ, ν), A primal dual feasible pair localizes the primal and dual optimal values g(λ, ν) d p f 0 (x) into an interval the width of which is given by the duality gap If the duality gap is zero, we know that x is primal optimal and (λ, ν) is dual optimal We can use duality gap as a stopping criterion for optimisation Juho Rousu 27. September, / 45
38 Properties of convex optimisation problems Stopping criterion for optimization Suppose the algorithm generates a sequence of primal feasible x (k) and dual feasible (λ (k), ν (k) ) solutions for k = 1, 2,... Then the duality gap can be used as the stopping criterion: e.g. stop when f 0 (x (k) ) g(λ (k), ν (k) ɛ, for some ɛ > 0 Juho Rousu 27. September, / 45
39 Properties of convex optimisation problems Complementary slackness Let x be a primal optimal (and thus also feasible) and (λ, ν ) be a dual optimal (and thus also feasible) solution and let strong optimality hold, i.e. d = p Then, at optimum d = g(λ, ν ) = inf x {f 0(x) + f 0 (x ) + m λ i f i (x) + m λ i f i (x ) + p νi h i (x)} p νi h i (x ) f 0 (x ) = p First inequality: definition of infimum, second: inequality from x being a feasible solution Due to d = p, the inequalities must hold as equalities = penalty terms must equate to zero Juho Rousu 27. September, / 45
40 Properties of convex optimisation problems Complementary slackness We have m p λ i f i (x ) + νi h i (x ) = 0 Since h i (x ) = 0, i = 1,..., p we conclude that m λ i f i (x ) = 0 and since each term is nonpositive λ i f i (x ) = 0, i = 1,..., m This condition is called the complementary slackness Juho Rousu 27. September, / 45
41 Properties of convex optimisation problems Complementary slackness Intuition: at optimum there cannot be both slack in the dual variable λ i 0 and the constraint f i (x ) 0 at the same time: and λ i > 0 f i (x ) = 0 f i (x ) < 0 λ i = 0 At optimum, positive Lagrange multipliers are associated with active constraints Juho Rousu 27. September, / 45
42 Properties of convex optimisation problems Complementary slackness for softmargin SVM Recall the Lagrangian of the softmargin SVM: L(w, ξ, α, β) = 1 2 w 2 + C l ξ i + l α i(1 y i w x i ξ i ) + l β i( ξ i ) Complementary slackness condition gives for all i = 1,..., l αi (1 y i w x i ξ i ) = 0 βi ξ i = 0 Juho Rousu 27. September, / 45
43 Properties of convex optimisation problems KarushKuhnTucker (KKT) conditions Summarizing, at optimum the following conditions must hold true: Inequality contraints satified: f i (x ) 0, i = 1,..., m Equality contraints satified: hi (x ) = 0, i = 1,..., p Nonnegativity of dual variables of the inequality constraints: λ i 0, i = 1,..., m Complementary slackness: λ i f i(x ) = 0, i = 1,..., m Derivative of Lagrangian vanishes: x L(x, λ, ν ) = (f 0 (x ) + m λ i f i(x ) + p ν i h i(x )) = 0 These conditions are called the KarushKuhnTucker conditions For convex problems satisfying KKT conditions is also sufficient as well as necessary for optimality Juho Rousu 27. September, / 45
44 Properties of convex optimisation problems KKT conditions for the softmargin SVM Inequality contraints satified: 1 y i w x i ξ i 0, i = 1,..., l No equality constraints to worry about Nonnegativity of dual variables of the inequality constraints: α i 0, β i 0, i = 1,..., l Complementary slackness: for all i = 1,..., l αi (1 y i w x i ξ i ) = 0 βi ξ i = 0 Derivative of Lagrangian vanishes: (w,ξ) L(w, ξ, α, β ) = (w,ξ) ( 1 2 w 2 + C + = w = αi (1 y i w x i ξi ) + α i y i x i, ξ i βi ( ξi )) = 0 α i C, i = 1,..., l Juho Rousu 27. September, / 45
45 Properties of convex optimisation problems KKT conditions for the softmargin SVM By using the KKT conditions, we can categorize data points by their position w.r.t the their margin: A Too small margin, difficult data point : α i = C = y i w x i 1 B At the boundary : 0 < α i < C = y i w x i = 1 C More than enough margin, easy data point α i = 0 = y i w x i 1 <w, x> + b = 1 <w, x> + b = 0 B C <w, x> + b = +1 ξ>1 B C B A C C A C ξ>0 C B Juho Rousu 27. September, / 45
ICSE4030 Kernel Methods in Machine Learning
ICSE4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very wellwritten and a pleasure to read. The
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationThe Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:
HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course
More informationIntroduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem
CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights
More informationOn the Method of Lagrange Multipliers
On the Method of Lagrange Multipliers Reza Nasiri Mahalati November 6, 2016 Most of what is in this note is taken from the Convex Optimization book by Stephen Boyd and Lieven Vandenberghe. This should
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More informationLagrangian Duality and Convex Optimization
Lagrangian Duality and Convex Optimization David Rosenberg New York University February 11, 2015 David Rosenberg (New York University) DSGA 1003 February 11, 2015 1 / 24 Introduction Why Convex Optimization?
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts  ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a twoclass classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationConvex Optimization and SVM
Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a ndimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and FuJie Huang 1 Outline What is behind
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt2016fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equalityconstrained Optimization Inequalityconstrained Optimization Mixtureconstrained Optimization 3 Quadratic Programming
More informationKarushKuhnTucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36725
KarushKuhnTucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10725/36725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationIntroduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationDuality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities
Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form
More informationLagrange duality. The Lagrangian. We consider an optimization program of the form
Lagrange duality Another way to arrive at the KKT conditions, and one which gives us some insight on solving constrained optimization problems, is through the Lagrange dual. The dual is a maximization
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More information14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.
CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity
More informationSupport Vector Machines for Regression
COMP566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More information12. Interiorpoint methods
12. Interiorpoint methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470  Convex Optimization Fall 201718, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual
More informationI.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec  Spring 2010
I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec  Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationLecture 2: Linear SVM in the Dual
Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your onepage crib sheet. No calculators or electronic items.
More informationLinear vs Nonlinear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Nonlinear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationSupport Vector Machine (SVM) & Kernel CE717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationMax MarginClassifier
Max MarginClassifier Oliver Schulte  CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin NonSeparable Data Kernels and Nonlinear Mappings Where does the maximization
More informationApplied inductive learning  Lecture 7
Applied inductive learning  Lecture 7 Louis Wehenkel & Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège Montefiore  Liège  November 5, 2012 Find slides: http://montefiore.ulg.ac.be/
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual
More informationEE/AA 578, Univ of Washington, Fall Duality
7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the classconditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 00 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationInterior Point Algorithms for Constrained Convex Optimization
Interior Point Algorithms for Constrained Convex Optimization Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Inequality constrained minimization problems
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationNonlinear Support Vector Machines
Nonlinear Support Vector Machines Andrea Passerini passerini@disi.unitn.it Machine Learning Nonlinear Support Vector Machines Nonlinearly separable problems Hardmargin SVM can address linearly separable
More informationLecture 7: Convex Optimizations
Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More information10701 Recitation 5 Duality and SVM. Ahmed Hefny
10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian
More informationSoftmargin SVM can address linearly separable problems with outliers
Nonlinear Support Vector Machines Nonlinearly separable problems Hardmargin SVM can address linearly separable problems Softmargin SVM can address linearly separable problems with outliers Nonlinearly
More informationConvex Optimization Overview (cnt d)
Conve Optimization Overview (cnt d) Chuong B. Do November 29, 2009 During last week s section, we began our study of conve optimization, the study of mathematical optimization problems of the form, minimize
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to nonlinear
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1  April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationSupport Vector Machine I
Support Vector Machine I Statistical Data Analysis with Positive Definite Kernels Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for Advanced
More informationKernel Machines. Pradeep Ravikumar Coinstructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Coinstructor: Manuela Veloso Machine Learning 10701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a ddimensional vector Primal problem:
More informationOptimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers
Optimization for Communications and Networks Poompat Saengudomlert Session 4 Duality and Lagrange Multipliers P Saengudomlert (2015) Optimization Session 4 1 / 14 24 Dual Problems Consider a primal convex
More informationMachine Learning And Applications: Supervised LearningSVM
Machine Learning And Applications: Supervised LearningSVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@enslyon.fr 1 Supervised vs unsupervised learning Machine
More informationShiqian Ma, MAT258A: Numerical Optimization 1. Chapter 4. Subgradient
Shiqian Ma, MAT258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian
More informationConvex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Convex Functions Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470  Convex Optimization Fall 201718, HKUST, Hong Kong Outline of Lecture Definition convex function Examples
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Nonlinear programming In case of LP, the goal
More informationCSE4830 Kernel Methods in Machine Learning
CSE4830 Kernel Methods in Machine Learning Lecture 5: Multiclass and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationTutorial on Convex Optimization: Part II
Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms  A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationCONVEX OPTIMIZATION, DUALITY, AND THEIR APPLICATION TO SUPPORT VECTOR MACHINES. Contents 1. Introduction 1 2. Convex Sets
CONVEX OPTIMIZATION, DUALITY, AND THEIR APPLICATION TO SUPPORT VECTOR MACHINES DANIEL HENDRYCKS Abstract. This paper develops the fundamentals of convex optimization and applies them to Support Vector
More informationISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints
ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of nonrigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationSupport Vector Machines
Support Vector Machines Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department of Statistical Science, Graduate University for
More informationPart IB Optimisation
Part IB Optimisation Theorems Based on lectures by F. A. Fischer Notes taken by Dexter Chua Easter 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after
More informationA : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:
15: Leastsquares I A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution: f (x) =(Ax b) T (Ax b) =x T A T Ax 2b T Ax + b T b f (x) = 2A T Ax 2A T b = 0 ChihJen Lin (National
More informationQuiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006
Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in
More informationEE364a Review Session 5
EE364a Review Session 5 EE364a Review announcements: homeworks 1 and 2 graded homework 4 solutions (check solution to additional problem 1) scpd phonein office hours: tuesdays 67pm (6507231156) 1 Complementary
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Maxmargin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationLinear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26
Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines
More informationsubject to (x 2)(x 4) u,
Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the
More information