Convex Optimization and SVM

Similar documents
Convex Optimization and Support Vector Machine

ICS-E4030 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lecture 18: Optimization Programming

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Constrained Optimization and Lagrangian Duality

Convex Optimization M2

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Interior Point Algorithms for Constrained Convex Optimization

subject to (x 2)(x 4) u,

5. Duality. Lagrangian

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lecture: Duality of LP, SOCP and SDP

Support Vector Machine (continued)

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture: Duality.

Convex Optimization Boyd & Vandenberghe. 5. Duality

Support Vector Machines for Regression

EE364a Review Session 5

Lecture 6: Conic Optimization September 8

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

CS711008Z Algorithm Design and Analysis

Linear and Combinatorial Optimization

Lagrangian Duality Theory

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Part IB Optimisation

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Support Vector Machines for Classification and Regression

Convex Optimization & Lagrange Duality

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

12. Interior-point methods

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Applications of Linear Programming

EE/AA 578, Univ of Washington, Fall Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

On the Method of Lagrange Multipliers

Duality of LPs and Applications

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Support Vector Machines

Support Vector Machines

Lecture Notes on Support Vector Machine

Machine Learning. Support Vector Machines. Manfred Huber

10 Numerical methods for constrained problems

Support Vector Machines: Maximum Margin Classifiers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Support Vector Machines

Lagrangian Duality and Convex Optimization

Optimization for Machine Learning

Support vector machines

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

minimize x subject to (x 2)(x 4) u,

EE 227A: Convex Optimization and Applications October 14, 2008

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

CSC 411 Lecture 17: Support Vector Machine

Lecture 8. Strong Duality Results. September 22, 2008

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization Overview (cnt d)

Tutorial on Convex Optimization: Part II

Solving Dual Problems

Convex Optimization M2

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

The fundamental theorem of linear programming

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

CSCI : Optimization and Control of Networks. Review on Convex Optimization

4. Algebra and Duality

Support Vector Machines, Kernel SVM

Lecture 10: A brief introduction to Support Vector Machine

A Brief Review on Convex Optimization

Barrier Method. Javier Peña Convex Optimization /36-725

Convex Optimization Lecture 13

Lecture 14: Optimality Conditions for Conic Problems

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Primal-Dual Interior-Point Methods

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

12. Interior-point methods

Homework Set #6 - Solutions

Lecture 13: Constrained optimization

Lecture 7: Convex Optimizations

Lecture 2: Linear SVM in the Dual

Support Vector Machines and Kernel Methods

Lagrange Relaxation and Duality

Convex Optimization Overview (cnt d)

Lecture Support Vector Machine (SVM) Classifiers

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Review: Support vector machines. Machine learning techniques and image analysis

Chapter 1 Linear Programming. Paragraph 5 Duality

Support Vector Machine

TMA947/MAN280 APPLIED OPTIMIZATION

Transcription:

Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence convex. (iii) Since x x x 0 2 x y 2 for all y S} = y Sx x x 0 2 x y 2 }, it is convex as the intersection of half spaces. (iv) Not convex in general. Take for instance S = 1, 1} and T = 0}. (v) Convex! Look: we have x + S 2 S 1 if x + y S 1 for all y S 2. Thus x x + S 2 S 1 } = x x + y S 1 } = (S 1 y), y S 2 y S 2 where each translated set S 1 y is convex. Problem 2. (i) The Lagrangian of the LP optimisation problem is The dual function is L(x, λ) = c t x + λ t (Ax b) = b t λ + (A t λ + c) t x. g(λ) = inf x The Lagrange dual problem is L(x, λ) = bt λ + inf x (At λ + c) t x b t λ if A t λ + c = 0 = otherwise b t λ subject to A t λ + c = 0 λ 0. 1

(ii) We now derive the dual of the dual problem. To start, we rewrite the dual problem in a standard form, b t λ subject to A t λ + c = 0 The Lagrangian of this optimisation problem is λ 0. L(λ, α, β) = b t λ + α t (A t λ + c) β t λ = (b t β t + α t A t )λ + α t c, subject to β 0. The Lagrange dual function is g(α, β) = inf (b β + Aα) t + c t α } λ c t α if Aα + b β = 0 = otherwise The dual of the dual is thus which is equivalent to c t α subject to Aα + b β = 0 β 0, minimize subject to c t α Aα = b β β 0, which, in turn, is equivalent to the original LP problem minimize c t x subject to Ax b. (iii) Using the weaker form of SC, strong duality holds for any LP problem provided the primal is feasible. Applying this to the dual, strong duality holds for LPs if the dual is feasible. The only possible case in which strong duality can fail: if both the primal and the dual are infeasible, for which p = + and d =. Problem 3. (i) The functions g 1 and g 2 are polynomials in R 2. A quick glance at their Hessian shows that they are strictly convex functions. The sets C j = x R 2 g j (x) 0} are both convex. The feasible set is thus convex, as the intersection of two convex sets. 2

(ii) KKT conditions are 2(λ 1 + λ 2 )x + 1 + 8λ 2 2λ 1 = 0 2(λ 1 + λ 2 )y + 1 + 6λ 1 = 0 λ 1 g 1 (x, y) = 0 λ 2 g 2 (x, y) = 0 (iii) Check case by case, depending on the values of λ 1, λ 2. Check that if λ 1 = λ 2 = 0, λ 1 = 0, λ 2 > 0 or λ 1 > 0, λ 2 > 0, then there are no solution. The only solution occurs when λ 1 > 0, λ 2 = 0, which is (x, y ) = (1 2/2, 2/2), for λ 1 = 1/(2y ). Problem 4. (i) Linear programming (ii) Quadratic programming minimize x y 2 2 subject to A 1 x 1 b 1 A 2 x 2 b 2, Problem 6. (i) The feasible set of the LP relaxation includes the feasible set of the Boolean LP. (ii) It follows from (i) that the Boolean LP is infeasible is the LP relaxation is infeasible. (iii) The Lagrangian function is L(x, λ, ν) = x t diag(ν)x + (c + A t λ) t x b t λ. Minimizing over x, we obtain the Lagrange dual function g(λ, ν) = 1 (c i + a t iλ + ν i ) 2 b t λ, 4 ν i if ν 0, and otherwise, where a i represents the i-th column of A. (iv) The dual of the LP relaxation problem can be found to be subject to b t u 1 t w A t u + c + w = v u 0 v 0 w 0 A careful comparison between this problem and the dual problem derived in (iii) show that they are equivalent: they return the same solution. 3

Problem 7. (i) Points inside the tube [y(x i ) ɛ, y(x i ) + ɛ] are such that their associated slack variables are equal to zero. Points are allowed to lie outside the ɛ-tube provided that their slack variables are non-zero, in which case the penalty is linear The error function t i y(x i ) ɛ = ξ i = penalty induced y(x i ) t i ɛ = ˆξ i = penalty induced C (ξ i + ˆξ i ) + 1 2 β 2 must be minimised subject to the non-negativity of the slack variables, t i y(x i ) + ɛ + ξ i and t i y(x i ) ɛ ˆξ i. (ii) The Lagrangian is L(β 0, β, ξ, ˆξ, λ, ˆλ, ν, ˆν) = C (ξ i + ˆξ i ) + 1 2 β 2 (λ i ξ i + ˆλ i ˆξi ) ν i (ɛ + ξ i + y i t i ) ˆν i (ɛ + ˆξ i y i + t i ). Primal conditions are ξ i 0, ˆξ i 0, t i y(x i ) + ɛ + ξ i and t i y(x i ) ɛ ˆξ i. Dual conditions are λ i, ˆλ i, ν i, ˆν i 0. Complementary slackness ensures λ i ξ i = 0, ˆλ i ˆξi = 0, ν i (ɛ + ξ i + y(x i ) t i ) = 0 and ˆν i (ɛ + ˆξ i y(x i ) + t i ) = 0. Gradient of the Lagrangian vanishes β 0 = (ν i ˆν i ) = 0 β = β (ν i ˆν i )x i = 0 ξ i = C λ i ν i = 0 ˆξ i = C ˆλ i ˆν i = 0 (iii) Check that the dual problem reduces to 1 (ν i ˆν i )(ν j ˆν j )x t 2 ix j ɛ i,j subject to 0 ν i C 0 ˆν i C (ν i + ˆν i ) + (ν i ˆν i )t i 4

(iv) y (x i ) = x t iβ + β 0 = n j=1 (ν j ˆν j )x t ix j + β 0 (v) Points with ν > 0 lie strictly above the upper boundary. Points with ˆν > 0 lie strictly below the lower boundary. For each observation outside the tube, either ν or ˆν is non-zero, they cannot be both non-zero at the same time (check complementary slackness conditions). The SV are such that either ν or ˆν is non-zero. Points inside the tube have ν = ˆν = 0 and do not contribute to the solution. (vi) Pick a point on the upper boundary (for which 0 < ν i < C), for which and average over all such points. β 0 = t i ɛ (νj ˆν j )x t ix j, j=1 5