WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Size: px
Start display at page:

Download "WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????"

Transcription

1 DUALITY

2 WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

3 LAGRANGIAN Simple case f(x) subject to Ax b =0 Saddle-point form min x max f(x)+h, Ax bi Lagrangian

4 SADDLE-POINT FORM Simple case f(x) subject to Ax b =0 max f(x)+h, Ax bi = Saddle-point form min x max f(x)+h, Ax bi Lagrangian ( f(x), Ax b =0 1, otherwise min max f(x)+h, Ax bi =min x x f(x)

5 INEQUALITY CONSTRAINTS f(x) subject to Ax b =0 Cx d apple 0 min x max, 0 f(x)+h, Ax bi + h,cx di non-negative constraint Why does this work? Cx d apple 0: =0

6 GENERAL FORM LAGRANGIAN f(x) subject to g(x) apple 0 h(x) =0 Lagrangian L(x,, ) =f(x)+h,h(x)i + h,g(x)i Saddle-point formulation f(x? )=min x max, 0 f(x)+h,h(x)i + h,g(x)i

7 CALCULUS INTERPRETATION f(x) subject to h(x) =0 gradient of objective parallel to gradient of constraint rf(x) =rh(x) rf(x)+rh(x) =0 contours of objective linear constraint

8 CALCULUS INTERPRETATION f(x) subject to h(x) =0 gradient of objective parallel to gradient of constraint rf(x) =rh(x) rf(x)+rh(x) =0 L(x, )=f(x)+h,h(x)i min x optimality x L(x, )=rf(x)+rh(x) =0

9 INEQUALITY CONDITIONS f(x) subject to g(x) apple 0 L(x, ) =f(x)+h,g(x)i Inactive: min x max 0 f(x)+h,g(x)i =0 Active: > 0 x? x?

10 INEQUALITY x L(x, )=rf(x)+rg(x) =0 Inactive: =0 Active: > 0

11 OPTIMALITY CONDITIONS: KKT SYSTEM f(x) subject to g(x) apple 0 h(x) =0 L(x,, ) =f(x)+h,h(x)i + h,g(x)i necessary conditions Karush-Kuhn-Tucker rf(x? )+ rh(x? )+ rg(x? )=0 Primal/dual optimality g(x? ) apple 0 h(x? )=0 0 i g i (x? )=0 } Primal feasibility Dual feasibility Comp slackness

12 DUAL FUNCTION d(, ) =min x L(x,, ) =min x f(x)+h,h(x)i + h,g(x)i Dual is concave? Why? Does this depend on convexity of f?

13 DUAL FUNCTION d(, ) =min x L(x,, ) =min x f(x)+h,h(x)i + h,g(x)i Dual is lower bound to optimal objective min x f(x)+h,h(x)i + h,g(x)i applef(x? )? Why? (because optimal x satisfies constraints)

14 GEOMETRIC INTERPRETATION OF LOWER BOUND f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) X (z) = ( 0, z apple 0 1, otherwise X 1 0

15 GEOMETRIC INTERPRETATION f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) X 1 linear lower bound X (z) hz, i, 0 0

16 GEOMETRIC INTERPRETATION f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) Lower bound f(x)+h,h(x)i + h,g(x)i

17 MAX OF DUAL max, 0 d(, ) = max min, 0 x f(x)+h,h(x)i + h,g(x)i??? L(x,, ) max, 0 d(, ) = max min, 0 x L(x,, ) =min x max, 0 L(x,, ) Solution Does maximizing dual = minimizing primal???

18 WEAK/STRONG DUALITY Weak duality max, 0 d(, ) apple f(x? ) Always holds: even for non-convex problems Strong duality max, 0 d(, ) =f(x? ) Holds most of the time for convex problems

19 SLATER S CONDITION f(x) subject to g(x) apple 0 Ax = b Slater s condition holds if there is a strictly feasible point Constraint qualification f(x) < 1 g(x) < 0 Ax = b Not strictly feasible Strictly feasible

20 SLATER S CONDITION f(x) subject to g(x) apple 0 Ax = b Slater s condition holds if there is a strictly feasible point f(x) < 1 g(x) < 0 Theorem Ax = b Convex+(linear equalities)+(slater s condition) = strong duality max min, 0 x L(x,, ) =min x max, 0 L(x,, ) max, 0 d(, ) =f(x? )

21 NON-HOMOGENOUS SVM w,b Choose line with largest margin: 2 kwk Push data to right side of line: 1 2 kwk2 + C X h[l i (d T i w b)] d T w b =1 h(l i [d T i w b]) d T w b =0

22 DUAL: THE HARD WAY w,b 1 2 kwk2 + C X h[l i (d T i w b)] w,b w,b 1 X 2 kwk2 + C max{1 l i (d T i w b), 0} Idea: make this differentiable 1 X 2 kwk2 + C max{1 L(Dw b), 0} w,b,x Standard form 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0

23 DUAL: THE HARD WAY w,b,x 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0 Positive multipliers 1 d(, )=min w,b,x2 kwk2 + Ch1,xi + h, 1 L(Dw b) xi + h, xi This is too complicated. Let s reduce it derivative for x C1 =0, with, 0 solution = C1, for 0 apple i apple C new constraint

24 DUAL: THE HARD WAY w,b,x 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0 1 d(, )=min w,b,x2 kwk2 + Ch1,xi + h, 1 L(Dw b) xi + h, xi hc1, xi d( ) =min w,b,x =min w,b,x = C1, for 0 apple i apple C 1 2 kwk2 + h, 1 L(Dw b)i 1 2 kwk2 + h, 1i h,ldwi h,lib

25 DUAL: THE HARD WAY d( ) =min w,b,x =min w,b,x w 1 2 kwk2 + h, 1 L(Dw b)i 1 2 kwk2 + h, 1i h,ldwi h,lib h,li =0 for w D T L =0, or w = D T L d( ) = 1 2 kdt L k 2 + h, 1i h,ldd T L i = 1 2 kdt L k 2 + h, 1i kd T L k 2 d( ) = 1 2 kdt L k 2 + h, 1i

26 ADVANTAGES OF DUAL w,b w = D T L Original problem 1 2 kwk2 + C max{1 L(Dw b), 0} non-differentiable Dual problem maximize d( ) = subject to 0 apple apple C h,li =0 1 2 kdt L k 2 + h, 1i smooth Solvable via coordinate descent LIBSVM

27 ADVANTAGES OF DUAL Dual problem maximize d( ) = 1 2 kdt L k 2 + h, 1i subject to 0 apple apple C h,li =0 kernel form d( ) = 1 2 T LDD T L kernel h, 1i subject to 0 apple apple C h,li =0

28 KERNELS (DD T ) ij = hd i,d j i = ( big, when d i and d j are close small, when d i and d j are far apart k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent Dual SVM: only need to know similarity function Kernel methods: replace inner product with some other similarity

29 EXAMPLE: TWO MOONS k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent far close

30 EXAMPLE: TWO MOONS k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent k(x, y) =exp kx yk far close

31 EXAMPLE: TWO MOONS subject to 0 apple apple C d( ) = 1 2 T LDD T L h,li =0 kernel h, 1i K ij = k(d i,d j ) d( ) = 1 2 T LKL h, 1i subject to 0 apple apple C h,li =0 Return to this later!!

32 CONJUGATE FUNCTION f (y) = max x y T x f(x) Is it convex?

33 CONJUGATE OF NORM f(x) =kxk dual norm f (y) = max x y T x kxk Holder inequality kyk, max x yt x/kxk y T x applekyk kxk kyk apple 1! f =0 kyk > 1! f = 1 why? why? f (y) = ( 0, kyk apple 1 1, otherwise

34 EXAMPLES f(x) =kxk 2, f (y) =X 2 (y) = f(x) = x, f (y) =X 1 (y) = f(x) =kxk 1, f (y) =X 1 (y) = ( 0, kxk 2 apple 1 1, kxk 2 > 1 ( 0, kxk 1 apple 1 1, kxk 1 > 1 ( 0, x apple 1 1, x > 1

35 HOW TO USE CONJUGATE g(x)+f(ax + b) write this as g(x)+f(y) subject to y = Ax + b d( )=min x,y g(x)+f(y)+h, Ax + b yi =min x,y g(x)+h, Axi + f(y) h,yi + h,bi = h,bi max x,y g(x) ha T,xi f(y)+h,yi d( )=h,bi g ( A T ) f ( )

36 EXAMPLE: LASSO µ x kax bk2 how big? g(x)+f(ax + b) d( )=h,bi g ( A T ) f ( ) note: (µj) (y) =µj (y/µ) maximize h,bi X 1 1 µ AT 1 2 k k2 change variables to eliminate negative sign maximize h,bi X 1 1 µ AT 1 2 k k2

37 EXAMPLE: LASSO µ x kax bk2 maximize h,bi X 1 1 µ AT 1 2 k k2 complete square 1 maximize 2 k bk kbk2 subject to ka T k 1 apple µ

38 EXAMPLE: LASSO µ x kax bk2 dual problem maximize k bk kbk2 subject to ka T k 1 apple µ Zero is a solution when the optimal objective is µ ka0 bk2 = 1 2 kbk2 This coincides with dual variable = b µ ka T bk

39 EXAMPLE: LASSO µ x kax bk2 Solution is zero when µ ka T bk µ guess = 1 10 kat bk

40 EXAMPLE: SVM g(x)+f(ax) w,b w x = b 1 2 kwk2 + h(ldw lb) A =[LD l] f(z) =h(z) =C X max{1 z i, 0} g(x) =g(w, b) = 1 2 kwk2

41 HOW TO USE CONJUGATE f(z) =h(z) =C X max{1 z i, 0} f ( ) = maxh,zi h(z) z = maxh,zi C max{0, 1 z} z = max z min{h,zi, h,zi C(1 z)} = max min{h,zi, h + C, zi C} z ( 1 T, i 2 [ C, 0] = 1, otherwise max occurs where two linear functions intersect

42 HOW TO USE CONJUGATE g(x) =g(w, b) = 1 2 kwk2 g (y) = max x hy, xi 1 2 kwk2 = max hy w,wi + hy b,bi w,b ( 1 2 ky wk 2,y b =0 1 2 kwk2 = 1, otherwise

43 SVM DUAL g(x)+f(ax) d( )= g ( A T ) f ( ) x = w b A =[LD l] g (y) = ( 1 2 ky wk 2,y b =0 1, otherwise f ( )= ( P i, 2 [ C, 0] 1, otherwise maximize = 1 2 kdt L k 2 1 T subject to l T =0 C apple apple 0 Same as before with

44 EXAMPLE: TV 1 2 kx fk2 + µ rx g(x)+f(ax) f(z) =µ z! f (y) =X 1 (y/µ) (µj) (y) =µj (y/µ) g(z) = 1 2 kz fk2! g (y) = 1 2 ky + 1 fk2 2 kfk2 form dual problem d( )= g ( A T ) f ( ) maximize 1 2 krt fk 2 X 1 ( /µ)

45 EXAMPLE: TV maximize 1 2 krt fk 2 X 1 ( /µ) 1 maximize 2 krt fk 2 subject to k k 1 apple µ smooth simple

46 NON-CONVEX PROBLEM: SPECTRAL CLUSTERING Non-linearly separable classes Two moons Swiss roll

47 SPECTRAL CLUSTERING Non-linearly separable classes Two moons (Dis)similarity matrix W ij = e kd i d j k 2 / 2 x i 2 { 1, 1}

48 LABELING PROBLEM x T Wx subject to x 2 i =1 Big W Small W different labels same label Is this convex? Why? How hard is this problem? Dual function d( )= x T Wx+ h,x 2 1i Is this convex? Why? How hard is this problem?

49 LABELING PROBLEM Dual function d( )= x T Wx+ h,x 2 1i X ix 2 i = x T diag( )x d( )= x T (W + diag( ))x h, 1i ( = h, 1i, W + diag( ) 0 1, otherwise

50 LABELING PROBLEM Dual function d( )= x T (W + diag( ))x h, 1i ( = h, 1i, W + diag( ) 0 1, otherwise Pick the dual vector to be smallest constant we can get away with i = min(w ), 8i d(x) =h min, 1i smallest eigenvector x = arg min x T (W mini)x + h min, 1i = e min

51 LABELING PROBLEM Approximate solution x = arg min x T (W mini)x + h min, 1i = e min Final step: integer rounding x? approx = round(e min )

52 OVERALL STRATEGY x T Wx subject to x 2 i =1 convex relaxation Why approximate? Dual function d( )= x T Wx+ h,x 2 1i Approximate solution x = arg min x T (W mini)x + h min, 1i = e min Final step: integer rounding x? approx = round(e min ) note: we could also definite SIMilarity matrix, and use largest eigenvalue

53 DO CODE EXAMPLE spectral clustering

54 NEWTON S METHOD smooth problem f(x) quadratic Approximation 1 2 (x xk ) T H(x x k )+hx x k,gi What about a constrained problem? f(x) subject to Ax = b

55 CONSTRAINED NEWTON Constrained problem f(x) subject to Ax = b 1 2 (x xk ) T H(x x k )+hx x k,gi subject to Ax = b L(x, )= 1 2 (x xk ) T H(x x k )+hx x k,gi + h, Ax bi KKT Conditions H(x x k )+g + A T =0 Ax = b

56 CONSTRAINED NEWTON KKT Conditions H(x x k )+g + A T =0 Ax = b Newton direction d = x x k H A T A 0 d = b g Ax k Solution is approximate r Solution satisfies constraints

57 NEWTON ALGORITHM Compute Hessian and gradient Solve Newton system H A T d A 0 = b g Ax k Armijo search Update iterate f(x k + d) apple f(x k )+ 10 gt d x k+1 = x k + d When stepsize=1, constraints are satisfied exactly

58 NEWTON COMPLEXITY f(x) subject to Ax = b Theorem Suppose the Hessian of f is Lipschitz continuous. Then after a finitely many constrained Newton steps the unit stepsize is an Armijo step, and ( = 1) kx k+1 x? kappleckx k x? k 2 Furthermore, the constraint are exactly satisfied.

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

COM S 578X: Optimization for Machine Learning

COM S 578X: Optimization for Machine Learning COM S 578X: Optimization for Machine Learning Lecture Note 4: Duality Jia (Kevin) Liu Assistant Professor Department of Computer Science Iowa State University, Ames, Iowa, USA Fall 2018 JKL (CS@ISU) COM

More information

10-725/ Optimization Midterm Exam

10-725/ Optimization Midterm Exam 10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

More information

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem Agenda 1 Duality for LP 2 Theorem of alternatives 3 Conic Duality 4 Dual cones 5 Geometric view of cone programs 6 Conic duality theorem 7 Examples Lower bounds on LPs By eliminating variables (if needed)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Support vector machines

Support vector machines Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3. Optimization Problems and Iterative Algorithms Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

Machine Learning A Geometric Approach

Machine Learning A Geometric Approach Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Generalization to inequality constrained problem. Maximize

Generalization to inequality constrained problem. Maximize Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

More information

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b. Solutions Chapter 5 SECTION 5.1 5.1.4 www Throughout this exercise we will use the fact that strong duality holds for convex quadratic problems with linear constraints (cf. Section 3.4). The problem of

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Numerical optimization

Numerical optimization Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Numerical Optimization

Numerical Optimization Numerical Optimization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Numerical Optimization Machine Learning

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems 1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Recita,on: Loss, Regulariza,on, and Dual*

Recita,on: Loss, Regulariza,on, and Dual* 10-701 Recita,on: Loss, Regulariza,on, and Dual* Jay- Yoon Lee 02/26/2015 *Adopted figures from 10725 lecture slides and from the book Elements of Sta,s,cal Learning Loss and Regulariza,on Op,miza,on problem

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009 UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

Constrained optimization

Constrained optimization Constrained optimization In general, the formulation of constrained optimization is as follows minj(w), subject to H i (w) = 0, i = 1,..., k. where J is the cost function and H i are the constraints. Lagrange

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40 Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 28, 2017 1 / 40 The Key Idea of Newton s Method Let f : R n R be a twice differentiable function

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Nonlinear Programming and the Kuhn-Tucker Conditions

Nonlinear Programming and the Kuhn-Tucker Conditions Nonlinear Programming and the Kuhn-Tucker Conditions The Kuhn-Tucker (KT) conditions are first-order conditions for constrained optimization problems, a generalization of the first-order conditions we

More information

subject to (x 2)(x 4) u,

subject to (x 2)(x 4) u, Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Lecture 2: Linear SVM in the Dual

Lecture 2: Linear SVM in the Dual Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

ECON 4117/5111 Mathematical Economics

ECON 4117/5111 Mathematical Economics Test 1 September 23, 2016 1. Suppose that p and q are logical statements. The exclusive or, denoted by p Y q, is true when only one of p and q is true. (a) Construct the truth table of p Y q. (b) Prove

More information

Scientific Computing: Optimization

Scientific Computing: Optimization Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

More information

EE/AA 578, Univ of Washington, Fall Duality

EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Support Vector Machines for Regression

Support Vector Machines for Regression COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x

More information

1. f(β) 0 (that is, β is a feasible point for the constraints)

1. f(β) 0 (that is, β is a feasible point for the constraints) xvi 2. The lasso for linear models 2.10 Bibliographic notes Appendix Convex optimization with constraints In this Appendix we present an overview of convex optimization concepts that are particularly useful

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Mathematical Economics. Lecture Notes (in extracts)

Mathematical Economics. Lecture Notes (in extracts) Prof. Dr. Frank Werner Faculty of Mathematics Institute of Mathematical Optimization (IMO) http://math.uni-magdeburg.de/ werner/math-ec-new.html Mathematical Economics Lecture Notes (in extracts) Winter

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx. Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Linear and non-linear programming

Linear and non-linear programming Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/ Optimization Methods Practice True/False Questions 2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

More information

Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

More information

Lecture 7: Convex Optimizations

Lecture 7: Convex Optimizations Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)

More information