# WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Size: px
Start display at page:

## Transcription

1 DUALITY

2 WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

3 LAGRANGIAN Simple case f(x) subject to Ax b =0 Saddle-point form min x max f(x)+h, Ax bi Lagrangian

4 SADDLE-POINT FORM Simple case f(x) subject to Ax b =0 max f(x)+h, Ax bi = Saddle-point form min x max f(x)+h, Ax bi Lagrangian ( f(x), Ax b =0 1, otherwise min max f(x)+h, Ax bi =min x x f(x)

5 INEQUALITY CONSTRAINTS f(x) subject to Ax b =0 Cx d apple 0 min x max, 0 f(x)+h, Ax bi + h,cx di non-negative constraint Why does this work? Cx d apple 0: =0

6 GENERAL FORM LAGRANGIAN f(x) subject to g(x) apple 0 h(x) =0 Lagrangian L(x,, ) =f(x)+h,h(x)i + h,g(x)i Saddle-point formulation f(x? )=min x max, 0 f(x)+h,h(x)i + h,g(x)i

7 CALCULUS INTERPRETATION f(x) subject to h(x) =0 gradient of objective parallel to gradient of constraint rf(x) =rh(x) rf(x)+rh(x) =0 contours of objective linear constraint

8 CALCULUS INTERPRETATION f(x) subject to h(x) =0 gradient of objective parallel to gradient of constraint rf(x) =rh(x) rf(x)+rh(x) =0 L(x, )=f(x)+h,h(x)i min x optimality x L(x, )=rf(x)+rh(x) =0

9 INEQUALITY CONDITIONS f(x) subject to g(x) apple 0 L(x, ) =f(x)+h,g(x)i Inactive: min x max 0 f(x)+h,g(x)i =0 Active: > 0 x? x?

10 INEQUALITY x L(x, )=rf(x)+rg(x) =0 Inactive: =0 Active: > 0

11 OPTIMALITY CONDITIONS: KKT SYSTEM f(x) subject to g(x) apple 0 h(x) =0 L(x,, ) =f(x)+h,h(x)i + h,g(x)i necessary conditions Karush-Kuhn-Tucker rf(x? )+ rh(x? )+ rg(x? )=0 Primal/dual optimality g(x? ) apple 0 h(x? )=0 0 i g i (x? )=0 } Primal feasibility Dual feasibility Comp slackness

12 DUAL FUNCTION d(, ) =min x L(x,, ) =min x f(x)+h,h(x)i + h,g(x)i Dual is concave? Why? Does this depend on convexity of f?

13 DUAL FUNCTION d(, ) =min x L(x,, ) =min x f(x)+h,h(x)i + h,g(x)i Dual is lower bound to optimal objective min x f(x)+h,h(x)i + h,g(x)i applef(x? )? Why? (because optimal x satisfies constraints)

14 GEOMETRIC INTERPRETATION OF LOWER BOUND f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) X (z) = ( 0, z apple 0 1, otherwise X 1 0

15 GEOMETRIC INTERPRETATION f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) X 1 linear lower bound X (z) hz, i, 0 0

16 GEOMETRIC INTERPRETATION f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) Lower bound f(x)+h,h(x)i + h,g(x)i

17 MAX OF DUAL max, 0 d(, ) = max min, 0 x f(x)+h,h(x)i + h,g(x)i??? L(x,, ) max, 0 d(, ) = max min, 0 x L(x,, ) =min x max, 0 L(x,, ) Solution Does maximizing dual = minimizing primal???

18 WEAK/STRONG DUALITY Weak duality max, 0 d(, ) apple f(x? ) Always holds: even for non-convex problems Strong duality max, 0 d(, ) =f(x? ) Holds most of the time for convex problems

19 SLATER S CONDITION f(x) subject to g(x) apple 0 Ax = b Slater s condition holds if there is a strictly feasible point Constraint qualification f(x) < 1 g(x) < 0 Ax = b Not strictly feasible Strictly feasible

20 SLATER S CONDITION f(x) subject to g(x) apple 0 Ax = b Slater s condition holds if there is a strictly feasible point f(x) < 1 g(x) < 0 Theorem Ax = b Convex+(linear equalities)+(slater s condition) = strong duality max min, 0 x L(x,, ) =min x max, 0 L(x,, ) max, 0 d(, ) =f(x? )

21 NON-HOMOGENOUS SVM w,b Choose line with largest margin: 2 kwk Push data to right side of line: 1 2 kwk2 + C X h[l i (d T i w b)] d T w b =1 h(l i [d T i w b]) d T w b =0

22 DUAL: THE HARD WAY w,b 1 2 kwk2 + C X h[l i (d T i w b)] w,b w,b 1 X 2 kwk2 + C max{1 l i (d T i w b), 0} Idea: make this differentiable 1 X 2 kwk2 + C max{1 L(Dw b), 0} w,b,x Standard form 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0

23 DUAL: THE HARD WAY w,b,x 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0 Positive multipliers 1 d(, )=min w,b,x2 kwk2 + Ch1,xi + h, 1 L(Dw b) xi + h, xi This is too complicated. Let s reduce it derivative for x C1 =0, with, 0 solution = C1, for 0 apple i apple C new constraint

24 DUAL: THE HARD WAY w,b,x 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0 1 d(, )=min w,b,x2 kwk2 + Ch1,xi + h, 1 L(Dw b) xi + h, xi hc1, xi d( ) =min w,b,x =min w,b,x = C1, for 0 apple i apple C 1 2 kwk2 + h, 1 L(Dw b)i 1 2 kwk2 + h, 1i h,ldwi h,lib

25 DUAL: THE HARD WAY d( ) =min w,b,x =min w,b,x w 1 2 kwk2 + h, 1 L(Dw b)i 1 2 kwk2 + h, 1i h,ldwi h,lib h,li =0 for w D T L =0, or w = D T L d( ) = 1 2 kdt L k 2 + h, 1i h,ldd T L i = 1 2 kdt L k 2 + h, 1i kd T L k 2 d( ) = 1 2 kdt L k 2 + h, 1i

26 ADVANTAGES OF DUAL w,b w = D T L Original problem 1 2 kwk2 + C max{1 L(Dw b), 0} non-differentiable Dual problem maximize d( ) = subject to 0 apple apple C h,li =0 1 2 kdt L k 2 + h, 1i smooth Solvable via coordinate descent LIBSVM

27 ADVANTAGES OF DUAL Dual problem maximize d( ) = 1 2 kdt L k 2 + h, 1i subject to 0 apple apple C h,li =0 kernel form d( ) = 1 2 T LDD T L kernel h, 1i subject to 0 apple apple C h,li =0

28 KERNELS (DD T ) ij = hd i,d j i = ( big, when d i and d j are close small, when d i and d j are far apart k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent Dual SVM: only need to know similarity function Kernel methods: replace inner product with some other similarity

29 EXAMPLE: TWO MOONS k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent far close

30 EXAMPLE: TWO MOONS k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent k(x, y) =exp kx yk far close

31 EXAMPLE: TWO MOONS subject to 0 apple apple C d( ) = 1 2 T LDD T L h,li =0 kernel h, 1i K ij = k(d i,d j ) d( ) = 1 2 T LKL h, 1i subject to 0 apple apple C h,li =0 Return to this later!!

32 CONJUGATE FUNCTION f (y) = max x y T x f(x) Is it convex?

33 CONJUGATE OF NORM f(x) =kxk dual norm f (y) = max x y T x kxk Holder inequality kyk, max x yt x/kxk y T x applekyk kxk kyk apple 1! f =0 kyk > 1! f = 1 why? why? f (y) = ( 0, kyk apple 1 1, otherwise

34 EXAMPLES f(x) =kxk 2, f (y) =X 2 (y) = f(x) = x, f (y) =X 1 (y) = f(x) =kxk 1, f (y) =X 1 (y) = ( 0, kxk 2 apple 1 1, kxk 2 > 1 ( 0, kxk 1 apple 1 1, kxk 1 > 1 ( 0, x apple 1 1, x > 1

35 HOW TO USE CONJUGATE g(x)+f(ax + b) write this as g(x)+f(y) subject to y = Ax + b d( )=min x,y g(x)+f(y)+h, Ax + b yi =min x,y g(x)+h, Axi + f(y) h,yi + h,bi = h,bi max x,y g(x) ha T,xi f(y)+h,yi d( )=h,bi g ( A T ) f ( )

36 EXAMPLE: LASSO µ x kax bk2 how big? g(x)+f(ax + b) d( )=h,bi g ( A T ) f ( ) note: (µj) (y) =µj (y/µ) maximize h,bi X 1 1 µ AT 1 2 k k2 change variables to eliminate negative sign maximize h,bi X 1 1 µ AT 1 2 k k2

37 EXAMPLE: LASSO µ x kax bk2 maximize h,bi X 1 1 µ AT 1 2 k k2 complete square 1 maximize 2 k bk kbk2 subject to ka T k 1 apple µ

38 EXAMPLE: LASSO µ x kax bk2 dual problem maximize k bk kbk2 subject to ka T k 1 apple µ Zero is a solution when the optimal objective is µ ka0 bk2 = 1 2 kbk2 This coincides with dual variable = b µ ka T bk

39 EXAMPLE: LASSO µ x kax bk2 Solution is zero when µ ka T bk µ guess = 1 10 kat bk

40 EXAMPLE: SVM g(x)+f(ax) w,b w x = b 1 2 kwk2 + h(ldw lb) A =[LD l] f(z) =h(z) =C X max{1 z i, 0} g(x) =g(w, b) = 1 2 kwk2

41 HOW TO USE CONJUGATE f(z) =h(z) =C X max{1 z i, 0} f ( ) = maxh,zi h(z) z = maxh,zi C max{0, 1 z} z = max z min{h,zi, h,zi C(1 z)} = max min{h,zi, h + C, zi C} z ( 1 T, i 2 [ C, 0] = 1, otherwise max occurs where two linear functions intersect

42 HOW TO USE CONJUGATE g(x) =g(w, b) = 1 2 kwk2 g (y) = max x hy, xi 1 2 kwk2 = max hy w,wi + hy b,bi w,b ( 1 2 ky wk 2,y b =0 1 2 kwk2 = 1, otherwise

43 SVM DUAL g(x)+f(ax) d( )= g ( A T ) f ( ) x = w b A =[LD l] g (y) = ( 1 2 ky wk 2,y b =0 1, otherwise f ( )= ( P i, 2 [ C, 0] 1, otherwise maximize = 1 2 kdt L k 2 1 T subject to l T =0 C apple apple 0 Same as before with

44 EXAMPLE: TV 1 2 kx fk2 + µ rx g(x)+f(ax) f(z) =µ z! f (y) =X 1 (y/µ) (µj) (y) =µj (y/µ) g(z) = 1 2 kz fk2! g (y) = 1 2 ky + 1 fk2 2 kfk2 form dual problem d( )= g ( A T ) f ( ) maximize 1 2 krt fk 2 X 1 ( /µ)

45 EXAMPLE: TV maximize 1 2 krt fk 2 X 1 ( /µ) 1 maximize 2 krt fk 2 subject to k k 1 apple µ smooth simple

46 NON-CONVEX PROBLEM: SPECTRAL CLUSTERING Non-linearly separable classes Two moons Swiss roll

47 SPECTRAL CLUSTERING Non-linearly separable classes Two moons (Dis)similarity matrix W ij = e kd i d j k 2 / 2 x i 2 { 1, 1}

48 LABELING PROBLEM x T Wx subject to x 2 i =1 Big W Small W different labels same label Is this convex? Why? How hard is this problem? Dual function d( )= x T Wx+ h,x 2 1i Is this convex? Why? How hard is this problem?

49 LABELING PROBLEM Dual function d( )= x T Wx+ h,x 2 1i X ix 2 i = x T diag( )x d( )= x T (W + diag( ))x h, 1i ( = h, 1i, W + diag( ) 0 1, otherwise

50 LABELING PROBLEM Dual function d( )= x T (W + diag( ))x h, 1i ( = h, 1i, W + diag( ) 0 1, otherwise Pick the dual vector to be smallest constant we can get away with i = min(w ), 8i d(x) =h min, 1i smallest eigenvector x = arg min x T (W mini)x + h min, 1i = e min

51 LABELING PROBLEM Approximate solution x = arg min x T (W mini)x + h min, 1i = e min Final step: integer rounding x? approx = round(e min )

52 OVERALL STRATEGY x T Wx subject to x 2 i =1 convex relaxation Why approximate? Dual function d( )= x T Wx+ h,x 2 1i Approximate solution x = arg min x T (W mini)x + h min, 1i = e min Final step: integer rounding x? approx = round(e min ) note: we could also definite SIMilarity matrix, and use largest eigenvalue

53 DO CODE EXAMPLE spectral clustering

54 NEWTON S METHOD smooth problem f(x) quadratic Approximation 1 2 (x xk ) T H(x x k )+hx x k,gi What about a constrained problem? f(x) subject to Ax = b

55 CONSTRAINED NEWTON Constrained problem f(x) subject to Ax = b 1 2 (x xk ) T H(x x k )+hx x k,gi subject to Ax = b L(x, )= 1 2 (x xk ) T H(x x k )+hx x k,gi + h, Ax bi KKT Conditions H(x x k )+g + A T =0 Ax = b

56 CONSTRAINED NEWTON KKT Conditions H(x x k )+g + A T =0 Ax = b Newton direction d = x x k H A T A 0 d = b g Ax k Solution is approximate r Solution satisfies constraints

57 NEWTON ALGORITHM Compute Hessian and gradient Solve Newton system H A T d A 0 = b g Ax k Armijo search Update iterate f(x k + d) apple f(x k )+ 10 gt d x k+1 = x k + d When stepsize=1, constraints are satisfied exactly

58 NEWTON COMPLEXITY f(x) subject to Ax = b Theorem Suppose the Hessian of f is Lipschitz continuous. Then after a finitely many constrained Newton steps the unit stepsize is an Armijo step, and ( = 1) kx k+1 x? kappleckx k x? k 2 Furthermore, the constraint are exactly satisfied.

### This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

### Support Vector Machines and Kernel Methods

2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

### Constrained Optimization

1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

### I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

### Nonlinear Optimization: What s important?

Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global

### Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

### Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

### 5. Duality. Lagrangian

5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### Lecture: Duality of LP, SOCP and SDP

1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

### Convex Optimization Boyd & Vandenberghe. 5. Duality

5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### Constrained Optimization and Lagrangian Duality

CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

### Lecture 18: Optimization Programming

Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

### Support Vector Machines

Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

### COM S 578X: Optimization for Machine Learning

COM S 578X: Optimization for Machine Learning Lecture Note 4: Duality Jia (Kevin) Liu Assistant Professor Department of Computer Science Iowa State University, Ames, Iowa, USA Fall 2018 JKL (CS@ISU) COM

### 10-725/ Optimization Midterm Exam

10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

### Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

Agenda 1 Duality for LP 2 Theorem of alternatives 3 Conic Duality 4 Dual cones 5 Geometric view of cone programs 6 Conic duality theorem 7 Examples Lower bounds on LPs By eliminating variables (if needed)

### Support Vector Machines

Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

### Support vector machines

Support vector machines Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support

### Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

### Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

### ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

### Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

### Convex Optimization M2

Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

### Lecture: Duality.

Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

### Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

### Convex Optimization and Modeling

Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

### Lecture 3. Optimization Problems and Iterative Algorithms

Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex

### Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

### CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

### EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

### Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

### Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

### Machine Learning A Geometric Approach

Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

### Introduction to Support Vector Machines

Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

### Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

### Generalization to inequality constrained problem. Maximize

Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum

### Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

Solutions Chapter 5 SECTION 5.1 5.1.4 www Throughout this exercise we will use the fact that strong duality holds for convex quadratic problems with linear constraints (cf. Section 3.4). The problem of

### Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

### Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

### Support Vector Machine

Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

### Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

### 5 Handling Constraints

5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

### Support Vector Machines

Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

### Numerical optimization

Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of non-rigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal

### Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

### UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

### Optimization for Machine Learning

Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

### Numerical Optimization

Numerical Optimization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Numerical Optimization Machine Learning

GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

### Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

1 Numerical optimization Alexander & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of

### Optimality Conditions for Constrained Optimization

72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

### Optimisation in Higher Dimensions

CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

### (Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

### Recita,on: Loss, Regulariza,on, and Dual*

10-701 Recita,on: Loss, Regulariza,on, and Dual* Jay- Yoon Lee 02/26/2015 *Adopted figures from 10725 lecture slides and from the book Elements of Sta,s,cal Learning Loss and Regulariza,on Op,miza,on problem

### A Brief Review on Convex Optimization

A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

### UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that

### minimize x subject to (x 2)(x 4) u,

Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

### Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

### Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

### Constrained optimization

Constrained optimization In general, the formulation of constrained optimization is as follows minj(w), subject to H i (w) = 0, i = 1,..., k. where J is the cost function and H i are the constraints. Lagrange

### Gradient Descent. Dr. Xiaowei Huang

Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

### Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 28, 2017 1 / 40 The Key Idea of Newton s Method Let f : R n R be a twice differentiable function

### Linear & nonlinear classifiers

Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

### Nonlinear Programming and the Kuhn-Tucker Conditions

Nonlinear Programming and the Kuhn-Tucker Conditions The Kuhn-Tucker (KT) conditions are first-order conditions for constrained optimization problems, a generalization of the first-order conditions we

### subject to (x 2)(x 4) u,

Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the

### Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

### Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

### Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

### Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

### Lecture 2: Linear SVM in the Dual

Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation

### Written Examination

Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

### ECON 4117/5111 Mathematical Economics

Test 1 September 23, 2016 1. Suppose that p and q are logical statements. The exclusive or, denoted by p Y q, is true when only one of p and q is true. (a) Construct the truth table of p Y q. (b) Prove

### Scientific Computing: Optimization

Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATH-GA.2043 or CSCI-GA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture

### EE/AA 578, Univ of Washington, Fall Duality

7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

### Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

### CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

### Support Vector Machines for Regression

COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x

### 1. f(β) 0 (that is, β is a feasible point for the constraints)

xvi 2. The lasso for linear models 2.10 Bibliographic notes Appendix Convex optimization with constraints In this Appendix we present an overview of convex optimization concepts that are particularly useful

### Applications of Linear Programming

Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

### Mathematical Economics. Lecture Notes (in extracts)

Prof. Dr. Frank Werner Faculty of Mathematics Institute of Mathematical Optimization (IMO) http://math.uni-magdeburg.de/ werner/math-ec-new.html Mathematical Economics Lecture Notes (in extracts) Winter

### Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

### Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

### Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

### Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

### Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION - SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If

### Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

### Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

### CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics

### E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program

### Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

### Linear and non-linear programming

Linear and non-linear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)

### 4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality

### Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

### Support Vector Machines

EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

### 2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

### Numerical Optimization of Partial Differential Equations

Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada

### Lecture 7: Convex Optimizations

Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1