WHY DUALITY? Gradient descent Newton s method Quasinewton Conjugate gradients. No constraints. Nondifferentiable ???? Constrained problems? ????


 Abel Kennedy
 1 years ago
 Views:
Transcription
1 DUALITY
2 WHY DUALITY? No constraints f(x) Nondifferentiable f(x) Gradient descent Newton s method Quasinewton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
3 LAGRANGIAN Simple case f(x) subject to Ax b =0 Saddlepoint form min x max f(x)+h, Ax bi Lagrangian
4 SADDLEPOINT FORM Simple case f(x) subject to Ax b =0 max f(x)+h, Ax bi = Saddlepoint form min x max f(x)+h, Ax bi Lagrangian ( f(x), Ax b =0 1, otherwise min max f(x)+h, Ax bi =min x x f(x)
5 INEQUALITY CONSTRAINTS f(x) subject to Ax b =0 Cx d apple 0 min x max, 0 f(x)+h, Ax bi + h,cx di nonnegative constraint Why does this work? Cx d apple 0: =0
6 GENERAL FORM LAGRANGIAN f(x) subject to g(x) apple 0 h(x) =0 Lagrangian L(x,, ) =f(x)+h,h(x)i + h,g(x)i Saddlepoint formulation f(x? )=min x max, 0 f(x)+h,h(x)i + h,g(x)i
7 CALCULUS INTERPRETATION f(x) subject to h(x) =0 gradient of objective parallel to gradient of constraint rf(x) =rh(x) rf(x)+rh(x) =0 contours of objective linear constraint
8 CALCULUS INTERPRETATION f(x) subject to h(x) =0 gradient of objective parallel to gradient of constraint rf(x) =rh(x) rf(x)+rh(x) =0 L(x, )=f(x)+h,h(x)i min x optimality x L(x, )=rf(x)+rh(x) =0
9 INEQUALITY CONDITIONS f(x) subject to g(x) apple 0 L(x, ) =f(x)+h,g(x)i Inactive: min x max 0 f(x)+h,g(x)i =0 Active: > 0 x? x?
10 INEQUALITY x L(x, )=rf(x)+rg(x) =0 Inactive: =0 Active: > 0
11 OPTIMALITY CONDITIONS: KKT SYSTEM f(x) subject to g(x) apple 0 h(x) =0 L(x,, ) =f(x)+h,h(x)i + h,g(x)i necessary conditions KarushKuhnTucker rf(x? )+ rh(x? )+ rg(x? )=0 Primal/dual optimality g(x? ) apple 0 h(x? )=0 0 i g i (x? )=0 } Primal feasibility Dual feasibility Comp slackness
12 DUAL FUNCTION d(, ) =min x L(x,, ) =min x f(x)+h,h(x)i + h,g(x)i Dual is concave? Why? Does this depend on convexity of f?
13 DUAL FUNCTION d(, ) =min x L(x,, ) =min x f(x)+h,h(x)i + h,g(x)i Dual is lower bound to optimal objective min x f(x)+h,h(x)i + h,g(x)i applef(x? )? Why? (because optimal x satisfies constraints)
14 GEOMETRIC INTERPRETATION OF LOWER BOUND f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) X (z) = ( 0, z apple 0 1, otherwise X 1 0
15 GEOMETRIC INTERPRETATION f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) X 1 linear lower bound X (z) hz, i, 0 0
16 GEOMETRIC INTERPRETATION f(x) subject to g(x) apple 0 h(x) =0 Implicit constraints f(x)+x 0 (h(x)) + X (g(x)) Lower bound f(x)+h,h(x)i + h,g(x)i
17 MAX OF DUAL max, 0 d(, ) = max min, 0 x f(x)+h,h(x)i + h,g(x)i??? L(x,, ) max, 0 d(, ) = max min, 0 x L(x,, ) =min x max, 0 L(x,, ) Solution Does maximizing dual = minimizing primal???
18 WEAK/STRONG DUALITY Weak duality max, 0 d(, ) apple f(x? ) Always holds: even for nonconvex problems Strong duality max, 0 d(, ) =f(x? ) Holds most of the time for convex problems
19 SLATER S CONDITION f(x) subject to g(x) apple 0 Ax = b Slater s condition holds if there is a strictly feasible point Constraint qualification f(x) < 1 g(x) < 0 Ax = b Not strictly feasible Strictly feasible
20 SLATER S CONDITION f(x) subject to g(x) apple 0 Ax = b Slater s condition holds if there is a strictly feasible point f(x) < 1 g(x) < 0 Theorem Ax = b Convex+(linear equalities)+(slater s condition) = strong duality max min, 0 x L(x,, ) =min x max, 0 L(x,, ) max, 0 d(, ) =f(x? )
21 NONHOMOGENOUS SVM w,b Choose line with largest margin: 2 kwk Push data to right side of line: 1 2 kwk2 + C X h[l i (d T i w b)] d T w b =1 h(l i [d T i w b]) d T w b =0
22 DUAL: THE HARD WAY w,b 1 2 kwk2 + C X h[l i (d T i w b)] w,b w,b 1 X 2 kwk2 + C max{1 l i (d T i w b), 0} Idea: make this differentiable 1 X 2 kwk2 + C max{1 L(Dw b), 0} w,b,x Standard form 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0
23 DUAL: THE HARD WAY w,b,x 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0 Positive multipliers 1 d(, )=min w,b,x2 kwk2 + Ch1,xi + h, 1 L(Dw b) xi + h, xi This is too complicated. Let s reduce it derivative for x C1 =0, with, 0 solution = C1, for 0 apple i apple C new constraint
24 DUAL: THE HARD WAY w,b,x 1 2 kwk2 + Ch1,xi subject to x 1 L(Dw b) x 0 1 d(, )=min w,b,x2 kwk2 + Ch1,xi + h, 1 L(Dw b) xi + h, xi hc1, xi d( ) =min w,b,x =min w,b,x = C1, for 0 apple i apple C 1 2 kwk2 + h, 1 L(Dw b)i 1 2 kwk2 + h, 1i h,ldwi h,lib
25 DUAL: THE HARD WAY d( ) =min w,b,x =min w,b,x w 1 2 kwk2 + h, 1 L(Dw b)i 1 2 kwk2 + h, 1i h,ldwi h,lib h,li =0 for w D T L =0, or w = D T L d( ) = 1 2 kdt L k 2 + h, 1i h,ldd T L i = 1 2 kdt L k 2 + h, 1i kd T L k 2 d( ) = 1 2 kdt L k 2 + h, 1i
26 ADVANTAGES OF DUAL w,b w = D T L Original problem 1 2 kwk2 + C max{1 L(Dw b), 0} nondifferentiable Dual problem maximize d( ) = subject to 0 apple apple C h,li =0 1 2 kdt L k 2 + h, 1i smooth Solvable via coordinate descent LIBSVM
27 ADVANTAGES OF DUAL Dual problem maximize d( ) = 1 2 kdt L k 2 + h, 1i subject to 0 apple apple C h,li =0 kernel form d( ) = 1 2 T LDD T L kernel h, 1i subject to 0 apple apple C h,li =0
28 KERNELS (DD T ) ij = hd i,d j i = ( big, when d i and d j are close small, when d i and d j are far apart k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent Dual SVM: only need to know similarity function Kernel methods: replace inner product with some other similarity
29 EXAMPLE: TWO MOONS k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent far close
30 EXAMPLE: TWO MOONS k(d i,d j )= kernel function ( big, when d i and d j are similar small, when d i and d j are di erent k(x, y) =exp kx yk far close
31 EXAMPLE: TWO MOONS subject to 0 apple apple C d( ) = 1 2 T LDD T L h,li =0 kernel h, 1i K ij = k(d i,d j ) d( ) = 1 2 T LKL h, 1i subject to 0 apple apple C h,li =0 Return to this later!!
32 CONJUGATE FUNCTION f (y) = max x y T x f(x) Is it convex?
33 CONJUGATE OF NORM f(x) =kxk dual norm f (y) = max x y T x kxk Holder inequality kyk, max x yt x/kxk y T x applekyk kxk kyk apple 1! f =0 kyk > 1! f = 1 why? why? f (y) = ( 0, kyk apple 1 1, otherwise
34 EXAMPLES f(x) =kxk 2, f (y) =X 2 (y) = f(x) = x, f (y) =X 1 (y) = f(x) =kxk 1, f (y) =X 1 (y) = ( 0, kxk 2 apple 1 1, kxk 2 > 1 ( 0, kxk 1 apple 1 1, kxk 1 > 1 ( 0, x apple 1 1, x > 1
35 HOW TO USE CONJUGATE g(x)+f(ax + b) write this as g(x)+f(y) subject to y = Ax + b d( )=min x,y g(x)+f(y)+h, Ax + b yi =min x,y g(x)+h, Axi + f(y) h,yi + h,bi = h,bi max x,y g(x) ha T,xi f(y)+h,yi d( )=h,bi g ( A T ) f ( )
36 EXAMPLE: LASSO µ x kax bk2 how big? g(x)+f(ax + b) d( )=h,bi g ( A T ) f ( ) note: (µj) (y) =µj (y/µ) maximize h,bi X 1 1 µ AT 1 2 k k2 change variables to eliminate negative sign maximize h,bi X 1 1 µ AT 1 2 k k2
37 EXAMPLE: LASSO µ x kax bk2 maximize h,bi X 1 1 µ AT 1 2 k k2 complete square 1 maximize 2 k bk kbk2 subject to ka T k 1 apple µ
38 EXAMPLE: LASSO µ x kax bk2 dual problem maximize k bk kbk2 subject to ka T k 1 apple µ Zero is a solution when the optimal objective is µ ka0 bk2 = 1 2 kbk2 This coincides with dual variable = b µ ka T bk
39 EXAMPLE: LASSO µ x kax bk2 Solution is zero when µ ka T bk µ guess = 1 10 kat bk
40 EXAMPLE: SVM g(x)+f(ax) w,b w x = b 1 2 kwk2 + h(ldw lb) A =[LD l] f(z) =h(z) =C X max{1 z i, 0} g(x) =g(w, b) = 1 2 kwk2
41 HOW TO USE CONJUGATE f(z) =h(z) =C X max{1 z i, 0} f ( ) = maxh,zi h(z) z = maxh,zi C max{0, 1 z} z = max z min{h,zi, h,zi C(1 z)} = max min{h,zi, h + C, zi C} z ( 1 T, i 2 [ C, 0] = 1, otherwise max occurs where two linear functions intersect
42 HOW TO USE CONJUGATE g(x) =g(w, b) = 1 2 kwk2 g (y) = max x hy, xi 1 2 kwk2 = max hy w,wi + hy b,bi w,b ( 1 2 ky wk 2,y b =0 1 2 kwk2 = 1, otherwise
43 SVM DUAL g(x)+f(ax) d( )= g ( A T ) f ( ) x = w b A =[LD l] g (y) = ( 1 2 ky wk 2,y b =0 1, otherwise f ( )= ( P i, 2 [ C, 0] 1, otherwise maximize = 1 2 kdt L k 2 1 T subject to l T =0 C apple apple 0 Same as before with
44 EXAMPLE: TV 1 2 kx fk2 + µ rx g(x)+f(ax) f(z) =µ z! f (y) =X 1 (y/µ) (µj) (y) =µj (y/µ) g(z) = 1 2 kz fk2! g (y) = 1 2 ky + 1 fk2 2 kfk2 form dual problem d( )= g ( A T ) f ( ) maximize 1 2 krt fk 2 X 1 ( /µ)
45 EXAMPLE: TV maximize 1 2 krt fk 2 X 1 ( /µ) 1 maximize 2 krt fk 2 subject to k k 1 apple µ smooth simple
46 NONCONVEX PROBLEM: SPECTRAL CLUSTERING Nonlinearly separable classes Two moons Swiss roll
47 SPECTRAL CLUSTERING Nonlinearly separable classes Two moons (Dis)similarity matrix W ij = e kd i d j k 2 / 2 x i 2 { 1, 1}
48 LABELING PROBLEM x T Wx subject to x 2 i =1 Big W Small W different labels same label Is this convex? Why? How hard is this problem? Dual function d( )= x T Wx+ h,x 2 1i Is this convex? Why? How hard is this problem?
49 LABELING PROBLEM Dual function d( )= x T Wx+ h,x 2 1i X ix 2 i = x T diag( )x d( )= x T (W + diag( ))x h, 1i ( = h, 1i, W + diag( ) 0 1, otherwise
50 LABELING PROBLEM Dual function d( )= x T (W + diag( ))x h, 1i ( = h, 1i, W + diag( ) 0 1, otherwise Pick the dual vector to be smallest constant we can get away with i = min(w ), 8i d(x) =h min, 1i smallest eigenvector x = arg min x T (W mini)x + h min, 1i = e min
51 LABELING PROBLEM Approximate solution x = arg min x T (W mini)x + h min, 1i = e min Final step: integer rounding x? approx = round(e min )
52 OVERALL STRATEGY x T Wx subject to x 2 i =1 convex relaxation Why approximate? Dual function d( )= x T Wx+ h,x 2 1i Approximate solution x = arg min x T (W mini)x + h min, 1i = e min Final step: integer rounding x? approx = round(e min ) note: we could also definite SIMilarity matrix, and use largest eigenvalue
53 DO CODE EXAMPLE spectral clustering
54 NEWTON S METHOD smooth problem f(x) quadratic Approximation 1 2 (x xk ) T H(x x k )+hx x k,gi What about a constrained problem? f(x) subject to Ax = b
55 CONSTRAINED NEWTON Constrained problem f(x) subject to Ax = b 1 2 (x xk ) T H(x x k )+hx x k,gi subject to Ax = b L(x, )= 1 2 (x xk ) T H(x x k )+hx x k,gi + h, Ax bi KKT Conditions H(x x k )+g + A T =0 Ax = b
56 CONSTRAINED NEWTON KKT Conditions H(x x k )+g + A T =0 Ax = b Newton direction d = x x k H A T A 0 d = b g Ax k Solution is approximate r Solution satisfies constraints
57 NEWTON ALGORITHM Compute Hessian and gradient Solve Newton system H A T d A 0 = b g Ax k Armijo search Update iterate f(x k + d) apple f(x k )+ 10 gt d x k+1 = x k + d When stepsize=1, constraints are satisfied exactly
58 NEWTON COMPLEXITY f(x) subject to Ax = b Theorem Suppose the Hessian of f is Lipschitz continuous. Then after a finitely many constrained Newton steps the unit stepsize is an Armijo step, and ( = 1) kx k+1 x? kappleckx k x? k 2 Furthermore, the constraint are exactly satisfied.
This can be 2 lectures! still need: Examples: nonconvex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: nonconvex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationI.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec  Spring 2010
I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec  Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equalityconstrained Optimization Inequalityconstrained Optimization Mixtureconstrained Optimization 3 Quadratic Programming
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More informationCOM S 578X: Optimization for Machine Learning
COM S 578X: Optimization for Machine Learning Lecture Note 4: Duality Jia (Kevin) Liu Assistant Professor Department of Computer Science Iowa State University, Ames, Iowa, USA Fall 2018 JKL (CS@ISU) COM
More information10725/ Optimization Midterm Exam
10725/36725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single twosided sheet of notes, no other material or discussion is permitted
More informationAgenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem
Agenda 1 Duality for LP 2 Theorem of alternatives 3 Conic Duality 4 Dual cones 5 Geometric view of cone programs 6 Conic duality theorem 7 Examples Lower bounds on LPs By eliminating variables (if needed)
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSupport vector machines
Support vector machines Guillaume Obozinski Ecole des Ponts  ParisTech SOCN course 2014 SVM, kernel methods and multiclass 1/23 Outline 1 Constrained optimization, Lagrangian duality and KKT 2 Support
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and FuJie Huang 1 Outline What is behind
More informationDuality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities
Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form
More informationICSE4030 Kernel Methods in Machine Learning
ICSE4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt2016fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 00 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual
More informationLecture 3. Optimization Problems and Iterative Algorithms
Lecture 3 Optimization Problems and Iterative Algorithms January 13, 2016 This material was jointly developed with Angelia Nedić at UIUC for IE 598ns Outline Special Functions: Linear, Quadratic, Convex
More informationIntroduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,
More informationCSE4830 Kernel Methods in Machine Learning
CSE4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jeanmarc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jeanmarc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of knn, KMeans,
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationPart 4: Activeset methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Activeset methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to nonlinear
More informationKernel Machines. Pradeep Ravikumar Coinstructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Coinstructor: Manuela Veloso Machine Learning 10701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a ddimensional vector Primal problem:
More informationGeneralization to inequality constrained problem. Maximize
Lecture 11. 26 September 2006 Review of Lecture #10: Second order optimality conditions necessary condition, sufficient condition. If the necessary condition is violated the point cannot be a local minimum
More informationSolutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.
Solutions Chapter 5 SECTION 5.1 5.1.4 www Throughout this exercise we will use the fact that strong duality holds for convex quadratic problems with linear constraints (cf. Section 3.4). The problem of
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very wellwritten and a pleasure to read. The
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 20182019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationNumerical optimization
Numerical optimization Lecture 4 Alexander & Michael Bronstein tosca.cs.technion.ac.il/book Numerical geometry of nonrigid shapes Stanford University, Winter 2009 2 Longest Slowest Shortest Minimal Maximal
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10725 / 36725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationUNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems
UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms  A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationNumerical Optimization
Numerical Optimization ShanHung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning ShanHung Wu (CS, NTHU) Numerical Optimization Machine Learning
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Isocontours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationNumerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems
1 Numerical optimization Alexander & Michael Bronstein, 20062009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book Numerical optimization 048921 Advanced topics in vision Processing and Analysis of
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationOptimisation in Higher Dimensions
CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationRecita,on: Loss, Regulariza,on, and Dual*
10701 Recita,on: Loss, Regulariza,on, and Dual* Jay Yoon Lee 02/26/2015 *Adopted figures from 10725 lecture slides and from the book Elements of Sta,s,cal Learning Loss and Regulariza,on Op,miza,on problem
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009
UC Berkeley Department of Electrical Engineering and Computer Science EECS 227A Nonlinear and Convex Optimization Solutions 5 Fall 2009 Reading: Boyd and Vandenberghe, Chapter 5 Solution 5.1 Note that
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 function with f () on (, L) and that you have explicit formulae for
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a ndimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationLectures 9 and 10: Constrained optimization problems and their optimality conditions
Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained
More informationConstrained optimization
Constrained optimization In general, the formulation of constrained optimization is as follows minj(w), subject to H i (w) = 0, i = 1,..., k. where J is the cost function and H i are the constraints. Lagrange
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning knn linear regression only optimization objectives are discussed,
More informationOptimization. YuhJye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40
Optimization YuhJye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 28, 2017 1 / 40 The Key Idea of Newton s Method Let f : R n R be a twice differentiable function
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationNonlinear Programming and the KuhnTucker Conditions
Nonlinear Programming and the KuhnTucker Conditions The KuhnTucker (KT) conditions are firstorder conditions for constrained optimization problems, a generalization of the firstorder conditions we
More informationsubject to (x 2)(x 4) u,
Exercises Basic definitions 5.1 A simple example. Consider the optimization problem with variable x R. minimize x 2 + 1 subject to (x 2)(x 4) 0, (a) Analysis of primal problem. Give the feasible set, the
More informationOptimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of WisconsinMadison May 2014 Wright (UWMadison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470  Convex Optimization Fall 201819, HKUST, Hong Kong Outline of Lecture Subgradients
More informationLecture 2: Linear SVM in the Dual
Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202220 Time: 4:009:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationECON 4117/5111 Mathematical Economics
Test 1 September 23, 2016 1. Suppose that p and q are logical statements. The exclusive or, denoted by p Y q, is true when only one of p and q is true. (a) Construct the truth table of p Y q. (b) Prove
More informationScientific Computing: Optimization
Scientific Computing: Optimization Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 Course MATHGA.2043 or CSCIGA.2112, Spring 2012 March 8th, 2011 A. Donev (Courant Institute) Lecture
More informationEE/AA 578, Univ of Washington, Fall Duality
7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de MatemáticasUniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de MatemáticasUniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationCSCI : Optimization and Control of Networks. Review on Convex Optimization
CSCI7000016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one
More informationSupport Vector Machines for Regression
COMP566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x
More information1. f(β) 0 (that is, β is a feasible point for the constraints)
xvi 2. The lasso for linear models 2.10 Bibliographic notes Appendix Convex optimization with constraints In this Appendix we present an overview of convex optimization concepts that are particularly useful
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Nonlinear programming In case of LP, the goal
More informationMathematical Economics. Lecture Notes (in extracts)
Prof. Dr. Frank Werner Faculty of Mathematics Institute of Mathematical Optimization (IMO) http://math.unimagdeburg.de/ werner/mathecnew.html Mathematical Economics Lecture Notes (in extracts) Winter
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationQuiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006
Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSAlab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationTwo hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.
Two hours To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER CONVEX OPTIMIZATION  SOLUTIONS xx xxxx 27 xx:xx xx.xx Answer THREE of the FOUR questions. If
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationLinearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学
Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks
More informationCONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS
CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS A Dissertation Submitted For The Award of the Degree of Master of Philosophy in Mathematics Neelam Patel School of Mathematics
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming
E5295/5B5749 Convex optimization with engineering applications Lecture 5 Convex programming and semidefinite programming A. Forsgren, KTH 1 Lecture 5 Convex optimization 2006/2007 Convex quadratic program
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. ChihJen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems ChihJen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationLinear and nonlinear programming
Linear and nonlinear programming Benjamin Recht March 11, 2005 The Gameplan Constrained Optimization Convexity Duality Applications/Taxonomy 1 Constrained Optimization minimize f(x) subject to g j (x)
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Duality in Nonlinear Optimization ) Tamás TERLAKY Computing and Software McMaster University Hamilton, January 2004 terlaky@mcmaster.ca Tel: 27780 Optimality
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1  April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More information2.098/6.255/ Optimization Methods Practice True/False Questions
2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 13 line supporting sentence
More informationNumerical Optimization of Partial Differential Equations
Numerical Optimization of Partial Differential Equations Part I: basic optimization concepts in R n Bartosz Protas Department of Mathematics & Statistics McMaster University, Hamilton, Ontario, Canada
More informationLecture 7: Convex Optimizations
Lecture 7: Convex Optimizations Radu Balan, David Levermore March 29, 2018 Convex Sets. Convex Functions A set S R n is called a convex set if for any points x, y S the line segment [x, y] := {tx + (1
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for nonlinearly separable
More informationSupport Vector Machines
Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)
More information