COM S 578X: Optimization for Machine Learning

Similar documents
5. Duality. Lagrangian

Convex Optimization M2

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture: Duality.

Lecture: Duality of LP, SOCP and SDP

EE/AA 578, Univ of Washington, Fall Duality

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Convex Optimization & Lagrange Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

subject to (x 2)(x 4) u,

COM S 672: Advanced Topics in Computational Models of Learning Optimization for Learning

Constrained Optimization and Lagrangian Duality

COM S 578X: Optimization for Machine Learning

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture 8. Strong Duality Results. September 22, 2008

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex Optimization and Modeling

Tutorial on Convex Optimization for Engineers Part II

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Linear and non-linear programming

Convex Optimization M2

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Convex Optimization and Support Vector Machine

ICS-E4030 Kernel Methods in Machine Learning

Optimality Conditions for Constrained Optimization

Additional Homework Problems

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture 18: Optimization Programming

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

CS-E4830 Kernel Methods in Machine Learning

EE 227A: Convex Optimization and Applications October 14, 2008

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Lagrangian Duality Theory

Generalization to inequality constrained problem. Maximize

Semidefinite Programming Basics and Applications

Constrained Optimization Theory

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

minimize x subject to (x 2)(x 4) u,

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Lecture 7: Convex Optimizations

Support Vector Machines: Maximum Margin Classifiers

LECTURE 13 LECTURE OUTLINE

Constrained Optimization

Maximum likelihood estimation

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Finite Dimensional Optimization Part III: Convex Optimization 1

Duality Theory of Constrained Optimization

Primal/Dual Decomposition Methods

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

EE364a Review Session 5

12. Interior-point methods

10-725/ Optimization Midterm Exam

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Convex Optimization and SVM

Homework Set #6 - Solutions

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

CS , Fall 2011 Assignment 2 Solutions

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Optimization for Machine Learning

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Optimization Theory. Lectures 4-6

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Lecture Notes on Support Vector Machine

Applications of Linear Programming

Optimality, Duality, Complementarity for Constrained Optimization

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Constrained optimization

Sufficient Conditions for Finite-variable Constrained Minimization

4TE3/6TE3. Algorithms for. Continuous Optimization

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

2.098/6.255/ Optimization Methods Practice True/False Questions

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

LECTURE 10 LECTURE OUTLINE

Support Vector Machines

Lecture Support Vector Machine (SVM) Classifiers

Lagrange Relaxation and Duality

Nonlinear Optimization: What s important?

Convex Functions. Pontus Giselsson

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Recita,on: Loss, Regulariza,on, and Dual*

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

5 Handling Constraints

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

4. Algebra and Duality

Transcription:

COM S 578X: Optimization for Machine Learning Lecture Note 4: Duality Jia (Kevin) Liu Assistant Professor Department of Computer Science Iowa State University, Ames, Iowa, USA Fall 2018 JKL (CS@ISU) COM S 578X: Lecture 4 1/21

Outline In this lecture: Lagrange dual problem Weak and strong duality Geometric interpretation Examples in machine learning JKL (CS@ISU) COM S 578X: Lecture 4 2/21

The Lagrangian Standard optimization problem (may or may not be convex): convex Opt : Minimize f(x) it :t convex P : Vi dwdvarormk9471es convex fi g subject to g i (x) apple 0, i =1,,m Ui > o Hi hit 'AE h j (x) =0, j =1,,p Uj EIR 'tj variable Pis x convex 2 R n, opt hike? domain D, optimal value p Lagrangian: L: R n R m R p! R, withdom(l) =D R m R p : mx px L(x, u, v) =f(x)+ u i g i (x)+ v j h j (x) i=1 j=1 weighted sum of objective and constraint functions u i 0 is dual variable (Lagrangian multiplier) associated with g i (x) apple 0 v j 2 R is dual variable (Lagrangian multiplier) associated with h j (x) =0 JKL (CS@ISU) COM S 578X: Lecture 4 3/21

' value ( Lagrangian Dual Function Lagrangian dual function: :R m R p! R: (u, v) = inf L(x, u, v) x2d 0 mx It = inf @f(x)+ u i g i (x)+ (u, v) is concave in u, v, canbe 1 for some u, v bk its ' a pt Lower bound property: If u Proof If x 2Dand u 20<0 =O px t v j h j (x) A x2d i=1 j=1 we EO wise 0, then (u, v) apple p minimum Wirt y 1,1 ) opt of Primal problem by + to ", detyninj, t 4 0, then f( x) L( x, u, v) inf L(x, u, v) = (u, v) x2d Since this holds for any x 2D, minimizing over all x yields p 1 of altmetns DET Lagrangian dual (u, v) JKL (CS@ISU) COM S 578X: Lecture 4 4/21

Example: Leastnorm solution of linear equations Minimize x > x subject to Ax = b I EIR " Dual function: Lagrangian is: L(x, v) =x > x + v > (Ax b) To minimize L(x, v) over x, set gradient equal to zero r x L(x, v) =2x + A > v =0 =) x = 1 2 A> v Plug it in L to obtain : (v) =L( which is clearly a concave function of v it 1 2 A> v, v) = 1 4 v> AA > v b > v, NSD Eo Lower bound property: p 1 4 v> AA > v b > v for all v Largest LB JKL (CS@ISU) COM S 578X: Lecture 4 5/21

Example: Linear Programming Dual function: Lagrangian is: Minimize c > x subject to Ax = b, x 0 a a IEIR " YEIR "+ L is a ne in x, hence L(x, u, v) =c > x + v > (Ax b) u > x (u, v) =infl(x, u, v) = x = b > v +(c + A > v u) > x 3 w : ni, ( b > v A > v u + c =0 1 otherwise = = at is linear on a ne domain {(u, v) :A > v u + c = 0}, hence concave Lower bound property: p b > v if A > v + c 0 w win + case JKL (CS@ISU) COM S 578X: Lecture 4 6/21

Example: Equality Constrained Norm Minimization Minimize kxk subject to Ax = b Dual function: (v) =inf x (kxk v> Ax + b > v)= ( b > v ka > vk apple 1 1 otherwise = bit tenets { MI where kvk, sup kukapple1 u > v (referred to as dual Effeya } = EK see norm of k k), { I # a Proof It follows from the fact that inf x (kxk y > x)=0if kyk apple 1, 1 otherwise If kyk apple 1, thenkxk y > x 0, 8x, withequalityifx = 0 ( Cauchy schwatd, if kyk > 1, choose x = tu, whereu satisfies kuk apple1, u > y = kyk > 1: kxk y > x = t(kuk kyk )! 1 as t!1 Lower bound property: p b > v if ka > vk apple 1 T > 0 = E I > 1 we I EIR " thea JKL (CS@ISU) COM S 578X: Lecture 4 7/21

NP Example: Twoway Partitioning Minimize x > Wx = IEFe, c subject to x 2 i =1, i =1,,n Ii E { I, 13 A nonconvex problem; feasible set contains 2 n, ti Hard discrete points Interpretation: partition {1,,n} into two sets; (W) ij is cost of assigning i, j to the same set; (W) ij is cost of assigning to di erent sets Dual function: (v) =inf x (x> Wx + X i Lower bound property: p v i (x 2 i symmetric 1)) = inf x {x> W + Diag(v) x 1 > v} ( 1 > v W+ Diag{v} 0 = 1 otherwise 1 > v if W + Diag{v} 0 Example: v = min(w)1 gives nontrivial bound p n min (W) a " xilhflijnj diagonal matrix I ER " # ( l ;] : t atuitttfivnl =±y YII* ' JKL (CS@ISU) COM S 578X: Lecture 4 8/21

Lagrangian Dual and Conjugate Function Dual function: Minimize f(x) subject to Ax apple b, Cx = d erm+ * ytern (u, v) = inf x2dom(f) f(x)+(a > u + C > v) > x b > u d > v = f ( A > u C > v) b > u d > v Legendre Fenehelconjngatekonvex conjugate Definition of conjugate function: f (y) =sup x2dom(f) y > x f(x) Simplifies derivation of dual if conjugate of f is known Example: Entropy maximization f(x) = nx x i log x i, f (y) = i=1 nx exp(y i 1) i=1 JKL (CS@ISU) COM S 578X: Lecture 4 9/21

The Lagrangian Dual Problem Maximize (u, v) subject to u 0 if Min EQ OH, v ) Finds largest lower bound on p, obtained from Lagrangian dual function A convex optimization problem; optimal value denoted d u, v are dual feasible if u 0, (u, v) 2 dom( ) Often simplified by making implicit constraint u 0, (u, v) 2 dom( ) explicit Example: Standard form LP and its dual: Minimize c > x subject to Ax = b, x 0 Maximize b > v subject to A > v + c 0 More on this later JKL (CS@ISU) COM S 578X: Lecture 4 10 / 21

Weak and Strong Duality Weak duality: d apple p Always holds (for convex and nonconvex problems) Can be used to find nontrivial lower bounds for di cult problems For example, solving SDP semi definite Programming Maximize subject to 1 > v W + Diag{v} 0 yields a lower bound for the twoway partitioning problem Strong duality: d = p Does not hold in general Usually hold for convex problems mild ( under certaihhonditions ) Conditions that guarantee strong duality in convex problems are called constraint qualifications ( CQ ) JKL (CS@ISU) COM S 578X: Lecture 4 11 / 21

Slater s Constraint Qualification Strong duality holds for a convex problem Minimize f(x) subject to g i (x) apple 0, Ax = b i =1,,m if it is strictly feasible, ie, 9x 2 intd : g i (x) < 0, i =1,, m, Ax = b Also guarantees that the dual optimum is attained (if p > 1) Can be further relaxed: eg, can replace intd with relintd (interior relative to a ne hull); linear inequalities do not need to hold with strict inequality, There exist many other types of constraint qualifications (see [BSS, Ch 5]) JKL (CS@ISU) COM S 578X: Lecture 4 12 / 21

Other WellKnown CQs LICQ (Linear independence CQ): Gradients of active inequality constraints and gradients of equality constraints are linearly independent at x MFCQ (ManasarianFromovitz CQ): Gradients of equality constraints are LI at x, 9d 2 R n such that rg > i (x )d < 0 for active inequality constraints, rh > j (x )d =0for equality constraints CRCQ (Constant rank CQ): For each subset of gradients of active inequality constraints & gradients of equality constraints, the rank at x s vicinity is constant CPLD (Constant positive linear dependence CQ): For each subset of gradients of active inequality constraints & gradients of equality constraints, if it s positivelinear dep at x then it s positivelinear dependent in x s vicinity QNCQ (Quasinormality CQ): If gradients of active inequality constraints and gradients of equality constraints are positivelylinearly dependent at x with duals u i for inequalities and v j for equalities, then there is no sequence x k! x such that v j 6=0) v j h j (x k ) > 0 and u i 6=0) u i g i (x k ) < 0 JKL (CS@ISU) COM S 578X: Lecture 4 13 / 21

value 9 " value Geometric Interpretation Mi?fg function " c Bssr op g) peg, sit, p*=pco, For ease of exposition, consider problem with one constraint g(x) apple 0 Interpretation of dual function P sent lines 11 11 I Perturbed (u) = inf (uq + p), where G = {(g(x),f(x)) x 2D} (q,p)2g t ) p hp =p # obj T q pcobj ppc ) E HI, ptxaicm I * y " c RAS ) q uq + p = (u) is an supporting hyperplane to set G with slope u Hyperplane intersects paxis at p = (u) JKL (CS@ISU) COM S 578X: Lecture 4 14 / 21

Geometric Interpretation Epigraph variation: Same interpretation if G is replaced with A = {(q, p) g(x) apple q, f(x) apple p for some x 2D} v : Strong duality Holds if there is a nonvertical supporting hyperplane to A at (0,p ) For convex problem, A is convex, hence has supporting hyperplane at (0,p ) Slater s condition: If there exist ( q, p) 2Awith q <0, then supporting hyperplanes at (0,p ) must be nonvertical JKL (CS@ISU) COM S 578X: Lecture 4 15 / 21

vertical Strong Duality Geometric Interpretation p I A Existence i " " of Teo forcing the appointing hyperplane to be non clearly, Slater 's condition does NOT hold JKL (CS@ISU) COM S 578X: Lecture 4 16 / 21 q

e Example: Inequality Form of LP Primal problem: Dual function: Dual problem: (u) =inf x Minimize c > x subject to Ax apple b T = LCE, 4) Eta t a I A a I s ZE so to ( (c + A > u) > x b > u = :* Ta : e Maximize b > u Is E subject to A > u + c = 0, u 0 From Slater s condition: p = d if A x < b for some x b > u A > u + c = 0 1 otherwise In fact, p = d except when primal and dual are infeasible I 's ) L JKL (CS@ISU) COM S 578X: Lecture 4 17 / 21

, set to O LES of Example: Quadratic Program Primal problem (P 2 S n ++): Minimize x > Px subject to Ax < b 130 Dual function: ( )=inf x Dual problem: take x > Px + u > (Ax b) = 1 4 u> AP 1 A > u b > u = quadratic fu k = e der ZAZ the Maximize 1 4 u> AP 1 A > u b > u subject to u 0 From Slater s condition: p = d if A x < b for some x In fact, p = d always JKL (CS@ISU) COM S 578X: Lecture 4 18 / 21

obj! large Example: Support Vector Machine " i ::* i itb s ),, FITYi III Given labels y 2 { 1, 1} n,featurevectorsx 1,,x m LetX,[x 1,,x m ] > hint be I Recall from Lecture 1 that the support vector machine problem: 1 mx Minimize w,b, 2 w> w + C i guess : I=o b = 0, i=1 choose E suft subject to y i (w > x i + b) 1 i, i =1,,m i 0, i =1,,m Ui 30 Vi 30 Introducing dual variables u, v 0 to obtain the Lagrangian: L(w,b,, u, v) = 1 mx nx 2 kwk2 2 + C i + u i (1 i y i (w > x i + b)) i=1 i=1 I orig penalty Minimizing over w,b, yields the Lagrangian dual function terms nx v i i i=1 JKL (CS@ISU) COM S 578X: Lecture 4 19 / 21

Example: Support Vector Machine (u, v) = ( 1 2 u> X> Xu + 1 > u if u = C1 v, u > y =0 1 otherwise where X, XDiag{y 1,,y m } The dual problem, after eliminating v, becomes: 1 Minimize 2 u> X> Xu 1 > u subject to u > y =0, 0 apple u i apple C, i =1,, m Slater s condition is satisfied ) strong duality In ML literature, more common to work with the dual: Quadratic having PSD Hessian with simple bounds, plus a single linear & simple box constraints At optimality, we can verify that w = Xu (this is not a coincidence! will be proved by KKT conditions in next class) e WEE g, b o, e sufficiently large JKL (CS@ISU) COM S 578X: Lecture 4 20 / 21

Next Class Optimality Conditions JKL (CS@ISU) COM S 578X: Lecture 4 21 / 21