Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Similar documents
Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Machine Learning. Yuh-Jye Lee. February 12, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Lecture 18: Optimization Programming

Terminal velocity of solid particles in the fluid can be determined from;

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Linear programming II

Lecture 6: Conic Optimization September 8

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Nonlinear Optimization: What s important?

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Optimality Conditions for Constrained Optimization

Constrained Optimization and Lagrangian Duality

Convex Optimization & Lagrange Duality

Lecture 2: Linear SVM in the Dual

CS711008Z Algorithm Design and Analysis

Lecture 7: Convex Optimizations

Lecture Notes on Support Vector Machine

Support Vector Machines for Regression

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Constrained Optimization

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

Optimality, Duality, Complementarity for Constrained Optimization

Lecture 3. Optimization Problems and Iterative Algorithms

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Algorithms for constrained local optimization

ICS-E4030 Kernel Methods in Machine Learning

10 Numerical methods for constrained problems


2.3 Linear Programming

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Support Vector Machines: Maximum Margin Classifiers

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

LECTURE 10 LECTURE OUTLINE

Convex Optimization and Modeling

5. Duality. Lagrangian

Lagrangian Duality Theory

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

Support vector machines

4TE3/6TE3. Algorithms for. Continuous Optimization

Numerical Optimization

Chap 2. Optimality conditions

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Nonlinear Optimization

Some new facts about sequential quadratic programming methods employing second derivatives

minimize x subject to (x 2)(x 4) u,

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Lecture: Duality of LP, SOCP and SDP

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Machine Learning A Geometric Approach

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Midterm Review. Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A.

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

CS-E4830 Kernel Methods in Machine Learning

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Linear and Combinatorial Optimization

Machine Learning. Support Vector Machines. Manfred Huber

Convex Optimization Boyd & Vandenberghe. 5. Duality

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Miscellaneous Nonlinear Programming Exercises

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

The Karush-Kuhn-Tucker conditions

Interior Point Algorithms for Constrained Convex Optimization

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Generalization to inequality constrained problem. Maximize

Convex Optimization and Support Vector Machine

A Brief Review on Convex Optimization

IE 534 Linear Programming Lecture Notes Fall 2011

Support Vector Machines

Convex Optimization and SVM

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

2.098/6.255/ Optimization Methods Practice True/False Questions

Lecture 13: Constrained optimization

Interior-Point Methods for Linear Optimization

Primal-Dual Interior-Point Methods

Computational Finance

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

5 Handling Constraints

Numerical Optimization

Optimisation in Higher Dimensions

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Tutorial on Convex Optimization: Part II

Lecture 16: October 22

Tutorial on Convex Optimization for Engineers Part II

4. Algebra and Duality

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Linear Programming: Simplex

Support Vector Machines

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Transcription:

Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 28, 2017 1 / 40

The Key Idea of Newton s Method Let f : R n R be a twice differentiable function f (x + d) = f (x) + f (x) d + 1 2 d 2 f (x)d + β(x, d) d where lim d 0 β(x, d) = 0 At i th iteration, use a quadratic function to approximate f (x) f (x i ) + f (x i )(x x i ) + 1 2 (x x i ) 2 f (x i )(x x i ) x i+1 = arg min f (x) 2 / 40

Newton s Method Start with x 0 R n. Having x i,stop if f (x i ) = 0 Else compute x i+1 as follows: 1 Newton direction: 2 f (x i )d i = f (x i ) Have to solve a system of linear equations here! 2 Updating: x i+1 = x i + d i Converge only when x 0 is close to x enough. 3 / 40

Newton s Method with BAD Initial Point f(x) =à 6 1 x 6 + 4 1 x 4 +2x 2 f (x) = 1 6 x 6 + 1 4 x 4 + 2x 2 g(x) =f(x i )+f 0 (x i )(x à x i )+ 2 1 f 00 (x i )(x à x i ) It can not converge to the optimal solution. g i (x) = f (x i ) + f (x i )(x x i ) + 1 2 f (x i )(x x i ) 2 g 1 (x) = f (1) + 4(x 1) + (x 1) 2 g 2 (x) = f ( 1) + 4(x + 1) + (x + 1) 2 g 1 ( 1) = g 2 (1) = 0 It can not converge to the optimal solution. 4 / 40

Constrained Optimization Problem Problem setting: Given function f, g i, i = 1,..., k and h j, j = 1,..., m, defined on a domain Ω R n, min x Ω f (x) s.t. g i (x) 0, i h j (x) = 0, where f (x) is called the objective function and g(x) 0, h(x) = 0 are called constrains. j 5 / 40

Example I <sol> min f (x) = 2x1 2 + x 2 2 + 3x 3 2 s.t. 2x 1 3x 2 + 4x 3 = 49 L(x, β) = f (x) + β(2x 1 3x 2 + 4x 3 49), β R L(x, β) = 0 x 1 4x 1 + 2β = 0 L(x, β) = 0 x 2 2x 2 3β = 0 L(x, β) = 0 x 3 6x 3 + 4β = 0 2x 1 3x 2 + 4x 3 49 = 0 β = 6 x 1 = 3, x 2 = 9, x 3 = 4 6 / 40

Example II min x R 2 x2 1 + x 2 2 x 1 + x 2 4 x 1 x 2 2 x 1, x 2 0 f(x) = [2x 1, 2x 2 ] f(x ) = [2, 2] 7 / 40

Definitions and Notation Feasible region: where g(x) = F = {x Ω g(x) 0, h(x) = 0} g 1 (x) h 1 (x) and h(x) =. g k (x). h m (x) A solution of the optimization problem is a point x F such that x F for which f (x) < f (x ) and x is called a global minimum. 8 / 40

Definitions and Notation A point x F is called a local minimum of the optimization problem if ε > 0 such that f (x) f ( x), x F and x x < ε At the solution x, an inequality constraint g i (x) is said to be active if g i (x ) = 0, otherwise it is called an inactive constraint. g i (x) 0 g i (x) + ξ i = 0, ξ i 0 where ξ i is called the slack variable 9 / 40

Definitions and Notation Remove an inactive constraint in an optimization problem will NOT affect the optimal solution Very useful feature in SVM If F = R n then the problem is called unconstrained minimization problem Least square problem is in this category SSVM formulation is in this category Difficult to find the global minimum without convexity assumption 10 / 40

The Most Important Concepts in Optimization(minimization) A point is said to be an optimal solution of a unconstrained minimization if there exists no decent direction = f (x ) = 0 A point is said to be an optimal solution of a constrained minimization if there exists no feasible decent direction = KKT conditions There might exist decent direction but move along this direction will leave out the feasible region 11 / 40

Minimum Principle Let f : R n R be a convex and continuously differentiable function F R n be the feasible region. x arg min x F f (x) f (x )(x x ) 0 x F Example: min(x 1) 2 s.t. a x b 12 / 40

Example II min x R 2 x2 1 + x 2 2 x 1 + x 2 4 x 1 x 2 2 x 1, x 2 0 f(x) = [2x 1, 2x 2 ] f(x ) = [2, 2] 13 / 40

Linear Programming Problem An optimization problem in which the objective function and all constraints are linear functions is called a linear programming problem (LP) min p x s.t. Ax b Cx = d L x U 14 / 40

Linear Programming Solver in MATLAB X=LINPROG(f,A,b) attempts to solve the linear programming problem: min x f *x subject to: A*x <= b X=LINPROG(f,A,b,Aeq,beq) solves the problem above while additionally satisfying the equality constraints Aeq*x = beq. X=LINPROG(f,A,b,Aeq,beq,LB,UB) defines a set of lower and upper bounds on the design variables, X, so that the solution is in the range LB <= X <= UB. Use empty matrices for LB and UB if no bounds exist. Set LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if X(i) is unbounded above. 15 / 40

Linear Programming Solver in MATLAB X=LINPROG(f,A,b,Aeq,beq,LB,UB,X0) sets the starting point to X0. This option is only available with the active-set algorithm. The default interior point algorithm will ignore any non-empty starting point. You can type help linprog in MATLAB to get more information! 16 / 40

L 1 -Approximation: min x R n Ax b 1 z 1 = m z i i=1 min x,s 1 s min x,s m s i i=1 Or s.t. s Ax b s s.t. s i A i x b i s i [ ] [ ] x min 0 0 1 1 x,s s [ ] [ ] [ ] A I x b s.t. A I s b 2m (n+m) i 17 / 40

Chebyshev Approximation: min x R n Ax b z = max 1 i m z i min x,γ s.t. γ 1γ Ax b 1γ [ ] [ ] x min 0 0 1 x,s γ [ ] [ ] [ ] A 1 x b s.t. A 1 γ b 2m (n+1) 18 / 40

Quadratic Programming Problem If the objective function is convex quadratic while the constraints are all linear then the problem is called convex quadratic programming problem (QP) min s.t. 1 2 x Qx + p x Ax b Cx = d L x U 19 / 40

Quadratic Programming Solver in MATLAB X=QUADPROG(H,f,A,b) attempts to solve the quadratic programming problem: min x 0.5*x *H*x+f *x subject to: A*x <= b X=QUADPROG(H,f,A,b,Aeq,beq) solves the problem above while additionally satisfying the equality constraints Aeq*x=beq. X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB) defines a set of lower and upper bounds on the design variables, X, so that the solution is in the range LB <= X <= UB. Use empty matrices for LB and UB if no bounds exist. Set LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if X(i) is unbounded above. 20 / 40

Quadratic Programming Solver in MATLAB X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB,X0) sets the starting point to X0. You can type help quadprog in MATLAB to get more information! 21 / 40

Standard Support Vector Machine min w,b,ξ A,ξ B C(1 ξ A + 1 ξ B ) + 1 2 w 2 2 (Aw + 1b) + ξ A 1 (Bw + 1b) ξ B 1 ξ A 0, ξ B 0 22 / 40

Farkas Lemma For any matrix A R m n and any vector b R n, either Ax 0, b x > 0 has a solution or but never both. A α = b, α 0 has a solution 23 / 40

Farkas Lemma Ax 0, b x > 0 has a solution b is NOT in the cone generated by A 1 and A 2 b A 1 A 2 Solution Area {x b x > 0} {x Ax 0} 0 24 / 40

Farkas Lemma A α = b, α 0 has a solution b is in the cone generated by A 1 and A 2 {x b x > 0} {x Ax 0} = {x b > 0} {x Ax 0} = A 1 b A 2 25 / 40

Minimization Problem vs. Kuhn-Tucker Stationary-point Problem MP: KTSP: min x Ω f (x) s.t. g(x) 0 Find x Ω, ᾱ R m such that f ( x) + ᾱ g( x) = 0 ᾱ g( x) = 0 g( x) 0 ᾱ 0 26 / 40

Lagrangian Function L(x, α) = f (x) + α g(x) Let L(x, α) = f (x) + α g(x) and α 0 If f (x), g(x) are convex the L(x, α) is convex. For a fixed α 0, if x arg min{l(x, α) x R n } then L(x, α) = f ( x) + α g( x) = 0 x x= x Above result is a sufficient condition if L(x, α) is convex. 27 / 40

KTSP with Equality Constraints? (Assume h(x) = 0 are linear functions) h(x) = 0 h(x) 0 and h(x) 0 KTSP: Find x Ω, ᾱ R k, β +, β R m such that f ( x) + ᾱ g( x) + ( β + β ) h( x) = 0 ᾱ g( x) = 0, ( β + ) h( x) = 0, ( β ) ( h( x)) = 0 g( x) 0, h( x) = 0 ᾱ 0, β+, β 0 28 / 40

KTSP with Equality Constraints KTSP: Find x Ω, ᾱ R k, β R m such that f ( x) + ᾱ g( x) + β h( x) = 0 ᾱ g( x) = 0, g( x) 0, h( x) = 0 ᾱ 0 Let β = β + β and β +, β 0 then β is free variable 29 / 40

Generalized Lagrangian Function L(x, α, β) = f (x) + α g(x) + β h(x) Let L(x, α, β) = f (x) + α g(x) + β h(x) and α 0 If f (x), g(x) are convex and h(x) is linear then L(x, α, β) is convex. For fixed α 0, if x arg min{l(x, α, β) x R n } then L(x, α, β) = f ( x) + α g( x) + β h( x) = 0 x x= x Above result is a sufficient condition if L(x, α, β) is convex. 30 / 40

Lagrangian Dual Problem max α,β x Ω L(x, α, β) s.t. α 0 31 / 40

Lagrangian Dual Problem max α,β x Ω L(x, α, β) s.t. α 0 max α,β where θ(α, β) s.t. α 0 θ(α, β) = inf x Ω L(x, α, β) 32 / 40

Weak Duality Theorem Let x Ω be a feasible solution of the primal problem and (α, β) a feasible sulution of the dual problem. then f ( x) θ(α, β) Corollary: θ(α, β) = inf L(x, α, β) L( x, α, β) x Ω sup{θ(α, β) α 0} inf{f (x) g(x) 0, h(x) = 0} 33 / 40

Weak Duality Theorem Corollary If f (x ) = θ(α, β ) where α 0 and g(x ) 0, h(x ) = 0,then x and (α, β ) solve the primal and dual problem respectively. In this case, 0 α g(x) 0 34 / 40

Saddle Point of Lagrangian Let x Ω,α 0, β R m satisfying L(x, α, β) L(x, α, β ) L(x, α, β ), x Ω, α 0 Then (x, α, β ) is called The saddle point of the Lagrangian function 35 / 40

Saddle Point of f (x, y) = x 2 y 2 Saddle point of f 2 2 ( x, y) = x y 36 / 40

Dual Problem of Linear Program Primal LP min x R n p x subject to Ax b, x 0 Dual LP max α R m b α subject to A α p, α 0 All duality theorems hold and work perfectly! 37 / 40

Lagrangian Function of Primal LP L(x, α) = p x + α 1 (b Ax) + α 2 ( x) max min L(x, α 1, α2) α 1,α 2 0 x R n max α 1,α 2 0 p x + α 1 (b Ax) + α 2 ( x) subject to p A α 1 α 2 = 0 ( x L(x, α 1, α 2 ) = 0) 38 / 40

Application of LP Duality LSQ NormalEquation Always Has a Solution For any matrix A R mxn and any vector b R m, consider min x R n Ax b 2 2 x arg min{ Ax b 2 2} A Ax = A b Claim : A Ax = A b always has a solution. 39 / 40

Dual Problem of Strictly Convex Quadratic Program Primal QP 1 min x R n s.t. 2 x Qx + p x Ax b With strictlyconvex assumption, we have Dual QP max 1 2 (p + α A)Q 1 (A α + p) α b s.t. α 0 40 / 40