Lecture 18: Optimization Programming

Similar documents
5. Duality. Lagrangian

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Convex Optimization & Lagrange Duality

Convex Optimization M2

Convex Optimization Boyd & Vandenberghe. 5. Duality

ICS-E4030 Kernel Methods in Machine Learning

Lecture: Duality.

Support Vector Machines: Maximum Margin Classifiers

Lecture: Duality of LP, SOCP and SDP

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Nonlinear Optimization: What s important?

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

CS-E4830 Kernel Methods in Machine Learning

Lecture 6: Conic Optimization September 8

EE/AA 578, Univ of Washington, Fall Duality

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

4TE3/6TE3. Algorithms for. Continuous Optimization

CS711008Z Algorithm Design and Analysis

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

Convex Optimization and Support Vector Machine

Optimality Conditions for Constrained Optimization

The Karush-Kuhn-Tucker conditions

Constrained Optimization

Constrained Optimization and Lagrangian Duality

Convex Optimization and SVM

Tutorial on Convex Optimization: Part II

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Lecture 2: Linear SVM in the Dual

Support Vector Machines

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Linear and Combinatorial Optimization

Lecture 7: Convex Optimizations

Lecture Notes on Support Vector Machine

More on Lagrange multipliers

Support vector machines

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Machine Learning A Geometric Approach

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Scientific Computing: Optimization

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Chap 2. Optimality conditions

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Generalization to inequality constrained problem. Maximize

Interior Point Algorithms for Constrained Convex Optimization

10 Numerical methods for constrained problems

Machine Learning. Support Vector Machines. Manfred Huber

Optimality, Duality, Complementarity for Constrained Optimization

On the Method of Lagrange Multipliers

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Numerical Optimization. Review: Unconstrained Optimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Numerical Optimization

EE364a Review Session 5

12. Interior-point methods

Lecture 13: Constrained optimization

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines for Regression

Tutorial on Convex Optimization for Engineers Part II

minimize x subject to (x 2)(x 4) u,

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Part IB Optimisation

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Primal/Dual Decomposition Methods

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

LECTURE 7 Support vector machines

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

Computational Finance

IE 5531 Midterm #2 Solutions

CS711008Z Algorithm Design and Analysis

LECTURE 10 LECTURE OUTLINE

Optimisation in Higher Dimensions

1. f(β) 0 (that is, β is a feasible point for the constraints)

Support Vector Machine (SVM) and Kernel Methods

Homework Set #6 - Solutions

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Algorithms for constrained local optimization

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

What s New in Active-Set Methods for Nonlinear Optimization?

Pattern Classification, and Quadratic Problems

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Applications of Linear Programming

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Convex Optimization and Modeling

Transcription:

Fall, 2016

Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming (QP), Linear Programming (LP)

Unconstrained Optimization Consider the minimization problem min x f (x), where x R d, and f is continuously differentiable. Necessary condition for local minimums If x is a local minimum of f (x), then f (x ) = 0. A point satisfying the necessary condition is called a stationary point, which can be a local minimum, local maximum, or saddle point. Sufficient conditions for x be a local minimum are: f (x ) = 0, 2 f (x ) is positive definite.

Equality- Inequality- Mixture- Consider the minimization problem min x f (x) subject to g i (x) 0, i = 1,, m. h j (x) = 0, j = 1,, k. Assume x R d. f : R d R is called the objective function g i : R d R, i = 1,, m are inequality constraints h j : R d R, j = 1,, k are equality constraints. Assume f, g i s, h j s are all continuously differentiable.

Single Equality Constraint Equality- Inequality- Mixture- Consider the minimization problem Introduce the Lagrange function: Lagrange extreme value theory: min x f (x) (1) subject to h(x) = 0. (2) L(x, λ) = f (x) + λh(x) If x is a local minimum under the constraint, then there exist some λ such that x L(x, λ ) = 0, λ L(x, λ ) = 0.

Interpretations Unconstrained Optimization Equality- Inequality- Mixture- Above conditions imply that (x, λ ) is a stationary point of L. The above equations are equivalent to x f (x ) + λ x h(x ) = 0, h(x ) = 0. Note the first condition gives d equations.

Multiple Equality Constraints Equality- Inequality- Mixture- Consider the minimization problem min x f (x) (3) subject to h j (x) = 0, j = 1,, k. (4) Introduce multiple Lagrange multipliers: λ = (λ 1,, λ k ) L(x, λ) = f (x) + k λ j h j (x) Lagrange extreme theory: If x is a local minimum under the constraint, then there exist some λ = (λ 1,, λ k ) such that j=1 x L(x, λ ) = 0, λ L(x, λ ) = 0. In other words, (x, λ ) is a stationary point of L.

Equality- Inequality- Mixture- Example: Entropy Maximization Problem Consider the minimization problem n min p i log(p i ), subject to p 1,,p n i=1 i=1 n p i = 1. i=1 The Lagrange function is n n L(p, λ) = p i log(p i ) + λ( p i 1). L p i = log(p i ) + 1 + λ = 0, log(p i ) = 1 λ, i = 1,, n. Together with n i=1 p i = 1, we have p 1 = = p n = 1 n. This problem is known as maximizing information entropy. i=1

Generalized Lagrange Methods Consider the minimization problem Equality- Inequality- Mixture- min x f (x) subject to g i (x) 0, i = 1,, m. Introduce multiple non-negative Lagrange multipliers: µ 1 0,, µ m 0. m L(x, µ) = f (x) µ i g i (x). µ i s are also called dual variables or slack variables. Correspondingly, we call x as primal variables. We would like to minimize L with respect to x and maximize L with respect to µ. i=1

KKT Necessary Optimality Conditions Equality- Inequality- Mixture- If x is a local minimum under the above inequality constraints, then there exist some µ = (µ 1,, µ m) such that Stationarity: x L(x, µ ) = 0, i.e., x f (x ) m µ k xg i (x ) = 0. i=1 Primal feasibility: g i (x ) 0 for i = 1,, m. Dual feasibility: µ i 0 for i = 1,, m. Complementary slackness: KKT = Karush-Kuhn-Tucker µ i g i (x ) = 0, i = 1,, m.

Sufficient Optimality Conditions Equality- Inequality- Mixture- The above necessary conditions are also sufficient for optimality, if f, g i s are continuously differentiable f and g i s are convex functions. Under this case, There exists a solution x satisfying the KKT condition, x is actually a global optimum. Examples: SVM problems

Alternative Constraints Equality- Inequality- Mixture- Consider the minimization problem min x f (x) subject to g i (x) 0, i = 1,, m. Introduce multiple non-negative Lagrange multipliers: µ 1 0,, µ m 0. m L(x, µ) = f (x) + µ i g i (x). µ i s are also called dual variables or slack variables. Correspondingly, we call x as primal variables. We would like to minimize L with respect to x and maximize L with respect to µ. i=1

KKT Necessary Optimality Conditions Equality- Inequality- Mixture- If x is a local minimum under the above inequality constraints, then there exist some µ = (µ 1,, µ m) such that Stationarity: x L(x, µ ) = 0, i.e., x f (x ) + m µ k xg i (x ) = 0. i=1 Primal feasibility: g i (x ) 0 for i = 1,, m. Dual feasibility: µ i 0 for i = 1,, m. Complementary slackness: µ i g i (x ) = 0, i = 1,, m.

Generalized Lagrange Methods Consider the minimization problem Equality- Inequality- Mixture- min x f (x) subject to g i (x) 0, i = 1,, m. h j (x) = 0, j = 1,, k. Introduce two sets of Lagrange multipliers: Define the Lagrange function: µ 1 0,, µ m 0, λ 1,, λ k. L(x, µ, λ) = f (x) m µ i g i (x) + i=1 k λ j h j (x) j=1 µ i s and λ j s are dual variables or slack variables.

KKT Necessary Optimality Conditions Equality- Inequality- Mixture- If x is a local minimum under the above inequality constraints, then there exist some (µ, λ ) such that Stationarity: x L(x, µ, λ ) = 0, i.e., x f (x ) Primal feasibility: m µ i x g i (x ) + i=1 k λ j x h j (x ) = 0. j=1 g i (x ) 0, i = 1,, m; h j (x ) = 0, j = 1,, k. Dual feasibility: µ i 0 for i = 1,, m. Complementary slackness: µ i g i (x ) = 0, i = 1,, m.

Sufficient Optimality Conditions Equality- Inequality- Mixture- The above necessary conditions are also sufficient for optimality, if f, g i s are continuously differentiable. f and g i s are convex functions, h j s are linear constraints. Under this case, There exists a solution x satisfying the KKT condition, x is actually a global optimum.

Quadratic Programming (QP) Quadratic Programming (QP) The objective function is quadratic in x The constraints are linear in x. Consider the minimization problem min x f (x) = 1 2 xt Qx + c T x, subject to Ax b, Ex = d. where Q is a symmetric matrix of d d, A is a matrix of m d, E is a matrix of k d, c R d, b R m, and d R k. Example: SVM

Properties of QP Unconstrained Optimization Quadratic Programming (QP) If Q is a non-negative definite matrix, then f (x) is convex. If Q is a non-negative definite, the above QP has a global minimizer if there exists at least one x satisfying the constraints and f (x) is bounded below in the feasible region. If Q is a positive definite, the above QP has a unique global minimizer. If Q = 0, the problem becomes a linear programming (LP). From optimization theory, If Q is non-negative definite, the necessary and sufficient condition for a point x to be a global minimizer is for it to satisfy the KKT condition.

Solving QP Unconstrained Optimization Quadratic Programming (QP) If there are only equality constraints, then the QP can be solved by a linear system. In general, a variety of methods for solving the QP are commonly used, including Methods: ellipsoid, interior point, active set, conjugate gradient, etc Software packages: Matlab, CPLEX, AMPL, GAMS, etc. For positive definite Q, the ellipsoid method solves the problem in polynomial time. In other words, the running time is upper bounded by a polynomial in the size of the input for the algorithm, i.e., T = O(d k ) for some k.

Dual Problem for QP Quadratic Programming (QP) Consider the minimization problem The Lagrange function. min f (x) = 1 x 2 xt Qx, subject to Ax b. L(x, λ) = f (x) + λ T (Ax b). We need to minimize L with respect to x and maximize L with respect to λ. The dual function is defined as a function of dual variables only by solving x first with any fixed λ: G(λ) = inf x L(x, λ).

Dual Problem Derivation for QP Quadratic Programming (QP) With fixed λ, we solve the equation L x (x, λ) = 0 and obtain x = Q 1 A T λ. Plug this into L, we get the dual function The dual problem of the QP is G(λ) = 1 2 λt AQ 1 A T b T λ. max G(λ) = 1 λ 2 λt AQ 1 A T b T λ, subject to λ 0. We first find λ = arg max λ G(λ), and then compute x = Q 1 A T λ. The idea is similar to the profile method.