Introduction to Optimization Techniques. Nonlinear Programming

Similar documents
Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Support Vector Machines MIT Course Notes Cynthia Rudin

In the original knapsack problem, the value of the contents of the knapsack is maximized subject to a single capacity constraint, for example weight.

Introduction to Discrete Optimization

THE WEIGHTING METHOD AND MULTIOBJECTIVE PROGRAMMING UNDER NEW CONCEPTS OF GENERALIZED (, )-INVEXITY

The Methods of Solution for Constrained Nonlinear Programming

List Scheduling and LPT Oliver Braun (09/05/2017)

Lagrange Relaxation and Duality

Kernel Methods and Support Vector Machines

The Weierstrass Approximation Theorem

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

Reed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product.

Solution of Multivariable Optimization with Inequality Constraints by Lagrange Multipliers. where, x=[x 1 x 2. x n] T

LP in Standard and Slack Forms

On Constant Power Water-filling

Egyptian Mathematics Problem Set

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Ch 12: Variations on Backpropagation

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Lecture 21. Interior Point Methods Setup and Algorithm

1 Bounding the Margin

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

ALGEBRA REVIEW. MULTINOMIAL An algebraic expression consisting of more than one term.

Optimal Pigouvian Taxation when Externalities Affect Demand

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

2001 Dennis L. Bricker Dept. of Industrial Engineering The University of Iowa. Reducing dimensionality of DP page 1

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Monochromatic images

Bipartite subgraphs and the smallest eigenvalue

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

Feature Extraction Techniques

arxiv: v1 [math.nt] 14 Sep 2014

Finite fields. and we ve used it in various examples and homework problems. In these notes I will introduce more finite fields

arxiv: v1 [math.co] 19 Apr 2017

Bayes Decision Rule and Naïve Bayes Classifier

3.8 Three Types of Convergence

SOLVING LITERAL EQUATIONS. Bundle 1: Safety & Process Skills

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

INNER CONSTRAINTS FOR A 3-D SURVEY NETWORK

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

THE POLYNOMIAL REPRESENTATION OF THE TYPE A n 1 RATIONAL CHEREDNIK ALGEBRA IN CHARACTERISTIC p n

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

An Algorithm for Posynomial Geometric Programming, Based on Generalized Linear Programming

STOPPING SIMULATED PATHS EARLY

Computational and Statistical Learning Theory

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Polygonal Designs: Existence and Construction

1 Identical Parallel Machines

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

A Simple Regression Problem

Congruences and Modular Arithmetic

Part 1. The Review of Linear Programming

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Convex Optimization & Lagrange Duality

A Bernstein-Markov Theorem for Normed Spaces

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS

INVEX FUNCTIONS AND CONSTRAINED LOCAL MINIMA

Sharp Time Data Tradeoffs for Linear Inverse Problems

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Nonlinear Optimization

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Distributed Subgradient Methods for Multi-agent Optimization

3.3 Variational Characterization of Singular Values

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

On Conditions for Linearity of Optimal Estimation

Chap 2. Optimality conditions

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

Introduction to Machine Learning. Recitation 11

arxiv: v1 [cs.ds] 29 Jan 2012

Chapter 6 1-D Continuous Groups

OPTIMIZATION in multi-agent networks has attracted

IE 5531: Engineering Optimization I

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

Slide10. Haykin Chapter 8: Principal Components Analysis. Motivation. Principal Component Analysis: Variance Probe

lecture 36: Linear Multistep Mehods: Zero Stability

Midterm 1 Sample Solution

26 Impulse and Momentum

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Answers to Econ 210A Midterm, October A. The function f is homogeneous of degree 1/2. To see this, note that for all t > 0 and all (x 1, x 2 )

EE/AA 578, Univ of Washington, Fall Duality

Block designs and statistics

Supplement to: Subsampling Methods for Persistent Homology

OBJECTIVES INTRODUCTION

Soft-margin SVM can address linearly separable problems with outliers

CS Lecture 13. More Maximum Likelihood

Numerical Optimization

Now multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get:

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Transcription:

Introduction to Optiization echniques Nonlinear Prograing

Optial Solutions Consider the optiization proble in f ( x) where F R n xf Definition : x F is optial (global iniu) for this proble, if f( x ) f( x) for all xf. Definition : xf is a local iniu if there is an 0 so that f( x) f( x) for all x F N( x, ) where n Nx (, ) xr xx and is a nor on n R 2

Lagrange Multiplier Method Now consider the optiization proble in f ( x) xx g( x) 0 (GP) where n X R, f : XR, g : XR (and we take the feasible region F xx g( x) 0 Definition : he Lagrangian for (GP) is a function L( x, ) f ( x) g( x) where g( x) is the inner product of the vector with the vector g( x). hat is, L( x, ) f( x) g ( x) i i i ). 3

Lagrange Multiplier Method heore (Lagrange) : Let R (i.e., is a nonnegative -vector) and let x X solve the relatively unconstrained proble in L(, ) xx x If x F and λ ( x ) 0, then Proof : By assuption, f g But λ g( x ) 0 then iplies f But F X and, therefore, for all xf x is optial for (GP). x λ x x λ x xx ( ) g( ) f( ) g( ) for all x x λ x xx ( ) f( ) g( ) for all f x f x λ x f x ( ) ( ) g( ) ( ) Note : he theore assues an appropriate exists and we know its value. 4

Lagrange Multiplier Method Ex : ( ay not exist) in x x R 2 x 0 then F {0} and therefore x 0 is optial, Let 0, then in L( x, ) in x x 0 xx xr in L( x, ) xx not optial for 2 If 0, in L( x, ) in x x xx xr x( ) 0 is optial for in L( x, ) x 2 X x 0 is not. 5

Saddle Points for Optiality Ex 2 : Show there is no such (for heore ) for the proble in x R x x 0 Definition : For proble GP, ( x, is said to be a saddle-point of the Lagrangian if x X, 0 and L( x, L( x, L( x, for all R for all x X 6

Saddle Points for Optiality heore 2 : ( x, X R + is a saddle-point of the Lagrangian of GP, if and only if (i) x solves in L x, xx ii xx, g( x) 0 (i.e., xf) (iii) g( x) 0 (copleentary slackness) Proof : Assue ( x, X R + is a saddle-point. hen, L( x, L( x, for all x X is precisely the stateent (i) x solves in L x, xx 7

Saddle Points for Optiality Also, since L( x, L( x, all R, we have f( x) g( x) f( x) g( x), for all R ( ) ( ) g( x) 0, for all Assue g( x) 0. Without loss of generality (wlog) we ay assue g(. x) 0 Let, i i, i 2. then 0 and ( ) g( x) ( ) g ( x) g ( x) 0 i i i i and this contradicts (). herefore, (ii) x X, g( x) 0 R 8

Saddle Points for Optiality Now, () iplies (by taking 0 ) g( x) 0 But we ve just shown that g( x) 0 and we ve assued 0 ; therefore g( x) 0 and, hence, (iii) g( x) 0 Conversely, assue ( x, X R + satisfies conditions (i) - (iii). Condition (i) is precisely the stateent L( x, L( x,, for all x X and we now need to show L( x, L( x, for all R. 9

Saddle Points for Optiality For R we have g( x) 0 or g( x) 0 and therefore, f( x) g( x) f( x) f( x) g( x) or L( x, L( x,, for all R Corollary : If proble GP has a saddle-point ( x, ) then x is optial for GP. Proof : If ( x, ) is a saddle-point, then conditions (i)-(iii) hold and therefore heore applies. Note : Exaples and 2 show that not all probles have saddle-points (even though the proble ay have an optial solution) 0

Saddle Points for Optiality HW : Let ( x, ) be a saddle-point for GP and let x x be optial for GP. Show whether or not ( xλ, ) is a saddle-point. HW a : Let ( x, ), ( x, be two saddle-points for GP. Show whether or not ( xλ, ) is a saddle-point. HW 2 : Using the definition of the Lagrangian for GP, derive the Lagrangians for (a) ax f ( x) (b) ax f ( x) (c) in f ( x) xx g( x) 0 xx g( x) 0 xx g( x) 0

Saddle Points for Optiality HW 3 : Using the definition of Lagrangian for GP, derive the Lagrangian for in f ( x) xx h( x) 0 r( x) 0 and show that the ultipliers for the vector function h( x) are unrestricted in sign. (Hint : First write h ( x) 0 as h( x) 0, h( x) 0 ) HW 4 : (a useful lower bound) : For proble (GP) show that, for all, λ R in L( xλ, ) in f ( x) xx (Hint : recall, for x F, λ g( x) 0 ) xf 2

For Proble GP Dual Proble in f ( x) xx g( x) 0 (GP) we define the dual proble, denoted by (D), to be ax L ( λ) λ (D) where L ( λ) in L( x,λ) ( L is called the dual function) x X Note : he above HW shows that for all λ R (i.e., for all λ 0 ) we have L ( λin f ( x) and hence, xf ax L ( λin f ( x) λ0 xf his is called the Weak Duality heore. 3

Dual Proble Ex 3 : Consider the linear progra in cx xx Ax b x 0 Let n X xr x0 and take the Lagrangian to be L( x, λc x λ Ax b) ( c λ A) xλ b L ( in ( λ c λ A x λ b x0 4

Dual Proble Notational Digression: Let A be an n atrix. We let ai denote the i - th row of A and we let a denote the j -th colun of A. herefore, we have a i a2 2 n A or A = a a a a n j By Ax is eant : Ax a x j (where a nuber ties a vector is the nuber j ties each coponent of the vector) 5

Dual Proble We also have ax ax 2 Ax ax 2 n By ya is eant : y A ai yi or y A y a y a y a i Now, for the linear progra we have L ( λin ( c λ Axλ b x0 x0 n j cj λa xj ibi j= i in ( n j in ( cj x λa j ibi x j 0 j i 6

Dual Proble Now, if c λ A 0 there exists at least one index k so that k and therefore, in ( c λax x k 0 k and hence, if c λ A 0 we have ( λ k ck k λa k On the other hand, if c λ A 0 then each c 0 and, hence, k λa k in ( c λax 0 x k 0 herefore, if c λ A0, we have k L ( λ λ b k L 0 7

Dual Proble Hence, ax ( λ ay be rewritten as λ0 ax L λb c λ A 0 λ 0 ax or bλ Aλ c λ 0 and this, of course is the usual linear prograing dual for in cx Ax b x 0 8

HW 5 : For the linear progra in cx Dual Proble Ax b x 0 n take X R and let L( x, λ, γ) c x λ ( Ax b) γ x. Following the line of arguent above, develop the dual proble ax L ( λ, γ λ0 γ0 We now characterize saddle-points in ters of duality. 9

Optiality via Duality heore 3 : For proble (GP) the pair ( x, λxr is a saddle-point, if and only if, (a) x solves GP, (b) λ solves D, and (c) f ( x) L ( λ Proof : Assue ( x, λ is a saddle-point. By theores and 2 we autoatically have (a) x solves GP. Also, L g g ( λin L( x,λ) in f( x) λ ( x) f( x) λ ( x) f( x) xx xx where the last two equalities follow fro conditions (i) and (iii) of heore 2. L ( λ) f ( x) and condition (c) holds. L By weak duality, we have ( λ) f ( x) for all λ R and hence L ( λ) f ( x) iplies λ solves ax L ( λ and hence (b) holds. λ0 20

Optiality via Duality Conversely, assue conditions (a), (b), (c) hold (and, of course, λ 0). o show ( x, λ is a saddle-point. Condition (c) states ( ) f( x) L ( λin L( x, λ) L( x,λ) f( x) λ g( x) xx But condition (a) iplies x F and, hence, λ g( x) 0 and it ust then be the case that f( x) λ g( x) f( x) herefore, we ust have and f( x) λ g( x) f( x) λ g( x) 0 condition (iii) of heore 2 holds. Also, x F iplies condition (ii) of heore 2 holds. Hence, it only reains to show that condition (i) holds; i.e., to show x solves in L( x, λ ). But this follows iediately fro () xx since λ g ( x) 0 iplies (using () ) that in L( x, λ) L( x, λ) and condition (i) of heore 2 holds xx 2

Ex (revisited) : in x xx 2 x 0 Optiality via Duality 2 hen, L( x, ) xx and L ( ) in For 0, L( in x For xr xr 2 0, L ( ( ) 2 2 4 xx 2, if L (, if 4 L ( 22

Optiality via Duality and while sup L ( 0 we see that ax L ( has no optial solution; 0 0 that is, there is no 0 so that L ( 0. Ex 2 (revisited) : in x (P) xr For 0 we see that and for 0 we have x 0 L( x, ) x x) and L ( in x x) L (0in x 0 xr xr L ( in x x) xr 0, if L (, if 23

L( Optiality via Duality 0 solves ax ( and 0 But x solves (P) and herefore, L f x L ( 0 x ( ) f x 0 L( ax L( in ( ) 0 xf [NOE : when this situation occurs we say that there is a duality gap. hat is, if x solves GP and solves D and L ( f( x ) we have a duality gap.] herefore, any proble with a duality gap cannot have a saddle-point. 24

Optiality via Duality HW 6 : Consider the proble, denoted by P I, where in xx cx Ax b (P ) I X n I x R xj 0 and integer-valued, j,, n [NOE: (P I ) is a version of the so-called linear integer prograing proble ] Let (P) denote the associated linear progra in xx Ax cx b (P) where n n XR 0 xr x 25

L I Optiality via Duality Let denote the dual function for and Let L denote the dual function for P. Show whether or not LI L HW 7 : Consider the proble in xx x 2 2x 0 where X [0,] (Note: the constraint is an equality constraint) P I (a) Derive L ( ) (b) Decide whether or not P has a saddle-point. 26

Karush-Kuhn-ucker Points Consider the optiization proble in f ( x) xx g( x) 0 (GDP) where f is differentiable on the interior of X and also each g, i i,, is differentiable on the interior of X. We use the sybol GDP for general differentiable proble. As before, g L( x, λ) f ( x) λ ( x) 27

(KK point) if where Karush-Kuhn-ucker Points Definition: We say ( x, λ) (int X) R is a Karush-Kuhn-ucker point (i') L( x, λ x (ii) x X, g( x) 0 g (iii) λ ( x) 0 L( x, λ L( x, λ x L( x, λ,, x x n the gradient, with respect to x, of the Lagrangian evaluated at x. 28

Karush-Kuhn-ucker Points Ex 2 (revisited) : in xr x x 0 then x is optial for P. Consider L( x, x x ). hen 2 x L( x, ( x) 2 herefore, if we take we have. 2 0x L( x, x L(, 2 Also, x int R,, and. herefore, g( x) x 0 g( x) 2( ) 0 we see that ( x, (, 2) is a Karush-Kuhn-ucker point for (P) [Recall : (P) has no saddle-point]. 29

Ex (revisited) : Karush-Kuhn-ucker Points in x xr 2 x 0 (P) 2 Let x 0, the only feasible point. Now L( x, xx and for all. herefore, (P) has x L( x, 2x x L( x, 0 no KK point (as well as no S.P.) 30

Karush-Kuhn-ucker Points HW 8 : Consider the linear progra ( A is n ) in xr n cx Ax b x 0 (P) and let L( x, λ, γc x λ ( Ax b) γ x. Show that (P) has an optial n solution x if, and only if, there exists vectors λ R, γ R, so that ( x,( λ, γ ) is a saddle-point. Is it necessarily true that ( x,( λ, γ ) is also a Karush-Kuhn-ucker point? Why? [HIN : Don t be afraid to use your knowledge of linear prograing.] 3

Econoic Motivation of Duality Assue GP provides our optial production cost in f ( x) xx g( x) 0 (GP) We are faced with the following offer. he Dual Co. will buy us out as follows: he Dual Co. provides us with a vector of prices λ ( 2 ) 0. We then choose x X and the Dual Co. will pay us λ g ( x) igi( x) (think of the vector g( x) as being the vector i of resources used when we choose x X). On the other hand, since we will not be producing, we are to pay the Dual Co. the savings in production costs (i.e., f ( x) ). herefore, the net payent to the Dual Co. is f ( x) λ g( x) L( x,λ. 32

Econoic Motivation of Duality Of course, when faced with λ 0 we would like to pay the Dual Co. as little as possible; i.e., we would like to pay L ( λ) in L( x,λ and, of course, the Dual Co. would like to choose a vector λ so that we pay the Dual Co. as uch as a possible. hat is, the Dual Co. would like to choose a λ 0 so that xx L ( λ ) ax L( λ0 Now, we already know that ax ( λ) in f ( x (i.e., our largest λ0 L possible payent to the Dual Co. is no larger than our optial production cost). λ xf 33

Econoic Motivation of Duality If ax ( λ) in f ( x (duality gap) λ0 L xf the Dual Co., presuably, will not want to buy us out since the aount of oney it receives fro us ust be saller than its optial production cost after it buys us out. herefore, a necessary condition for a rational Dual Co. to ake us an offer in the first place is that ax L ( λ) in f ( x. λ0 xf herefore, assue this condition is et. Now assue when faced with λ 0 we choose an xx so that (wlog, assue g( x) 0 ). he Dual Co. will then argue that it did not correctly estiate and it will offer a new price vector λ 0 where, say,, i i, i 2 and 0 (i.e., it will raise or increase ) 34

Econoic Motivation of Duality Proof : L( xλ, f( x) ( ) g ( x) g ( x) g ( x) 2 2 L( x, λ g ( x) L( x, λ i.e., our net payent to the Dual Co. will increase if the Dual Co. is allowed to change its price offer λ and we stick to our previous choice of the vector x. [Of course, since the Dual Co. has changed its λ we ll insist on being allowed to change our x.] herefore, a necessary condition for the Dual Co. to be satisfied with its own price offer is that g( x) 0. On the other hand, suppose g( x) 0 but λ g( x) 0. hen λ g( x) 0 and therefore, wlog, we ay assue g ( x) 0. 35

Econoic Motivation of Duality Show that the Dual Co. will then want to decrease the price therefore a necessary condition for the Dual Copany to be satisfied with its offer is λ g( x) 0, g( x) 0 and ax L ( λ) in f ( x. λ0 Hence, a necessary condition for us to actually be bought out is that our optial production optiization proble (GP) has a saddle-point. xf and 36

Crude Idea of a Dual Algorith for GP Step 0 : Set, select λ k 0. Step : Let x k X solve Step 2 : Let (NOE : IkC k and IkCk ). For each i Ik (if Ik ) let k 0 in L( x, λ xx k k k If x F xx g( x) 0 and λ g( x ) 0, go to Step 3 k i g k k k k i x j j g j x I ( ) 0, C ( ) 0 k k i For each j C if C, let 0 k k Set k k and return to Step. k i k k 37

Crude Idea of a Dual Algorith for GP k k k Step 3 : Stop; x is optial for GP ( ( x, λ ) is a S.P.) NOE : he above algorith is not well-defined for the following reason: (a) At Step 2 we have not specified by how uch to increase the prices associated with infeasible constraints nor have we specified by how uch to decrease the prices associated with feasible constraints for which copleentary slackness fails. (b) Even if GP has a saddle-point we, as yet, do not know if this algorith converges to a S.P. (c) Since this algorith is seeking a saddle-point for GP, the procedure is autoatically in trouble if (GP) does not have a S.P. his algorith, of course, was otivated by the above econoic discussion we ll have ore to say about these types of procedures later in the course. 38