The Fundamental Theorem of Linear Inequalities

Similar documents
SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Second Order Optimality Conditions for Constrained Nonlinear Programming

The Conjugate Gradient Method

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Generalization to inequality constrained problem. Maximize

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

POLARS AND DUAL CONES

Part IB Optimisation

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

Lecture 8. Strong Duality Results. September 22, 2008

Constrained Optimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Nonlinear Optimization: What s important?

Lagrangian Duality Theory

Linear and Combinatorial Optimization

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Support Vector Machines

1. Algebraic and geometric treatments Consider an LP problem in the standard form. x 0. Solutions to the system of linear equations

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Convex Optimization and Modeling

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES

Convex Optimization & Lagrange Duality

MATH2070 Optimisation

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Mechanical Systems II. Method of Lagrange Multipliers

Input: System of inequalities or equalities over the reals R. Output: Value for variables that minimizes cost function

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

4. Algebra and Duality

Support Vector Machines for Regression

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

minimize x subject to (x 2)(x 4) u,

OPTIMISATION 3: NOTES ON THE SIMPLEX ALGORITHM

Duality of LPs and Applications

Convex Optimization and Support Vector Machine

Examination paper for TMA4180 Optimization I

Assignment 1: From the Definition of Convexity to Helley Theorem

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

The Karush-Kuhn-Tucker (KKT) conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Constrained Optimization and Lagrangian Duality

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Finite Dimensional Optimization Part III: Convex Optimization 1

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Chap 2. Optimality conditions

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

Applications of Linear Programming

Constrained optimization: direct methods (cont.)

Line Search Methods for Unconstrained Optimisation

1 Seidel s LP algorithm

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

Lecture Note 18: Duality

CO 250 Final Exam Guide

Implications of the Constant Rank Constraint Qualification

Lecture 18: Optimization Programming

IE 5531: Engineering Optimization I

CS 6820 Fall 2014 Lectures, October 3-20, 2014

1 Maximal Lattice-free Convex Sets

Algorithms for constrained local optimization

Link lecture - Lagrange Multipliers

CONSTRAINED NONLINEAR PROGRAMMING

CS711008Z Algorithm Design and Analysis

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

10-725/ Optimization Midterm Exam

Optimization and Optimal Control in Banach Spaces

1 Overview. 2 Extreme Points. AM 221: Advanced Optimization Spring 2016

Convex Programs. Carlo Tomasi. December 4, 2018

Lagrange Relaxation and Duality

Economics 101A (Lecture 3) Stefano DellaVigna

Constrained optimization

MAT-INF4110/MAT-INF9110 Mathematical optimization

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

IE 5531: Engineering Optimization I

TMA947/MAN280 APPLIED OPTIMIZATION

Theory and Internet Protocols

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Optimization Theory. Lectures 4-6

Example: feasibility. Interpretation as formal proof. Example: linear inequalities and Farkas lemma


4TE3/6TE3. Algorithms for. Continuous Optimization

Optimality Conditions for Constrained Optimization

LECTURE 10 LECTURE OUTLINE

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Introduction to Nonlinear Stochastic Programming

Convex Sets with Applications to Economics

Optimality Conditions for Nonsmooth Convex Optimization

ORIE 6300 Mathematical Programming I August 25, Lecture 2

4TE3/6TE3. Algorithms for. Continuous Optimization

Tangent spaces, normals and extrema

Transcription:

The Fundamental Theorem of Linear Inequalities Lecture 8, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk)

Constrained Optimisation and the Need for Optimality Conditions: In the remaining part of this course we will consider the problem of minimising objective functions over constrained domains. The general problem of this kind can be written in the form min x R n s.t. g i (x) = 0 (i E), g j (x) 0 (i I), where E and I are the finite index sets corresponding to the equality and inequality constraints, and where f, g i C k (R n, R) for all (i I E).

In unconstrained optimisation we found that we can use the optimality conditions derived in Lecture 1 to transform optimisation problems into zero-finding problems for systems of nonlinear equations f(x) = 0. We will spend the next few lectures to develop a similar approach to constrained optimisation: in this case the optimal solutions can be characterised by systems of nonlinear equations and inequalities.

A natural by-product of this analysis will be the notion of a Lagrangian dual of an optimisation problem: every optimisation problem - called the primal - has a sister problem in the space of Lagrange multipliers - called the dual. In constrained optimisation it is often advantageous to think of the primal and dual in a combined primal-dual framework where each sheds light from a different angle on a certain saddle-point finding problem.

First we will take a closer look at systems of linear inequalities and prove a theorem that will be of fundamental importance in everything that follows: Theorem 1: Fundamental theorem of linear inequalities. Let a 1,..., a m, b R n be a set of vectors. Then exactly one of the two following alternatives occurs: (I) y R m + such that b = m i y i a i. (II) d R n such that d T b < 0 and d T a i 0 for all (i = 1,..., m).

Note that Alternative (I) says that b lies in the convex cone generated by the vectors a i : b cone(a 1,..., a m ) := n i=1 λ i a i : λ i 0 i. Alternative (II) on the other hand says that the hyperplane d := {x R n : d T x = 0} strictly separates b from the convex set cone(a 1,..., a m ). Thus, Theorem 1 is a result about convex separation: either b is a member of cone(a 1,..., a m ) or there exists a hyperplane that strictly separates the two objects.

C b C a1 d a1 a2 a3 a2 a3 b

Lemma 1: exclusive. The two alternatives of Theorem 1 are mutually Proof: If this is not the case then we find the contradiction 0 m i=1 y i (d T a i ) = d T m i=1 y i a i = d T b < 0.

Lemma 2: W.l.o.g. we may assume that span{a 1,..., a m } = R n. Proof: If span{a 1,..., a m } R n then either b span{a 1,..., a m } and then we can restrict all arguments of the proof of Theorem 1 to the linear subspace span{a 1,..., a m } of R n. Else, if b / span{a 1,..., a m } then b cannot be written in the form b = m i µ i a i, so Alternative (I) does not hold.

It remains to show that Alternative (II) applies in this case. Let π be the the orthogonal projection of R n onto span{a 1,..., a m }, and let d = π(b) b. Then d span{a 1,..., a m }, so that d T b = d T (b π(b)) + d T π(b) = d 2 + 0 < 0, d T a i = 0 i. Therefore, Alternative (II) holds.

Because of Lemma 2, we will henceforth assume that span{a 1,..., a m } = R n. We will next construct an algorithm that stops when a situation corresponding to either Alternative (I) or (II) is detected. This will in fact be the simplex algorithm for LP in disguised form.

Algorithm 1: S0 Choose J 1 {1,..., m} such that span{a i } J 1 = R n, J 1 = n. S1 For k = 1, 2,... repeat 1. decompose b = i J k y k i a i 2. if y k i 0 i Jk return y k and stop. 3. else begin let j k := min{i J k : y k i < 0} let π k : R n span{a i : i J k \ {j k }} orthogonal projection

let d k := a j k π k (a j k) 1 (a j k π k (a j k)) if (d k ) T a i 0 for (i = 1,..., m) return d k and stop. end 4. let l k := min{i : (d k ) T a i < 0 } 5. let J k+1 := J k \ {j k } {l k } end.

Comments: If the algorithm returns y k in Step 2, then Alternative (I) holds: let y i = 0 for i J k and y i = y k i for i J. Then y R m + and b = i y i a i. If the algorithm enters Step 3, then {i J k : yi k because the condition of Step 2 is not satisfied. < 0} The vector d k constructed in Step 3 satisfies (d k ) T b = i J k y k i (dk ) T a i = y k j k (d k ) T a j k < 0, (1) Therefore, if the algorithm returns d k then Alternative (II) holds with d = d k.

If the algorithm enters Step 4 then {i : (d k ) T a i < 0 } because the condition of the last if statement of Step 3 is not satisfied. Moreover, since (d k ) T a j k = 1, (d k ) T a i = 0 (i J k \ {j k }), we have {i : (d k ) T a i < 0 } J k =. This shows that l k / J k. We have span{a i : i J k+1 } = R n, because (d k ) T a l k 0 and (d k ) T a i = 0 (i J k \ {j k }) show that a l k / span{a i : i J k \ {j k }}. Moreover, J k+1 = n.

Lemma 3: It can never occur that J k = J t for k < t. Proof: Let us assume to the contrary that J k = J t for some iterations k < t, and let j max := max{j s : k s t 1}. Then there exists p {k, k + 1,..., t 1} such that j max = j p. Since J k = J t, there also exists q {k, k + 1,..., t 1} such that j max = l q. In other words, the index must have once left J and then reentered, or else it must have entered and then left again.

Now j max = j p implies that for all i J p such that i < j max we have y p i 0. Likewise, for the same indices i we have (d q ) T a i 0, as i < j max = l q = min{i : (d q ) T a i < 0}. Furthermore, we have y p j max = y p j p < 0 and (d q ) T a j max = (d q ) T a l q < 0.

And finally, since J s {j max + 1,..., m} remains unchanged for s = k,..., t we have (d q ) T a i = 0 for all i J p such that i > j max. Therefore, (d q ) T b = i J p y p i (dq ) T a i 0. (2) On the other hand, (1) shows (d q ) T b < 0, contradicting (2). Thus, indices k < t such that J k = J t do not exist.

We are finally ready to prove the fundamental theorem of linear inequalities: Proof: (Theorem 1) Since J k {1,..., m} and there are finitely many choices for these index sets and Lemma 3 shows that there are no repetitions in the sequence J 1, J 2,..., the sequence must be finite. But this is only possible if in some iteration k Algorithm 1 either returns y k, detecting that Alternative (I) holds, or d k, detecting that Alternative (II) holds.

The Implicit Function Theorem: Another fundamental tool we will need is the implicit function theorem. Before stating the theorem, let us illustrate it with an example: Example 1: The function f(x 1, x 2 ) = x 2 1 + x2 2 1 has a zero at the point (1, 0) and x f(1, 0) = 1 0. In a neighbourhood of 1 this point the level set {(x 1, x 2 ) : f(x 1, x 2 ) = 0} can be explicitly parameterised in terms of x 2, that is, there exists a function h(t) such that f(x 1, x 2 ) = 0 if and only if (x 1, x 2 ) = (h(t), t) for some value of t.

The level set is nothing else but the unit circle S 1, and for (x 1, x 2 ) with x 1 > 0 we have f(x 1, x 2 ) = 0 if and only if x 1 = h(x 2 ) where h(t) = 1 t 2. Thus, as claimed. S 1 {x R 2 : x 1 > 0} = {(h(t), t) : t ( 1, 1)}, Another way to say this is that S 1 is a differentiable manifold with local coordinate map ϕ : S 1 {x R 2 : x 1 > 0} ( 1, 1), x x 2.

The parameterisation in terms of x 2 was only possible because x f(1, 0) 0. 1 To illustrate this, note that we also have f(0, 1) = 0, but now x f(0, 1) = 0 and we cannot parameterise S 1 by x 1 2 in a neighbourhood of (0, 1). In fact, in a neighbourhood of x 2 = 1, there are two x 1 values ± 1 x 2 2 such that f(x 1, x 2 ) = 0 when x 2 < 1 and none when x 2 > 0.

Example 2: 3 5 2 0 1 5 0 10 2 0 2 2 0 2 1 2 3 3 2 1 0 1 2 3 0.1 1.5 0.2 0.3 2 0.4 0.5 0.6 2.5 0.7 0.8 1.4 1.2 1 0.8 3 2 1.5 1

Generalisation: f C k (R p+q, R p ). Let f B (x) be the leading p p block of the Jacobian matrix f (x) = [ f B (x) f N (x) ], and f N (x) the trailing p q block. Let x B be the first p 1 block of the vector x and x N the trailing q 1 block.

Theorem 2: Implicit Function Theorem. Let f C k (R p+q, R p ) and let x R p+q be such that f( x) = 0 and f B ( x) nonsingular. Then there exist open neighbourhoods U B R p of x B and U N R q of x N and a function h C k (U N, U B ) such that for all (x B, x N ) U B U N, i) f(x B, x N ) = 0 x B = h(x N ), ii) f B (x) is nonsingular, iii) h (x N ) = ( f B (x)) 1 f N (x).

Reading Assignment: Lecture-Note 8.