The Fundamental Theorem of Linear Inequalities Lecture 8, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk)
Constrained Optimisation and the Need for Optimality Conditions: In the remaining part of this course we will consider the problem of minimising objective functions over constrained domains. The general problem of this kind can be written in the form min x R n s.t. g i (x) = 0 (i E), g j (x) 0 (i I), where E and I are the finite index sets corresponding to the equality and inequality constraints, and where f, g i C k (R n, R) for all (i I E).
In unconstrained optimisation we found that we can use the optimality conditions derived in Lecture 1 to transform optimisation problems into zero-finding problems for systems of nonlinear equations f(x) = 0. We will spend the next few lectures to develop a similar approach to constrained optimisation: in this case the optimal solutions can be characterised by systems of nonlinear equations and inequalities.
A natural by-product of this analysis will be the notion of a Lagrangian dual of an optimisation problem: every optimisation problem - called the primal - has a sister problem in the space of Lagrange multipliers - called the dual. In constrained optimisation it is often advantageous to think of the primal and dual in a combined primal-dual framework where each sheds light from a different angle on a certain saddle-point finding problem.
First we will take a closer look at systems of linear inequalities and prove a theorem that will be of fundamental importance in everything that follows: Theorem 1: Fundamental theorem of linear inequalities. Let a 1,..., a m, b R n be a set of vectors. Then exactly one of the two following alternatives occurs: (I) y R m + such that b = m i y i a i. (II) d R n such that d T b < 0 and d T a i 0 for all (i = 1,..., m).
Note that Alternative (I) says that b lies in the convex cone generated by the vectors a i : b cone(a 1,..., a m ) := n i=1 λ i a i : λ i 0 i. Alternative (II) on the other hand says that the hyperplane d := {x R n : d T x = 0} strictly separates b from the convex set cone(a 1,..., a m ). Thus, Theorem 1 is a result about convex separation: either b is a member of cone(a 1,..., a m ) or there exists a hyperplane that strictly separates the two objects.
C b C a1 d a1 a2 a3 a2 a3 b
Lemma 1: exclusive. The two alternatives of Theorem 1 are mutually Proof: If this is not the case then we find the contradiction 0 m i=1 y i (d T a i ) = d T m i=1 y i a i = d T b < 0.
Lemma 2: W.l.o.g. we may assume that span{a 1,..., a m } = R n. Proof: If span{a 1,..., a m } R n then either b span{a 1,..., a m } and then we can restrict all arguments of the proof of Theorem 1 to the linear subspace span{a 1,..., a m } of R n. Else, if b / span{a 1,..., a m } then b cannot be written in the form b = m i µ i a i, so Alternative (I) does not hold.
It remains to show that Alternative (II) applies in this case. Let π be the the orthogonal projection of R n onto span{a 1,..., a m }, and let d = π(b) b. Then d span{a 1,..., a m }, so that d T b = d T (b π(b)) + d T π(b) = d 2 + 0 < 0, d T a i = 0 i. Therefore, Alternative (II) holds.
Because of Lemma 2, we will henceforth assume that span{a 1,..., a m } = R n. We will next construct an algorithm that stops when a situation corresponding to either Alternative (I) or (II) is detected. This will in fact be the simplex algorithm for LP in disguised form.
Algorithm 1: S0 Choose J 1 {1,..., m} such that span{a i } J 1 = R n, J 1 = n. S1 For k = 1, 2,... repeat 1. decompose b = i J k y k i a i 2. if y k i 0 i Jk return y k and stop. 3. else begin let j k := min{i J k : y k i < 0} let π k : R n span{a i : i J k \ {j k }} orthogonal projection
let d k := a j k π k (a j k) 1 (a j k π k (a j k)) if (d k ) T a i 0 for (i = 1,..., m) return d k and stop. end 4. let l k := min{i : (d k ) T a i < 0 } 5. let J k+1 := J k \ {j k } {l k } end.
Comments: If the algorithm returns y k in Step 2, then Alternative (I) holds: let y i = 0 for i J k and y i = y k i for i J. Then y R m + and b = i y i a i. If the algorithm enters Step 3, then {i J k : yi k because the condition of Step 2 is not satisfied. < 0} The vector d k constructed in Step 3 satisfies (d k ) T b = i J k y k i (dk ) T a i = y k j k (d k ) T a j k < 0, (1) Therefore, if the algorithm returns d k then Alternative (II) holds with d = d k.
If the algorithm enters Step 4 then {i : (d k ) T a i < 0 } because the condition of the last if statement of Step 3 is not satisfied. Moreover, since (d k ) T a j k = 1, (d k ) T a i = 0 (i J k \ {j k }), we have {i : (d k ) T a i < 0 } J k =. This shows that l k / J k. We have span{a i : i J k+1 } = R n, because (d k ) T a l k 0 and (d k ) T a i = 0 (i J k \ {j k }) show that a l k / span{a i : i J k \ {j k }}. Moreover, J k+1 = n.
Lemma 3: It can never occur that J k = J t for k < t. Proof: Let us assume to the contrary that J k = J t for some iterations k < t, and let j max := max{j s : k s t 1}. Then there exists p {k, k + 1,..., t 1} such that j max = j p. Since J k = J t, there also exists q {k, k + 1,..., t 1} such that j max = l q. In other words, the index must have once left J and then reentered, or else it must have entered and then left again.
Now j max = j p implies that for all i J p such that i < j max we have y p i 0. Likewise, for the same indices i we have (d q ) T a i 0, as i < j max = l q = min{i : (d q ) T a i < 0}. Furthermore, we have y p j max = y p j p < 0 and (d q ) T a j max = (d q ) T a l q < 0.
And finally, since J s {j max + 1,..., m} remains unchanged for s = k,..., t we have (d q ) T a i = 0 for all i J p such that i > j max. Therefore, (d q ) T b = i J p y p i (dq ) T a i 0. (2) On the other hand, (1) shows (d q ) T b < 0, contradicting (2). Thus, indices k < t such that J k = J t do not exist.
We are finally ready to prove the fundamental theorem of linear inequalities: Proof: (Theorem 1) Since J k {1,..., m} and there are finitely many choices for these index sets and Lemma 3 shows that there are no repetitions in the sequence J 1, J 2,..., the sequence must be finite. But this is only possible if in some iteration k Algorithm 1 either returns y k, detecting that Alternative (I) holds, or d k, detecting that Alternative (II) holds.
The Implicit Function Theorem: Another fundamental tool we will need is the implicit function theorem. Before stating the theorem, let us illustrate it with an example: Example 1: The function f(x 1, x 2 ) = x 2 1 + x2 2 1 has a zero at the point (1, 0) and x f(1, 0) = 1 0. In a neighbourhood of 1 this point the level set {(x 1, x 2 ) : f(x 1, x 2 ) = 0} can be explicitly parameterised in terms of x 2, that is, there exists a function h(t) such that f(x 1, x 2 ) = 0 if and only if (x 1, x 2 ) = (h(t), t) for some value of t.
The level set is nothing else but the unit circle S 1, and for (x 1, x 2 ) with x 1 > 0 we have f(x 1, x 2 ) = 0 if and only if x 1 = h(x 2 ) where h(t) = 1 t 2. Thus, as claimed. S 1 {x R 2 : x 1 > 0} = {(h(t), t) : t ( 1, 1)}, Another way to say this is that S 1 is a differentiable manifold with local coordinate map ϕ : S 1 {x R 2 : x 1 > 0} ( 1, 1), x x 2.
The parameterisation in terms of x 2 was only possible because x f(1, 0) 0. 1 To illustrate this, note that we also have f(0, 1) = 0, but now x f(0, 1) = 0 and we cannot parameterise S 1 by x 1 2 in a neighbourhood of (0, 1). In fact, in a neighbourhood of x 2 = 1, there are two x 1 values ± 1 x 2 2 such that f(x 1, x 2 ) = 0 when x 2 < 1 and none when x 2 > 0.
Example 2: 3 5 2 0 1 5 0 10 2 0 2 2 0 2 1 2 3 3 2 1 0 1 2 3 0.1 1.5 0.2 0.3 2 0.4 0.5 0.6 2.5 0.7 0.8 1.4 1.2 1 0.8 3 2 1.5 1
Generalisation: f C k (R p+q, R p ). Let f B (x) be the leading p p block of the Jacobian matrix f (x) = [ f B (x) f N (x) ], and f N (x) the trailing p q block. Let x B be the first p 1 block of the vector x and x N the trailing q 1 block.
Theorem 2: Implicit Function Theorem. Let f C k (R p+q, R p ) and let x R p+q be such that f( x) = 0 and f B ( x) nonsingular. Then there exist open neighbourhoods U B R p of x B and U N R q of x N and a function h C k (U N, U B ) such that for all (x B, x N ) U B U N, i) f(x B, x N ) = 0 x B = h(x N ), ii) f B (x) is nonsingular, iii) h (x N ) = ( f B (x)) 1 f N (x).
Reading Assignment: Lecture-Note 8.