CS711008Z Algorithm Design and Analysis

Similar documents
CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis

Convex Optimization & Lagrange Duality

Lecture 18: Optimization Programming

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

5. Duality. Lagrangian

Part IB Optimisation

Convex Optimization M2

ICS-E4030 Kernel Methods in Machine Learning

Convex Optimization Boyd & Vandenberghe. 5. Duality

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Convex Optimization and Modeling

The Primal-Dual Algorithm P&S Chapter 5 Last Revised October 30, 2006

Lagrange duality. The Lagrangian. We consider an optimization program of the form

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Lecture: Duality of LP, SOCP and SDP

CS-E4830 Kernel Methods in Machine Learning

Linear Programming Duality

Lecture: Duality.

Constrained Optimization and Lagrangian Duality

EE/AA 578, Univ of Washington, Fall Duality

Lecture 10: Linear programming duality and sensitivity 0-0

Linear and Combinatorial Optimization

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

2.098/6.255/ Optimization Methods Practice True/False Questions

Lectures 6, 7 and part of 8

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Applications of Linear Programming

On the Method of Lagrange Multipliers

Lecture 6: Conic Optimization September 8

Primal/Dual Decomposition Methods

A Brief Review on Convex Optimization

"SYMMETRIC" PRIMAL-DUAL PAIR

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

EE364a Review Session 5

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

CSCI5654 (Linear Programming, Fall 2013) Lecture-8. Lecture 8 Slide# 1

CSCI : Optimization and Control of Networks. Review on Convex Optimization

4. Duality Duality 4.1 Duality of LPs and the duality theorem. min c T x x R n, c R n. s.t. ai Tx = b i i M a i R n

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

CO 250 Final Exam Guide

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Convex Optimization and SVM

Chapter 1 Linear Programming. Paragraph 5 Duality

minimize x subject to (x 2)(x 4) u,

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method

Summary of the simplex method

LINEAR PROGRAMMING II

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Linear Programming: Simplex

Duality Theory of Constrained Optimization

Note 3: LP Duality. If the primal problem (P) in the canonical form is min Z = n (1) then the dual problem (D) in the canonical form is max W = m (2)

Interior Point Algorithms for Constrained Convex Optimization

Linear Programming Duality P&S Chapter 3 Last Revised Nov 1, 2004

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Homework Set #6 - Solutions

12. Interior-point methods

Math 5593 Linear Programming Week 1

Lagrangian Duality and Convex Optimization

Linear Programming. Leo Liberti. LIX, École Polytechnique. Operations research courses / LP theory p.

subject to (x 2)(x 4) u,

The Simplex Algorithm

Algorithmic Game Theory and Applications. Lecture 7: The LP Duality Theorem

Lagrangian Duality Theory

Lagrange Relaxation and Duality

Generality of Languages

3.10 Lagrangian relaxation

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

Support Vector Machines: Maximum Margin Classifiers

Nonlinear Programming and the Kuhn-Tucker Conditions

Interior-Point Methods for Linear Optimization

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Tutorial on Convex Optimization: Part II

Chap 2. Optimality conditions

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

The CPLEX Library: Mixed Integer Programming

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Summary of the simplex method

3 The Simplex Method. 3.1 Basic Solutions

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Linear & nonlinear classifiers

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Linear Programming. (Com S 477/577 Notes) Yan-Bin Jia. Nov 28, 2017

2.3 Linear Programming

1. f(β) 0 (that is, β is a feasible point for the constraints)

Lecture 3: Semidefinite Programming

CSC373: Algorithm Design, Analysis and Complexity Fall 2017 DENIS PANKRATOV NOVEMBER 1, 2017

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

A Review of Linear Programming

Farkas Lemma, Dual Simplex and Sensitivity Analysis

Applied Lagrange Duality for Constrained Optimization

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Lecture: Algorithms for LP, SOCP and SDP

4. The Dual Simplex Method

Appendix A Taylor Approximations and Definite Matrices

Transcription:

.. CS711008Z Algorithm Design and Analysis Lecture 9. Algorithm design technique: Linear programming and duality Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 183

. Outline The first example: the dual of Diet problem; Understanding duality: Lagrangian function, Lagrangian dual function, and Lagrangian dual problem; Conditions of optimal solution; Four properties of duality for linear program; Solving LP using duality: Dual simplex algorithm, Primal and dual algorithm, and interior point method; Applications of duality: Farkas lemma, von Neumann s MiniMax theorem, Yao s MiniMax theorem, Dual problem in SVM, and ShortestPath problem; Appendix: Proof of Slater theorem, and finding initial solution to dual problem. 2 / 183

. Importance of duality When minimizing a function f(x), it is invaluable to know a lower bound of f(x) in advance. Calculation of lower bound is extremely important to the design of approximation algorithm and branch-and-bound method. Duality and relaxation (say integer relaxation, convex relaxation) are powerful techniques to obtain a reasonable lower bound. Linear programs come in primal/dual pairs. It turns out that every feasible solution for one of these two problems provides a bound for the objective value for the other problem. The dual problems are always convex even if the primal problems are not convex. 3 / 183

. Ṭhe first example: the dual of Diet problem. 4 / 183

. Revisiting Diet problem A housewife wonders how much money she must spend on foods in order to get all the energy (2000 kcal), protein (55 g), and calcium (800 mg) that she needs every day. Food Energy Protein Calcium Price Quantity Oatmeal 110 4 2 3 x 1 Whole milk 160 8 285 9 x 2 Cherry pie 420 4 22 20 x 3 Pork beans 260 14 80 19 x 4 Linear program: min 3x 1 + 9x 2 + 20x 3 + 19x 4 money s.t. 110x 1 + 160x 2 + 420x 3 + 260x 4 2000 energy 4x 1 + 8x 2 + 4x 3 + 14x 4 55 protein 2x 1 + 285x 2 + 22x 3 + 80x 4 800 calcium x 1, x 2, x 3, x 4 0 5 / 183

. Dual of Diet problem: Pricing problem Consider a company producing protein powder, energy bar, and calcium tablet as substitution to foods. The company wants to design a reasonable pricing strategy to earn money as much as possible. However, the price cannot be arbitrarily high due to the following considerations:.1 If the prices are competitive with foods, one might consider choosing a combination of the ingredients rather than foods;.2 Otherwise, one will choose to buy foods directly. 6 / 183

. LP model of Pricing problem Food Energy Protein Calcium Price (cents) Oatmeal 110 4 2 3 Whole milk 160 8 285 9 Cherry pie 420 4 22 20 Pork with beans 260 14 80 19 Price y 1 y 2 y 3 Linear program: max 2000y 1 + 55y 2 + 800y 3 money s.t. 110y 1 + 4y 2 + 2y 3 3 oatmeal 160y 1 + 8y 2 + 285y 3 9 milk 420y 1 + 4y 2 + 22y 3 20 pie 260y 1 + 14y 2 + 80y 3 19 pork&beans y 1, y 2, y 3 0 7 / 183

. Primal problem and Dual problem c 1 c 2... c n a 11 a 12... a 1n b 1 a 21 a 22... a 2n b 2... a m1 a m2... a mn b m Primal problem and Dual problem are two points of view of the coefficient matrix A: Primal problem: row point of view Dual problem: column point of view 8 / 183

. Primal problem min c 1 x 1 + c 2 x 2 +... + c n x n a 11 x 1 + a 12 x 2 +... + a 1n x n b 1 a 21 x 1 + a 22 x 2 +... + a 2n x n b 2... a m1 x 1 + a m2 x 2 +... + a mn x n b m x i 0 for each i Primal problem: row point of view (in red); min c T x s.t. Ax b x 0 9 / 183

. Dual problem c 1 c 2... c n max y 1 a 11 y 1 a 12... y 1 a 1n y 1 b 1 + + + + y 2 a 21 y 2 a 22... y 2 a 2n y 2 b 2 + + + +....... + + + + y m a m1 y m a m2... y m a mn y m b m y j 0 for each j Dual problem: column point of view (in blue). max b T y s.t. y 0 A T y c 10 / 183

. How to write Dual problem? Case 1 For each constraint in the Primal problem, a variable is set in the Dual problem. If the Primal problem has inequality constraints, the Dual problem is written as follows. Primal problem: min c T x s.t. Ax b x 0 Dual problem: max b T y s.t. y 0 A T y c 11 / 183

. How to write Dual problem? Case 2 For each constraint in the Primal problem, a variable is set in the Dual problem. If the Primal problem has inequality constraints, the Dual problem is written as follows. Primal problem: min c T x s.t. Ax b x 0 Dual problem: max b T y s.t. y 0 A T y c 12 / 183

. How to write Dual problem? Case 3 For each constraint in the Primal problem, a variable is set in the Dual problem. If the Primal problem has equality constraints, the Dual problem is as follows. Primal problem: Dual problem: min c T x s.t. Ax = b x 0 max s.t. b T y A T y c Note: there is neither y 0 nor y 0 constraint in the dual problem. 13 / 183

. Why can the Dual problem be written as above?. Understanding duality from the Lagrangian dual point of view 14 / 183

. Standard form of constrained optimization problems Consider the following constrained optimization problem (might be non-convex). min f 0 (x) s.t. f i (x) 0 i = 1,..., m h i (x) = 0 i = 1,..., p Here the variables x R n and we use D = m p dom f i dom h i to represent the domain of i=0 i=1 definition. We use p to represent the optimal value of the problem. 15 / 183

. An equivalent unconstrained optimization problem We can transform this constrained optimization problem into an equivalent unconstrained optimization problem: m p min f 0 (x) + I (f i (x)) + I 0 (h i (x)) i=1 i=1 where x D, I (u) and I 0 (u) are indicator functions for non-positive reals and the set {0}, respectively: { { 0 u 0 0 u = 0 I (u) = I 0 (u) = u > 0 u 0 10 5 I (u) 10 5 I 0 (u) 0. 0. 5. 2 1 0 u 1 2 5. 2 1 0 u 16 / 183 1 2

. Difficulty in solving the optimization problem 10 5 I (u) 10 5 I 0 (u) 0. 0. 5. 2 1 0 u 1 2 5. 2 1 0 u 1 2 Intuitively, I (u) and I 0 (u) represent our infinite dissatisfaction with the violence of constraints. However both I 0 (u) and I (u) are non-differentiable, making the optimization problem m p min f 0 (x) + I (f i (x)) + I 0 (h i (x)) i=1, although unconstrained, not easy to solve. Question: How to efficiently solve this optimization problem? i=1 17 / 183

. Approximating I (u) using a differentiable function (1) 10 5 1 t log( u) (t > 0) I (u) 0.. 5 2 1 0 u 1 2 An approximation to I (u) is logarithm barrier function: Î (u) = 1 log( u) (t > 0) t The difference between Î (u) and I (u) decreases as t increases. This approximation was used in the interior point method. 18 / 183

. Approximating I (u) using a differentiable function (2) 10 5 0 I (u) ϕ(u). 5. 2 1 0 u 1 2 Another approximation to I (u) is a penalty function: { x t (t > 1) x 0 Î (u) = ϕ(u) = 0 otherwise The penalty function penalizes any u which is greater than zero. It is a hands-off method for converting contracted problems into unconstrained problems, to which an initial feasible solution is easy to obtained. However, in some cases it cannot be applied because the objective function is undefined out of the feasible set or the unconstrained problem becomes ill-conditioned as t increases. 19 / 183

. Approximating I (u) using a differentiable function (3) 10 5 0 5. 2 1 I (u). λu (λ 0) 0 1 2 u Another approximation to I (u) is a simple linear function: Î (u) = λu (λ 0) Despite the considerable difference between Î (u) and I (u), Î (u) still provides lower bound information of I (u). 20 / 183

. Approximating I 0 (u) using a differentiable function 10 5 0 νu (ν 0). I 0 (u) νu (ν 0) 5. 2 1 0 u 1 2 I 0 (u) can also be approximated using linear function: Î 0 (u) = νu Although Î0(u) deviates considerably from I 0 (u), Î0(u) still provides lower bound information of I 0 (u). It is worthy pointed out that unlike Î (u), Î0(u) has no restriction on ν. 21 / 183

. Lagrangian function Consider the unconstrained optimization problem min f 0 (x) + m I (f i (x)) + i=1 p I 0 (h i (x)). Now let s replace I (u) with λu (λ 0) and replace I 0 (u) with νu. Then the objective function becomes: L(x, λ, ν) = f 0 (x) i=1 m λ i f i (x) i=1 p ν i h i (x) This function is called Lagrangian function, which is a lower bound of f 0 (x) for any feasible solution x when λ 0. Here we call λ i Lagrangian multiplier for the i-th inequality constraint f i (x) 0 and ν i Lagrangian multiplier for the i-th equality constraint h i (x) = 0. i=1 22 / 183

. Lagrangian connecting primal and dual: An example Primal problem: Lagrangian: min x 2 2x s.t. x 0 L(x, λ) = x 2 2x + λx Note that L(x, λ) is a lower bound of the primal objective function x 2 2x when λ 0 and x 0. Dual problem: max 1 4 (2 λ)2 s.t. λ 0 23 / 183

. Lagrangian connecting primal and dual Primal: f 0 (x) = x*x - 2*x L(x,y) = x*x - 2*x + y*x Dual: g(y) = -1.0/4.0*(2-y)*(2-y) 15 10 5 0-5 -10-15 0 1 2 3 4 5-5 -4-3 -2-1 0 x y Observation: Primal objective function Lagrangian Dual objective function in the feasible region. 24 / 183

. Ḷagrangian dual function and Lagrangian dual problem 25 / 183

. Lagrangian dual function Consider the following constrained optimization problem. Lagrangian function: min f 0 (x) s.t. f i (x) 0 i = 1,..., m h i (x) = 0 i = 1,..., p L(x, λ, ν) = f 0 (x) m λ i f i (x) i=1 p ν i h i (x) which is a lower bound of f 0 (x) for any feasible solution x when λ 0. Now let s consider the infimum of Lagrangian function (called Lagrangian dual function): g(λ, ν) = inf L(x, λ, ν) x D Note: infimum rather than minimum is used here as some sets have no minimum. i=1 26 / 183

. Lagrangian dual problem Lagrangian dual function provides lower bound of the primal objective function, i.e. f 0 (x) L(x, λ, ν) g(λ, ν) for any feasible solution x when λ 0. Now let s try to find the tightest lower bound of the primal objective function, which can be obtained by solving the following Lagrangian dual problem: max g(λ, ν) s.t. λ 0 27 / 183

. Dual problem is always convex Note that the Lagrangian dual function g(λ, ν) is a point-wise minimum of affine functions over λ, ν. g(λ, ν) = inf L(x, λ, ν) x D = inf x D (f 0(x) m λ i f i (x) i=1 m ν i h i (x)) Thus Lagrangian dual function g(λ, ν) is always concave and the dual problem is always a convex programming problem even if the primal problem is non-convex. i=1 28 / 183

. Note: The dual is not intrinsic It is worthy pointed out that the dual problem and its optimal objective value are not properties of the primal feasible set and primal objective function alone. They also depend on the specific constraints in the primal problem. Thus we can construct equivalent primal optimization problem with different duals through the following ways: Replacing primal objective function f 0 (x) with h(f 0 (x)) where h(u) is monotonically increasing. Introducing new variables. Adding redundant constraints. 29 / 183

. Lagrangian function and dual function: An example I Consider the following primal problem: min e x2 + 4e x2 s.t. 2e x2 + 1 0 i = 1,..., m Lagrangian function: L(x, λ) = e x2 + 4e x2 λ( 2e x2 + 1) where λ 0. 8 f 0 (x)=exp(x*x)+4exp(-x*x) f 1 (x)=-2exp(-x*x) L(x, -0.5) L(x, -1) L(x, -1.5) 6 4 2 0-2 -1.5-1 -0.5 0 0.5 1 1.5 x 30 / 183

. Lagrangian function and dual function: An example II Lagrangian dual function: g(λ) = inf L(x, λ) x R { = 5 + λ λ 1.5 2 4 + 2λ λ 1.5 λ 0 5 g(lambda) 4 3 2 1-5 -4-3 -2-1 0 lambda 31 / 183

. Ḍeriving dual problem of linear program in standard form 32 / 183

. Dual problem of LP problem in standard form Consider a LP problem in standard form: min c T x s.t. Ax b x 0 Lagrangian function: L(x, λ, ν) = c T x m λ i(a i1 x 1 +...+a in x n b i ) n ν ix i i=1 i=1 where λ 0, ν 0. Notice that for any feasible solution x and λ 0, ν 0, Lagrangian function is a lower bound of the primal objective function, i.e. c T x L(x, λ, ν), and further c T x L(x, λ, ν) inf L(x, λ, ν) x D Let s define Lagrangian dual function g(λ, ν) = inf L(x, λ, ν) and rewrite the above inequality as x D c T x L(x, λ, ν) g(λ, ν) 33 / 183

. Lagrangian dual function What is the Lagrangian dual function g(λ, ν)? g(λ, ν) = inf L(x, λ, ν) x D m = inf x D (ct x λ i (a i1 x 1 +... + a in x n b i ) i=1 = inf x D (λt b + (c T λ T A ν T )x) { λ T b if c T = λ T A + ν T = otherwise m ν i x i ) i=1 Note that D = R n. Thus g(λ, ν) = λ T b if c T = λ T A + ν T ; otherwise, g(λ, ν) =, which is a trivial lower bound for the primal objective function c T x. We usually denote the domain of g(λ, ν) as dom g = {(λ, ν) g(λ, ν) > }. 34 / 183

. Lagrangian dual problem Now let s try to find the tightest lower bound of the primal objective function c T x, which can be calculated by solving the following Lagrangian dual problem: { max g(λ, ν) = λ T b if c T = λ T A + ν T otherwise s.t. λ 0 ν 0, or explicitly representing constraints in dom g: max λ T b s.t. λ T A c T λ 0 Note that this is actually the Dual form of LP if replacing λ by y; thus, we have another explanation of Dual variables y the Lagrangian multiplier. 35 / 183

. An example Primal problem: Lagrangian function: min x s.t. x 2 x 0 L(x, λ, ν) = x λ(x 2) νx = λ + (1 λ ν)x Note that when λ 0, ν 0 and x 2, L(x, λ, ν) is a lower bound of the primal objective function x. Lagrangian dual function: g(λ, ν) = inf L(x, λ, ν) = x R Dual problem: { 2λ if 1 λ ν = 0 max 2λ s.t. λ 1 λ 0 otherwise 36 / 183

. Ḍeriving dual problem of linear program in slack form 37 / 183

. Dual problem of LP problem in slack form Consider a LP problem in slack form: Lagrangian function: min c T x s.t. Ax = b x 0 L(x, λ, s) = c T x m λ i(a i1 x 1 +...+a in x n b i ) n ν ix i i=1 i=1 Notice that for any feasible solution x and ν 0, Lagrangian function is a lower bound of the primal objective function, i.e. c T x L(x, λ, ν), and further c T x L(x, λ, ν) inf L(x, λ, ν) x D Let s define Lagrangian dual function g(λ, ν) = inf L(x, λ, ν) and rewrite the above inequality as x D c T x L(x, λ, ν) g(λ, ν) 38 / 183

. Lagrangian dual function What is the Lagrangian dual function g(λ, ν)? g(λ, ν) = inf L(x, λ, ν) x D m = inf x D (ct x λ i (a i1 x 1 +... + a in x n b i ) i=1 = inf x D (λt b + (c T λ T A ν T )x) { λ T b if c T = λ T A + ν T = otherwise Note that D = R n. Thus g(λ, ν) = λ T b if c T = λ T A + ν T ; otherwise, g(λ, ν) =, which is a trivial lower bound for the primal objective function c T x. m ν i x i ) i=1 39 / 183

. Lagrangian dual problem Now let s try to find the tightest lower bound of the primal objective function c T x, which can be calculated by solving the following Lagrangian dual problem: { λ T b if c T = λ T A + ν T max g(λ, ν) = otherwise s.t. ν 0, or explicitly representing constraints in dom g: max λ T b s.t. λ T A c T Note that the dual problem does not have the λ 0 constraint as for any λ i, λ i (a i1 x 1 +... + a in x n b i ) is a lower bound for I 0 (a i1 x 1 +... + a in x n b i ). 40 / 183

. Ẹxplanations of dual variables 41 / 183

. Two explanations of dual variables y: L. Kantorovich vs. T. Koopmans.1 Price interpretation: Constrained optimization plays an important role in economics. Dual variables are also called as shadow price (by T. Koopmans), i.e. the instantaneous change in the optimization objective function when constraints are relaxed, or marginal cost when strengthening constraints..2 Lagrangian multiplier interpretation: Dual variables are essentially Lagrangian multiplier, which describe the effect of constraints on the objective function (by L. Kantorovich). For example, it can describe how much the optimal objective function will change when b i increase to b i + b i. In fact, we have L(x,λ) b i = λ i. 42 / 183

. Explanation of dual variables y: using Diet as an example Optimal solution to primal problem with b 1 = 2000, b 2 = 55, b 3 = 800: x = (14.24, 2.70, 0, 0), c T x = 67.096. Optimal solution to dual problem: y = (0.0269, 0, 0.0164), y T b = 67.096. Let s make a slight change on b, and watch the effect on min c T x..1 b = (2001, 55, 800): min c T x = 67.123 (Note that.2 y 1 = 0.0269 = 67.123 67.096) b = (2000, 56, 800): min c T x = 67.096 (Note that.3 y 2 = 0 = 67.096 67.096) b = (2000, 55, 801): min c T x = 67.112 (Note that y 3 = 0.0164 = 67.112 67.096) 43 / 183

..Property of Lagrangian dual problem: Weak duality 44 / 183

. Weak duality Let s p and d denote the optimal objective value of a primal problem and its dual problem, respectively. We always have d p regardless of non-convexity of the primal problem. The difference p d is called duality gap. Weak duality holds even if p =, which means the infeasibility of the dual problem. Similarly, if d = +, the primal problem is infeasible. As dual problems are always convex, it is relatively easy to calculate d efficiently and thus obtain a lower bound for p. 45 / 183

. An example of non-zero duality gap Consider the following non-convex optimization problem. Lagrange dual function: min x 1 x 2 s.t. x 1 0 x 2 0 x 2 1 + x2 2 1 g(λ) = inf x D (x 1x 2 λ 1 x1 λ 2 x 2 + λ 3 (x 2 1 + x 2 2 1)) Dual problem: max g(λ) s.t. λ 1 0 λ 2 0 λ 3 1 2 Duality gap: p = 0 > d = 1 2. 46 / 183

..Property of Lagrangian dual problem: Strong duality 47 / 183

. Strong duality Strong duality holds if p = d, i.e., the duality gap is 0. Strong duality doesn t necessarily hold for any optimization problem, but it almost always holds for convex ones, i.e. min f 0 (x) s.t. f i (x) 0 i = 1,..., m Ax = b where f i (x), (i = 0, 1,..., m) are convex functions. The conditions that guarantee strong duality are called regularity conditions, one of which is the Slater s condition. 48 / 183

. Slater s condition Slater s condition: Consider a convex optimization problem. The strong duality holds if there exists a vector x relint D such that f i (x)<0, i = 1,..., m, Ax = b Suppose the first k constraints are affine, then the Slater s conditions turns into: there exists a vector x relintd such that f i (x) 0, i = 1,..., k, f i (x)<0, i = 1 + 1,..., m, Ax = b Specifically, if all constraints are affine, then the original constraints are themselves the Slater s conditions. Please refer to Appendix for the proof of the Slater theorem. 49 / 183

..Conditions of optimal solution: KKT conditions 50 / 183

. Three types of optimization problems It is relatively easy to optimize an objective function without any constraint, say: min f 0 (x) But how to optimize an objective function with equality constraints? min f 0 (x) s.t. h i (x) = 0 i = 1, 2,..., p And how to optimize an objective function with inequality constraints? min f 0 (x) s.t. f i (x) 0 i = 1, 2,..., m h i (x) = 0 i = 1, 2,..., p 51 / 183

. Optimization problem over equality constraints Consider the following optimization problem: min f 0 (x, y) s.t. h(x, y) = 0 Intuition: suppose (x, y ) is the optimum point. Thus at (x, y ), f 0 (x, y) does not change when walking along the curve h(x, y) = 0; otherwise, we can follow the curve to make f 0 (x, y) smaller, meaning that the starting point (x, y ) is not optimum. h(x, y) h(x, y) = 0 (x, y ) f 0 (x, y). f 0 (x, y) = 1 52 / 183

h(x, y) = 0 h(x, y) (x, y ) f 0 (x, y). f 0 (x, y) = 1 f 0 (x, y) = 2 So at (x, y ), the red line tangentially touches a blue contour, i.e. there exists a real λ such that: f 0 (x, y) = λ h(x, y) Lagrange must have cleverly noticed that the equation above looks like partial derivatives of some larger scalar function: L(x, y, λ) = f 0 (x, y) λh(x, y) Necessary conditions of optimum point: If (x, y ) is local optimum, then there exists a λ such that L(x, y, λ) = 0. 53 / 183

. Understanding Lagrangian function Lagrangian function: a combination of the original optimization objective function and constraints: L(x, y, λ) = f 0 (x, y) λh(x, y) The critical point of Lagrangian L(x, y, λ) occurs at saddle points rather than local minima (or maxima). Thus, to utilize numerical optimization techniques, we must first transform the problem such that the critical points lie at local minima. This is done by calculating the magnitude of the gradient of Lagrangian. 54 / 183

. Optimization problems over inequality constraints Consider the following optimization problem: f 1 (x, y) 0 min f 0 (x, y) s.t. f 1 (x, y) 0 f 1 (x, y) 0 (x, y ). f 0 (x, y) = 1 f 0 (x, y) = 2 (x, y ). f 0 (x, y) = 1 f 0 (x, y) = 2 Figure: Case 1: the optimum point (x, y ) lies in the curve f 1 (x, y) = 0. Thus Lagrangian condition L(x, y, λ) = 0 applies. Case 2: (x, y ) lies within the interior region f 1 (x, y)<0; thus we have f.. 0 (x, y) = 0 at.... (x, y ) 55 / 183

. Complementary slackness These two cases can be summarized as the following two conditions: (Stationary point) L(x, y, λ) = 0 (Complementary slackness) λf 1 (x, y ) = 0 Reason: In case 2, f i (x )<0 λ = 0 by complementary slackness. We further have f 0 (x, y ) = 0 since L(x, y, λ) = 0. Complementary slackness, also called orthogonality by Gomory, essentially equals to the strong duality for convex optimization problems. A relaxation of this condition, i.e., λ i f i (x ) = µ, where µ is a small positive number, is used in the interior point method. 56 / 183

. Proof of complementary slackness Assume that strong duality holds, i.e., p = d. Let s x and (λ, ν ) λ 0 denote the optimal solution to the primal and dual problem, respectively. We have: f 0 (x ) = g(λ, ν ) = inf (f 0(x) m λ if i (x) p ν ih i (x)) x D i=1 i=1 f 0 (x ) m λ if i (x ) p ν ih i (x ) i=1 i=1 f 0 (x ) Thus the last two inequalities turns into equalities, which implies that Lagrangian function L(x, λ, ν ) reaches its minimum at x and m i=1 λ i f i (x ) = 0. Note that λ i f i(x ) 0. Hence λ i f i(x ) = 0 for i = 1,..., m. 57 / 183

. KKT conditions for general optimization problems I Consider the following constrained optimization problem (might be non-convex). Lagrangian function: min f 0 (x) s.t. f i (x) 0 i = 1,..., m h i (x) = 0 i = 1,..., p L(x, λ) = f 0 (x) m λ i f i (x) i=1 p ν i h i (x) Let x and (λ, ν) denote the optimal solutions to the primal and dual problems, respectively. Suppose the strong duality holds. i=1 58 / 183

. KKT conditions for general optimization problems II Since Lagrangian function reaches its minimum at x, its gradient is 0 at x, i.e., f 0 (x ) m λ i f i (x ) i=1 p ν i h i (x ) = 0 i=1 Then x and (λ, ν) satisfy the following KKT conditions:.1 (Stationary point) L(x, λ, ν) = 0.2 (Primal feasibility) f i (x ) 0, i = 1,..., m; h i (x ) = 0, i = 1,..., p.3 (Dual feasibility) λ i 0, i = 1,..., m.4 (Complementary slackness) λ i f i (x ) = 0, i = 1,..., m 59 / 183

. KKT conditions for convex problems I Consider the following convex optimization problem. Lagrangian function: min f 0 (x) s.t. f i (x) 0 i = 1,..., m Ax = b L(x, λ) = f 0 (x) m λ i f i (x) ν T (Ax b) i=1 60 / 183

. KKT conditions for convex problems II x and (λ, ν) are optimal solutions to the primal and dual problems, respectively, if they satisfy the following KKT conditions:.1 (Stationary point) L(x, λ, ν) = 0.2 (Primal feasibility) f i (x ) 0, i = 1,..., m; Ax = b;.3 (Dual feasibility) λ i 0, i = 1,..., m.4 (Complementary slackness) λ i f i (x ) = 0, i = 1,..., m KKT conditions, named after William Karush, Harold W. Kuhn, and Albert W. Tucker, are usually not solved directly in optimization; instead, iterative successive approximation is most often used to find the final results that satisfy KKT conditions. 61 / 183

..Four properties of duality for linear program 62 / 183

. Property 1: Primal is the dual of dual. Theorem..For linear program, primal problem is the dual of dual. For a general optimization problem, the dual of dual is not always the primal problem but a convex relaxation of the primal problem. 63 / 183

. Property 2: Weak duality. Theorem. (Weak duality) The objective value of any feasible solution to the dual problem is always a lower bound of the objective value of. primal problem. It is easy to prove this property as Lagrangian function connects primal objective function and dual objective function. 64 / 183

. An example: Diet problem and its dual problem Primal problem P : min 3x 1 + 9x 2 + 20x 3 + 19x 4 money s.t. 110x 1 + 160x 2 + 420x 3 + 260x 4 2000 energy 4x 1 + 8x 2 + 4x 3 + 14x 4 55 protein 2x 1 + 285x 2 + 22x 3 + 80x 4 800 calcium x 1, x 2, x 3, x 4 0 Feasible solution x T = [0, 8, 2, 0] T c T x = 112. Dual problem D: max 2000y 1 + 55y 2 + 800y 3 money s.t. 110y 1 + 4y 2 + 2y 3 3 oatmeal 160y 1 + 8y 2 + 285y 3 9 milk 420y 1 + 4y 2 + 22y 3 20 pie 260y 1 + 14y 2 + 80y 3 19 pork&beans y 1, y 2, y 3 0 Feasible solution y T = [0.0269, 0, 0.0164] T y T b = 67.096 The theorem states that c T x y T b for any feasible solutions x and y. 65 / 183

. Proof.. Consider the following Primal problem: and Dual problem: min c T x s.t. Ax b x 0 max b T y s.t. y 0 A T y c. Let x and y denote a feasible solution to primal and dual problems, respectively. We have c T x y T Ax (by the feasibility of dual problem, i.e., y T A c T, and x T 0 ) Therefore c T x y T Ax y T b (by the feasibility of primal problem, i.e., Ax b, and y 0) 66 / 183

. Property 3: Strong duality. Theorem. (Strong duality) Consider a linear program. If the primal problem has an optimal solution, then the dual problem also has an optimal. solution with the same objective value.. Proof.. [ B Suppose x = 1 ] b be the optimal solution to the primal 0 problem. We have c T c T B B 1 A 0.. Let s set y T = c B T B 1. We will show that y T is the optimal solution to the dual problem. In fact, we have y T b = c B T B 1 b = c T x. That is, y T b reaches its upper bound. So y T is an optimal solution to the dual problem. 67 / 183

. Property 4: Complementary slackness. Theorem. Let x and y denote feasible solutions to the primal and dual problems, respectively. Then x and y are optimal solutions iff u i = y i (a i1 x 1 + a i2 x 2 +... + a in x n b i ) = 0 for any 1 i m, and v. j = (c j a 1j y 1 a 2j y 2... a mj y m )x j = 0 for any 1 j n. Intuition: a constraint of primal problem is loosely restricted the corresponding dual variable is tight. An example: the optimal solutions to Diet and its dual are x = (14.244, 2.707, 0, 0) and y = (0.0269, 0, 0.0164). 110x 1 + 160x 2 + 420x 3 + 260x 4 =2000 4x 1 + 8x 2 + 4x 3 + 14x 4 >55 y 2 = 0 2x 1 + 285x 2 + 22x 3 + 80x 4 =800 x 1, x 2, x 3, x 4 0 68 / 183

. Proof. Proof.. u i = 0 and v j = 0 for any i and j i u i = 0 and j v j = 0 (since u i 0, v j 0) i u i + j v j = 0 (y T Ax y T b) + (c T x y T Ax) = 0 y T b = c T x y and x are optimal solutions (by strong duality property, i.e., both y T b and c T x reach its bound). 69 / 183

. Summary: 9 cases of primal and dual problems 70 / 183

. Example 1: Primal has unbounded objective value and Dual is infeasible Primal: Dual: min 2x 1 2x 2 s.t. x 1 x 2 1 x 1 + x 2 1 x 1 0 x 2 0 max y 1 + y 2 s.t. y 1 0 y 2 0 y 1 y 2 2 y 1 + y 2 2 71 / 183

. Example 2: both Primal and Dual are infeasible Primal: Dual: min x 1 2x 2 s.t. x 1 x 2 2 x 1 + x 2 1 x 1 0 x 2 0 max 2y 1 y 2 s.t. y 1 0 y 2 0 y 1 y 2 1 y 1 + y 2 2 72 / 183

. Ṣolving linear program using duality 73 / 183

. KKT conditions for linear program Consider a linear program in slack form min c T x s.t. Ax = b x 0 The KKT conditions turns into: x and y are optimal solutions to the primal and dual problems, respectively, if they satisfy the following three conditions:.1 (Primal feasibility) Ax = b, x 0.2 (Dual feasibility).3 (Complementary slackness) y T A c T c T x = y T b Question: How to obtain x and y that satisfy these three conditions simultaneously? 74 / 183

. Strategy 1 1: x = x 0 ; //Initialize x with a primal feasible solution 2: y = y 0 ; //Calculate initial y according to complementary slackness 3: while TRUE do 4: x =Improve(x); 5: //Improve x towards dual feasibility of corresponding y. Throughout this process, primal feasibility is maintained and y is recalculated according to complementary slackness 6: if stopping(x) then 7: break; 8: //If y is dual feasible then stop 9: end if 10: end while 11: return x; Example: Primal simplex 75 / 183

. Strategy 2 1: y = y 0 ; //Initialize y with a dual feasible solution 2: x = x 0 ; //Calculate initial x according to complementary slackness 3: while TRUE do 4: y =Improve(y); 5: //Improve y towards primal feasibility of corresponding x. Throughout this process, dual feasibility is maintained and x is recalculated according to complementary slackness 6: if stopping(y) then 7: break; 8: //If x is primal feasible then stop 9: end if 10: end while 11: return y; Example: Dual simplex, primal and dual 76 / 183

. Strategy 3 1: x = x 0 ; //Initialize x with a primal feasible solution 2: y = y 0 ; //Initialize y with a dual feasible solution 3: while TRUE do 4: (x, y) =Improve(x, y); //Improve x and y towards complementary slackness 5: if stopping(x, y) then 6: break; 7: //If x and y satisfy complementary slackness then stop 8: end if 9: end while 10: return y; Example: Interior point method 77 / 183

. Ḍual simplex method 78 / 183

. Revisiting primal simplex: An example min x 3 +x 4 s.t. x 1 x 3 +2x 4 = 2 x 2 +3x 3 2x 4 = 6 x 1, x 2, x 3, x 4 0 x 1 x 2 x 3 x 4 -z= 0 c 1 = 0 c 2 =0 c 3 =-1 c 4 =1 x B1 = b 1 =2 1 0-1 2 x B2 = b 2 =6 0 1 3-2 x 4 4 3x 3 2x 4 = 6 3 x 3 + 2x 4 = 2 2 1. 1 2 3 4x 3 79 / 183

. Step 2 min x 3 +x 4 s.t. x 1 x 3 +2x 4 = 2 x 2 +3x 3 2x 4 = 6 x 1, x 2, x 3, x 4 0 x 1 x 2 x 3 x 4 -z= 2 c 1 = 0 c 2 = 1 c 3 3 =0 c 4 = 1 3 x B1 = b 1 =4 1 1 4 0 3 3 x B2 = b 2 =2 0 1 1 2 3 3 x 4 4 3x 3 2x 4 = 6 3 x 3 + 2x 4 = 2 2 1. 1 2 3 4x 3 80 / 183

. The viewpoint of dual problem Primal problem P : Dual problem D: min x 3 +x 4 s.t. x 1 x 3 +2x 4 = 2 x 2 +3x 3 2x 4 = 6 x 1, x 2, x 3, x 4 0 max 2y 1 +6y 2 s.t. y 1 0 y 2 0 y 1 +3y 2 1 2y 1 2y 2 1 x 4 43x 3 2x 4 = 6 3 2 x 3 + 2x 4 = 2 y 1 + 3y 2 = 1 4 3 2 1 y 2 y 1 1 2 1 3.... 4... 1 2 3 4 x 3 2y 1 2y 2 = 1 81 / 183

. Set primal and dual solutions according to a basis Let s consider a linear program in slack form and its dual problem, i.e. Primal problem: min c T x s.t. Ax = b x 0 Dual problem: max b T y s.t. A T y c From each basis B of A, we can set a primal solution x and a dual solution y. [ ] B Primal solution: x = 1 b. x is feasible if B 0 1 b 0. Dual solution: y T = c T B B 1. y is feasible if y T A c T, i.e., c T = c T c T B B 1 A = c T y T A 0. Note that by this setting, complementary slackness follows as c T x = c T B B 1 b = y T b. 82 / 183

. Primal feasible basis Consider the Primal problem: min x 3 +x 4 s.t. x 1 x 3 +2x 4 = 2 x 2 +3x 3 2x 4 = 6 x 1, x 2, x 3, x 4 0 x 1 x 2 x 3 x 4 -z= 0 c 1 = 0 c 2 =0 c 3 =-1 c 4 =1 x B1 = b 1 =2 1 0-1 2 x B2 = b 2 =6 0 1 3-2 Primal variables: x; Feasible: B 1 b 0. Basis B is called primal feasible if all elements in B 1 b (the first column except for z) are non-negative. 83 / 183

. Dual feasible basis Now let s consider the Dual problem: max 2y 1 +6y 2 s.t. y 1, y 2 0 y 1 +3y 2 1 2y 1 2y 2 1 Consider Primal simplex tabular again: x 1 x 2 x 3 x 4 -z= 0 c 1 = 0 c 2 =0 c 3 =-1 c 4 =1 x B1 = b 1 =2 1 0-1 2 x B2 = b 2 =6 0 1 3-2 Dual variables: y T = c T B B 1 ; Feasible: y T A c T. Basis B is called dual feasible if all elements in c T = c T c T B B 1 A = c T y T A (the first row except for z) are non-negative. 84 / 183

. Another view point of the Primal Simplex algorithm Thus another view point of the Primal Simplex algorithm can be described as:.1 Starting point: The Primal Simplex algorithm starts with a primal basic feasible solution (the first column in simplex table x B = B 1 b 0). Set dual variable y T = c T B B 1..2 Improvement: By pivoting basis, we move towards dual feasibility, i.e. the first row in simplex table c T = c T c T B B 1 A = c T y T A 0. Here, the improvement of dual feasibility is achieved by selecting a negative element in the first row in the pivoting process. Throughout the process we maintain the primal feasibility and complementary slackness, i.e., c T x = c T B 1 b = y T b..3 Stopping criterion: c T = c T c T B B 1 A 0, i.e., y T A c T. In other words, the iteration process ends when the basis is both primal feasible and dual feasible. 85 / 183

. Another viewpoint of primal simplex: Step 1 min x 3 +x 4 s.t. x 1 x 3 +2x 4 = 2 x 2 +3x 3 2x 4 = 6 x 1, x 2, x 3, x 4 0 x 1 x 2 x 3 x 4 -z= 0 c 1 = 0 c 2 =0 c 3 =-1 c 4 =1 x B1 = b 1 =2 1 0-1 2 x B2 = b 2 =6 0 1 3-2 Dual solution: y T = c T B B 1 = [0 0], which is infeasible. x 4 4 3x 3 2x 4 = 6 3 2 x 3 + 2x 4 = 2 y 1 + 3y 2 = 1 4 3 2 1 y 2 y 1 1 2 1 3. 4 1 2 3 4x 3 2y 1 2y 2 = 1 86 / 183

. Another viewpoint of primal simplex: Step 2 min x 3 +x 4 s.t. x 1 x 3 +2x 4 = 2 x 2 +3x 3 2x 4 = 6 x 1, x 2, x 3, x 4 0 x 1 x 2 x 3 x 4 -z= 2 c 1 = 0 c 2 = 1 c 3 3 =0 c 4 = 1 3 x B1 = b 1 =4 1 1 4 0 3 3 x B2 = b 2 =2 0 1 1 2 3 3 Dual solution: y T = c T B B 1 = [0 1 3 ], which is feasible. x 4 4 3x 3 2x 4 = 6 3 2 x 3 + 2x 4 = 2 y 1 + 3y 2 = 1 4 3 2 1 y 2 y 1 1 2 1 3.... 4... 1 2 3 4 x 3 2y 1 2y 2 = 1 87 / 183

. Dual simplex works in just an opposite fashion Dual simplex:.1 Starting point: The Dual Simplex algorithm starts with a dual basic feasible solution y T = c T B B 1 such that y T A c T, i.e., the first row in simplex table c T = c T c T B B 1 A = c T y T A 0. Set primal variables x B = B 1 b and x N = 0)..2 Improvement: By pivoting basis, we move towards primal feasibility, i.e. the first column in simplex table B 1 b 0. Here, the improvement of primal feasibility is achieved by selecting a negative element from the first column in the pivoting process. Throughout the process we maintain the dual feasibility and complementary slackness, i.e., c T x = c T B 1 b = y T b..3 Stopping criterion: x B = B 1 b 0. In other words, the iteration process ends when the basis is both primal feasible and dual feasible. 88 / 183

. Primal simplex vs. Dual simplex Both primal simplex and dual simplex terminate at the same condition, i.e. the basis is primal feasible and dual feasible simultaneously. However, the final objective is achieved in totally opposite fashions the primal simplex method keeps the primal feasibility while the dual simplex method keeps the dual feasibility during the pivoting process. The primal simplex algorithm first selects an entering variable and then determines the leaving variable. In contrast, the dual simplex method does the opposite; it first selects a leaving variable and then determines an entering variable. 89 / 183

Dual simplex(b I, z, A, b, c) 1: //Dual simplex starts with a dual feasible basis. Here, B I contains the indices of the basic variables. 2: while TRUE do 3: if there is no index l (1 l m) has b l < 0 then 4: x =CalculateX(B I, A, b, c); 5: return (x, z); 6: end if; 7: choose an index l having b l < 0 according to a certain rule; 8: for each index j (1 i n) do 9: if a lj < 0 then 10: j = c j a lj ; 11: else 12: j = ; 13: end if 14: end for 15: choose an index e that minimizes j; 16: if e = then 17: return no feasible solution ; 18: end if 19: (B I, A, b, c, z) = Pivot(B I, A, b, c, z, e, l); 20: end while 90 / 183

. An example Standard form: Slack form: min 5x 1 + 35x 2 + 20x 3 s.t. x 1 x 2 x 3 2 x 1 3x 2 3 x 1, x 2, x 3 0 min 5x 1 + 35x 2 + 20x 3 s.t. x 1 x 2 x 3 + x 4 = 2 x 1 3x 2 + x 5 = 3 x 1, x 2, x 3, x 4, x 5 0 91 / 183

. Step 1 x 1 x 2 x 3 x 4 x 5 -z= 0 c 1 = 5 c 2 =35 c 3 =20 c 4 =0 c 5 =0 x B1 = b 1 =-2 1-1 -1 1 0 x B2 = b 2 =-3-1 -3 0 0 1 Basis (in blue): B = {a 4, a 5 } [ B Solution: x = 1 ] b = (0, 0, 0, 2, 3). 0 Pivoting: choose a 5 to leave basis since b 2 = 3 < 0; choose c a 1 to enter basis since min j j,a2j <0 a 2j = c 1 a 21. 92 / 183

. Step 2 x 1 x 2 x 3 x 4 x 5 -z= -15 c 1 = 0 c 2 =20 c 3 =20 c 4 =0 c 5 =5 x B1 = b 1 =-5 0-4 -1 1 1 x B2 = b 2 =3 1 3 0 0-1 Basis (in blue): B = {a 1, a 4 } [ B Solution: x = 1 ] b = (3, 0, 0, 5, 0). 0 Pivoting: choose a 4 to leave basis since b 1 = 5 < 0; choose c a 2 to enter basis since min j j,a1j <0 a 1j = c 2 a 12. 93 / 183

. Step 3 x 1 x 2 x 3 x 4 x 5 -z= -40 c 1 = 0 c 2 =0 c 3 =15 c 4 =5 c 5 =10 x B1 = b 1 = 5 4 0 1 1 x B2 = b 2 = 3 4 1 0 3 4 4 1 4 1 4 3 4 1 4 Basis (in blue): B = {a 1, a 2 } [ B Solution: x = 1 ] b = ( 5 0 4, 3 4, 0, 0, 0). Pivoting: choose a 1 to leave basis since b 2 = 3 4 c a 3 to enter basis since min j j,a2j <0 a 2j = c 3 a 23. < 0; choose 94 / 183

. Step 4 x 1 x 2 x 3 x 4 x 5 -z= -55 c 1 = 20 c 2 =0 c 3 =0 c 4 =20 c 5 =5 x B1 = b 1 =1 1 3 1 0 0 1 3 x B2 = b 2 =1 4 1 3 0 1-1 3 Basis (in blue): B = {a 2, a 3 } [ B Solution: x = 1 ] b = (0, 1, 1, 0, 0). 0 Done! 95 / 183

. When dual simplex method is useful?.1 The dual simplex algorithm is most suited for problems for which an initial dual feasible solution is easily available..2 It is particularly useful for reoptimizing a problem after a constraint has been added or some parameters have been changed so that the previously optimal basis is no longer feasible..3 Trying dual simplex is particularly useful if your LP appears to be highly degenerate, i.e. there are many vertices of the feasible region for which the associated basis is degenerate. We may find that a large number of iterations (moves between adjacent vertices) occur with little or no improvement. 1 1 References: Operations Research Models and Methods, Paul A. Jensen and Jonathan F. Bard; OR-Notes, J. E. Beasley 96 / 183

. Primal and dual method: another strategy to find a pair of primal. and dual variables satisfying KKT conditions 97 / 183

. Primal and dual method: a brief history In 1955, H. Kuhn proposed the Hungarian method for the MaxWeightedMatching problem. This method effectively explores the duality property of linear programming. In 1956, G. Dantzig, R. Ford, and D. Fulkerson extended this idea to solve linear programming problems. In 1957, R. Ford, and D. Fulkerson applied this idea to solve network-flow problem and Hitchcock problem. In 1957, J. Munkres applied this idea to solve the transportation problem. 98 / 183

. Primal and dual method Basic idea: It is not easy to find primal variables x and dual variables y to satisfy the KKT conditions simultaneously. A reasonable strategy is starting from a dual feasible y, and pose restrictions on x according to complementary slackness. Next a step-by-step improvement procedure was performed towards the primal feasibility of x.. Primal P x Dual D y Step 1: y x Restricted Primal RP x Step 3: y = y+θ y Step 2: x y Dual of RP DRP y 99 / 183

. Basic idea of primal and dual method Primal P: min c 1x 1 + c 2x 2 +... + c nx n s.t. a 11x 1 + a 12x 2 +... + a 1nx n = b 1 (y 1) a 21 x 1 + a 22 x 2 +... + a 2n x n = b 2 (y 2 )... a m1 x 1 + a m2 x 2 +... + a mn x n = b m (y m ) x 1, x 2,..., x n 0 Dual D: max b 1 y 1 + b 2 y 2 +... + b m y m s.t. a 11y 1 + a 21y 2 +... + a m1y m c 1 a 12y 1 + a 22y 2 +... + a m2y m c 2... a 1n y 1 + a 2n y 2 +... + a mn y m c n Basic idea: Suppose we are given a dual feasible solution y. Let s verify whether y is an optimal solution or not:.1 If y is an optimal solution to the dual problem D, then the corresponding primal variables x should satisfy a restricted primal problem called RP ;.2 Furthermore, even if y is not optimal, the solution to the dual of RP (called DRP ) still provide invaluable. information.. it. can. be. 100 / 183 used to improve y.

. Step 1: y x I Dual problem D: max b 1 y 1 + b 2 y 2 +... + b m y m s.t. a 11 y 1 + a 21 y 2 +... + a m1 y m c 1 ( = x 1 0)... a 1n y 1 + a 2n y 2 +... + a mn y m c n ( < x n = 0) y provides information of the corresponding primal variables x:.1 Given a dual feasible solution y. Let s check whether y is optimal solution or not..2 If y is optimal, we have the following restrictions on x: a 1i y 1 + a 2i y 2 +... + a mi y m <c i x i = 0 (Reason: complement slackness. An optimal solution y satisfies (a 1i y 1 + a 2i y 2 +... + a mi y m c i ) x i = 0).3 Let s use J to record the index of tight constraints where = holds. 101 / 183

. Step 1: y x II 102 / 183

. Step 1: y x III.4 Thus the corresponding primal solution x should satisfy the following restricted primal (RP):.5 RP: a 11 x 1 + a 12 x 2 +... + a 1n x n = b 1 a 21 x 1 + a 22 x 2 +... + a 2n x n = b 2... a m1 x 1 + a m2 x 2 +... + a mn x n = b m x i = 0 i / J x i 0 i J.6 In other words, the optimality of y is determined via solving RP. 103 / 183

. But how to solve RP? I RP: a 11 x 1 + a 12 x 2 +... + a 1n x n = b 1 a 21 x 1 + a 22 x 2 +... + a 2n x n = b 2... a m1 x 1 + a m2 x 2 +... + a mn x n = b m x i = 0 i / J x i 0 i J How to solve RP? Recall that Ax = b, x 0 can be solved via solving an extended LP. 104 / 183

. But how to solve RP? II RP (extended through introducing slack variables): min ϵ = s 1 +s 2... +s m s.t. s 1 +a 11 x 1... +a 1n x n = b 1 s 2 +a 21 x 1... +a 2n x n = b 2...... s m +a m1 x 1... +a mn x n = b m x i = 0 i / J x i 0 i J s i 0 i.1 If ϵ OP T = 0, then we find a feasible solution to RP, implying that y is an optimal solution;.2 If ϵ OP T > 0, y is not an optimal solution. 105 / 183

. Step 2: x y I Alternatively, we can solve the dual of RP, called DRP : max w = b 1 y 1 + b 2 y 2 +... + b m y m s.t. a 11 y 1 + a 21 y 2 +... + a m1 y m 0 a 12 y 1 + a 22 y 2 +... + a m2 y m 0... a 1 J y 1 + a 2 J y 2 +... + a m J y m 0 y 1, y 2,... y m 1.1 If w OP T = 0, y is an optimal solution.2 If w OP T > 0, y is not an optimal solution. However, the optimal solution still provides useful information the optimal solution to DRP can be used to improve y. 106 / 183

. The difference between DRP and D Dual problem D: DRP: max b 1 y 1 + b 2 y 2 +... + b m y m s.t. a 11 y 1 + a 21 y 2 +... + a m1 y m c 1 a 12 y 1 + a 22 y 2 +... + a m2 y m c 2... a 1n y 1 + a 2n y 2 +... + a mn y m c n max w = b 1 y 1 + b 2 y 2 +... + b m y m s.t. a 11 y 1 + a 21 y 2 +... + a m1 y m 0 a 12 y 1 + a 22 y 2 +... + a m2 y m 0... a 1 J y 1 + a 2 J y 2 +... + a m J y m 0 y 1, y 2,... y m 1 How to write DRP from D? Replacing c i with 0; Only J restrictions in DRP ; An additional restriction: y 1, y 2,..., y m 1; 107 / 183

. Step 3: y y I Why y can be used to improve y? Consider an improved dual solution y = y+θ y, θ > 0. We have: Objective function: Since y T b = w OP T > 0, y T b = y T b + θw OP T > y T b. In other words, (y + θ y) is better than y. Constraints: The dual feasibility requires that: For any j J, a 1j y 1 + a 2j y 2 +... + a mj y m 0. Thus we have y T a j = y T a j + θ y T a j c j for any θ > 0. For any j / J, there are two cases: 108 / 183

. Step 3: y y II.1 j / J, a 1j y 1 + a 2j y 2 +... + a mj y m 0: Thus y is feasible for any θ > 0 since for 1 j n, a 1jy 1 + a 2jy 2 +... + a mjy m (1) = a 1j y 1 + a 2j y 2 +... + a mj y m (2) + θ(a 1j y 1 + a 2j y 2 +... + a mj y m ) (3) c j (4) Hence dual problem D is unbounded and the primal problem P is infeasible..2 j / J, a 1j y 1 + a 2j y 2 +... + a mj y m > 0: We can safely set θ c j (a 1j y 1 +a 2j y 2 +...+a mj y m) a 1j y 1 +a 2j y 2 +...+a mj y m to guarantee that y T a j = y T a j + θ y T a j c j. = c j y T a j y T a j 109 / 183

. Primal Dual algorithm 1: Infeasible = No Optimal = No y = y 0 ; //y 0 is a feasible solution to the dual problem D 2: while TRUE do 3: Finding tight constraints index J, and set corresponding x j = 0 for j / J. 4: Thus we have a smaller RP. 5: Solve DRP. Denote the solution as y. 6: if DRP objective function w OP T = 0 then 7: Optimal= Yes 8: return y; 9: end if 10: if y T a j 0 (for all j / J) then 11: Infeasible = Yes ; 12: return ; 13: end if 14: Set θ = min cj yt a j y T a j for y T a j > 0, j / J. 15: Update y as y = y + θ y; 16: end while 110 / 183

. Advantages of Primal and dual algorithm Primal and dual algorithm ends if using anti-cycling rule. (Reason: the objective value y T b increases if there is no degeneracy.) Both RP and DRP do not explicitly rely on c. In fact, the information of c is represented in J. This leads to another advantage of primal and dual technique, i.e., RP is usually a purely combinatorial problem. Take ShortestPath as an example. RP corresponds to a connection problem. An optimal solution to DRP usually has combinatorial explanation, especially for graph-theory problems. More and more constraints become tight in the primal and dual process. Unlike dual simplex starting from a dual basic feasible solution, primal and dual method requires only a dual feasible solution. (See Lecture 10 for a primal and dual algorithm for MaximumFlow problem. ) 111 / 183

. ShortestPath: Dijkstra s algorithm is essentially Primal Dual. algorithm 112 / 183

. ShortestPath problem. INPUT: n cities, and a collection of roads. A road from city i to j has a distance d(i, j). Two specific cities: s and t.. OUTPUT: the shortest path from city s to t. s 5 u. 8 1 v 6 2 t 113 / 183

. ShorestPath problem: Primal problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: set variables for roads (Intuition: x i = 0/1 means whether edge i appears in the shortest path), and a constraint means that we enter a node through an edge and leaves it through another edge. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 = 0/1 114 / 183

. ShorestPath problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: relax the 0/1 integer linear program into linear program by the totally uni-modular property. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 0 x 1, x 2, x 3, x 4, x 5 1 115 / 183

. ShortestPath problem: Dual problem Dual problem: set variables y i for cities and variables z i for the constraints x i 1. max y s y t +z 1 + z 2 + z 3 + z 4 + z 5 s.t. y s y u 5 y s y v 8 y u y v 1 y t +y u 6 y t +y v 2 z i 0 i = 1,..., 5 116 / 183

. Dual of ShortestPath problem height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: set variables for cities. (Intuition: y i means the height of city i; thus, y s y t denotes the height difference between s and t, providing a lower bound of the shortest path length.) max y s y t s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y t + y u 6 x 4 : edge (u, t) y t + y v 2 x 5 : edge (v, t) 117 / 183

. A simplified version height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: simplify by setting y t = 0 (and remove the 2nd constraint in the primal problem P, accordingly) max y s s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y u 6 x 4 : edge (u, t) y v 2 x 5 : edge (v, t) 118 / 183

. Iteration 1 I Dual feasible solution: y T = (0, 0, 0). Let s check the constraints in D: y s y u < 5 x 1 = 0 y s y v < 8 x 2 = 0 y u y v < 1 x 3 = 0 y u < 6 x 4 = 0 y v < 2 x 5 = 0 Identifying tight constraints in D: J = Φ, implying that x 1, x 2, x 3, x 4, x 5 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 1, x 2, x 3, x 4, x 5 = 0 119 / 183

. Iteration 1 II DRP : max y s s.t. y s 1 y u 1 y v 1 Solve DRP using combinatorial technique: optimal solution y T = (1, 0, 0). Note: the optimal solution is not unique Step length θ: θ = min{ c1 yt a 1 y T a 1, c2 yt a 2 y T a 2 } = min{5, 8} = 5 Update y: y T = y T + θ y T = (5, 0, 0). 120 / 183

. Iteration 1 III height y s = 5 s 5 8 y u = y v = y t = 0.u. v 1 2 t 6 From the point of view of Dijkstra s algorithm: Optimal solution to DRP is y T = (1, 0, 0): the explored vertex set S = {s} in Dijkstra s algorithm. In fact, DRP is solved via identifying the nodes reachable from s. Step length θ = min{ c 1 y T a 1 y T a 1, c 2 y T a 2 y T a 2 } = min{5, 8} = 5: finding the closest vertex to the nodes in S via comparing all edges going out from S. 121 / 183

. Iteration 2 I Dual feasible solution: y T = (5, 0, 0). Let s check the constraints in D: y s y u = 5 y s y v < 8 x 2 = 0 y u y v < 1 x 3 = 0 y u < 6 x 4 = 0 y v < 2 x 5 = 0 Identifying tight constraints in D: J = {1}, implying that x 2, x 3, x 4, x 5 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 2, x 3, x 4, x 5 = 0 122 / 183

. Iteration 2 II DRP : max y s s.t. y s y u 0 y s, y u, y v 1 Solve DRP using combinatorial technique: optimal solution y T = (1, 1, 0). Note: the optimal solution is not unique Step length θ: θ = min{ c2 yt a 2 y T a 2, c3 yt a 3 y T a 3, c4 yt a 4 y T a 4 } = min{3, 1, 6} = 1 Update y: y T = y T + θ y T = (6, 1, 0). 123 / 183

. Iteration 2 III height y s = 6 s 5 8 y u = 1 u 6 1 y v = y t = 0. v t 2 From the point of view of Dijkstra s algorithm: Optimal solution to DRP is y T = (1, 1, 0): the explored vertex set S = {s, u} in Dijkstra s algorithm. In fact, DRP is solved via identifying the nodes reachable from s. Step length θ = min{ c2 yt a 2 y T a 2, c3 yt a 3 y T a 3, c4 yt a 4 y T a 4 } = min{3, 1, 6} = 1: finding the closest vertex to the nodes in S via comparing all edges going out from S. 124 / 183

. Iteration 3 I Dual feasible solution: y T = (6, 1, 0). Let s check the constraints in D: y s y u = 5 y s y v < 8 x 2 = 0 y u y v = 1 y u < 6 x 4 = 0 y v < 2 x 5 = 0 Identifying tight constraints in D: J = {1, 3}, implying that x 2, x 4, x 5 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 2, x 4, x 5 = 0 125 / 183

. Iteration 3 II DRP : max y s s.t. y s y u 0 y u y v 0 y s, y u, y v 1 Solve DRP using combinatorial technique: optimal solution y T = (1, 1, 1). Note: the optimal solution is not unique Step length θ: θ = min{ c4 yt a 4 y T a 4, c5 yt a 5 y T a 5 } = min{5, 2} = 2 Update y: y T = y T + θ y T = (8, 3, 2). 126 / 183

. Iteration 3 III height y s = 8 s 5 8 y u = 3 u y v = 2. 1 v 2 y t = 0 t 6 From the point of view of Dijkstra s algorithm: Optimal solution to DRP is y T = (1, 1, 1): the explored vertex set S = {s, u, v} in Dijkstra s algorithm. In fact, DRP is solved via identifying the nodes reachable from s. 127 / 183

. Iteration 3 IV Step length θ = min{ c4 yt a 4 y T a 4, c5 yt a 5 y T a 5 } = min{5, 2} = 2: finding the closest vertex to the nodes in S via comparing all edges going out from S. 128 / 183

. Iteration 4 I Dual feasible solution: y T = (8, 3, 2). Let s check the constraints in D: y s y u = 5 y s y v < 8 x 2 = 0 y u y v = 1 y u < 6 x 4 = 0 y v = 2 Identifying tight constraints in D: J = {1, 3, 5}, implying that x 2, x 4 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 2, x 4 = 0 129 / 183