Introduction to Linear and Combinatorial Optimization (ADM I)

Introduction to Linear and Combinatorial Optimization (ADM I) Rolf Möhring based on the 20011/12 course by Martin Skutella TU Berlin WS 2013/14 1

General Remarks new flavor of ADM I introduce linear and combinatorial optimization together to be continued in ADM II Discrete Optimization in (SoSe 2014) lectures (Rolf Möhring): Wednesday, 10:15 11:45, MA 041 Thursday, 16:15 17:45, MA 043 exercise session (Torsten Gellert): Friday, 14:15 15:45, MA 041 tutorial sessions (Fabian Wegscheider), t.b.a. on Friday homework: set of problems every week, sometimes programming exercises (details t.b.a.) final oral exam (Modulabschlussprüfung) in February or March alternatively: oral exam for both ADM I & II next summer 2

Outline 1 Introduction 2 Linear Programming Basics 3 The Geometry of Linear Programming 4 The Simplex Method 5 Duality Theory 6 Optimal Trees and Paths 7 Maximum Flow Problems 8 Minimum-Cost Flow Problems 9 NP-Completeness 3

Chapter 0: Introduction 3

Optimization Problems Generic optimization problem Given: set X, function f : X R Task: find x X maximizing (minimizing) f (x ), i. e., f (x ) f (x) (f (x ) f (x)) for all x X. An x with these properties is called optimal solution (optimum). Here, X is the set of feasible solutions, f is the objective function. Short form: maximize subject to f (x) x X or simply: max{f (x) x X }. Problem: Too general to say anything meaningful! 4

Convex Optimization Problems Definition 0.1. Let X R n and f : X R. a X is convex if for all x, y X and 0 λ 1 it holds that λ x + (1 λ) y X. b f is convex if for all x, y X and 0 λ 1 with λ x + (1 λ) y X it holds that λ f (x) + (1 λ) f (y) f (λ x + (1 λ) y). c If X and f are both convex, then min{f (x) x X } is a convex optimization problem. Note: f : X R is called concave if f is convex. 5

Local and Global Optimality Definition 0.2. Let X R n and f : X R. x X is a local optimum of the optimization problem min{f (x) x X } if there is an ε > 0 such that f (x ) f (x) for all x X with x x 2 ε. Theorem 0.3. For a convex optimization problem, every local optimum is a (global) optimum. Proof:... 6

Optimization Problems Considered in this Course: maximize subject to f (x) x X X R n polyhedron, f linear function linear optimization problem (in particular convex) X Z n integer points of a polyhedron, f linear function integer linear optimization problem X related to some combinatorial structure (e. g., graph) combinatorial optimization problem X finite (but usually huge) discrete optimization problem 7

Example: Shortest Path Problem Given: directed graph D = (V, A), weight function w : A R 0, start node s V, destination node t V. Task: find s-t-path of minimum weight in D. That is, X = {P A P is s-t-path in D} and f : X R is given by f (P) = a P w(a). Remark. Note that the finite set of feasible solutions X is only implicitly given by D. This holds for all interesting problems in combinatorial optimization! 8

Example: Minimum Spanning Tree (MST) Problem Given: undirected graph G = (V, E), weight function w : E R 0. Task: find connected subgraph of G containing all nodes in V with minimum total weight. That is, X = {E E E connects all nodes in V } and f : X R is given by f (E ) = e E w(e). Remarks. Notice that there always exists an optimal solution without cycles. A connected graph without cycles is called a tree. A subgraph of G containing all nodes in V is called spanning. 9

Example: Minimum Cost Flow Problem Given: directed graph D = (V, A), with arc capacities u : A R 0, arc costs c : A R, and node balances b : V R. Interpretation: nodes v V with b(v) > 0 (b(v) < 0) have supply (demand) and are called sources (sinks) the capacity u(a) of arc a A limits the amount of flow that can be sent through arc a. Task: find a flow x : A R 0 obeying capacities and satisfying all supplies and demands, that is, a δ + (v) x(a) 0 x(a) u(a) for all a A, x(a) = b(v) for all v V, a δ (v) such that x has minimum cost c(x) := a A c(a) x(a). 10

Example: Minimum Cost Flow Problem (cont.) Formulation as a linear program (LP): minimize subject to c(a) x(a) (0.1) a A a δ + (v) x(a) a δ (v) x(a) = b(v) for all v V, (0.2) x(a) u(a) for all a A, (0.3) x(a) 0 for all a A. (0.4) Objective function given by (0.1). Set of feasible solutions: X = {x R A x satisfies (0.2), (0.3), and (0.4)}. Notice that (0.1) is a linear function of x and (0.2) (0.4) are linear equations and linear inequalities, respectively. linear program 11

Example (cont.): Adding Fixed Cost Fixed costs w : A R 0. If arc a A shall be used (i. e., x(a) > 0), it must be bought at cost w(a). Add variables y(a) {0, 1} with y(a) = 1 if arc a is used, 0 otherwise. This leads to the following mixed-integer linear program (MIP): minimize c(a) x(a) + w(a) y(a) a A a A subject to x(a) x(a) = b(v) for all v V, a δ + (v) a δ (v) x(a) u(a) y(a) for all a A, x(a) 0 for all a A. y(a) {0, 1} for all a A. MIP: Linear program where some variables may only take integer values. 12

Example: Maximum Weighted Matching Problem Given: undirected graph G = (V, E), weight function w : E R. Task: find matching M E with maximum total weight. (M E is a matching if every node is incident to at most one edge in M.) Formulation as an integer linear program (IP): Variables: x e {0, 1} for e E with x e = 1 if and only if e M. maximize subject to w(e) x e e E e δ(v) x e 1 for all v V, x e {0, 1} for all e E. IP: Linear program where all variables may only take integer values. 13

Example: Traveling Salesperson Problem (TSP) Given: complete graph K n on n nodes, weight function w : E(K n ) R. Task: find a Hamiltonian circuit with minimum total weight. (A Hamiltonian circuit visits every node exactly once.) Formulation as an integer linear program? (later!) 14

Example: Minimum Node Coloring Problem Given: undirected graph G = (V, E). Task: color the nodes of G such that adjacent nodes get different colors. Use a minimum number of colors. Definition 0.4. A graph G = (V, E) whose nodes can be colored with two colors is called bipartite. 15

Example: Weighted Vertex Cover Problem Given: undirected graph G = (V, E), weight function w : V R 0. Task: find U V of minimum total weight such that every edge e E has at least one endpoint in U. Formulation as an integer linear program (IP): Variables: x v {0, 1} for v V with x v = 1 if and only if v U. minimize w(v) x v v V subject to x v + x v 1 for all e = {v, v } E, x v {0, 1} for all v V. 16

Typical Questions For a given optimization problem: How to find an optimal solution? How to find a feasible solution? Does there exist an optimal/feasible solution? How to prove that a computed solution is optimal? How difficult is the problem? Does there exist an efficient algorithm with small worst-case running time? How to formulate the problem as a (mixed integer) linear program? Is there a useful special structure of the problem? 17

Preliminary Outline of the Course linear programming and the simplex algorithm geometric interpretation of the simplex algorithm LP duality, complementary slackness efficient algorithms for minimum spanning trees, shortest paths efficient algorithms for maximum flows and minimum cost flows complexity theory 18

Literature on Linear Optimization (not complete) D. Bertsimas, J. N. Tsitsiklis, Introduction to Linear Optimization, Athena, 1997. V. Chvatal, Linear Programming, Freeman, 1983. G. B. Dantzig, Linear Programming and Extensions, Princeton University Press, 1998 (1963). M. Grötschel, L. Lovàsz, A. Schrijver, Geometric Algorithms and Combinatorial Optimization. Springer, 1988. J. Matousek, B. Gärtner, Using and Understanding Linear Programming, Springer, 2006. M. Padberg, Linear Optimization and Extensions, Springer, 1995. A. Schrijver, Theory of Linear and Integer Programming, Wiley, 1986. R. J. Vanderbei, Linear Programming, Springer, 2001. 19

Literature on Combinatorial Optimization (not complete) R. K. Ahuja, T. L. Magnanti, J. B. Orlin, Network Flows: Theory, Algorithms, and Applications, Prentice-Hall, 1993. W. J. Cook, W. H. Cunningham, W. R. Pulleyblank, A. Schrijver, Combinatorial Optimization, Wiley, 1998. L. R. Ford, D. R. Fulkerson, Flows in Networks, Princeton University Press, 1962. M. R. Garey, D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979. B. Korte, J. Vygen, Combinatorial Optimization: Theory and Algorithms, Springer, 2002. C. H. Papadimitriou, K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Dover Publications, reprint 1998. A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency, Springer, 2003. 20

Chapter 1: Linear Programming Basics (cp. Bertsimas & Tsitsiklis, Chapter 1) 21

Example of a Linear Program Remarks. minimize 2x 1 x 2 + 4x 3 subject to x 1 + x 2 + x 4 2 3x 2 x 3 = 5 x 3 + x 4 3 x 1 0 x 3 0 objective function is linear in vector of variables x = (x 1, x 2, x 3, x 4 ) T constraints are linear inequalities and linear equations last two constraints are special (non-negativity and non-positivity constraint, respectively) 22

General Linear Program minimize c T x subject to a i T x b i for i M 1, (1.1) a i T x = b i for i M 2, (1.2) a i T x b i for i M 3, (1.3) x j 0 for j N 1, (1.4) x j 0 for j N 2, (1.5) with c R n, a i R n and b i R for i M 1 M 2 M 3 (finite index sets), and N 1, N 2 {1,..., n} given. x R n satisfying constraints (1.1) (1.5) is a feasible solution. feasible solution x is optimal solution if c T x c T x for all feasible solutions x. linear program is unbounded if, for all k R, there is a feasible solution x R n with c T x k. 23

Special Forms of Linear Programs maximizing c T x is equivalent to minimizing ( c) T x. any linear program can be written in the form minimize subject to c T x A x b for some A R m n and b R m : rewrite ai T x = b i as: a i T x b i a i T x b i, rewrite ai T x b i as: ( a i ) T x b i. Linear program in standard form: min s.t. c T x A x = b x 0 with A R m n, b R m, and c R n. 24

Example: Diet Problem Given: n different foods, m different nutrients a ij := amount of nutrient i in one unit of food j b i := requirement of nutrient i in some ideal diet c j := cost of one unit of food j Task: find a cheapest ideal diet consisting of foods 1,..., n. LP formulation: Let x j := number of units of food j in the diet: min c T x min c T x s.t. A x = b or s.t. A x b x 0 x 0 with A = (a ij ) R m n, b = (b i ) R m, c = (c j ) R n. 25

Reduction to Standard Form Any linear program can be brought into standard form: elimination of free (unbounded) variables x j : replace x j with x + j, x j 0: x j = x + j x j elimination of non-positive variables x j : replace x j 0 with ( x j ) 0. elimination of inequality constraint a T i x b i : introduce slack variable s 0 and rewrite: a i T x + s = b i elimination of inequality constraint a i T x b i : introduce slack variable s 0 and rewrite: a i T x s = b i 26

Example The linear program min 2 x 1 + 4 x 2 s.t. x 1 + x 2 3 3 x 1 + 2 x 2 = 14 x 1 0 is equivalent to the standard form problem min 2 x 1 + 4 x 2 + 4 x2 s.t. x 1 + x 2 + x2 x 3 = 3 3 x 1 + 2 x 2 + 2 x2 = 14 x 1, x 2 +, x 2, x 3 0 27

Affine Linear and Convex Functions Lemma 1.1. a b An affine linear function f : R n R given by f (x) = c T x + d with c R n, d R, is both convex and concave. If f 1,..., f k : R n R are convex functions, then f : R n R defined by f (x) := max i=1,...,k f i (x) is also convex. Proof:... 28

Piecewise Linear Convex Objective Functions Let c 1,..., c k R n and d 1,..., d k R. Consider piecewise linear convex function: x max i=1,...,k c i T x + d i : min max c i T x + d i min z i=1,...,k s.t. A x b s.t. z c i T x + d i for all i A x b Example: let c 1,..., c n 0 n n n min c i x i min c i z i min c i (x + i + x i ) i=1 i=1 i=1 s.t. A x b s.t. z i x i s.t. A (x + x ) b z i x i x +, x 0 A x b 29

Graphical Representation and Solution 2D example: min x 1 x 2 s.t. x 1 + 2 x 2 3 2 x 1 + x 2 3 x 1, x 2 0 30

Graphical Representation and Solution (cont.) 3D example: min x 1 x 2 x 3 s.t. x 1 1 x 2 1 x 3 1 x 1, x 2, x 3 0 31

Graphical Representation and Solution (cont.) another 2D example: min c 1 x 1 + c 2 x 2 s.t. x 1 + x 2 1 x 1, x 2 0 for c = (1, 1) T, the unique optimal solution is x = (0, 0) T for c = (1, 0) T, the optimal solutions are exactly the points x = (0, x 2 ) T with 0 x 2 1 for c = (0, 1) T, the optimal solutions are exactly the points x = (x 1, 0) T with x 1 0 for c = ( 1, 1) T, the problem is unbounded, optimal cost is if we add the constraint x 1 + x 2 1, the problem is infeasible 32

Properties of the Set of Optimal Solutions In the last example, the following 5 cases occurred: i there is a unique optimal solution ii iii iv v there exist infinitely many optimal solutions, but the set of optimal solutions is bounded there exist infinitely many optimal solutions and the set of optimal solutions is unbounded the problem is unbounded, i. e., the optimal cost is and no feasible solution is optimal the problem is infeasible, i. e., the set of feasible solutions is empty These are indeed all cases that can occur in general (see later). 33

Visualizing LPs in Standard Form Example: Let A = (1, 1, 1) R 1 3, b = (1) R 1 and consider the set of feasible solutions P = {x R 3 A x = b, x 0}. More general: if A R m n with m n and the rows of A are linearly independent, then {x R n A x = b} is an (n m)-dimensional hyperplane in R n. set of feasible solutions lies in this hyperplane and is only constrained by non-negativity constraints x 0. 34

Chapter 2: The Geometry of Linear Programming (cp. Bertsimas & Tsitsiklis, Chapter 2) 35

Polyhedra and Polytopes Definition 2.1. Let A R m n and b R m. a b set {x R n A x b} is called polyhedron {x A x = b, x 0} is polyhedron in standard form representation Definition 2.2. a Set S R n is bounded if there is K R such that x K for all x S. b A bounded polyhedron is called polytope. 36

Hyperplanes and Halfspaces Definition 2.3. Let a R n \ {0} and b R: a b set {x R n a T x = b} is called hyperplane set {x R n a T x b} is called halfspace Remarks Hyperplanes and halfspaces are convex sets. A polyhedron is an intersection of finitely many halfspaces. 37

Convex Combination and Convex Hull Definition 2.4. Let x 1,..., x k R n and λ 1,..., λ k R 0 with λ 1 + + λ k = 1. a The vector k i=1 λ i x i is a convex combination of x 1,..., x k. b The convex hull of x 1,..., x k is the set of all convex combinations. 38

Convex Sets, Convex Combinations, and Convex Hulls Theorem 2.5. a The intersection of convex sets is convex. b c d Every polyhedron is a convex set. A convex combination of a finite number of elements of a convex set also belongs to that set. The convex hull of finitely many vectors is a convex set. Proof:... Corollary 2.6. The convex hull of x 1,..., x k R n is the smallest (w.r.t. inclusion) convex subset of R n containing x 1,..., x k. Proof:... 39

Extreme Points and Vertices of Polyhedra Definition 2.7. Let P R n be a polyhedron. a x P is an extreme point of P if x λ y + (1 λ) z for all y, z P \ {x}, 0 λ 1, b i. e., x is not a convex combination of two other points in P. x P is a vertex of P if there is some c R n such that c T x < c T y for all y P \ {x}, i. e., x is the unique optimal solution to the LP min{c T z z P}. 40

Active and Binding Constraints In the following, let P R n be a polyhedron defined by a T i x b i for i M 1, a T i x = b i for i M 2, with a i R n and b i R, for all i. Definition 2.8. If x R n satisfies a i T x = b i for some i, then the corresponding constraint is active (or binding) at x. 41

Basic Facts from Linear Algebra Theorem 2.9. Let x R n and I = {i a T i x = b i }. The following are equivalent: i there are n vectors in {a i i I } which are linearly independent; ii the vectors in {a i i I } span R n ; iii x is the unique solution to the system of equations a T i x = b i, i I. 42

Vertices, Extreme Points, and Basic Feasible Solutions Definition 2.10. a x R n is a basic solution of P if all equality constraints are active and there are n linearly independent constraints that are active. b A basic solution satisfying all constraints is a basic feasible solution. Theorem 2.11. For x P, the following are equivalent: i x is a vertex of P; ii x is an extreme point of P; iii x is a basic feasible solution of P. Proof:... 43

Number of Vertices Corollary 2.12. a b A polyhedron has a finite number of vertices and basic solutions. For a polyhedron in R n given by linear equations and m linear inequalities, this number is at most ( m n). Example: P := {x R n 0 x i 1, i = 1,..., n} (n-dimensional unit cube) number of constraints: m = 2n number of vertices: 2 n 44

Adjacent Basic Solutions and Edges Definition 2.13. Let P R n be a polyhedron. a b Two distinct basic solutions are adjacent if there are n 1 linearly independent constraints that are active at both of them. If both solutions are feasible, the line segment that joins them is an edge of P. 45

Polyhedra in Standard Form Let A R m n, b R m, and P = {x R n A x = b, x 0}. Observation One can assume without loss of generality that rank(a) = m. Theorem 2.14. x R n is a basic solution of P if and only if A x = b and there are indices B(1),..., B(m) {1,..., n} such that columns A B(1),..., A B(m) of matrix A are linearly independent and x i = 0 for all i {B(1),..., B(m)}. Proof:... x B(1),..., x B(m) are basic variables, the remaining variables non-basic. The vector of basic variables is denoted by x B := (x B(1),..., x B(m) ) T. A B(1),..., A B(m) are basic columns of A and form a basis of R m. The matrix B := (A B(1),..., A B(m) ) R m m is called basis matrix. 46

Basic Columns and Basic Solutions Observation 2.15. Let x R n be a basic solution, then: B x B = b and thus x B = B 1 b; x is a basic feasible solution if and only if x B = B 1 b 0. Example: m = 2 A 3 b A 1 A 4= A1 A 2 A 1, A 3 or A 2, A 3 form bases with corresp. basic feasible solutions. A 1, A 4 do not form a basis. A 1, A 2 and A 2, A 4 and A 3, A 4 form bases with infeasible basic solution. 47

Bases and Basic Solutions Corollary 2.16. Every basis A B(1),..., A B(m) determines a unique basic solution. Thus, different basic solutions correspond to different bases. But: two different bases might yield the same basic solution. Example: If b = 0, then x = 0 is the only basic solution. 48

Adjacent Bases Definition 2.17. Two bases A B(1),..., A B(m) and A B (1),..., A B (m) are adjacent if they share all but one column. Observation 2.18. a Two adjacent basic solutions can always be obtained from two adjacent bases. b If two adjacent bases lead to distinct basic solutions, then the latter are adjacent. 49

Degeneracy Definition 2.19. A basic solution x of a polyhedron P is degenerate if more than n constraints are active at x. Observation 2.20. Let P = {x R n A x = b, x 0} be a polyhedron in standard form with A R m n and b R m. a b A basic solution x P is degenerate if and only if more than n m components of x are zero. For a non-degenerate basic solution x P, there is a unique basis. 50

Three Different Reasons for Degeneracy i ii iii redundant variables Example: x 1 + x 2 = 1 x 3 = 0 x 1, x 2, x 3 0 redundant constraints Example: x 1 + 2 x 2 3 geometric reasons Example: Octahedron 2 x 1 + x 2 3 x 1 + x 2 2 x 1, x 2 0 ( ) 1 1 0 A = 0 0 1 Observation 2.21. Perturbing the right hand side vector b may remove degeneracy. 51

Existence of Extreme Points Definition 2.22. A polyhedron P R n contains a line if there is x P and a direction d R n \ {0} such that x + λ d P for all λ R. Theorem 2.23. Let P = {x R n A x b} with A R m n and b R m. The following are equivalent: i There exists an extreme point x P. ii iii P does not contain a line. A contains n linearly independent rows. Proof:... 52

Existence of Extreme Points (cont.) Corollary 2.24. a b A non-empty polytope contains an extreme point. A non-empty polyhedron in standard form contains an extreme point. Proof of b: A b A x = b A x b x 0 I 0 Example: x 1 P = x 2 R 3 x 1 + x 2 1 x x 1 + 2 x 2 0 3 1 0 contains a line since 1 + λ 0 P for all λ R. 0 1 53

Optimality of Extreme Points Theorem 2.25. Let P R n a polyhedron and c R n. If P has an extreme point and min{c T x x P} is bounded, there is an extreme point that is optimal. Proof:... Corollary 2.26. Every linear programming problem is either infeasible or unbounded or there exists an optimal solution. Proof: Every linear program is equivalent to an LP in standard form. The claim thus follows from Corollary 2.24 and Theorem 2.25. 54

Chapter 3: The Simplex Method (cp. Bertsimas & Tsitsiklis, Chapter 3) 55

Linear Program in Standard Form Throughout this chapter, we consider the following standard form problem: minimize subject to c T x A x = b x 0 with A R m n, rank(a) = m, b R m, and c R n. 56

Basic Directions Observation 3.1. Let B = (A B(1),..., A B(m) ) be a basis matrix. The values of the basic variables x B(1),..., x B(m) in the system A x = b are uniquely determined by the values of the non-basic variables. Proof: A x = b B x B + Definition 3.2. j B(1),...,B(m) x B = B 1 b A j x j = b j B(1),...,B(m) For fixed j B(1),..., B(m), let d R n be given by B 1 A j x j d j := 1, d B := B 1 A j, and d j := 0 for j j, B(1),..., B(m). Then A (x + θ d) = b, for all θ R, and d is the jth basic direction. 57

Feasible Directions Definition 3.3. Let P R n a polyhedron. For x P the vector d R n \ {0} is a feasible direction at x if there is a θ > 0 with x + θ d P. Example: Some feasible directions at several points of a polyhedron. 58

Feasible Directions Consider a basic feasible solution x. Question: Is the jth basic directions d a feasible direction? Case 1: If x is a non-degenerate feasible solution, then x B > 0 and x + θ d 0 for θ > 0 small enough. answer is yes! Case 2: If x is degenerate, the answer might be no! E. g., if x B(i) = 0 and d B(i) < 0, then x + θ d 0, for all θ > 0. Example: n = 5, m = 3, n m = 2 z y x1 = 0 x 3 = 0 x 2 = 0 x 4 = 0 x 5 = 0 1st basic direction at y (basic variables x 2, x 4, x 5 ) 3rd basic direction at z (basic variables x 1, x 2, x 4 ) 59

Reduced Cost Coefficients Consider a basic solution x. Question: How does the cost change when moving along the jth basic direction d? c T (x + θ d) = c T x + θ c T d = c T x + θ (c j c B T B 1 A j ) }{{} c j := Definition 3.4. For a given basic solution x, the reduced cost of variable x j, j = 1,..., n, is c j := c j c B T B 1 A j. Observation 3.5. The reduced cost of a basic variable x B(i) is zero. Proof: c B(i) = c B(i) c B T B 1 A B(i) }{{} = e i = c B(i) c B(i) = 0 60

Optimality Criterion Theorem 3.6. Let x be a basic feasible solution and c the vector of reduced costs. a If c 0, then x is an optimal solution. b If x is an optimal solution and non-degenerate, then c 0. Proof:... Definition 3.7. A basis matrix B is optimal if i B 1 b 0 and ii c T = c T c B T B 1 A 0. Observation 3.8. If B is an optimal basis, the associated basic solution x is feasible and optimal. 61

Developement of the Simplex Method Assumption (for now): only non-degenerate basic feasible solutions Let x be a basic feasible solution with c j < 0 for some j B(1),..., B(m). Let d be the jth basic direction: 0 > c j = c T d It is desirable to go to y := x + θ d with θ := max{θ x + θ d P}. Question: How to determine θ? By construction of d, it holds that A (x + θ d) = b for all θ R, i. e., x + θ d P x + θ d 0. Case 1: d 0 = x + θ d 0 for all θ 0 = θ = Thus, the LP is unbounded. ( Case 2: d k < 0 for some k = x k + θ d k 0 θ x ) k Thus, θ = x k min k: d k <0 d k = min i=1,...,m d B(i) <0 x B(i) d B(i) > 0. d k 62

Developement of the Simplex Method (cont.) Assumption (for now): only non-degenerate basic feasible solutions Let x be a basic feasible solution with c j < 0 for some j B(1),..., B(m). Let d be the jth basic direction: 0 > c j = c T d It is desirable to go to y := x + θ d with θ := max{θ x + θ d P}. θ = x k min = min k: d k <0 d k i=1,...,m d B(i) <0 x B(i) d B(i) Let l {1,..., m} with θ = x B(l) d B(l), then y j = θ and y B(l) = 0. = x j replaces x B(l) as a basic variable and we get a new basis matrix ) ) B = (A B(1),..., A B(l 1), A j, A B(l+1),..., A B(m) = (A B(1),..., A B(m) { B(i) if i l, with B(i) = j if i = l. 63

Core of the Simplex Method Theorem 3.9. Let x be a non-degenerate basic feasible solution, j B(1),..., B(m) with c j < 0, d the jth feasible direction, and θ := max{θ x + θ d P} <. a θ = min i=1,...,m d B(i) <0 x B(i) d B(i) = x B(l) d B(l) Let B(i) := B(i) for i l and B(l) := j. b c for some l {1,..., m}. A B(1),..., A B(m) are linearly independent and B is a basis matrix. y := x + θ d is a basic feasible solution associated with B and c T y < c T x. Proof:... 64

An Iteration of the Simplex Method Given: basis B = (A B(1)... A B(m) ), corresponding basic feasible solution x 1 Let c T := c T c B T B 1 A. If c 0, then STOP; else choose j with c j < 0. 2 Let u := B 1 A j. If u 0, then STOP (optimal cost is ). 3 Let θ x B(i) := min i:u i >0 u i = x B(l) u l for some l {1,..., m}. 4 Form new basis by replacing A B(l) with A j ; corresponding basic feasible solution y is given by y j := θ and y B(i) = x B(i) θ u i for i l. Remark: We say that the nonbasic variable x j enters the basis and the basic variable x B(l) leaves the basis. 65

Correctness of the Simplex Method Theorem 3.10. If every basic feasible solution is non-degenerate, the simplex method terminates after finitely many iterations in one of the following two states: i we have an optimal basis B and an associated basic feasible solution x which is optimal; ii we have a vector d satisfying A d = 0, d 0, and c T d < 0; the optimal cost is. Proof sketch: The simplex method makes progress in every iteration. Since there are only finitely many different basic feasible solutions, it stops after a finite number of iterations. 66

Simplex Method for Degenerate Problems An iteration of the simplex method can also be applied if x is a degenerate basic feasible solution. In this case it might happen that θ x B(i) := min i:u i >0 u i some basic variable x B(l) is zero and d B(l) < 0. = x B(l) u l = 0 if Thus, y = x + θ d = x and the current basic feasible solution does not change. But replacing A B(l) with A j still yields a new basis with associated basic feasible solution y = x. Remark: Even if θ is positive, more than one of the original basic variables may become zero at the new point x + θ d. Since only one of them leaves the basis, the new basic feasible solution y is degenerate. 67

Example c x 4 = 0 x 3 = 0 x x 6 = 0 x5 = 0 y x2 = 0 x 1 = 0 68

Pivot Selection Question: How to choose j with c j < 0 and l with x B(l) if several possible choices exist? u l x B(i) = min i:u i >0 u i Attention: Choice of j is critical for overall behavior of simplex method. Three popular choices are: smallest subscript rule: choose smallest j with c j < 0. (very simple; no need to compute entire vector c; usually leads to many iterations) steepest descent rule: choose j such that c j < 0 is minimal. (relatively simple; commonly used for mid-size problems; does not necessarily yield the best neighboring solution) best improvement rule: choose j such that θ c j is minimal. (computationally expensive; used for large problems; usually leads to very few iterations) 69

Revised Simplex Method Observation 3.11. To execute one iteration of the simplex method efficiently, it suffices to know B(1),..., B(m), the inverse B 1 of the basis matrix and the input data A, b, and c. It is then easy to compute: x B = B 1 b u = B 1 A j c T = c T c T B B 1 A θ x B(i) = min = x B(l) i:u i >0 u i u l The new basis matrix is then B = ( A B(1),..., A B(l 1), A j, A B(l+1),..., A B(m) ) Critical question: How to obtain B 1 efficiently? 70

Computing the Inverse of the Basis Matrix Notice that B 1 B = (e 1,..., e l 1, u, e l+1,..., e m ). Thus, B 1 can be obtained from B 1 as follows: multiply lth row of B 1 with 1/u l ; for i l, subtract ui times resulting lth row from ith row. These are exactly the elementary row operations needed to turn B 1 B into the identity matrix! Elementary row operations are the same as multiplying the matrix with corresponding elementary matrices from the left hand side. Equivalently: Obtaining B 1 from B 1 Apply elementary row operations to the matrix (B 1 u) to make the last column equal to the unit vector e l. The first m columns of the resulting matrix form the inverse B 1 of the new basis matrix B. 71

An Iteration of the Revised Simplex Method Given: B = ( A B(1),..., A B(m) ), corresp. basic feasible sol. x, and B 1. 1 Let p T := c B T B 1 and c j := c j p T A j, j B(1),..., B(m); if c 0, then STOP; else choose j with c j < 0. 2 Let u := B 1 A j. If u 0, then STOP (optimal cost is ). 3 Let θ x B(i) := min i:u i >0 u i = x B(l) u l for some l {1,..., m}. 4 Form new basis by replacing A B(l) with A j ; corresponding basic feasible solution y is given by y j := θ and y B(i) = x B(i) θ u i for i l. 5 Apply elementary row operations to the matrix (B 1 u) to make the last column equal to the unit vector e l. The first m columns of the resulting matrix yield B 1. 72

Full Tableau Implementation Main idea Instead of maintaining and updating the matrix B 1, we maintain and update the m (n + 1)-matrix which is called simplex tableau. B 1 (b A) = (B 1 b B 1 A) The zeroth column B 1 b contains x B. For i = 1,..., n, the ith column of the tableau is B 1 A i. The column u = B 1 A j corresponding to the variable x j that is about to enter the basis is the pivot column. If the lth basic variable x B(l) exits the basis, the lth row of the tableau is the pivot row. The element u l > 0 is the pivot element. 73

Full Tableau Implementation (cont.) Notice: The simplex tableau B 1 (b A) represents the linear equation B 1 b = B 1 A x which is equivalent to A x = b. Updating the simplex tableau At the end of an iteration, the simplex tableau B 1 (b A) has to be updated to B 1 (b A). B 1 can be obtained from B 1 by elementary row operations, i. e., B 1 = Q B 1 where Q is a product of elementary matrices. Thus, B 1 (b A) = Q B 1 (b A) and new tableau B 1 (b A) can be obtained by applying the same elementary row operations. 74

Zeroth Row of the Simplex Tableau In order to keep track of the objective function value and the reduced costs, we consider the following augmented simplex tableau: c B T B 1 b B 1 b c T c B T B 1 A B 1 A or in more detail c T B x B c 1 c n x B(1). B 1 A 1 B 1 A n x B(m) Update after one iteration The zeroth row is updated by adding a multiple of the pivot row to the zeroth row to set the reduced cost of the entering variable to zero. 75

An Iteration of the Full Tableau Implementation Given: Simplex tableau corresp. to feasible basis B = ( A B(1),..., A B(m) ). 1 If c 0 (zeroth row), then STOP; else choose pivot column j with c j < 0. 2 If u = B 1 A j 0 (jth column), STOP (optimal cost is ). 3 Let θ x B(i) := min = x B(l) i:u i >0 u i u l for some l {1,..., m} (cp. columns 0 and j). 4 Form new basis by replacing A B(l) with A j. 5 Apply elementary row operations to the simplex tableau so that u l (pivot element) becomes one and all other entries of the pivot column become zero. 76

Full Tableau Implementation: An Example A simple linear programming problem: min 10 x 1 12 x 2 12 x 3 s.t. x 1 + 2 x 2 + 2 x 3 20 2 x 1 + x 2 + 2 x 3 20 2 x 1 + 2 x 2 + x 3 20 x 1, x 2, x 3 0 77

Set of Feasible Solutions x 2 C A = (0, 0, 0) T B = (0, 0, 10) T C = (0, 10, 0) T D = (10, 0, 0) T E = (4, 4, 4) T x 1 A E D B x 3 78

Introducing Slack Variables min 10 x 1 12 x 2 12 x 3 s.t. x 1 + 2 x 2 + 2 x 3 20 2 x 1 + x 2 + 2 x 3 20 2 x 1 + 2 x 2 + x 3 20 x 1, x 2, x 3 0 LP in standard form min 10 x 1 12 x 2 12 x 3 s.t. x 1 + 2 x 2 + 2 x 3 + x 4 = 20 2 x 1 + x 2 + 2 x 3 + x 5 = 20 2 x 1 + 2 x 2 + x 3 + x 6 = 20 x 1,..., x 6 0 Observation The right hand side of the system is non-negative. Therefore the point (0, 0, 0, 20, 20, 20) T is a basic feasible solution and we can start the simplex method with basis B(1) = 4, B(2) = 5, B(3) = 6. 79

Setting Up the Simplex Tableau x 1 x 2 x 3 x 4 x 5 x 6 0 10 12 12 0 0 0 x 4 = 20 1 2 2 1 0 0 x 5 = 20 2 1 2 0 1 0 x 6 = 20 2 2 1 0 0 1 Determine pivot column (e. g., take smallest subscript rule). 80

Setting Up the Simplex Tableau x x 1 x 2 x 3 x 4 x 5 x B(i) 6 u i 0 10 12 12 0 0 0 x 4 = 20 1 2 2 1 0 0 20 x 5 = 20 2 1 2 0 1 0 10 x 6 = 20 2 2 1 0 0 1 10 Determine pivot column (e. g., take smallest subscript rule). c 1 < 0 and x 1 enters the basis. Find pivot row with u i > 0 minimizing x B(i) u i. Rows 2 and 3 both attain the minimum. 80

Setting Up the Simplex Tableau x 1 x 2 x 3 x 4 x 5 x 6 0 10 12 12 0 0 0 x 4 = 20 1 2 2 1 0 0 x 5 = 20 2 1 2 0 1 0 x 6 = 20 2 2 1 0 0 1 Determine pivot column (e. g., take smallest subscript rule). c 1 < 0 and x 1 enters the basis. Find pivot row with u i > 0 minimizing x B(i) u i. Rows 2 and 3 both attain the minimum. Choose i = 2 with B(i) = 5. = x 5 leaves the basis. Perform basis change: Eliminate other entries in the pivot column. 80

Setting Up the Simplex Tableau x 1 x 2 x 3 x 4 x 5 x 6 100 0 7 2 0 5 0 x 4 = 20 1 2 2 1 0 0 x 5 = 20 2 1 2 0 1 0 x 6 = 20 2 2 1 0 0 1 Determine pivot column (e. g., take smallest subscript rule). c 1 < 0 and x 1 enters the basis. Find pivot row with u i > 0 minimizing x B(i) u i. Rows 2 and 3 both attain the minimum. Choose i = 2 with B(i) = 5. = x 5 leaves the basis. Perform basis change: Eliminate other entries in the pivot column. 80

Setting Up the Simplex Tableau x 1 x 2 x 3 x 4 x 5 x 6 100 0 7 2 0 5 0 x 4 = 10 0 1.5 1 1 0.5 0 x 5 = 20 2 1 2 0 1 0 x 6 = 20 2 2 1 0 0 1 Determine pivot column (e. g., take smallest subscript rule). c 1 < 0 and x 1 enters the basis. Find pivot row with u i > 0 minimizing x B(i) u i. Rows 2 and 3 both attain the minimum. Choose i = 2 with B(i) = 5. = x 5 leaves the basis. Perform basis change: Eliminate other entries in the pivot column. 80

Setting Up the Simplex Tableau x 1 x 2 x 3 x 4 x 5 x 6 100 0 7 2 0 5 0 x 4 = 10 0 1.5 1 1 0.5 0 x 5 = 20 2 1 2 0 1 0 x 6 = 0 0 1 1 0 1 1 Determine pivot column (e. g., take smallest subscript rule). c 1 < 0 and x 1 enters the basis. Find pivot row with u i > 0 minimizing x B(i) u i. Rows 2 and 3 both attain the minimum. Choose i = 2 with B(i) = 5. = x 5 leaves the basis. Perform basis change: Eliminate other entries in the pivot column. 80

Setting Up the Simplex Tableau x 1 x 2 x 3 x 4 x 5 x 6 100 0 7 2 0 5 0 x 4 = 10 0 1.5 1 1 0.5 0 x 1 = 10 1 0.5 1 0 0.5 0 x 6 = 0 0 1 1 0 1 1 Determine pivot column (e. g., take smallest subscript rule). c 1 < 0 and x 1 enters the basis. Find pivot row with u i > 0 minimizing x B(i) u i. Rows 2 and 3 both attain the minimum. Choose i = 2 with B(i) = 5. = x 5 leaves the basis. Perform basis change: Eliminate other entries in the pivot column. 80

Geometric Interpretation in the Original Polyhedron x 2 C A = (0, 0, 0) T B = (0, 0, 10) T C = (0, 10, 0) T D = (10, 0, 0) T E = (4, 4, 4) T x 1 A E D B x 3 81

Next Iterations x 1 x 2 x 3 x 4 x 5 x 6 100 0 7 2 0 5 0 x 4 = 10 0 1.5 1 1 0.5 0 x 1 = 10 1 0.5 1 0 0.5 0 x 6 = 0 0 1 1 0 1 1 c 2, c 3 < 0 = two possible choices for pivot column. 82

Next Iterations x x 1 x 2 x 3 x 4 x 5 x B(i) 6 u i 100 0 7 2 0 5 0 x 4 = 10 0 1.5 1 1 0.5 0 10 x 1 = 10 1 0.5 1 0 0.5 0 10 x 6 = 0 0 1 1 0 1 1 c 2, c 3 < 0 = two possible choices for pivot column. Choose x 3 to enter the new basis. u 3 < 0 = third row cannot be chosen as pivot row. Choose x 4 to leave basis. 82

Next Iterations x 1 x 2 x 3 x 4 x 5 x 6 100 0 7 2 0 5 0 x 4 = 10 0 1.5 1 1 0.5 0 x 1 = 10 1 0.5 1 0 0.5 0 x 6 = 0 0 1 1 0 1 1 c 2, c 3 < 0 = two possible choices for pivot column. Choose x 3 to enter the new basis. u 3 < 0 = third row cannot be chosen as pivot row. Choose x 4 to leave basis. 82

Next Iterations x 1 x 2 x 3 x 4 x 5 x 6 120 0 4 0 2 4 0 x 4 = 10 0 1.5 1 1 0.5 0 x 1 = 10 1 0.5 1 0 0.5 0 x 6 = 0 0 1 1 0 1 1 c 2, c 3 < 0 = two possible choices for pivot column. Choose x 3 to enter the new basis. u 3 < 0 = third row cannot be chosen as pivot row. Choose x 4 to leave basis. 82

Next Iterations x 1 x 2 x 3 x 4 x 5 x 6 120 0 4 0 2 4 0 x 4 = 10 0 1.5 1 1 0.5 0 x 1 = 0 1 1 0 1 1 0 x 6 = 0 0 1 1 0 1 1 c 2, c 3 < 0 = two possible choices for pivot column. Choose x 3 to enter the new basis. u 3 < 0 = third row cannot be chosen as pivot row. Choose x 4 to leave basis. 82

Next Iterations x 1 x 2 x 3 x 4 x 5 x 6 120 0 4 0 2 4 0 x 4 = 10 0 1.5 1 1 0.5 0 x 1 = 0 1 1 0 1 1 0 x 6 = 10 0 2.5 0 1 1.5 1 c 2, c 3 < 0 = two possible choices for pivot column. Choose x 3 to enter the new basis. u 3 < 0 = third row cannot be chosen as pivot row. Choose x 4 to leave basis. 82

Next Iterations x 1 x 2 x 3 x 4 x 5 x 6 120 0 4 0 2 4 0 x 3 = 10 0 1.5 1 1 0.5 0 x 1 = 0 1 1 0 1 1 0 x 6 = 10 0 2.5 0 1 1.5 1 c 2, c 3 < 0 = two possible choices for pivot column. Choose x 3 to enter the new basis. u 3 < 0 = third row cannot be chosen as pivot row. Choose x 4 to leave basis. New basic feasible solution (0, 0, 10, 0, 0, 10) T with cost -120, corresponding to point B in the original polyhedron. 82

Geometric Interpretation in the Original Polyhedron x 2 C A = (0, 0, 0) T B = (0, 0, 10) T C = (0, 10, 0) T D = (10, 0, 0) T E = (4, 4, 4) T x 1 A E D B x 3 83

Next Iterations x x 1 x 2 x 3 x 4 x 5 x B(i) 6 u i 120 0 4 0 2 4 0 20 x 3 = 10 0 1.5 1 1 0.5 0 3 x 1 = 0 1 1 0 1 1 0 x 6 = 10 0 2.5 0 1 1.5 1 4 Thus (4, 4, 4, 0, 0, 0) is an optimal solution with cost -136, corresponding to point E = (4, 4, 4) in the original polyhedron. 84

Next Iterations x x 1 x 2 x 3 x 4 x 5 x B(i) 6 u i 120 0 4 0 2 4 0 20 x 3 = 10 0 1.5 1 1 0.5 0 3 x 1 = 0 1 1 0 1 1 0 x 6 = 10 0 2.5 0 1 1.5 1 4 < 20 3 Thus (4, 4, 4, 0, 0, 0) is an optimal solution with cost -136, corresponding to point E = (4, 4, 4) in the original polyhedron. 84

Next Iterations x 1 x 2 x 3 x 4 x 5 x 6 120 0 4 0 2 4 0 x 3 = 10 0 1.5 1 1 0.5 0 x 1 = 0 1 1 0 1 1 0 x 6 = 10 0 2.5 0 1 1.5 1 x 2 enters the basis, x 6 leaves it. We get x 1 x 2 x 3 x 4 x 5 x 6 136 0 0 0 3.6 1.6 1.6 x 3 = 4 0 0 1 0.4 0.4 0.6 x 1 = 4 1 0 0 0.6 0.4 0.4 x 2 = 4 0 1 0 0.4 0.6 0.4 and the reduced costs are all non-negative. Thus (4, 4, 4, 0, 0, 0) is an optimal solution with cost -136, corresponding to point E = (4, 4, 4) in the original polyhedron. 84

All Iterations from Geometric Point of View x 2 C A = (0, 0, 0) T B = (0, 0, 10) T C = (0, 10, 0) T D = (10, 0, 0) T E = (4, 4, 4) T x 1 A E D B x 3 85

Comparison of Full Tableau and Revised Simplex Methods The following table gives the computational cost of one iteration of the simplex method for the two variants introduced above. full tableau revised simplex memory O(mn) O(m 2 ) worst-case time O(mn) O(mn) best-case time O(mn) O(m 2 ) Conclusion For implementation purposes, the revised simplex method is clearly preferable due to its smaller memory requirement and smaller average running time. The full tableau method is convenient for solving small LP instances by hand since all necessary information is readily available. 86

Practical Performance Enhancements Numerical stability The most critical issue when implementing the (revised) simplex method is numerical stability. In order to deal with this, a number of additional ideas from numerical linear algebra are needed. Every update of B 1 introduces roundoff or truncation errors which accumulate and might eventually lead to highly inaccurate results. Solution: Compute the matrix B 1 from scratch once in a while. Instead of computing B 1 explicitly, it can be stored as a product of matrices Q k Q k 1... Q 1 where each matrix Q i can be specified in terms of m coefficients. Then B 1 = Q k+1 B 1 = Q k+1... Q 1. This might also save space. Instead of computing B 1 explicitly, compute and store an LR-decomposition. 87

Cycling Problem: If an LP is degenerate, the simplex method might end up in an infinite loop (cycling). Example: x 1 x 2 x 3 x 4 x 5 x 6 x 7 3 3/4 20 1/2 6 0 0 0 x 5 = 0 1/4 8 1 9 1 0 0 x 6 = 0 1/2 12 1/2 3 0 1 0 x 7 = 1 0 0 1 0 0 0 1 Pivoting rules Column selection: let nonbasic variable with most negative reduced cost c j enter the basis, i. e., steepest descent rule. Row selection: among basic variables that are eligible to exit the basis, select the one with smallest subscript. 88

Iteration 1 x x 1 x 2 x 3 x 4 x 5 x 6 x B(i) 7 u i 3 3/4 20 1/2 6 0 0 0 x 5 = 0 1/4 8 1 9 1 0 0 0 x 6 = 0 1/2 12 1/2 3 0 1 0 0 x 7 = 1 0 0 1 0 0 0 1 Basis change: x 1 enters the basis x 5 leaves. Bases visited (5, 6, 7) 89

Iteration 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 3 3/4 20 1/2 6 0 0 0 x 5 = 0 1/4 8 1 9 1 0 0 x 6 = 0 1/2 12 1/2 3 0 1 0 x 7 = 1 0 0 1 0 0 0 1 Basis change: x 1 enters the basis x 5 leaves. Bases visited (5, 6, 7) 89

Iteration 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 3 0 4 7/2 33 3 0 0 x 5 = 0 1/4 8 1 9 1 0 0 x 6 = 0 1/2 12 1/2 3 0 1 0 x 7 = 1 0 0 1 0 0 0 1 Basis change: x 1 enters the basis x 5 leaves. Bases visited (5, 6, 7) 89

Iteration 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 3 0 4 7/2 33 3 0 0 x 5 = 0 1/4 8 1 9 1 0 0 x 6 = 0 0 4 3/2 15 2 1 0 x 7 = 1 0 0 1 0 0 0 1 Basis change: x 1 enters the basis x 5 leaves. Bases visited (5, 6, 7) 89

Iteration 1 x 1 x 2 x 3 x 4 x 5 x 6 x 7 3 0 4 7/2 33 3 0 0 x 1 = 0 1 32 4 36 4 0 0 x 6 = 0 0 4 3/2 15 2 1 0 x 7 = 1 0 0 1 0 0 0 1 Basis change: x 1 enters the basis x 5 leaves. Bases visited (5, 6, 7) 89

Iteration 2 x x 1 x 2 x 3 x 4 x 5 x 6 x B(i) 7 u i 3 0 4 7/2 33 3 0 0 x 1 = 0 1 32 4 36 4 0 0 x 6 = 0 0 4 3/2 15 2 1 0 0 x 7 = 1 0 0 1 0 0 0 1 Basis change: x 2 enters the basis x 6 leaves. Bases visited (5, 6, 7) (1, 6, 7) 90

Iteration 3 x x 1 x 2 x 3 x 4 x 5 x 6 x B(i) 7 u i 3 0 0 2 18 1 1 0 x 1 = 0 1 0 8 156 12 4 0 0 x 2 = 0 0 1 3/8 15/4 1/2 1/4 0 0 x 7 = 1 0 0 1 0 0 0 1 1 Basis change: x 3 enters the basis x 1 leaves. Bases visited (5, 6, 7) (1, 6, 7) (1, 2, 7) 91

Iteration 4 x x 1 x 2 x 3 x 4 x 5 x 6 x B(i) 7 u i 3 1/4 0 0 3 2 3 0 x 3 = 0 1/8 0 1 21/2 3/2 1 0 x 2 = 0 3/64 1 0 3/16 1/16 1/8 0 0 x 7 = 1 1/8 0 0 21/2 3/2 1 1 2/21 Basis change: x 4 enters the basis x 2 leaves. Bases visited (5, 6, 7) (1, 6, 7) (1, 2, 7) (3, 2, 7) 92

Iteration 5 x x 1 x 2 x 3 x 4 x 5 x 6 x B(i) 7 u i 3 1/2 16 0 0 1 1 0 x 3 = 0 5/2 56 1 0 2 6 0 0 x 4 = 0 1/4 16/3 0 1 1/3 2/3 0 0 x 7 = 1 5/2 56 0 0 2 6 1 Basis change: x 5 enters the basis x 3 leaves. Bases visited (5, 6, 7) (1, 6, 7) (1, 2, 7) (3, 2, 7) (3, 4, 7) Observation After 4 pivoting iterations our basic feasible solution still has not changed. 93

Iteration 6 x x 1 x 2 x 3 x 4 x 5 x 6 x B(i) 7 u i 3 7/4 44 1/2 0 0 2 0 x 1 = 0 5/4 28 1/2 0 1 3 0 x 2 = 0 1/6 4 1/6 1 0 1/3 0 0 x 7 = 1 0 0 1 0 0 0 1 Basis change: x 6 enters the basis x 4 leaves. Bases visited (5, 6, 7) (1, 6, 7) (1, 2, 7) (3, 2, 7) (3, 4, 7) (5, 4, 7) 94

Back at the Beginning x 1 x 2 x 3 x 4 x 5 x 6 x 7 3 3/4 20 1/2 6 0 0 0 x 5 = 0 1/4 8 1 9 1 0 0 x 6 = 0 1/2 12 1/2 3 0 1 0 x 7 = 1 0 0 1 0 0 0 1 Bases visited (5, 6, 7) (1, 6, 7) (1, 2, 7) (3, 2, 7) (3, 4, 7) (5, 4, 7) (5, 6, 7) This is the same basis that we started with. Conclusion Continuing with the pivoting rules we agreed on at the beginning, the simplex method will never terminate in this example. 95

Anticycling We discuss two pivoting rules that are guaranteed to avoid cycling: lexicographic rule Bland s rule 96

Lexicographic Order Definition 3.12. A vector u R n is lexicographically positive (negative) if u 0 and the first nonzero entry of u is positive (negative). Symbolically, we write u L > 0 (resp. u L < 0). A vector u R n is lexicographically larger (smaller) than a vector v R n if u v and u v L > 0 (resp. u v L < 0). We write u L > v (resp. u L < v). Example: (0, 2, 3, 0) T L > (0, 2, 1, 4) T (0, 4, 5, 0) T L < (1, 2, 1, 2) T 97

Lexicographic Pivoting Rule Lexicographic pivoting rule in the full tableau implementation: Lexicographic pivoting rule 1 Choose an arbitrary column A j with c j < 0 to enter the basis. Let u := B 1 A j be the jth column of the tableau. 2 For each i with u i > 0, divide the ith row of the tableau by u i and choose the lexicographically smallest row l. Then the lth basic variable x B(l) exits the basis. Remark The lexicographic pivoting rule always leads to a unique choice for the exiting variable. Otherwise two rows of B 1 A would have to be linearly dependent which contradicts our assumption on the matrix A. 98

Lexicographic Pivoting Rule (cont.) Theorem 3.13. Suppose that the simplex algorithm starts with lexicographically positive rows 1,..., m in the simplex tableau. Suppose that the lexicographic pivoting rule is followed. Then: a Rows 1,..., m of the simplex tableau remain lexicographically positive throughout the algorithm. b c The zeroth row strictly increases lexicographically at each iteration. The simplex algorithm terminates after a finite number of iterations. Proof:... 99

Remarks on Lexicographic Pivoting Rule The lexicographic pivoting rule was derived by considering s small perturbation of the right hand side vector b leading to a non-degenerate problem (see exercises). The lexicographic pivoting rule can also be used in conjunction with the revised simplex method, provided that B 1 is computed explicitly (this is not the case in sophisticated implementations). The assumption in the theorem on the lexicographically positive rows in the tableau can be made without loss of generality: Rearrange the columns of A such that the basic columns (forming the identity matrix in the tableau) come first. Since the zeroth column is nonnegative for a basic feasible solution, all rows are lexicographically positive. 100

Bland s Rule Smallest subscript pivoting rule (Bland s rule) 1 Choose the column A j with c j < 0 and j minimal to enter the basis. 2 Among all basic variables x i that could exit the basis, select the one with smallest i. Theorem (without proof) The simplex algorithm with Bland s rule terminates after a finite number of iterations. Remark Bland s rule is compatible with an implementation of the revised simplex method in which the reduced costs of the nonbasic variables are computed one at a time, in the natural order, until a negative one is discovered. 101

Finding an Initial Basic Feasible Solution So far we always assumed that the simplex algorithm starts with a basic feasible solution. We now discuss how such a solution can be obtained. Introducing artificial variables The two-phase simplex method The big-m method 102

Introducing Artificial Variables Example: min x 1 + x 2 + x 3 s.t. x 1 + 2 x 2 + 3 x 3 = 3 x 1 + 2 x 2 + 6 x 3 = 2 4 x 2 + 9 x 3 = 5 3 x 3 + x 4 = 1 x 1,..., x 4 0 Auxiliary problem with artificial variables: min x 5 + x 6 + x 7 + x 8 s.t. x 1 +2 x 2 +3 x 3 x 5 = 3 x 1 +2 x 2 +6 x 3 + x 6 = 2 4 x 2 +9 x 3 + x 7 = 5 3 x 3 +x 4 + x 8 = 1 x 1,..., x 4, x 5,..., x 8 0 103

Auxiliary Problem Auxiliary problem with artificial variables: min x 5 +x 6 +x 7 +x 8 s.t. x 1 +2 x 2 +3 x 3 x 5 = 3 x 1 +2 x 2 +6 x 3 +x 6 = 2 4 x 2 +9 x 3 +x 7 = 5 3 x 3 +x 4 +x 8 = 1 x 1,..., x 4, x 5,..., x 8 0 Observation x = (0, 0, 0, 0, 3, 2, 5, 1) is a basic feasible solution for this problem with basic variables (x 5, x 6, x 7, x 8 ). We can form the initial tableau. 104

Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 0 0 0 0 0 1 1 1 1 x 5 = 3 1 2 3 0 1 0 0 0 x 6 = 2 1 2 6 0 0 1 0 0 x 7 = 5 0 4 9 0 0 0 1 0 x 8 = 1 0 0 3 1 0 0 0 1 Calculate reduced costs by eliminating the nonzero-entries for the basis-variables. Now we can proceed as seen before... 105

Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 3 1 2 3 0 0 1 1 1 x 5 = 3 1 2 3 0 1 0 0 0 x 6 = 2 1 2 6 0 0 1 0 0 x 7 = 5 0 4 9 0 0 0 1 0 x 8 = 1 0 0 3 1 0 0 0 1 Calculate reduced costs by eliminating the nonzero-entries for the basis-variables. Now we can proceed as seen before... 105

Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 5 0 4 9 0 0 0 1 1 x 5 = 3 1 2 3 0 1 0 0 0 x 6 = 2 1 2 6 0 0 1 0 0 x 7 = 5 0 4 9 0 0 0 1 0 x 8 = 1 0 0 3 1 0 0 0 1 Calculate reduced costs by eliminating the nonzero-entries for the basis-variables. Now we can proceed as seen before... 105

Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 10 0 8 18 0 0 0 0 1 x 5 = 3 1 2 3 0 1 0 0 0 x 6 = 2 1 2 6 0 0 1 0 0 x 7 = 5 0 4 9 0 0 0 1 0 x 8 = 1 0 0 3 1 0 0 0 1 Calculate reduced costs by eliminating the nonzero-entries for the basis-variables. Now we can proceed as seen before... 105

Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 11 0 8 21 1 0 0 0 0 x 5 = 3 1 2 3 0 1 0 0 0 x 6 = 2 1 2 6 0 0 1 0 0 x 7 = 5 0 4 9 0 0 0 1 0 x 8 = 1 0 0 3 1 0 0 0 1 Calculate reduced costs by eliminating the nonzero-entries for the basis-variables. Now we can proceed as seen before... 105

Minimizing the Auxiliary Problem x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 11 0 8 21 1 0 0 0 0 x 5 = 3 1 2 3 0 1 0 0 0 x 6 = 2 1 2 6 0 0 1 0 0 x 7 = 5 0 4 9 0 0 0 1 0 x 8 = 1 0 0 3 1 0 0 0 1 Basis change: x 4 enters the basis, x 8 exits. 106

Minimizing the Auxiliary Problem x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 10 0 8 18 0 0 0 0 1 x 5 = 3 1 2 3 0 1 0 0 0 x 6 = 2 1 2 6 0 0 1 0 0 x 7 = 5 0 4 9 0 0 0 1 0 x 4 = 1 0 0 3 1 0 0 0 1 Basis change: x 3 enters the basis, x 4 exits. 107

Minimizing the Auxiliary Problem x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 4 0 8 0 6 0 0 0 7 x 5 = 2 1 2 0 1 1 0 0 1 x 6 = 0 1 2 0 2 0 1 0 2 x 7 = 2 0 4 0 3 0 0 1 3 x 3 = 1/3 0 0 1 1/3 0 0 0 1/3 Basis change: x 2 enters the basis, x 6 exits. 108

Minimizing the Auxiliary Problem x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 4 4 0 0 2 0 4 0 1 x 5 = 2 2 0 0 1 1 1 0 1 x 2 = 0 1/2 1 0 1 0 1/2 0 1 x 7 = 2 0 0 1 0 2 1 1 1 x 3 = 1/3 0 0 1 1/3 0 0 0 1/3 Basis change: x 1 enters the basis, x 5 exits. 109

Minimizing the Auxiliary Problem x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 0 0 0 0 0 2 2 0 1 x 1 = 1 1 0 0 1/2 1/2 1/2 0 1/2 x 2 = 1/2 0 1 0 3/4 1/4 1/4 0 3/4 x 7 = 0 0 0 0 0 1 1 1 0 x 3 = 1/3 0 0 1 1/3 0 0 0 1/3 Basic feasible solution for auxiliary problem with (auxiliary) cost value 0 Also feasible for the original problem - but not (yet) basic. 110

Obtaining a Basis for the Original Problem x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 0 0 0 0 0 2 2 0 1 x 1 = 1 1 0 0 1/2 1/2 1/2 0 1/2 x 2 = 1/2 0 1 0 3/4 1/4 1/4 0 3/4 x 7 = 0 0 0 0 0 1 1 1 0 x 3 = 1/3 0 0 1 1/3 0 0 0 1/3 Observation Restricting the tableau to the original variables, we get a zero-row. Thus the original equations are linearily dependent. We can remove the third row. 111

Obtaining a Basis for the Original Problem x 1 x 2 x 3 x 4 11/6 0 0 0 1/12 x 1 = 1 1 0 0 1/2 x 2 = 1/2 0 1 0 3/4 x 3 = 1/3 0 0 1 1/3 We finally obtain a basic feasible solution for the original problem. After computing the reduced costs for this basis (as seen in the beginning), the simplex method can start with its typical iterations. 112

Omitting Artificial Variables Auxiliary problem min x 5 +x 6 +x 7 +x 8 s.t. x 1 +2 x 2 +3 x 3 x 5 = 3 x 1 +2 x 2 +6 x 3 +x 6 = 2 4 x 2 +9 x 3 +x 7 = 5 3 x 3 +x 4 +x 8 = 1 x 1,..., x 8 0 Artificial variable x 8 could have been omitted by setting x 4 to 1 in the initial basis. This is possible as x 4 does only appear in one constraint. Generally, this can be done, e. g., with all slack variables that have nonnegative right hand sides. 113

Phase I of the Simplex Method Given: LP in standard form: min{c T x A x = b, x 0} 1 Transform problem such that b 0 (multiply constraints by 1). 2 Introduce artificial variables y 1,..., y m and solve auxiliary problem min m y i s.t. A x + I m y = b, x, y 0. i=1 3 If optimal cost is positive, then STOP (original LP is infeasible). 4 If no artificial variable is in final basis, eliminate artificial variables and columns and STOP (feasible basis for original LP has been found). 5 If lth basic variable is artificial, find j {1,..., n} with lth entry in B 1 A j nonzero. Use this entry as pivot element and replace lth basic variable with x j. 6 If no such j {1,..., n} exists, eliminate lth row (constraint). 114

The Two-phase Simplex Method Two-phase simplex method 1 Given an LP in standard from, first run phase I. 2 If phase I yields a basic feasible solution for the original LP, enter phase II (see above). Possible outcomes of the two-phase simplex method i Problem is infeasible (detected in phase I). ii iii iv Problem is feasible but rows of A are linearly dependent (detected and corrected at the end of phase I by eliminating redundant constraints.) Optimal cost is (detected in phase II). Problem has optimal basic feasible solution (found in phase II). Remark: (ii) is not an outcome but only an intermediate result leading to outcome (iii) or (iv). 115

Big-M Method Alternative idea: Combine the two phases into one by introducing sufficiently large penalty costs for artificial variables. This way, the LP becomes: min n i=1 c i x i s.t. A x = b x 0 min n i=1 c i x i + M m j=1 y j s.t. A x + I m y = b x, y 0 Remark: If M is sufficiently large and the original program has a feasible solution, all artificial variables will be driven to zero by the simplex method. 116

How to Choose M? Observation Initially, M only occurs in the zeroth row. As the zeroth row never becomes pivot row, this property is maintained while the simplex method is running. All we need to have is an order on all values that can appear as reduced cost coefficients. Order on cost coefficients a M + b < c M + d : (a < c) (a = c b < c) In particular, a M + b < 0 < a M + b for any positive a and arbitrary b, and we can decide whether a cost coefficient is negative or not. There is no need to give M a fixed numerical value. 117

Example Example: min x 1 + x 2 + x 3 s.t. x 1 + 2 x 2 + 3 x 3 = 3 x 1 + 2 x 2 + 6 x 3 = 2 4 x 2 + 9 x 3 = 5 3 x 3 + x 4 = 1 x 1,..., x 4 0 118

Introducing Artificial Variables and M Auxiliary problem: min x 1 +x 2 +x 3 + M x 5 + M x 6 + M x 7 s.t. x 1 +2 x 2 +3 x 3 x 5 = 3 x 1 +2 x 2 +6 x 3 + x 6 = 2 4 x 2 +9 x 3 + x 7 = 5 3 x 3 +x 4 = 1 x 1,..., x 4, x 5, x 6, x 7 0 Note that this time the unnecessary artificial variable x 8 has been omitted. We start off with (x 5, x 6, x 7, x 4 ) = (3, 2, 5, 1). 119

Forming the Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 0 1 1 1 0 M M M 3 1 2 3 0 1 0 0 2 1 2 6 0 0 1 0 5 0 4 9 0 0 0 1 1 0 0 3 1 0 0 0 Compute reduced costs by eliminating the nonzero entries for the basic variables. 120

Forming the Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 3M M + 1 2M + 1 3M + 1 0 0 M M 3 1 2 3 0 1 0 0 2 1 2 6 0 0 1 0 5 0 4 9 0 0 0 1 1 0 0 3 1 0 0 0 Compute reduced costs by eliminating the nonzero entries for the basic variables. 120

Forming the Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 5M 1 4M + 1 9M + 1 0 0 0 M 3 1 2 3 0 1 0 0 2 1 2 6 0 0 1 0 5 0 4 9 0 0 0 1 1 0 0 3 1 0 0 0 Compute reduced costs by eliminating the nonzero entries for the basic variables. 120

Forming the Initial Tableau x 1 x 2 x 3 x 4 x 5 x 6 x 7 10M 1 8M + 1 18M + 1 0 0 0 0 3 1 2 3 0 1 0 0 2 1 2 6 0 0 1 0 5 0 4 9 0 0 0 1 1 0 0 3 1 0 0 0 Compute reduced costs by eliminating the nonzero entries for the basic variables. 120

First Iteration x 1 x 2 x 3 x 4 x 5 x 6 x 7 10M 1 8M + 1 18M + 1 0 0 0 0 3 1 2 3 0 1 0 0 2 1 2 6 0 0 1 0 5 0 4 9 0 0 0 1 1 0 0 3 1 0 0 0 Reduced costs for x 2 and x 3 are negative. Basis change: x 3 enters the basis, x 4 leaves. 121

Second Iteration x 1 x 2 x 3 x 4 x 5 x 6 x 7 4M 1/3 1 8M + 1 0 6M 1/3 0 0 0 2 1 2 0 1 1 0 0 0 1 2 0 2 0 1 0 2 0 4 0 3 0 0 1 1/3 0 0 1 1/3 0 0 0 Basis change: x 2 enters the basis, x 6 leaves. 122

Third Iteration x 1 x 2 x 3 x 4 x 5 x 6 x 7 4M 1/3 4M + 3/2 0 0 2M + 2/3 0 4M 1/2 0 2 2 0 0 1 1 1 0 0 1/2 1 0 1 0 1/2 0 2 2 0 0 1 0 2 1 1/3 0 0 1 1/3 0 0 0 Basis change: x 1 enters the basis, x 5 leaves. 123

Fourth Iteration x 1 x 2 x 3 x 4 x 5 x 6 x 7 11/6 0 0 0 1/12 2M 3/4 2M + 1/4 0 21 1 0 0 1/2 1/2 1/2 0 1/2 0 1 0 3/4 1/4 1/4 0 0 0 0 0 0 1 1 1 1/3 0 0 1 1/3 0 0 0 Note that all artificial variables have already been driven to 0. Basis change: x 4 enters the basis, x 3 leaves. 124

Fifth Iteration x 1 x 2 x 3 x 4 x 5 x 6 x 7 7/4 0 0 1/4 0 2M 3/4 2M + 1/4 0 1/2 1 0 3/2 0 1/2 1/2 0 5/4 0 1 9/4 0 1/4 1/4 0 0 0 0 0 0 1 1 1 1 0 0 3 1 0 0 0 We now have an optimal solution of the auxiliary problem, as all costs are nonnegative (M presumed large enough). By elimiating the third row as in the previous example, we get a basic feasible and also optimal solution to the original problem. 125

Computational Efficiency of the Simplex Method Observation The computational efficiency of the simplex method is determined by i ii the computational effort of each iteration; the number of iterations. Question: How many iterations are needed in the worst case? Idea for negative answer (lower bound) Describe a polyhedron with an exponential number of vertices; a path that visits all vertices and always moves from a vertex to an adjacent one that has lower costs. 126

Computational Efficiency of the Simplex Method Unit cube Consider the unit cube in R n, defined by the constraints The unit cube has 2 n vertices; 0 x i 1, i = 1,..., n a spanning path, i. e., a path traveling the edges of the cube visiting each vertex exactly once. x 3 x 2 x 2 x 1 x 1 127

Computational Efficiency of the Simplex Method (cont.) Klee-Minty cube Consider a perturbation of the unit cube in R n, defined by the constraints 0 x 1 1, ɛx i 1 x i 1 ɛx i 1, i = 2,..., n for some ɛ (0, 1/2). x 3 x 2 x 2 x 1 x 1 128

Computational Efficiency of the Simplex Method (cont.) Klee-Minty cube 0 x 1 1, ɛx i 1 x i 1 ɛx i 1, i = 2,..., n, ɛ (0, 1/2) Theorem 3.14. Consider the linear programming problem of minimizing x n subject to the constraints above. Then, a b c the feasible set has 2 n vertices; the vertices can be ordered so that each one is adjacent to and has lower cost than the previous one; there exists a pivoting rule under which the simplex method requires 2 n 1 changes of basis before it terminates. 129

Diameter of Polyhedra Definition 3.15. The distance d(x, y) between two vertices x, y is the minimum number of edges required to reach y starting from x. The diameter D(P) of polyhedron P is the maximum d(x, y) over all pairs of vertices (x, y). (n, m) is the maximum D(P) over all polytopes in R n that are represented in terms of m inequality constraints. u (n, m) is the maximum D(P) over all polyhedra in R n that are represented in terms of m inequality constraints. (2, 8) = 8 2 = 4 u (2, 8) = 8 2 = 6 130

Hirsch Conjecture Observation: The diameter of the feasible set in a linear programming problem is a lower bound on the number of steps required by the simplex method, no matter which pivoting rule is being used. Polynomial Hirsch Conjecture (n, m) poly(m, n) Remarks Known lower bounds: u (n, m) m n + n Known upper bounds: (n, m) u (n, m) < m 1+log 2 n = (2n) log 2 m The Strong Hirsch Conjecture (n, m) m n was disproven in 2010 by Paco Santos for n = 43, m = 86. 5 131

Average Case Behavior of the Simplex Method Despite the exponential lower bounds on the worst case behavior of the simplex method (Klee-Minty cubes etc.), the simplex method usually behaves well in practice. The number of iterations is typically O(m). There have been several attempts to explain this phenomenon from a more theoretical point of view. These results say that on average the number of iterations is O( ) (usually polynomial). One main difficulty is to come up with a meaningful and, at the same time, manageable definition of the term on average. 132

Chapter 4: Duality Theory (cp. Bertsimas & Tsitsiklis, Chapter 4) 133

Motivation For A R m n, b R m, and c R n, consider the linear program min c T x s.t. A x b, x 0 Question: How to derive lower bounds on the optimal solution value? Idea: For p R m with p 0: A x b = (p T A) x p T b Thus, if c T p T A, then c T x (p T A) x p T b for all feasible solutions x. Find the best (largest) lower bound in this way: max p T b max b T p s.t. p T A c T s.t. A T p c p 0 p 0 This LP is the dual linear program of our initial LP. 134

Primal and Dual Linear Program Consider the general linear program: Obtain a lower bound: min c T x max p T b s.t. a T i x b i for i M 1 a T i x b i for i M 2 a T i x = b i for i M 3 x j 0 for j N 1 x j 0 for j N 2 x j free for j N 3 s.t. p i 0 for i M 1 p i 0 for i M 2 p i free for i M 3 p T A j c j for j N 1 p T A j c j for j N 2 p T A j = c j for j N 3 The linear program on the right hand side is the dual linear program of the primal linear program on the left hand side. 135

Primal and Dual Variables and Constraints primal LP (minimize) dual LP (maximize) b i 0 constraints b i 0 variables = b i free 0 c i variables 0 c i constraints free = c i 136

Examples primal LP dual LP min c T x max p T b s.t. A x b s.t. p T A = c T p 0 min c T x max p T b s.t. A x = b x 0 s.t. p T A c T 137

Basic Properties of the Dual Linear Program Theorem 4.1. The dual of the dual LP is the primal LP. Proof:... Theorem 4.2. Let Π 1 and Π 2 be two LPs where Π 2 has been obtained from Π 1 by (several) transformations of the following type: i ii iii replace a free variable by the difference of two non-negative variables; introduce a slack variable in order to replace an inequality constraint by an equation; if some row of a feasible equality system is a linear combination of the other rows, eliminate this row. Then the dual of Π 1 is equivalent to the dual of Π 2. Proof:... 138

Weak Duality Theorem Theorem 4.3. If x is a feasible solution to the primal LP (minimization problem) and p a feasible solution to the dual LP (maximization problem), then Proof:... Corollary 4.4. c T x p T b. Consider a primal-dual pair of linear programs as above. a b c If the primal LP is unbounded (i. e., optimal cost = ), then the dual LP is infeasible. If the dual LP is unbounded (i. e., optimal cost = ), then the primal LP is infeasible. If x and p are feasible solutions to the primal and dual LP, resp., and if c T x = p T b, then x and p are optimal solutions. 139

Strong Duality Theorem Theorem 4.5. If an LP has an optimal solution, so does its dual and the optimal costs are equal. Proof:... 140

Different Possibilities for Primal and Dual LP primal \ dual finite optimum unbounded infeasible finite optimum possible impossible impossible unbounded impossible impossible possible infeasible impossible possible possible Example of infeasible primal and dual LP: min x 1 + 2 x 2 max p 1 + 3 p 2 s.t. x 1 + x 2 = 1 s.t. p 1 + 2 p 2 = 1 2 x 1 + 2 x 2 = 3 p 1 + 2 p 2 = 2 141

Complementary Slackness Consider the following pair of primal and dual LPs: min c T x max p T b s.t. A x b s.t. p T A = c T p 0 If x and p are feasible solutions, then c T x = p T A x p T b.thus, c T x = p T b for all i: p i = 0 if a i T x > b i. Theorem 4.6. Consider an arbitrary pair of primal and dual LPs. Let x and p be feasible solutions to the primal and dual LP, respectively. Then x and p are both optimal if and only if u i := p i (a i T x b i ) = 0 for all i, v j := (c j p T A j ) x j = 0 for all j. Proof:... 142

Geometric View Consider pair of primal and dual LPs with A R m n and rank(a) = n: min c T x max p T b s.t. A x b s.t. p T A = c T p 0 Let I {1,..., m} with I = n and a i, i I, linearly independent. = a i T x = b i, i I, has unique solution x I (basic solution) Let p R m (dual vector). Then x, p are optimal solutions if i ii iii iv a i T x b i for all i (primal feasibility) p i = 0 for all i I (complementary slackness) m i=1 p i a i = c (dual feasibility) p 0 (dual feasibility) (ii) and (iii) imply i I p i a i = c which has a unique solution p I. The a i, i I, form basis for dual LP and p I is corresponding basic solution. 143

Geometric View (cont.) a 3 a 1 A a 2 a 1 a 5 a 4 B a 3 a 4 a1 a 2 c C a 1 a 5 D a 1 144

Dual Variables as Marginal Costs Consider the primal dual pair: min c T x max p T b s.t. A x = b s.t. p T A c T x 0 Let x be optimal basic feasible solution to primal LP with basis B, i. e., x B = B 1 b and assume that x B > 0 (i. e., x non-degenerate). Replace b by b + d. For small d, the basis B remains feasible and optimal: B 1 (b + d) = B 1 b + B 1 d 0 c T = c T c B T B 1 A 0 (feasibility) (optimality) Optimal cost of perturbed problem is c T B B 1 (b + d) = c T B xb + (c B T B 1 ) d }{{} =p T Thus, p i is the marginal cost per unit increase of b i. 145

Dual Variables as Shadow Prices Diet problem: a ij := amount of nutrient i in one unit of food j b i := requirement of nutrient i in some ideal diet c j := cost of one unit of food j on the food market LP duality: Let x j := number of units of food j in the diet: min c T x max p T b s.t. A x = b s.t. p T A c T x 0 Dual interpretation: p i is fair price per unit of nutrient i p T A j is value of one unit of food j on the nutrient market food j used in ideal diet (xj > 0) is consistently priced at the two markets (by complementary slackness) ideal diet has the same value on both markets (by strong duality) 146

Dual Basic Solutions Consider LP in standard form with A R m n, rank(a) = m, and dual LP: Observation 4.7. A basis B yields min c T x max p T b s.t. A x = b s.t. p T A c T x 0 a primal basic solution given by x B := B 1 b and a dual basic solution p T := c B T B 1. Moreover, a the values of the primal and the dual basic solutions are equal: b p is feasible if and only if c 0; c c B T x B = c B T B 1 b = p T b ; reduced cost c i = 0 corresponds to active dual constraint; d p is degenerate if and only if c i = 0 for some non-basic variable x i. 147

Dual Simplex Method Let B be a basis whose corresponding dual basic solution p is feasible. If also the primal basic solution x is feasible, then x, p are optimal. Assume that x B(l) < 0 and consider the lth row of the simplex tableau (x B(l), v 1,..., v n ) (pivot row) I Let j {1,..., n} with v j < 0 and c j v j = min c i i:v i <0 v i Performing an iteration of the simplex method with pivot element v j yields new basis B and corresponding dual basic solution p with II c B T B 1 A c T and p T b p T b (with > if c j > 0). If v i 0 for all i {1,..., n}, then the dual LP is unbounded and the primal LP is infeasible. 148

Remarks on the Dual Simplex Method Dual simplex method terminates if lexicographic pivoting rule is used: Choose any row l with xb(l) < 0 to be the pivot row. Among all columns j with vj < 0 choose the one which is lexicographically minimal when divided by v j. Dual simplex method is useful if, e. g., dual basic solution is readily available. Example: Resolve LP after right-hand-side b has changed. 149

Chapter 5: Optimal Trees and Paths (cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 2) 150

Trees and Forests Definition 5.1. i An undirected graph having no circuit is called a forest. ii A connected forest is called a tree. Theorem 5.2. Let G = (V, E) be an undirected graph on n = V nodes. Then, the following statements are equivalent: i ii iii iv v vi G is a tree. G has n 1 edges and no circuit. G has n 1 edges and is connected. G is connected. If an arbitrary edge is removed, the resulting subgraph is disconnected. G has no circuit. Adding an arbitrary edge to G creates a circuit. G contains a unique path between any pair of nodes. Proof: See exercises. 151

Kruskal s Algorithm Minimum Spanning Tree (MST) Problem Given: connected graph G = (V, E), cost function c : E R. Task: find spanning tree T = (V, F ) of G with minimum cost e F c(e). Kruskal s Algorithm for MST 1 Sort the edges in E such that c(e 1 ) c(e 2 ) c(e m ). 2 Set T := (V, ). 3 For i := 1 to m do: If adding e i to T does not create a circuit, then add e i to T. 152

Example for Kruskal s Algorithm b 28 31 32 h 16 22 d 23 15 18 a 29 g 12 k 20 35 25 f 153

Prim s Algorithm Notation: For a graph G = (V, E) and A V let δ(a) := {e = {v, w} E v A and w V \ A}. We call δ(a) the cut induced by A. Prim s Algorithm for MST 1 Set U := {r} for some node r V and F := ; set T := (U, F ). 2 While U V, determine a minimum cost edge e δ(u). 3 Set F := F {e} and U := U {W } with e = {v, w}, w V \ U. 154

Example for Prim s Algorithm b 28 31 32 h 16 22 d 23 15 18 a 29 g 12 k 20 35 25 f 155

Correctness of the MST Algorithms Lemma 5.3. A graph G = (V, E) is connected if and only if there is no set A V, A V, with δ(a) =. Proof:... Notation: We say that B E is extendible to an MST if B is contained in the edge-set of some MST of G. Theorem 5.4. Let B E be extendible to an MST and A V with B δ(a) =. If e is a min-cost edge in δ(a), then B {e} is extendible to an MST. Proof:... Correctness of Prim s Algorithm immediately follows. Kruskal: Whenever an edge e = {v, w} is added, it is cheapest edge in cut induced by subset of nodes currently reachable from v. 156

Efficiency of Prim s Algorithm Prim s Algorithm for MST 1 Set U := {r} for some node r V and F := ; set T := (U, F ). 2 While U V, determine a minimum cost edge e δ(u). 3 Set F := F {e} and U := U {W } with e = {v, w}, w V \ U. Straightforward implementation achieves running time O(nm) where, as usual, n := V and m := E : the while-loop has n 1 iterations; a min-cost edge e δ(u) can be found in O(m) time. Idea for improved running time O(n 2 ): For each v V \ U, always keep a minimum cost edge h(v) connecting v to some node in U. In each iteration, information about all h(v), v V \ U, can be updated in O(n) time. Find min-cost edge e δ(u) in O(n) time by only considering the edges h(v), v V \ U. Best known running time is O(m + n log n) (uses Fibonacci heaps). 157

Efficiency of Kruskal s Algorithm Kruskal s Algorithm for MST 1 Sort the edges in E such that c(e 1 ) c(e 2 ) c(e m ). 2 Set T := (V, ). 3 For i := 1 to m do: If adding e i to T does not create a circuit, then add e i to T. Theorem 5.5. Kruskal s Algorithm can be implemented to run in O(m log m) time. Proof:... 158

Minimum Spanning Trees and Linear Programming Notation: For S V let γ(s) := { e = {v, w} E v, w S }. For a vector x R E and a subset B E let x(b) := e B x e. Consider the following integer linear program: min Observations: c T x s.t. x(γ(s)) S 1 for all S V (5.1) x(e) = V 1 (5.2) x e {0, 1} for all e E Feasible solution x {0, 1} E is characteristic vector of subset F E. F does not contain circuit due to (5.1) and n 1 edges due to (5.2). Thus, F forms a spanning tree of G. Moreover, the edge set of an arbitrary spanning tree of G yields a feasible solution x {0, 1} E. 159

Minimum Spanning Trees and Linear Programming (cont.) Consider LP relaxation of the integer programming formulation: min Theorem 5.6. c T x s.t. x(γ(s)) S 1 for all S V x(e) = V 1 x e 0 for all e E Let x {0, 1} E be the characteristic vector of an MST. Then x is an optimal solution to the LP above. Proof:... Corollary 5.7. The vertices of the polytope given by the set of feasible LP solutions are exactly the characteristic vectors of spanning trees of G. The polytope is thus the convex hull of the characteristic vectors of all spanning trees. 160

Shortest Path Problem Given: digraph D = (V, A), node r V, arc costs c a, a A. Task: for each v V, find dipath from r to v of least cost (if one exists) Remarks: Existence of r-v-dipath can be checked, e. g., by breadth-first search. Ensure existence of r-v-dipaths: add arcs (r, v) of suffic. large cost. Basic idea behind all algorithms for solving shortest path problem: If y v, v V, is the least cost of a dipath from r to v, then y v + c (v,w) y w for all (v, w) A. (5.3) Remarks: More generally, subpaths of shortest paths are shortest paths! If there is a shortest r-v-dipath for all v V, then there is a shortest path tree, i. e., a directed spanning tree T rooted at r such that the unique r-v-dipath in T is a least-cost r-v-dipath in D. 161

Feasible Potentials Definition 5.8. A vector y R V is a feasible potential if it satisfies (5.3). Lemma 5.9. If y is feasible potential with y r = 0 and P an r-v-dipath, then y v c(p). Proof: Suppose that P is v 0, a 1, v 1,..., a k, v k, where v 0 = r and v k = v. Then, k k c(p) = (y vi y vi 1 ) = y vk y v0 = y v. Corollary 5.10. i=1 c ai i=1 If y is a feasible potential with y r = 0 and P an r-v-dipath of cost y v, then P is a least-cost r-v-dipath. 162

Ford s Algorithm Ford s Algorithm i ii Set y r := 0, p(r) := r, y v :=, and p(v) := null, for all v V \ {r}. While there is an arc a = (v, w) A with y w > y v + c (v,w), set y w := y v + c (v,w) and p(w) := v. Question: Does the algorithm always terminate? Example: a 2 1 r 1 b d 3 Observation: The algorithm does not terminate because of the negative-cost dicircuit. 163

Validity of Ford s Algorithm Lemma 5.11. If there is no negative-cost dicircuit, then at any stage of the algorithm: a if y v, then y v is the cost of some simple dipath from r to v; b if p(v) null, then p defines a simple r-v-dipath of cost at most y v. Proof:... Theorem 5.12. If there is no negative-cost dicircuit, then Ford s Algorithm terminates after a finite number of iterations. At termination, y is a feasible potential with y r = 0 and, for each node v V, p defines a least-cost r-v-dipath. Proof:... 164

Feasible Potentials and Negative-Cost Dicircuits Theorem 5.13. A digraph D = (V, A) with arc costs c R A has a feasible potential if and only if there is no negative-cost dicircuit. Proof:... Remarks: If there is a dipath but no least-cost dipath from r to v, it is because there are arbitrarily cheap nonsimple r-v-dipaths. Finding a least-cost simple dipath from r to v is, however, difficult (see later). Lemma 5.14. If c is integer-valued, C := 2 max a A c a + 1, and there is no negative-cost dicircuit, then Ford s Algorithm terminates after at most C n 2 iterations. Proof: Exercise. 165

Feasible Potentials and Linear Programming As a consequence of Ford s Algorithm we get: Theorem 5.15. Let D = (V, A) be a digraph, r, s V, and c R A. If, for every v V, there exists a least-cost dipath from r to v, then min{c(p) P an r-s-dipath} = max{y s y r y a feasible potential}. Formulate the right-hand side as a linear program and consider the dual: max s.t. y s y r y w y v c (v,w) for all (v, w) A min s.t. c T x x a x a = b v a δ (v) a δ + (v) x a 0 for all a A v V with b s = 1, b r = 1, and b v = 0 for all v {r, s}. Notice: The dual is the LP relaxation of an ILP formulation of the shortest r-s-dipath problem (x a ˆ= number of times a shortest r-s-dipath uses arc a). 166

Bases of Shortest Path LP Consider again the dual LP: min s.t. c T x x a x a = b v a δ (v) a δ + (v) x a 0 for all a A for all v V The underlying matrix Q is the incidence matrix of D. Lemma 5.16. Let D = (V, A) be a connected digraph and Q its incidence matrix. A subset of columns of Q indexed by a subset of arcs F A forms a basis of the linear subspace of R n spanned by the columns of Q if and only if F is the arc-set of a spanning tree of D. Proof: Exercise. 167

Refinement of Ford s Algorithm Ford s Algorithm i ii Set y r := 0, p(r) := r, y v :=, and p(v) := null, for all v V \ {r}. While there is an arc a = (v, w) A with y w > y v + c (v,w), set y w := y v + c (v,w) and p(w) := v. # iterations crucially depends on order in which arcs are chosen. Suppose that arcs are chosen in order S = f 1, f 2, f 3,..., f l. Dipath P is embedded in S if P s arc sequence is a subsequence of S. Lemma 5.17. If an r-v-dipath P is embedded in S, then y v c(p) after Ford s Algorithm has gone through the sequence S. Proof:... Goal: Find short sequence S such that, for all v V, a least-cost r-v-dipath is embedded in S. 168

Ford-Bellman Algorithm Basic idea: Every simple dipath is embedded in S 1, S 2,..., S n 1 where, for all i, S i is an ordering of A. This yields a shortest path algorithm with running time O(nm). Ford-Bellman Algorithm i ii iii initialize y, p (see Ford s Algorithm); for i = 1 to n 1 do for all a = (v, w) A do iv if y w > y v + c (v,w), then set y w := y v + c (v,w) and p(w) := v; Theorem 5.18. The algorithm runs in O(nm) time. If, at termination, y is a feasible potential, then p yields a least-cost r-v-dipath for each v V. Otherwise, the given digraph contains a negative-cost dicircuit. 169

Acyclic Digraphs and Topological Orderings Definition 5.19. Consider a digraph D = (V, A). a b An ordering v 1, v 2,..., v n of V so that i < j for each (v i, v j ) A is called a topological ordering. If D has a topological ordering, then D is called acyclic. Observations: Digraph D is acyclic if and only if it does not contain a dicircuit. Let D be acyclic and S an ordering of A such that (v i, v j ) precedes (v k, v l ) if i < k. Then every dipath of D is embedded in S. Theorem 5.20. The shortest path problem on acyclic digraphs can be solved in time O(m). Proof:... 170

Dijkstra s Algorithm Consider the special case of nonnegative costs, i. e., c a 0, for each a A. Dijkstra s Algorithm i initialize y, p (see Ford s Algorithm); set S := V ; ii while S do iii choose v S with y v minimum and delete v from S; iv for each w V with (v, w) A do v if y w > y v + c (v,w), then set y w := y v + c (v,w) and p(w) := v; Example: 2 2 1 3 a p 3 0 r 3 2 4 4 b 4 2 q 6 171

Correctness of Dijkstra s Algorithm Lemma 5.21. For each w V, let y w be the value of y w when w is removed from S. If u is deleted from S before v, then y u y v. Proof:... Theorem 5.22. If c 0, then Dijkstra s Algorithm solves the shortest paths problem correctly in time O(n 2 ). A heap-based implementation yields running time O(m log n). Proof:... Remark: The for-loop in Dijkstra s Algorithm (step iv) can be modified such that only arcs (v, w) with w S are considered. 172

Feasible Potentials and Nonnegative Costs Observation 5.23. For given arc costs c R A and node potential y R V, define arc costs c R A by c (v,w) := c (v,w) + y v y w. Then, for all v, w V, a least-cost v-w-dipath w.r.t. c is a least-cost v-w-dipath w.r.t. c, and vice versa. Proof: Notice that for any v-w-dipath P it holds that Corollary 5.24. c (P) = c(p) + y v y w. For given arc costs c R A (not necessarily nonnegative) and a given feasible potential y R V, one can use Dijkstra s Algorithm to solve the shortest paths problem. Definition 5.25. For a digraph D = (V, A), arc costs c R A are called conservative if there is no negative-cost dicircuit in D, i. e., if there is feasible potential y R V. 173

Efficient Algorithms What is an efficient algorithm? efficient: consider running time algorithm: Turing Machine or other formal model of computation Simplified Definition An algorithm consists of elementary steps like, e. g., variable assignments simple arithmetic operations which only take a constant amount of time. The running time of the algorithm on a given input is the number of such steps and operations. 174

Bit Model and Arithmetic Model Two ways of measuring the running time and the size of the input I of A: Bit Model Count bit operations; e. g., adding two n-bit numbers takes n(+1) steps; multiplying them takes O(n 2 ) steps. Size of input I is the total number of bits needed to encode structure and numbers. Arithmetic Model Simple arithmetic operations on arbitrary numbers can be performed in constant time. Size of input I is total number of bits needed to encode structure plus # numbers in the input. 175

Polynomial vs. Strongly Polynomial Running Time Definition 5.26. i An algorithm runs in polynomial time if, in the bit model, its (worst-case) running time is polynomially bounded in the input size. ii An algorithm runs in strongly polynomial time if, in the bit model as well as in the arithmetic model, its (worst-case) running time is polynomially bounded in the input size. Examples: Prim s and Kruskal s Algorithm as well as the Ford-Bellman Algorithm and Dijkstra s Algorithm run in strongly polynomial time. The Euclidean Algorithm runs in polynomial time but not in strongly polynomial time. 176

Pseudopolynomial Running Time In the bit model, we assume that numbers are binary encoded, i. e., the encoding of the number n N needs log n + 1 bits. Thus, the running time bound O(C n 2 ) of Ford s Algorithm where C := 2 max a A c a + 1 is not polynomial in the input size. If we assume, however, that numbers are unary encoded, then C n 2 is polynomially bounded in the input size. Definition 5.27. An algorithm runs in pseudopolynomial time if, in the bit model with unary encoding of numbers, its (worst-case) running time is polynomially bounded in the input size. 177

Chapter 6: Maximum Flow Problems (cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 3) 178

Maximum s-t-flow Problem Given: Digraph D = (V, A), arc capacities u R A 0, nodes s, t V. Definition 6.1. A flow in D is a vector x R A 0. Moreover, a flow x in D i obeys arc capacities and is called feasible, if x a u a for each a A; ii has excess ex x (v):= x(δ (v)) x(δ + (v)) at node v V ; iii satisfies flow conservation at node v V if ex x (v) = 0; iv is a circulation if it satisfies flow conservation at each node v V ; v is an s-t-flow of value ex x (t) if it satisfies flow conservation at each node v V \ {s, t} and if ex x (t) 0. The maximum s-t-flow problem asks for a feasible s-t-flow in D of maximum value. 179

s-t-flows and s-t-cuts For a subset of nodes U V, the excess of U is defined as Lemma 6.2. ex x (U) := x(δ (U)) x(δ + (U)). For a flow x and a subset of nodes U it holds that ex x (U) = v U ex x(v). In particular, the value of an s-t-flow x is equal to Proof:... ex x (t) = ex x (s) = ex x (U) for each U V \ {s} with t U. For U V \ {s} with t U, the subset of arcs δ (U) is called an s-t-cut. Lemma 6.3. Let U V \ {s} with t U. The value of a feasible s-t-flow x is at most the capacity u(δ (U)) of the s-t-cut δ (U). Equality holds if and only if x a = u a for each a δ (U) and x a = 0 for each a δ + (U). Proof:... 180

Residual Graph and Residual Arcs For a = (v, w) A, let a 1 := (w, v) be the corresponding backward arc and A 1 := {a 1 a A}. Definition 6.4. For a feasible flow x, the set of residual arcs is given by A x := {a A x a < u a } {a 1 A 1 x a > 0}. Moreover, the digraph D x := (V, A x ) is called the residual graph of x. Remark: A dipath in D x is called an x-augmenting path. Lemma 6.5. If x is a feasible s-t-flow such that D x does not contain an s-t-dipath, then x is a maximum s-t-flow. Proof:... 181

Residual Capacities Definition 6.6. Let x be a feasible flow. For a A, define u x (a) := u(a) x(a) if a A x, and u x (a 1 ) := x(a) if a 1 A x. The value u x (a) is called residual capacity of arc a A x. Observations: If x is a feasible flow in (D, u) and y a feasible flow in (D x, u x ), then z(a) := x(a) + y(a) y(a 1 ) for a A yields a feasible flow z in D (we write z := x + y for short). If x, z are feasible flows in (D, u), then y(a) := max{0, z(a) x(a)}, y(a 1 ) := max{0, x(a) z(a)} for a A yields a feasible flow y in D x (we write y := z x for short). 182

Max-Flow Min-Cut Theorem and Ford-Fulkerson Algorithm Theorem 6.7. The maximum s-t-flow value equals the minimum capacity of an s-t-cut. Proof:... Corollary. A feasible s-t-flow x is maximum if and only if D x does not contain an s-t-dipath. Ford-Fulkerson Algorithm i set x := 0; ii while there is an s-t-dipath P in D x iii set x := x + δ χ P with δ := min{u x (a) a P}; Here, χ P : A {0, 1, 1} is the characteristic vector of dipath P defined by 1 if a P, χ P (a) = 1 if a 1 P, for all a A. 0 otherwise, 183

Termination of the Ford-Fulkerson Algorithm Theorem 6.8. a If all capacities are rational, then the algorithm terminates with a maximum s-t-flow. b If all capacities are integral, it finds an integral maximum s-t-flow. Proof:... When an arbitrary x-augmenting path is chosen in every iteration, the Ford-Fulkerson Algorithm can behave badly: 10 k v 10 k s 1 t 10 k w 10 k Remark: There exist instances with finite irrational capacities where the Ford-Fulkerson Algorithm never terminates and the flow value converges to a value that is strictly smaller than the maximum flow value (see ex.). 184

Running Time of the Ford-Fulkerson Algorithm Theorem 6.9. If all capacities are integral and the maximum flow value is K <, then the Ford-Fulkerson Algorithm terminates after at most K iterations. Its running time is O(m K) in this case. Proof: In each iteration the flow value is increased by at least 1. A variant of the Ford-Fulkerson Algo. is the Edmonds-Karp Algorithm: In each iteration, choose shortest s-t-dipath in D x (using BFS). Theorem 6.10. The Edmonds-Karp Algorithm terminates after at most n m iterations; its running time is O(n m 2 ). Proof:... Remark: The Edmonds-Karp Algorithm can be implemented with running time O(n 2 m). 185

Application: Kőnig s Theorem Definition 6.11. Consider an undirected graph G = (V, E). i A matching in G is a subset of edges M E with e e = for all e, e M with e e. ii A node cover is a subset of nodes C V with e C for all e E. Observation: In a bipartite graph G = (P Q, E), a maximum cardinality matching can be found by a maximum flow computation. Theorem 6.12. In bipartite graphs, the maximum cardinality of a matching equals the minimum cardinality of a node cover. Proof:... 186

Arc-Based LP Formulation Straightforward LP formulation of the maximum s-t-flow problem: max x a s.t. Dual LP: min a δ + (s) x a a δ (s) a δ (v) a δ + (v) x a x a = 0 for all v V \ {s, t} x a u(a) for all a A x a 0 for all a A u(a) z a a A s.t. y w y v + z (v,w) 0 for all (v, w) A y s = 1, y t = 0 z a 0 for all a A 187

Dual Solutions and s-t-cuts min u(a) z a a A s.t. y w y v + z (v,w) 0 for all (v, w) A y s = 1, y t = 0 z a 0 for all a A Observation: An s-t-cut δ + (U) (with U V \ {t}, s U) yields feasible dual solution (y, z) of value u(δ + (U)): let y be the characteristic vector χ U of U (i. e., y v = 1 for v U, y v = 0 for v V \ U) let z be the characteristic vector χ δ+ (U) of δ + (U) (i. e., z a = 1 for a δ + (U), z a = 0 for a A \ δ + (U)) Theorem 6.13. There exists an s-t-cut δ + (U) (with U V \ {t}, s U) such that the corresponding dual solution (y, z) is an optimal dual solution. Proof:... 188

Flow Decomposition Theorem 6.14. For an s-t-flow x in D, there exist s-t-dipaths P 1,..., P k and dicircuits C 1,..., C l in D with k + l m and y P1,..., y Pk, y C1,..., y Cl 0 with x a = i:a P i y Pi + Moreover, the value of x is k i=1 y P i. Proof:... j:a C j y Cj for all a A. Observation: For an s-t-flow x and a flow decomposition as in Theorem 6.14, let x a := i:a P i y Pi for all a A. Then x is an s-t-flow of the same value as x and x a x a for all a A. 189

Path-Based LP Formulation Let P be the set of all s-t-dipaths in D. max Dual LP: s.t. min s.t. P P y P P P:a P y P 0 y P u(a) u(a) z a a A z a 1 a P z a 0 for all a A for all P P for all P P for all a A Remark. Notice that P and thus the number of variables of the primal LP and the number of constraints of the dual LP can be exponential in n and m. 190

Dual Solutions and s-t-cuts min s.t. u(a) z a a A z a 1 a P z a 0 for all P P for all a A Observation: An s-t-cut δ + (U) (with U V \ {t}, s U) yields feasible dual solution z of value u(δ + (U)): let z be the characteristic vector χ δ+ (U) of δ + (U) (i. e., z a = 1 for a δ + (U), z a = 0 for a A \ δ + (U)) Theorem 6.15. There exists an s-t-cut δ + (U) (with U V \ {t}, s U) such that the corresponding dual solution z is an optimal dual solution. 191

Maximum s-t-flows: Another Algorithmic Approach A feasible flow x is a maximum s-t-flow if it fulfills two conditions: i ex x (v) = 0 for all v V \ {s, t}; (flow conservation) ii there is no s-t-dipath in D x. Ford-Fulkerson and Edmonds-Karp always fulfill the first condition and terminate as soon as the second condition is fulfilled. The Goldberg-Tarjan Algorithm (or Push-Relabel Algorithm, or Preflow-Push Algorithm) always fulfills the second condition and terminate as soon as the first condition is fulfilled. Definition 6.16. i A flow x is called preflow if ex x (v) 0 for all v V \ {s}. ii A node v V \ {s, t} is called active if ex x (v) > 0. 192

Valid Labelings and Admissible Arcs Definition 6.17. Let x be a preflow. A function d : V Z 0 with d(s) = n and d(t) = 0 is called valid labeling if d(v) d(w) + 1 for all (v, w) A x. An arc (v, w) A x is called admissible if v is active and d(v) = d(w) + 1. For v, w V, let d x (v, w) be the length of a shortest v-w-dipath in D x. Observation 6.18. Let x be a feasible preflow and d a valid labeling. Then d x (v, t) d(v). Lemma 6.19. Let x be a feasible preflow and d a valid labeling. a There is a v-s-dipath in D x for every active node v. b There is no s-t-dipath in D x. Proof:... 193

Goldberg-Tarjan Algorithm Goldberg-Tarjan Algorithm 1 for a δ + (s) set x(a) := u(a); for a A \ δ + (s) set x(a) := 0; set d(s) := n; for v V \ {s} set d(v) := 0; 2 while there is an active node v do 3 if there is no admissible arc a δ + D x (v) then Relabel(v); choose an admissible arc a δ + D x (v) and Push(a); Relabel(v) set d(v) := min{d(w) + 1 (v, w) A x }; Push(a) augment x along arc a by γ := min{ex x (v), u x (a)}; (a δ + D x (v)) 194

Analysis of the Goldberg-Tarjan Algorithm Lemma 6.20. At any stage of the algorithm, x is a feasible preflow and d a valid labeling. Proof:... Corollary 6.21. After termination of the algorithm, x is a maximum s-t-flow. Proof:... Lemma 6.22. a b A label d(v) is never decreased by the algorithm; calling Relabel(v) strictly increases d(v). d(v) 2n 1 throughout the algorithm. c The number of Relabel operations is at most 2n 2. Proof:... 195

Bounding the Number of Push Operations We distinguish two types of Push operations: A Push operation on arc a is called saturating if, after the Push, arc a has disappeared from the residual graph D x. Otherwise, the Push operation is called nonsaturating and node v with a δ + (v) is no longer active. Lemma 6.23. The number of saturating Push operations is in O(m n). Proof:... Lemma 6.24. The number of unsaturating pushes is at most O(m n 2 ). Lemma 6.25. If the algorithm always chooses an active node v with d(v) maximum, then the number of unsaturating pushes is in O(n 3 ). Proof:... 196

Running Time of the Goldberg-Tarjan Algorithm Theorem 6.26. The Goldberg-Tarjan Algorithm finds a maximum s-t-flow in O(n 2 m) time. Theorem 6.27. If the algorithm always chooses an active node v with d(v) maximum, its running time is O(n 3 ). Remark: If the algorithm always chooses an active node v with d(v) maximum, one can show that the number of unsaturating pushes and thus the total running time is at most O(n 2 m). 197

Chapter 7: Minimum-Cost Flow Problems (cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 4; Korte & Vygen, Chapter 9) 198

b -Transshipments and Costs Given: Digraph D = (V, A), capacities u : A R 0, arc costs c : A R Definition 7.1. i Let b : V R. A flow x is called b -transshipment if ex x (v) = b(v) for all v V. ii The cost of a flow x is defined as c(x) := a A c(a) x(a). Observation 7.2. If there is a feasible b -transshipment, then one can be found by a maximum flow computation. Proof:... Remark. The existence of a b -transshipment implies that v V b(v) = 0. 199

Minimum-Cost b -Transshipment Problem Minimum-cost b -transshipment problem Given: D = (V, A), u : A R 0, c : A R, b : V R Task: find a feasible b -transshipment of minimum cost Special cases: min-cost s-t-flow problem (for given flow value) min-cost circulation problem Cost of residual arc: For a given feasible flow x, we extend the cost function c to A x by defining c(a 1 ) := c(a) for a A. 200

Optimality Criteria Theorem 7.3. A feasible b -transshipment x has minimum cost among all feasible b -transshipments if and only if each dicircuit of D x has nonnegative cost. Proof:... Theorem 7.4. A feasible b -transshipment x has minimum cost among all feasible b -transshipments if and only if there is a feasible potential y R V, i. e., Proof:... y v + c ( (v, w) ) y w for all (v, w) A x. 201

Alternative Proof of Theorem 7.4 Consider LP formulation of min-cost b -transshipment problem: min c(a) x a s.t. Dual LP: max a A x a x a = b(v) for all v V a δ (v) a δ + (v) x a u(a) for all a A x a 0 for all a A b(v) y v + u(a) z a a A v V s.t. y w y v + z (v,w) c ( (v, w) ) for all (v, w) A z a 0 for all a A The result follows from complementary slackness conditions. 202

Negative-Cycle Canceling Algorithm Negative-cycle canceling algorithm i compute a feasible b -transshipment x or determine that none exists; ii iii while there is a negative-cost dicircuit C in D x set x := x + δ χ C with δ := min{u x (a) a C}; Remarks: The negative-cost dicircuit C in step (ii) can be found in O(nm) time by the Ford-Bellman Algorithm. However, the number of iterations is only pseudo-polynomial in the input size. If arc capacities and b-values are integral, the algorithm returns an integral min-cost b -transshipment. 203

Minimum-Mean-Cycle Canceling Algorithm The mean cost of a dicircuit C in D x is a C c(a) C Theorem 7.5. Choosing a minimum mean-cost dicircuit in step (ii), the number of iterations is in O(n m 2 log n). Proof:... Theorem 7.6. A minimum mean-cost dicircuit can be found in O(n m) time. Proof: Exercise! 204

Running Time of Minimum-Mean-Cycle Canceling Corollary 7.7. A min-cost b -transshipment can be found in O(n 2 m 3 log n) time. Remark: Goldberg and Tarjan showed that the running time of the minimum-mean cycle canceling algorithm can be improved to O(n m 2 log 2 n). 205

Augmenting Flow Along Min-Cost Dipaths Remark: In the following we assume without loss of generality that in a given min-cost b -transshipment problem all arc costs are nonnegative; there is a dipath of infinite capacity between every pair of nodes. Theorem 7.8. Let x be a feasible min-cost b -transshipment, s, t V, and P a min-cost s-t-dipath in D x with bottleneck capacity u x (P) := min a P u x (a). Then, x + δ χ P with 0 δ u x (P) is a feasible min-cost b -transshipment with b(v) + δ for v = t, b (v) := b(v) δ for v = s, b(v) otherwise. Proof:... 206

Successive Shortest Path Algorithm In the following we assume that v V b(v) = 0. Successive Shortest Path Algorithm i set x := 0; ii while b 0 iii find min-cost s-t-dipath P in D x for s, t V, b(s) < 0, b(t) > 0; iv set δ := min{ b(s), b(t), u x (P)} and x := x + δ χ P, b(s) := b(s) + δ, b(t) := b(t) δ; Theorem 7.9. If all arc capacities and b-values are integral, the Successive Shortest Path Algorithm terminates with an integral min-cost b -transshipment after at most 1 2 v V b(v) iterations. Proof:... 207

Capacity Scaling For a flow x and > 0, let A x := {a A x u x (a) }, D x := (V, A x ); set U := max a A u(a). Successive Shortest Path Algorithm with Capacity Scaling i set x := 0, := 2 log U, p(v) := 0 for all v V ; ii while 1 iii for all a = (v, w) A x with c(a) < p(w) p(v) iv v vi set b(v) := b(v) + u x (a) and b(w) := b(w) u x (a); augment x by sending u x (a) units of flow along arc a; set S( ) := {v V b(v) }, T ( ) := {v V b(v) }; while S( ) and T ( ) vii find min-cost s-t-dipath P in D x for some s S( ), t T ( ); viii set p to the vector of shortest (min-cost) path distances from s; augment flow units along P in x; update b, S( ), T ( ), D x ; := /2; 208

Analysis of Running Time Remark: Steps (iii) (iv) ensure that optimality conditions are always fulfilled. Theorem 7.10. If all arc capacities and b-values are integral, the Successive Shortest Path Algorithm with Capacity Scaling terminates with an integral min-cost b -transshipment after at most O(m log U) calls to a shortest path subroutine. Proof:... Remark: A variant of the Successive Shortest Path Algorithm with strongly polynomial running time can be obtained by a refined use of capacity scaling (Orlin 1988/1993). 209

Chapter 8: Complexity Theory (cp. Cook, Cunningham, Pulleyblank & Schrijver, Chapter 9; Korte & Vygen, Chapter 15) 210

Efficient Algorithms: Historical Remark Edmonds (1965): Edmonds (1967): Jack Edmonds (1934 ) 211

Is There a Good Algorithm for the TSP? 212

Decision Problems Most of complexity theory is based on decision problems such as, e. g.: (Undirected) Hamiltonian Circuit Problem Given: undirected graph G = (V, E). Task: decide whether G contains a Hamiltonian circuit. Definition 8.1. i ii A decision problem is a pair P = (X, Y ). The elements of X are called instances of P, the elements of Y X are the yes-instances, those of X \ Y are no-instances. An algorithm for a decision problem (X, Y ) decides for a given x X whether x Y. Example. For Hamiltonian Circuit, X is the set of all (undirected) graphs and Y X is the subset of graphs containing a Hamiltonian circuit. 213

Further Examples of Decision Problems (Integer) Linear Inequalities Problem Given: matrix A Z m n, vector b Z m. Task: decide whether there is x Q n (x Z n ) with A x b. Clique Problem Given: undirected graph G = (V, E), positive integer k. Task: decide whether G has k mutually adjacent nodes (i. e., a complete subgraph on k nodes which is called clique of size k). Node Covering Problem Given: undirected graph G = (V, E), positive integer k. Task: decide whether there is a node cover C V with C k. 214

Further Examples of Decision Problems (cont.) Set Packing Problem Given: finite set U, family of subsets S 2 U, positive integer k. Task: decide whether there are k mutually disjoint subsets in R. Set Covering Problem Given: finite set U, family of subsets S 2 U, positive integer k. Task: decide whether there is R S with U = R and R k. R R Hitting Set Problem Given: finite set U, family of subsets S 2 U, positive integer k. Task: decide whether there is T U with T S for all S S, and T k. 215

Further Examples of Decision Problems (cont.) Node Coloring Problem Given: undirected graph G = (V, E), positive integer k. Task: decide whether the nodes of G can be colored with at most k colors such that adjacent nodes get different colors. Spanning Tree Problem Given: graph G = (V, E), edge weights w : E Z, positive integer k. Task: decide whether there is a spanning subtree of weight at most k. Steiner Tree Problem Given: graph G = (V, E), terminals T V, edge weights w : E Z, positive integer k. Task: decide whether there is a subtree of G of weight at most k that contains all terminals in T. 216

Complexity Classes P and NP Definition 8.2. The class of all decision problems for which there is a deterministic polynomial time algorithm is denoted by P. Example: The Spanning Tree Problem is in P. Definition 8.3. A decision problem P = (X, Y ) belongs to the complexity class NP if there is a polynomial function p : Z Z and a decision problem P = (X, Y ) in P, where X := {(x, c) x X, c {0, 1} p(size(x)) } such that Y = {x X c : (x, c) Y }. We call c with (x, c) Y a certificate for x. Examples: The Hamiltonian Circuit Problem and all problems listed on the last three slides belong to NP. 217

Certificates and Nondeterministic Turing Machines Remarks. The complexity class P consists of all decision problems that can be solved by a deterministic Turing machine in polynomial time. The complexity class NP consists of all decision problems that can be solved by a non-deterministic Turing machine in polynomial time. NP stands for Non-deterministic Polynomial. A certificate c for a problem instance x describes an accepting computation path of the non-deterministic turing machine for x. Lemma 8.4. P NP. NP P Proof: Deterministic Turing machines are a special case of non-deterministic Turing machines. 218

Polynomial Reductions Definition 8.5. Let P 1 and P 2 be decision problems. We say that P 1 polynomially reduces to P 2 if there exists a polynomial time oracle algorithm for P 1 using an oracle for P 2. Remark. An oracle for P 2 is a subroutine that can solve an instance of P 2. Calling the oracle (subroutine) is counted as one elementary computation step. Lemma 8.6. Let P 1 and P 2 be decision problems. If P 2 P and P 1 polynomially reduces to P 2, then P 1 P. Proof: Replace the oracle by a polynomial algorithm for P 2. 219

Finding Hamiltonian Circuits via Hamiltonian Paths Hamiltonian Path Problem Given: undirected graph G = (V, E). Task: decide whether G contains a Hamiltonian path. Lemma 8.7. The Hamiltonian Circuit Problem polynomially reduces to the Hamiltonian Path Problem. Proof: The following oracle algorithm runs in polynomial time. Oracle Algorithm. i for every e = {v, w} E do ii construct G = ( V {s, t}, E { {s, v}, {w, t} }) ; iii ask the oracle whether there is a Hamiltonian path in G ; iv if the oracle answers yes then stop and output yes ; v output no ; 220

Polynomial Transformations Definition 8.8. Let P 1 = (X 1, Y 1 ) and P 2 = (X 2, Y 2 ) be decision problems. We say that P 1 polynomially transforms to P 2 if there exists a function f : X 1 X 2 computable in polynomial time such that for all x X 1 x Y 1 f (x) Y 2. Lemma 8.9. The Hamiltonian Circuit Problem polynomially transforms to the Hamiltonian Path Problem. Proof:... Remarks. A polynomial reductions is also called Turing reduction. A polynomial transformation is also called Karp reduction. Every Karp reduction is also a Turing reduction, but not vice versa. Both notions are transitive. 221

NP-Completeness Definition 8.10. A decision problem P NP is NP-complete if all other problems in NP polynomially transform to P. Satisfiability Problem (SAT) Given: Boolean variables x 1,..., x n and a family of clauses where each clause is a disjunction of Boolean variables or their negations. Task: decide whether there is a truth assignment to x 1,..., x n such that all clauses are satisfied. Example: (x 1 x 2 x 3 ) (x 2 x 3 ) ( x 1 x 2 ) 222

Cook s Theorem (1971) Theorem 8.11. The Satisfiability problem is NP-complete. Stephen Cook (1939 ) Proof idea: SAT is obviously in NP. One can show that any nondeterministic Turing machine can be encoded as an instance of SAT. 223

Proofing NP-Completeness Lemma 8.12. Let P 1 and P 2 be decision problems. If P 1 is NP-complete, P 2 NP, and P 1 polynomially transforms to P 2, then P 2 is NP-complete. Proof: As mentioned above, polynomial transformations are transitive. 3-Satisfiability Problem (3SAT) Given: Boolean variables x 1,..., x n and a family of clauses where each clause is a disjunction of at most 3 Boolean variables or their negations. Task: decide whether there is a truth assignment to x 1,..., x n such that all clauses are satisfied. Theorem 8.13. 3SAT is NP-complete. Proof:... 224

Proofing NP-Completeness (cont.) Stable Set Problem Given: undirected graph G = (V, E), positive integer k. Task: decide whether G has k mutually non-adjacent nodes (i. e., a stable set of size k). Theorem 8.14. The Stable Set problem and the Clique problem are both NP-complete. Proof:... 225

Transformations for Karp s 21 NP-Complete Problems Richard M. Karp (1972). Reducibility Among Combinatorial Problems 226

P vs. NP Theorem 8.15. If a decision problem P is NP-complete and P P, then P = NP. Proof: See definition of NP-completeness and Lemma 8.6. There are two possible szenarios for the shape of the complexity world: NP-c P NP scenario A P = NP = NP-c scenario B It is widely believed that P NP, i. e., scenario A holds. Deciding whether P = NP or P NP is one of the seven millenium prize problems established by the Clay Mathematics Institute in 2000. 227

228

Complexity Class conp and conp-complete Problems Definition 8.16. i The complement of decision problem P = (X, Y ) is P := (X, X \ Y ). ii iii conp := { P P NP} A problem P conp is conp-complete if all problems in conp polynomially transform to P. Theorem 8.17. i ii iii P is NP-complete if and only if P is conp-complete. Unless NP = conp, no conp-complete problem is in NP. Unless P = NP, there are problems in NP that are neither in P nor NP-complete. 229

Complexity Landscape (Widely Believed) NP conp NP-c P conp-c Remark. Almost all problems that are known to be in NP conp are also known to be in P. One of few exceptions used to be the problem Primes (given a positive integer, decide whether it is prime), before it was shown to be in P by Agrawal, Kayal, and Saxena in 2002. 230

The Asymmetry of Yes and No Example: For a given TSP instance, does there exist a tour of cost 1573084? Pictures taken from www.tsp.gatech.edu 231

Partition Problem Partition Problem Given: positive integers a 1,..., a n Z >0. Task: decide whether there exists S {1,..., n} with a i = i S i {1,...,n}\S a i. Theorem 8.18. i The Partition problem can be solved in pseudopolynomial time. ii The Partition Problem is NP-complete. Proof of (i):... 232

Strongly NP-Complete Problems Let P = (X, Y ) be a decision problem, p : Z Z a polynomial function. Let X p X denote the subset of instances x where the absolute value of every (integer) number in the input x is at most p(size(x)). Let P p := (X p, Y X p ). Definition 8.19. A problem P NP is strongly NP-complete if P p is NP-complete for some polynomial function p : Z Z. Theorem 8.20. Unless P = NP, there is no pseudopolynomial algorithm for any strongly NP-complete problem P. Proof: Such an algorithm would imply that the NP-complete problem P p is in P and thus P = NP. 233

Strongly NP-Complete Problems (cont.) Examples: The decision problem corresponding to the Traveling Salesperson Problem (TSP) is strongly NP-complete. The Partition problem is NP-complete but not strongly NP-complete (unless P = NP, by Theorem 8.20). The 3-Partition problem below is strongly NP-complete. 3-Partition Problem Given: positive integers a 1,..., a 3n, B Z >0 with 3n i=1 a i = nb. Task: decide whether the index set {1,..., 3n} can be partitioned into n disjoint subsets S 1,..., S n such that i S j a i = B for j = 1,..., n. Remark. The 3-Partition problem remains strongly NP-complete if one adds the requirement that B/4 < a i < B/2 for all i = 1,..., 3n. Notice that each S j must contain exactly three indices in this case. 234

NP-Hardness Definition 8.21. Let P be an optimization or decision problem. i ii P is NP-hard if all problems in NP polynomially reduce to it. P is strongly NP-hard if P p is NP-hard for some polynomial function p. Remarks. A sufficient criterion for an optimization problem to be (strongly) NP-hard is the (strong) NP-completeness or (strong) NP-hardness of the corresponding decision problem. It is open whether each NP-hard decision problem P NP is NP-complete (difference between Turing and Karp reduction!). An example of an NP-hard decision problem that does not appear to be in NP (and thus might not be NP-complete) is the following: Given an instance of SAT, decide whether the majority of all truth assignments satisfy all clauses. 235

NP-Hardness (cont.) Theorem 8.22. Unless P = NP, there is no polynomial time algorithm for any NP-hard decision or optimization problem. Proof: Clear. Remark: It is thus widely believed that problems like the TSP and all other NP-hard optimization problems cannot be solved in polynomial time. 236

Complexity of Linear Programming As discussed in Chapter 4, so far no variant of the simplex method has been shown to have a polynomial running time. Therefore, the complexity of Linear Programming remained unresolved for a long time. Only in 1979, the Soviet mathematician Leonid Khachiyan proved that the so-called ellipsoid method earlier developed for nonlinear optimization can be modified in order to solve LPs in polynomial time. In November 1979, the New York Times featured Khachiyan and his algorithm in a front-page story. We will give a sketch of the ellipsoid method and its analysis. More details can, e. g., be found in the book of Bertsimas & Tsitsiklis (Chapter 8) or in the book Geometric Algorithms and Combinatorial Optimization by Grötschel, Lovász & Schrijver (Springer, 1988). 237

New York Times, Nov. 27, 1979 238