Network Optimization - PDF Free Download

Network Optimization Fall 2014 - Math 514 Lecturer: Thomas Rothvoss University of Washington, Seattle Last changes: December 4, 2014

Contents 1 The MaxCut problem 5 1.1 Semidefinite programs......................................... 6 1.2 The rounding.............................................. 8 2 The Interior-Point Method 11 2.1 Introduction............................................... 11 2.2 Setting up the LP............................................ 12 2.3 The potential function......................................... 12 2.4 The algorithm.............................................. 13 2.5 When can we stop?........................................... 14 2.6 Rescaling................................................. 15 2.7 A primal update............................................. 16 2.8 The dual update............................................. 17 3 Matroid Intersection 19 3.1 Introduction............................................... 19 3.2 The exchange lemma.......................................... 20 3.3 The rank function............................................ 21 3.4 An reverse exchange lemma...................................... 22 3.5 The algorithm.............................................. 23 3

Chapter 1 The MaxCut problem In this chapter, we discuss one of the most prominent problems in optimization and theoretical computer science, which is the MaxCut problem: MAXCUT Input: An undirected graph G = (V,E) Goal: Find the cut S V that maximizes the number δ(s) of cut edges. For example, in the depicted graph below, the dashed cut S cuts 3 edges, which should be optimal. i S j It turns out that this problem is NP-hard, which is somewhat surprising given that we already know that the MinCut problem is solvable in polynomial time. In other words, flipping the objective function does make this problem hard to solve. Hastad proved in 1997 that it is even NP-hard to find a cut that is within 16 17 94% of the optimum. Since we won t be able to always find the optimum solution efficiently, we will try to cut at least as many edges as we can. To warm up, let us see what to expect here. Lemma 1.1. In any graph G = (V,E) there is a cut that cuts at least half of the edges. Proof. We take a random cut S V. In other words, for each node i V we flip independently a fair coin and if it comes up head (that means with probability 1 2 ) we add it to S. Then an edge e = {u, v} has a probability of exactly 1 2 to end up in the cut δ(s). By linearity of expectation, E[ δ(s) ] = 1 2 E. In particular, there must be at least one cut that cuts at least half of the edges (otherwise, the expectation could not be 1 2 E ). In particular, the random cut satisfies E[ δ(s) ] 1 2 E 1 2 optimum we say that it gives a 2- approximation to the MaxCut problem. One can also find deterministically a cut that cuts half of the edges using a greedy algorithm. 5

For beating the factor of 2, we might wonder whether an LP could be useful. A natural integer linear programming formulation for the maxcut problem would be as follows: max e E x e x (u,v) min{y u + y v,2 (y u + y v )} (u, v) E y u, x e {0,1} u V e E Here we have variables x e telling us whether edge e is cut and we have variables y u telling us whether node u is in S. In fact, this program models MaxCut exactly. We might wonder, what happens if we replace the integrality constraints by y u, x e [0,1] and solve that linear program. The problem is that this LP is completely useless: for any graph, the solution y u := 1 2 and x e := 1 pretends to cut all edges, that means the objective function value is E. But there are many graphs (e.g. complete graphs) where only half of the edges can be cut. We say that the integrality gap of this LP is 2, since LP optimum optimum 2. It seems that in order to improve over the factor of 2, we need a completely new idea. 1.1 Semidefinite programs We call a symmetric matrix X R n n positive semidefinite if all its Eigenvalues are non-negative (in this chapter, we only consider symmetric matrices). In this case, we write X 0. From your basic linear algebra course, you might remember the following facts: Lemma 1.2. For a symmetric matrix X R n n, the following is equivalent a) a T X a 0 a R n. b) X is positive semidefinite. c) There exists a matrix U so that X = UU T. d) There are vectors u 1,...,u n R n with X i j = u i,u j for all i, j {1,...,n} 1. Proof. A basic fact from linear algebra is that a symmetric real matrix is diagonalizable, that means we can write X = W DW T where W consists of the orthonormal Eigenvectors v 1,..., v n R n and D is a diagonal matrix with the Eigenvalues λ 1,...,λ n on the diagonal. a) b). 0 v T i X v i = λ i v i 2 2 = λ i b) c). X = W DW T = UU T for U := W D. c) d). Choose u i as i th row of U. c) a). For any a R n, a T X a= Ua 2 2 0. 1 The dimension of the vectors u i could be bounded by rank(x ) n as well. 6

The ingenious insight of Goemans and Williamson (1994) was to solve MaxCut with a semidefinite program (instead of a linear program; note that X is restricted to symmetric matrices): max 1 2 (i,j ) E (1 X i j ) (SDP) X 0 X i i = 1 i V X R n n where n := V. Note that this is not a linear program, but the set of feasible solutions X are convex. To see this, consider the set of PSD matrices Using Lemma 1.2 we can rewrite this as S n + := {X Rn n X symmetric, X 0} S n + := {X Rn n X symmetric, X, aa T 0 a R n } In other words, the set S n + is defined by infinitely many linear constraints. In particular this implies that S n + is convex. In fact, Sn + is a non-polyhedral, convex cone. For example for n = 2, using the symmetry of the matrices, one can draw a 3-dimensional picture (note that my drawing skills are limited, so the actual set S n + is more round than the picture). γ Let s see that the SDP is actually meaningful: Lemma 1.3. The SDP is a relaxation of MaxCut. Proof. Suppose that S V is a cut in our graph. We define y { 1,1} V with { 1 if i S y i := 1 if i S α β ( ) α β 0 β γ Then X := y y T is a positive semidefinite matrix with X i i = y 2 = 1 and the objective function is i 1 2 (i,j ) E (1 y i y j ) = {(i, j ) E : y i y j } = δ(s). }{{} {0,2} 7

We mentioned that linear programs can be solved in polynomial time (even though we never gave an algorithm for that). It turns out that SDPs can be solved in polynomial time up to any desired accuracy with a variant of the Interior Point Method or with the Ellipsoid Method. The latter method works for all convex problems as long as one is able to find a violated inequality (if there is any). If a matrix X is not positive semidefinite, then there is a negative Eigenvector a R n (which can be found efficiently using methods from linear algebra) and aa T, X = a T X a< 0 is the violated inequality. 1.2 The rounding From now on, we take it for granted that the SDP can be solved. We want to show how a cut can be extracted out of the matrix X. By Lemma 1.2.d), we know that there are vectors u 1,...,u n R n so that X i j = u i,u j forall i, j V (and those vectors can be found via the Cholesky decomposition). Actually, this means that we have an equivalent solution to the vector program max 1 2 (1 u i,u j ) (i,j ) E u i 2 = 1 i V u i R n There is a nice physical interpretation of this vector program: it tries to embed the vectors into the sphere of radius 1, while for any edge (i, j ) E we have a repulsion force between u i and u j. In particular if we would add many parallel edges between a pair i, j, their vectors u i,u j would be pushed away to be almost antipodal. For example, the optimal embedding of the 5-cycle into the 1-dimensional sphere looks as follows: Graph G 2 SDP solution: u 4 3 u 2 1 0 u 1 4 u 5 5 u 3 Now, consider the following hyperplane rounding algorithm: (1) Take a random unit vector a R n (2) Define S := {i V a, v i 0} 8

u j u i a Geometrically speaking, the algorithm takes a random hyperplane through the origin and takes all nodes i whose vectors are on one side as S. Let us analyze the probability for a particular edge to be cut: Lemma 1.4. For any edge (i, j ) E one has Pr[(i, j ) δ(s)]= 1 π arccos( u i,u j ). Proof. The angle between the vectors is exactly θ := arccos( u i,u j ). Consider the two-dimensional space U := span{u i,u j }. Note that the projection of a uniform direction a in R n onto U is just a 2- dimensional uniform direction. So, we can actually consider what the algorithm would in a two-dimensional plane. Here, it is not difficult to see that Pr[u i,u j separated]= 2θ 2π. In fact, both nodes are separated exactly if the vector a that is orthogonal to a falls into the gray shaded area: u j 0 θ u i a While the chance of cutting edge (i, j ) is 1 π arccos( u i, v j ), the contribution of the edge to the SDP objective function is 1 2 (1 u i, v j ). A computer plot shows that the ratio between those quantities is 1 π arccos(t) 1 2 (1 t) 0.87 t [ 1,1] 1.0 0.8 0.6 0.4 0.2 1 π arccos(t) 1 2 (1 t) 0.87 t 1.0 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1.0 9

It follows that E[ δ(s) ] 0.87 SDP value 0.87 optimum value. In other words, the approximation algorithm cuts at least 87% of the number of edges that the optimum solution cuts. There are instances where the algorithm does not perform better. In fact, the algorithm is conjectured to be optimal for MaxCut. To be precise, the Unique games conjecture would imply that it is NP-hard to compute a solution that is better than the above constant of 0.87. Exercise 1. Suppose that the graph G = (V,E) is bipartite. Prove that the random cut S that is produced by the rounding algorithm will cut all the edges with probability 1. In other words, the SDP algorithm will find the correct partitions of the bipartite graph. Exercise 2. Argue why for a small enough constant ε>0 the following is true: given a graph G = (V,E) in which the maximum cut cuts (1 ε) E many edges, the Goemans-Williamson algorithm returns a cut that cuts in expectation at least 0.99 E many edges. Exercise 3. Consider MaxCut with the additional constraint that specified pairs of vertices have to be on the same/opposite sides of the cut. Formally, we are specified two sets of pairs of vertices, S 1 and S 2. The pairs in S 1 need to be separated, and those in S 2 need to be on the same size of the cut sought. Under these constraints, the problem is to find a maximum cut. Assume that the constraints provided by S 1 and S 2 and not inconsistent. Give a semidefinite program and the equivalent vector program for it. Then argue how the Goemans-Williamson algorithm will again give a 0.87-approximation algorithm for this problem. 10

Chapter 2 The Interior-Point Method 2.1 Introduction In this chapter, we want to discuss one variant of the Interior Point Method, which is a polynomial time algorithm to solve linear programs, say of the form min{cx Ax = b; x 0}. To be more precise, we discuss the variant suggested by Ye with simplifications of Freund. In the exposition we closely follow the lecture notes of Michel Goemans that you can find under http://www-math.mit.edu/~goemans/notes-lp.ps. In particular, see those notes for any missing details. Despite the fact that linear programming is such a basic and important problem and it has many nice features such as local optimum = global optimum and LP duality, it is surprisingly difficult to obtain an algorithm that runs provable in polynomial time. The oldest method to solve LPs is the Simplex method which moves from a vertex to a neighboring vertex that has a better objective function value. But for most implementations of the simplex algorithm one can cook up worst case polytopes where the algorithm needs exponentially many steps. It is still a major open problem whether the simplex method can be modified to run in polynomial time. As the name suggests, the interior point method instead moves through the interior of a polyhedron in order to find the optimum point. Probably the most intuitive algorithm would just move into direction of the objective function c as far as possible. But in a general situation we might hit a boundary very quickly and it would be impossible to argue that we make any significant progress. x c optimum The main ingredient of interior point methods is a magic potential function that guides us towards the optimum and has the property that in every step one can make a good chunk of progress. The variant that we discuss will solve a linear program in time O(n 3.5 L) where L is the number of bits describing the input data. 11

2.2 Setting up the LP Linear programming has the nice property that it is highly self-reducible. For example if you want to solve a linear program min{c T x Ax = b; x 0}, then we can instead solve min{c T x + λm Ax = b(1 λ); (x,λ) 0}; if we choose M large enough, then the optima will coincide, but for the 2nd LP we do know a trivial feasible solution (x,λ)=(0,1). Hence, in order to solve general linear programming, it actually suffices to find an optimum solution to a system for which we at least know some feasible solution. We consider a particular pair of primal and dual LP, which is (P) : min{c T x Ax = b; x 0} (D) : max{b T y A T y+ s= c; s 0} Note that the optimum values of both LPs coincide (given feasibility) and for any feasible solutions x and (y, s) one has c T x b T y. The vector s denotes the slack of the dual. We call a triple (x, y, s) a primaldual solution, if x is feasible for (P) and (y, s) is feasible for (D). We call such a triple optimal if x is an optimal solution for (P) and (y, s) is an optimal solution for (D). In the following, n denotes the number of columns of A, which will be the dominating factor for the running time. First observe that it is easy to check whether a given solution is optimal: Lemma 2.1. Let (x, y, s) be a primal-dual solution. Then the triple is optimal if and only if x T s= 0. Proof. Simply note that the duality gap of the solution is exactly c T x b T y = c T x } x T {{ A T } y = x T (c A T y) = x T s }{{} =b T and we know that the duality gap is 0 if and only if the primal-dual solution is optimal. Observe that for any given primal-dual feasible pair (x, y, s), we can estimate from the duality gap x T s how close we are to an optimum. We can assume that A has full rank (otherwise, we can eliminate redundant equations). Note that if s is known, the solution y with A T y+ s = c is unique (because of the rank assumption). Hence we can consider a solution simply as pair (x, s) and omit y. We will design an algorithm that starts from some feasible primal-dual solution (x, s) and then finds an optimum one. =s 2.3 The potential function Finally, we choose the potential function. For a feasible pair (x, s), define G(x, s) := (n+ n n) ln(x T s) ln(x }{{} j s j ) j=1 =:q Here, q > 0 is a parameter that has to be chosen larger than n it turns out that the choice of q = n+ n minimizes the number of iterations, so we make that choice right now. The function G is also called a logarithmic barrier function. If G(x, s) is extremely small, say G(x, s), then in particular it means that the duality gap x T s is small and all the x j s j must be bounded away from 0. To get some intuition behind this choice, consider f (t 1, t 2 ) := 3ln(t 1 + t 2 ) ln(t 1 ) ln(t 2 ). Then the function is minimized if (t 1, t 2 ) (0,0), but the origin will need to be approached while keeping some distance to the boundaries t 1 0 and t 2 0. 12

t 2 f decreases (0,0) t 1 Let us check that a polynomially small potential function implies that the duality gap is exponentially small. Lemma 2.2. Suppose that (x, y, s) is a primal-dual feasible solution with G(x, s) n, then the duality gap is x T s< e. Proof. First we should understand why we picked q = n+ n. We need the factor n so that the duality gap dominates the rest. More precisely, we claim that n n ln(x T s) ln(x j s j ). j=1 ( ) This follows from Jensen s inequality and the fact that the logarithm is concave: ( n ( ln s j x j ) ln(n)=ln j=1 E j [n] ) [s j x j ] Jensen s ineq +concavity of ln E [ln(x j s j )] j [n] Now we see that G(x, s)=(n+ n n)ln(x T s) ln(x j s j ) n ln(x T s) j=1 and rearranging gives ln(x T s) 1 n G(x, s). 2.4 The algorithm As we already remarked, one can transform a linear program so that there are at least some obvious solutions. In particular, one can easily find a feasible pair (x 0, s 0 ) so that G(x 0, s 0 ) O( nl). This is one of the points where the choice of q := n+ n is relevant. The appendix of Goemans lecture notes gives the somewhat boring details, which we will omit here. To give a better overview, we want to state the whole algorithm right now; we will give the details in the remaining sections. In particular, we will need to discuss how the primal updates and the dual updates are made and how step (6) works. 13

Ye s interior point algorithm Input: A linear program min{ cx Ãx b} with encoding length L Output: An optimum solution x (1) Transform the LP into a pair of primal min{cx Ax = b; x 0} and a dual max{b T y A T y+ s= c; s 0} so that we know a solution (x 0, s 0 ) with G(x 0, s 0 ) O( nl) (2) WHILE G(x t, s t )> 2 nl DO (3) Compute the gradient g of G(x t, s t ) and the projection d into the nullspace of A { primal update: change x t } (4) Perform either dual update: change s t and call the new vector (x t+1, s t+1 ) (5) t := t+ 1 (6) Extract a vertex x to (P) that is at least as good as the current x t. We will see that the updates can be performed, so that the potential function always decreases by an constant. Then the algorithm takes only O( n L) iterations; each takes not more than O(n 3 ) running time. A not very accurate visualization of the algorithm (restricted to the primal view) would be as follows (P) x 0 x 1 x t c step (6) x We begin by discussing step (6). 2.5 When can we stop? As our input is rational, it is somewhat clear that if we are close enough at an optimum solution, then we can obtain the actual optimum. Lemma 2.3. Suppose that the total number of bits encoding A,b,c is L and all those data is integral. If ( x, s) is a primal-dual feasible pair and x T s< 2 2L, then any vertex x with c T x c T x is optimal. Proof. First, let us prove the following fact (which we prefer to show for LPs in inequality form): given any vertex x of Ax b (with A,b integral), one has ( x p1 = q,..., p ) n with p i, q integer and p i, q 2 L q where L is the number of bits describing A,b. To see this, recall that there is a square submatrix B A and a sub-vector d b so that B x = d has exactly the vertex x as solution. We can use Cramer s rule to write x i = det(b i ) det(b) 14

where B i is matrix B with the i th column replaced by d. Moreover using the Hadamard bound, det(b) n B i 2 2 L i=1 where B i is the i th row of B and we use that the product of numbers with total encoding length L can be upper bounded by 2 L. Similarly, we can get det(b i ) 2 L. Now, suppose we have 2 different vertices x 1, x 2 with c T x 1 cx 2. Then let q 1, q 2 be the common denominators, so that x 1 Zn q 1 and x 2 Zn q 2. Then q 1 q 2 is a common denominator and since c is integral, we must have c T x 1 c T x 2 1 q 1 q 2 2 2L. There is the following general argument that given any point x P in a polytope, we can easily find a vertex that is not worse in terms of an objective function. Lemma 2.4. Let P = {x Ax = b; x 0} be any polyhedron, let x P and c be an objective function for which the LP min{cx x P} is bounded. Then one can find a vertex x of P with cx c x in polynomial time. Proof. Let I := {i x i = 0} be the indices that are already 0 in the starting solution. Pick any vector d 0 with Ad = 0; d i = 0 i I. If there is no such d, then x is already vertex and we are done. Suppose c T d 0 (otherwise flip the sign of d). Walk into the direction of d until we hit some inequality x i 0 (if P is unbounded in direction d, then c T d = 0 since by assumption the LP is bounded. Then walk into direction d since P is in the positive orthant, not both directions can be unbounded). We update I and repeat until we have a vertex that is not worse in terms of the objective function as the starting point. 2.6 Rescaling It will be more convenient to describe one iteration in the interior point method if the primal point x is not too close to any boundary, i.e. if x j 0 for all j. To ensure this, we can apply an affine rescaling. The idea is very simple. If ( x, ȳ, s) is our current point, then we rescale x j := x j x j = 1, s j := s j x j, A i j := A i j x j, b i := b i, c j := c j x j x rescaling 1 A x = b Ax = b The important observation is that x j s j = x j s j, hence in particular the potential function does not change and we have G(1, s )= G( x, s). In other words, if we can find a step that improves the objective function after rescaling, then we can do the same for the original problem. But the advantage is that we now have x = 1, which will simplify the reasoning. 15

2.7 A primal update Let us quickly remind ourselves of some basic optimization facts. If we have a function f : R n R then the gradient is a vector f (x ) with ( f f (x )=,..., f x 1 x n ) x In particular, f (x ) points into the direction in which f has the steepest increase from x as base point. Another interpretation is that f (x + x) f (x ) f (x ) x is the linear approximation for f at x. If we want to optimize the potential function G, it should be useful to know it s gradient g := x G(x, s) (1,s) = x (q ln(x T s) n j=1 ) (( q ln(s j x j ) = x T s s j 1 ) x j )j [n] = q (1,s) 1 T s s 1 using that d d z ln(a z)= 1 z and d a d z ln(a z+ b)= az+b. The interpretation of the components of g is that s is the direction to minize x T s and the direction 1 pushes us away from the non-negativity constaints. As we want to decrease the potential function G(x, s), we should try to move from x = 1 into direction of g. But of course, we need to make sure that x remains feasible, so we should move into the direction of the projection of g onto the nullspace of A. g g d 0 d {x Ax = 0} We can actually get an explicit definition of that projection more important than the exact formula is the fact that it can be computed in time O(n 3 ). Lemma 2.5. The projection of g on the nullspace {x Ax = 0} is the vector d = (I A(A A T ) 1 A)g. Proof. What should a projection satisfy? The product with A should give A(g A(A A T ) 1 Ag )= Ag A A(A A T ) 1 A g = 0. }{{} =I The second criterion is that for any w with Aw = 0 the projection matrix should leave the vector invariant, that means (I A(A A T ) 1 A)w = }{{} I w A(A A T ) 1 }{{} Aw = w =w 0 If we would move x directly into direction g, without a proof it would be clear that the potential function has to decrease. It is plausible that if the projection g is close enough, then this is sufficient for an improvement. 16

Lemma 2.6. Let (1, s) be a feasible primal-dual pair with d 2 0.4. Then ( x, s) defined by x := 1 d 4 d 2 and s := s is another feasible primal-dual pair and G( x, s) G(1, s) constant. Proof. From 1 we move only 1 4 in some direction, hence x > 0. Also, d was in the nullspace of A, hence again A x = b is feasible for the primal LP. Now, let us inspect how the potential function changes. We have ( G 1 1 ) ( (1 1 4 d d, s G(1, s)= q ln 2 d) T s ) n 4 d 2 1 T s j=1 ( ln 1 d ) ( j = q ln 1 d T s 1 ) 4 d 2 4 d 2 1 T s n j=1 ( ln 1 d ) j 4 d 2 ( ) using that ln(s j (1 δ)) ln(s j )=ln(1 δ). Now we upperbound the first part by ln(1 x) x and the second part by ln(1 x) x+ x2 2 (3/4) and we get ( ) qd T s 4 d 2 1 T s + n = j=1 1 ( 1 q ) T 4 d 2 1 T s s }{{} = g d j 4 d 2 + d } {{ } d 2 2 n d 2 j j=1 16 d 2 2 2 3 4 }{{} = 1 24 + 1 24 = d 2 4 + 1 24 < constant here we used that g T d = d 2 2 as d is a projection of g. For understanding this calculation it may help to imagine that a First Order approximation is ( q ) G(1+ x, s) G(1, s) x 1 T s s 1 We did choose an update of x := d 4 d 2 (and there was the 2nd order error term that we had to bound). 2.8 The dual update If d 2 0, then a primal update might not bring enough progress. In that case, we want to make a dual update. Note that the roles of x and s potential function G(x, s) is actually symmetric. Now, if we cannot move x into the direction of the gradient, then maybe we should move s into the direction of the gradient. While x has to be moved along the nullspace, s has to be moved orthogonal to the nullspace. But the projection of the gradient g has to be large for one of those spaces. The dual update will be x := 1 and s := s 1T s q (g d)= s 1T s q g ({}}{ q s ) 1 T s 1 d = 1T s (d+ 1) q First, let us verify that the dual update decreases the duality gap later, this will also translate into a decrease in the potential function. 17

Lemma 2.7. If d 2 0.4, then one has 1T s 1 1 T 0.3 s n for n large enough. Proof. First, note that n d 2 1 T 1 T s s 1 T s = q {}}{ 1T (d+ 1) d 1 +n 1 T s n+ n 0.4 {}}{ n+ d 2 1 0.3 n+ 1 n Lemma 2.8. Suppose that (1, s) is a primal-dual feasible pair with d 2 0.4. Then ( x, s) with x := 1 and s := s 1T s (g d) q is a primal-dual feasible pair with G(1, s) G(1, s) constant. Proof. Again, moving into g d leaves s in the row space of the matrix, hence there is still a ỹ so that A T ỹ+ s = c. We need to argue that s > 0. This will follow implicitly from the calculation where we show that the potential function improves, as no term will shoot through the roof. We have G(1, s) G(1, s) = (n+ ( 1T s ) n n n) ln 1 T + ln(s j ) ln( s j ) s = The missing calculation is j=1 j=1 ( 1T s ) ( n ( 1 T s )) n ln }{{ 1 T + ln(s j ) n ln s } j=1 n }{{} 0.3 with ln(1+x) x 0 by Jensen } =1+ 1T d n {{ } 1 T d ( 1T s ) n ) ln( s j ) const n j=1 }{{} to show: 2 15 ( + n ln ( 1T s ) n n ln ln( s j ) s= 1T s q (1+d) ( 1 T (1+d) ) n = n ln ln(1+d j ) n j=1 }{{ n } j=1}{{} where we use that the scalar 1T s q is cancelled out. This finishes the missing piece for the analysis. d j d 2 j 1 5 2 3 d 2 2 }{{} 0.4 2 5 6 2 15 18

Chapter 3 Matroid Intersection This chapter is a reproduction of a section in Lex Schrijver s lecture notes, with somewhat more details. 3.1 Introduction In a previous chapter of this course, we learned what a matroid is. It is a pair M = (X,I) where X is called the groundset and I are subsets of X that are also called the independent sets. Additionally, the matroid has to satisfy the following three axioms: 1. Non-emptyness: I 2. Monotonicity: If Y I and Z Y, then Z I 3. Exchange property: If Y, Z I with Y < Z, then there is an x Z /Y so that Y {x} I Examples for matroids are: The set of forests in an undirected graph form a graphical matroid. If v 1,..., v n are vectors in a vector space, then M = ([n],i) withi = {I [n] {v i } i I linearly independent} is a linear matroid. A partition matroid with ground set X can be obtained as follows: take any partition X = B 1... B m and select numbers d i {0,..., B i }. Then M = (X,I) withi := {I : I B i d i for all i = 1,...,m} is a matroid. We already learned that one can use the greedy algorithm to find a maximum weight independent set. In this chapter, we will see that a way more complex problem also can be solved in polynomial time: MATROID INTERSECTION Input: Matroid M 1 = (X,I 1 ), M 2 = (X,I 2 ) on the same groundset Goal: Find max{ I : I I 1 I 2 } To understand that this is a non-trivial problem, we want to argue that it contains maximum bipartite matching as a special case. To see this, take any bipartite graph G = (V, E). Suppose that V = U W with U = {u 1,...,u U } and W = {w 1,..., w W } are both sides. Then we can define two matroids that both have the edge set E as ground set as follows: take M 1 = (E,I 1 ) as the partition matroid with partitions 19

δ(u 1 ),...,δ(u U ), all with parameter d i := 1. Similarly, we introduce M 2 = (E,I 2 ) as partition matroid with partitions δ(w 1 ),...,δ(w W ). Now the matroid intersection problem asks to select as many edges as possible, where in each neighborhood δ(u i ) and δ(w j ) we select at most one edge. This is exactly maximum bipartite matching. See the figure below for an example: U W e 1 B w1 e 1 e 1 w 1 B u1 e 2 e 2 u 1 u 2 e 2 e 3 e 4 e 5 w 2 B u2 e 3 e 4 B w2 e 3 e 4 w 3 e 5 e 5 bipartite graph G = (V,E) M 1 M 2 3.2 The exchange lemma For example if we have two spanning trees T 1,T 2 in a graph, then the exchange property implies that for any e T 1, there exists some edge f (e) T 2 so that (T 1 \ {e}) f (e) is again a spanning tree. Now we will see that a stronger property is true: the map f : T 1 T 2 can be chosen to be bijective. Lemma 3.1. Let M = (X,I) be a matroid and let Y, Z I be disjoint independent sets of the same size. Define a bipartite exchange graph H = (Y Z,E) with E = {(y, z) : (Y \ y) z I}. Then H contains a perfect matching. Proof. Suppose for the sake of contradiction that H has no perfect matching. From Hall s condition we know that there must be subsets S Y and S Z so that all edges incident to S must have their partner in S and S < S. Y Z S S Since S < S and S,S are both independent sets, there is an element z S so that S {z} I. We can keep adding elements from Y to S {z} until we get a set U Y {z} with U = Y. 20

Y Z S z S x U There is exactly one element in Y \U ; we call it x. Then (Y /x) {z}= U I and (x, z) E would be an edge a contradiction. We will use that exchange graph more intensively later. Formally, for a matroid M = (X,I) and an independent set Y I, we can define H(M,Y ) as the bipartite graph with partitions Y and X \ Y where we have an edge between y Y and x X \ Y if (Y \ y) {x} I. 3.3 The rank function Again, let M = (X,I) be a matroid. Recall that an inclusionwise maximal independent set is called a basis. Moreover, all bases have the same size which is also called the rank of a matroid. One can generalize this to the rank function r M : 2 X Z 0 which is defined by r M (S) := max{ Y : Y S and Y I} which for a subset S X of the groundset, tells how many independent elements one can select from S. Now suppose we have two matroids M 1 = (X,I 1 ) and M 2 = (X,I 2 ) over the same groundset. The rank function will be useful to decide at some point that we have found the largest joint independent set. Let us make the following observation: Lemma 3.2. Let M 1 = (X,I 1 ), M 2 = (X,I 2 ) with rank functions r 1 and r 2. Then for any independent set Y I 1 I 2 and any set U X one has Y r 1 (U )+r 2 (X /U ). Proof. We have Y = U Y + (X /U ) Y r }{{}}{{} 1 (U )+r 2 (X /U ). r 1 (U ) r 2 (X /U ) using that Y is an independent set in both matroid. Later in the algorithm, we will see that this inequality is tight for some Y and U. As a side remark, for partition matroids in bipartite graphs, the lemma coincides with the fact that a vertex cover is always an upper bound to the size of any matching. 21

3.4 An reverse exchange lemma We just saw that the exchange graph has a perfect matching between independent sets of the same size. We now show the converse, namely that a unique perfect matching between an independent set Y and any set Z implies that Z is also independent. In the following, we will consider perfect matchings in the graph H(M,Y ) between Y Z. What we mean is a perfect matching N, matching nodes in Y \ Z to nodes in Z \ Y and each edge (y, z) N satisfies (Y \ y) {z} I. Y Z Lemma 3.3. Let M = (X,I) be a matroid and let Y I be an independent set and let Z X be any set with Z = Y. Suppose that there exists a unique perfect matching N in H(M,Y ) between Y Z. Then Z I. Proof. Let E = {(y, z) (Y \Z ) (Z \Y ) (Y /y) {z} I} be all the exchange edges between Y \Z and Z \Y. Claim: E has a leaf 1 y Y /Z. Proof of claim: By assumption there is a perfect matching N E. Start at any node w Y Z. If you are on the right side Z \ Y, then move along a matching edge in N ; if we are on the left hand side Y \ Z, take a non-matching edge. If we every revisit a node, then we have found an even length path C E that alternates between matching edges and non-matching edges. Hence N C is again a perfect matching, which contradicts the uniqueness. That implies that our path will not revisit a node, but that it will get stuck at some point. It cannot get stuck at a node in Z /Y because there is always a matching edge incident. Hence it can only get stuck at a node y Y /Z that is only incident to one edge (y, z) and that edge must be in N. Y /Z y Z /Y z E Let z denote the element with (y, z) N. Note that Z := (Z \z) {y} satisfies Y Z = Y Z 2 and there is still exactly one perfect matching between Y Z (which is N \ {(y, z)}). Hence we can apply induction and assume that Z I. 1 Recall that a leaf is a degree-1 node. 22

Y y Z z Z We know that r ((Y Z ) \ y) r ((Y \ y) {z}) = Y. By the matroid exchange property, there is some element x (Y Z )/y so that S := (Z /y) {x} is an independent set of size Y. If x = z then Z = S I and we are done. Otherwise, x Y /Z. S Y x y z Z As S > Y \y, there must be an exchange edge between y and a node in S/Y. That contradicts the choice of y. 3.5 The algorithm Now, suppose that we have two matroids M 1 = (X,I 1 ) and M 2 = (X,I 2 ) over the same ground set. Our algorithm starts with the independent set Y := and then augments it iteratively. Suppose we already have some joint independent set Y I 1 I 2. We will show how to either find another set Y I 1 I 2 with Y = Y +1 or decide that Y is already optimal. Let us define sets X 1 := {y X \ Y Y {y} I 1 } and X 2 := {y X \ Y Y {y} I 2 } In other words, X 1 denotes the elements that could be added to the independent set Y so that we would still have an independent set in M 1. We define a directed graph H = (X,E) as follows: for all y Y and x X /Y (y, x) E (Y /y) {x} I 1 (x, y) E (Y /y) {x} I 2 Let us check what this graph does for bipartite graphs (and M 1, M 2 are the partition matroids modelling both sides). In this case Y corresponds to a matching, X 1 are edges whose left-side node is unmatched by Y and X 2 are edges whose right-side node is unmatched. We also observe that a Y -augmenting path corresponds to a directed path in H. e 1 X 1 e 1 e 2 e 2 e 3 e 4 e 3 e 4 Y e 5 original graph X 2 e 5 exchange graph H With a bit care, we can use the concept of augmenting paths also for general matroid. 23

Lemma 3.4. Suppose there exists a directed path z 0, y 1, z 1,..., y m, z m starting at a vertex z 0 X 1 and ending at a node z m X 2. If that is a shortest path, then Y := (Y \ {y 1,..., y m }) {z 0,..., z m } I 1 I 2 Proof. We will show that Y I 1, the other inclusion follows by symmetry. On the figure below, on the left hand side, we consider the directed path and on the right hand side, we consider only edges E of the exchange graph H(M 1,Y ) that run between Y \Z and Z \Y for Z := (Y \{y 1,..., y m }) {z 1,..., z m }=Y \y 0. X 1 z 0 z 1 y 1 z 1 y 1 z 2 y 2 Y z 2 y 2 Y X 2 z 3 y 3 z 3 y 3 Z Note that the edges {(z i, y i ) : i = 1,...,m} from the directed path form a perfect matching on Y Z. While E may contain more edges than that, it does not contain a coord, which is an edge (y i, z j ) with j > i. The reason is that in this case our X 1 -X 2 path would not have been the shortest possible one as we could have used the coord as shortcut. Now, consider the complete cordless graph E := {(y i, z j ) : i j }. Then this graph does have only one perfect matching. In particular, (y 1, z 1 ) has to be in a matching then apply induction. z 1 y 1 z 2 y 2 z 3 y 3 z 4 y 4 As the matching on Y Z is unique, by Lemma 3.3 we have Z = Y /z 0 I 1. We know that r M1 (Y Y ) r M1 (Y {z 0 }) Y +1 since z 0 X 1 is one of the M 1 -augmenting elements. One the other hand r M1 (Y Y /{y 0 }) Y as none of the other elements of Y is in X 1 (here we use again that we have a shortest path). Hence, the only element that could possibly augment Y /y 0 to an independent set of size Y +1 is y 0 itself. Lemma 3.5. Suppose there is no path from a node in X 1 to a node in X 2. Then Y is optimal. In particular we can find a subset U X so that Y =r M1 (Y X )+r M2 (Y (X \U )). Proof. Let U := {i X : X 1 i path in H} (or maybe more intuitively, X \U are the nodes that are reachable from X 1 ). 24

X 1 y Y U x X 2 First, we claim that r M1 (U )= Y U. One direction is easy: r M1 (U ) r M1 (U Y )= U Y. For the other direction, suppose for the sake of contradiction that r M1 (U )> Y U and hence there is some x U so that (Y U ) {x} is an independent set of size Y U +1. There are two case depending on whether or not x also increases the rank of Y itself: Case r M1 (Y {x})= Y +1. Then x X 1 U, which is a contradiction to the choice of U. Case: r M1 (Y {x})= Y. Take a maximal independent set Z with (Y U ) {x} Z Y {x}. Then there is exactly one element y Y /U, so that Z = (Y /y) {x}. This implies that we have would contain a directed edge (y, x). Then the node x U is reachable from a element y U, which contradicts the definition of U. From the contradiction we obtain that indeed r M1 (U )= Y U. Similarly one can show that r M2 (X /U )= Y (X /U ) (which we skip for symmetry reasons). Overall, we have found a set U so that Y = Y U + Y (X \U ) =r M1 (U )+r M2 (X \U ). It follows that: Theorem 3.6. Matroid intersection can be solved in polynomial time. Proof. Start from Y := and iteratively construct the directed exchange graph; compute shortest X 1 -X 2 paths and augment Y as long as possible. The matroids that we have seen so far, all had some explicit representation. Note that the matroid intersection algorithm would work also in the black box model, where the only information that we have about the matroids is given by a so-called independence oracle. This is method that receives a set Y X and simply answers whether or not this is an independent set. Our algorithm provides a nice min-max formula for the size of joint independent sets: Theorem 3.7 (Edmond s matroid intersection theorem). For any matroids M 1 = (X,I 1 ) and M 2 = (X,I 2 ) one has max{ S : S I 1 I 2 }=min U X {r M 1 (X )+r M2 (X \U )} Proof. We saw the inequality already in Lemma 3.2. When the matroid intersection algorithm terminates, then it has found a set U providing equality. 25