Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology.
The Problem The logarithmic barrier approach to solving a linear program dates back to the work of Fiacco and McCormick in 967 in their book Sequential Unconstrained Minimization Techniques, also known simply as SUMT. The method was not believed then to be either practically or theoretically interesting, when in fact today it is both! The method was re-born as a consequence of Karmarkar s interior-point method, and has been the subject of an enormous amount of research and computation, even to this day. In these notes we present the basic algorithm and a basic analysis of its performance. Consider the linear programming problem in standard form: T P : minimize c x s.t. Ax = b x 0, where x is a vector of n variables, whose standard linear programming dual problem is: D : maximize b T π s.t. A T π + s = c s 0. Given a feasible solution x of P and a feasible solution (π, s) of D, the duality gap is simply 2
T c x b T π = x s 0. T We introduce the following notation which will be very convenient for manipulating equations, etc. Suppose that x> 0. Define the matrix X to be the n n diagonal matrix whose diagonal entries are precisely the components of x. Then X looks like: x 0... 0 0 x 2... 0....... 0 0... x n Notice that X is positive definite, and so is X 2, which looks like: (x ) 2 0... 0 0 (x 2 ) 2... 0....... 0 0... (x n ) 2 Similarly, the matrices X and X 2 look like: (/x ) 0... 0 0 (/x 2 )... 0...... 0 0... (/x n ) 3
and /(x ) 2 0... 0 0 /(x 2 ) 2... 0....... 0 0... /(x n ) 2 Let us introduce a logarithmic barrier term for P. We obtain: n P () : minimize ct x ln(x j ) j= s.t. Ax = b x> 0. Because the gradient of the objective function of P () issimply c X e, (where e is the vector of ones, i.e., e = (,,,..., ) T ), the Karush-Kuhn- Tucker conditions for P () are: Ax = b, x > 0 c X e = A T π. () If we define s = X e, then Xs = e, 4
equivalently XSe = e, and we can rewrite the Karush-Kuhn-Tucker conditions as: Ax = b, x > 0 A T π + s = c (2) XSe e = 0. From the equations of (2) it follows that if (x, π, s) is a solution of (2), then x is feasible for P,(π, s) isfeasiblefor D, and the resulting duality gap is: T x s = e T XSe = e T e = n. This suggests that we try solving P () for a variety of values of as 0. However, we cannot usually solve (2) exactly, because the third equation group is not linear in the variables. We will instead define a β-approximate solution of the Karush-Kuhn-Tucker conditions (2). A β-approximate solution of P () is defined as any solution (x, π, s) of Ax = b, x > 0 A T π + s = c (3) Xs e β. Here the norm is the Euclidean norm. 5
Lemma. If ( x, π, s ) is a β-approximate solution of P () and β <, then x is feasible for P, ( π, s) is feasible for D, and the duality gap satisfies: n( β) c T x b T π = x T s n( + β). (4) Proof: Primal feasibility is obvious. To prove dual feasibility, we need to show that s 0. To see this, note that the third equation system of (3) implies that β x j s j β which we can rearrange as: ( β) x j s j ( + β). (5) Therefore x j s j ( β) > 0, which implies x j s j > 0, and so s j > 0. From (5) we have n n n n( β) = ( β) x j s j = x T s ( + β) = n( + β). j= j= j= 2 The Primal-Dual Algorithm Based on the analysis just presented, we are motivated to develop the following algorithm: Step 0. Initialization. Data is (x 0,π 0,s 0, 0 ). k = 0. Assume that (x 0,π 0,s 0 )isa β-approximate solution of P ( 0 ) for some known value of β that satisfies β < 2. 6
Step. Set current values. ( x, π, s ) =(x k,π k,s k ), = k. Step 2. Shrink. Set = α for some α (0, ). Step 3. Compute the primal-dual Newton direction. Compute the Newton step ( x, π, s) for the equation system (2) at (x, π, s) =( x, π, s ) for =, by solving the following system of equations in the variables ( x, π, s): A x = 0 A T π + s = 0 (6) S x + X s = XSe e. Denote the solution to this system by ( x, π, s). Step 4. Update All Values. (x,π,s)=( x, π, s )+( x, π, s) Step 5. Reset Counter and Continue. (x k+,π k+,s k+ )=(x,π,s). =. k k +. Go to Step. k+ Figure shows a picture of the algorithm. Some of the issues regarding this algorithm include: how to set the approximation constant β and the fractional decrease 3 parameter α. We will see that it will be convenient to set β = 40 and 8 α =. + n the derivation of the primal-dual Newton equation system (6) 5 7
= 0 = /0 = 0 = 00 Figure : A conceptual picture of the interior-point algorithm. 8
whether or not successive iterative values (x k,π k,s k )are β-approximate solutions to P ( k ) how to get the method started 3 The Primal-Dual Newton Step We introduce a logarithmic barrier term for P to obtain P (): n P () : minimize T x c x ln(x j ) j= s.t. Ax = b x> 0. Because the gradient of the objective function of P () issimply c X e, (where e is the vector of ones, i.e., e = (,,,..., ) T ), the Karush-Kuhn- Tucker conditions for P () are: Ax = b, x > 0 c X e = A T π. (7) We define s = X e, whereby equivalently Xs = e, XSe = e, 9
and we can rewrite the Karush-Kuhn-Tucker conditions as: Ax = b, x > 0 A T π + s = c (8) XSe e = 0. Let ( x, π, s ) be our current iterate, which we assume is primal and dual feasible, namely: A x = b, x> 0, A T π + s = c, s> 0. (9) Introducing a direction ( x, π, s), the next iterate will be ( x, π, s ) + ( x, π, s), and we want to solve: A( x + x) = b, x + x> 0 A T ( π + π)+( s + s) = c ( X + X)(S + S)e e = 0. Keeping in mind that ( x, π, s ) is primal-dual feasible and so satisfies (9), we can rearrange the above to be: A x = 0 A T π + s = 0 S x + X s = e XSe X Se. Notice that the only nonlinear term in the above system of equations in ( x, π, s) is term the term X Se in the last system. If we erase this term, which is the same as the linearized version of the equations, we obtain the following primal-dual Newton equation system: 0
A x = 0 A T π + s = 0 (0) S x + X s = e XSe. The solution ( x, π, s) of the system (0) is called the primaldual Newton step. We can manipulate these equations to yield the following formulas for the solution: [ ( ] ( ) S A T ) S A T x I X AX A x + S e, π ( ) ) S A T ( AX A x S e, () ( ( ) ) S A T s A T AX A x + S e. Notice, by the way, that the computational effort in these equations lies primarily in solving a single equation: ( ) ( ) AX S A T π = A x S e. Once this system is solved, we can easily substitute: s A T π x x + S e X s. S (2) However, let us instead simply work with the primal-dual Newton system (0). Suppose that ( x, π, s) is the (unique) solution of the primal-dual Newton system (0). We obtain the new value of the variables (x, π, s) by taking the Newton step:
(x,π,s)=( x, π, s)+( x, π, s). We have the following very powerful convergence theorem which demonstrates the quadratic convergence of Newton s method for this problem, with an explicit guarantee of the range in which quadratic convergence takes place. Theorem 3. (Explicit Quadratic Convergence of Newton s Method). Suppose that ( x, π, s ) is a β-approximate solution of P () and β< 2. Let ( x, π, s) be the solution to the primal-dual Newton equations (0), and let: (x,π,s)=( x, π, s )+( x, π, s). ( ) +β Then (x,π,s) is a β ( β) 2 -approximate solution of P (). 2 Proof: Our current point ( x, π, s ) satisfies: A x = b, x> 0 A T π + s = c (3) XSe e β. Furthermore the primal-dual Newton step ( x, π, s) satisfies: A x = 0 A T π + s = 0 (4) S x + X s = e XSe. Note from the first two equations of (4) that x T s = 0. From the third equation of (3) we have β s j x j + β, j =,...,n, (5) 2
which implies: ( β) ( β) xj and s j, j =,...,n. (6) s j x j As a result of this we obtain: ( β) X x 2 = ( β) x T X X x x T X S x ) = xt X ( e XSe X s ) = xt X ( e XSe X x e XSe X x β. From this it follows that β X x <. β Therefore x = x + x = X(e + X x) > 0. We have the exact same chain of inequalities for the dual variables: ( β) S s 2 = ( β) s T S S s s T S X s ( ) = s T S e XSe ( ) S x = s T S e XSe S s e XSe S s β. From this it follows that β S s <. β Therefore s = s + s = S(e + S s) > 0. Next note from (4) that for j =,...,n we have: x j s j =( x j + x j )( s j + s j )= x j s j + x j s j + x j s j + x j s j = + x j s j. 3
Therefore ( ) x j s j e X S e =. j From this we obtain: n x = j s j j= n x = j s j x j s j j= x j s j n x j s j j= x j s j e X S e e X S e ( + β) X x S s ( + β) ( ) β 2 ( + β) β 4 Complexity Analysis of the Algorithm Theorem 4. (Relaxation Theorem). Suppose that ( x, π, s ) is a β = 3 40 -approximate solution of P (). Let α = 8 + n 5 and let = α. Then ( x, π, s) is a β = 5 -approximate solution of P ( ). Proof: The triplet ( x, π, s) satisfies A x = b, x> 0, and A T π + s = c, and so it remains to show that Xs e 5. We have ( ) ( ) Xs e = Xs e = Xs e e α α α ( ) + α Xs e e α α 4
3 ( ) α 3 + n 40 40 + n = n =. α α α 5 Theorem 4.2 (Convergence Theorem). Suppose that (x 0,π 0,s 0 ) is a 3 β = 40 -approximate solution of P ( 0 ). Then for all k =, 2, 3,..., (x k,π k,s k ) 3 is a β = 40 -approximate solution of P ( k ). Proof: By induction, suppose that the theorem is true for iterates 0,, 2,..., k. 3 Then (x k,π k,s k )isa β = 40 -approximate solution of P ( k ). From the Relaxation Theorem, (x k,π k,s k )isa 5 -approximate solution of P (k+ ) where k+ = α k. From the Quadratic Convergence Theorem, (x k+,π k+,s k+ )isa β-approximate solution of P ( k+ )for ( ) + 2 5 3 β = ( )2 =. 5 40 5 Therefore, by induction, the theorem is true for all values of k. Figure 2 shows a better picture of the algorithm. Theorem 4.3 (Complexity Theorem). Suppose that (x 0,π 0,s 0 ) is a 3 β = 40 -approximate solution of P ( 0 ). In order to obtain primal and dual feasible solutions (x k,π k,s k ) with a duality gap of at most ɛ, one needs to run the algorithm for at most iterations. ( ) 43 (x 0 ) T s 0 k = 0 n ln 37 ɛ Proof: Let k be as defined above. Note that α = 8 = ( ). 8 5 + n 5 +8 n 0 n 5
= 80 x^ ~ x = 90 _ x = 00 Figure 2: Another picture of the interior-point algorithm. 6
Therefore This implies that k ( ) k 0 0. n ( c T x k b T π k =(x k ) T s k k n( + β) 0 n ( ) ( ) k ( ) 43 0 n 40 n (x 0 ) T s 0 37 40 n, ) k ( 43 40 n ) 0 ) from (4). Taking logarithms, we obtain ( ) ( ) T 43 0 ln(c x k b T π k ) k ln +ln (x 0 ) T s 0 n 37 ( ) k 43 0 +ln (x 0 ) T s 0 n 37 ( ) ( ) 43 (x 0 ) T s 0 43 ln +ln (x 0 ) T s 0 =ln(ɛ). 37 ɛ 37 Therefore c T x k b T π k ɛ. 5 An Implementable Primal-Dual Interior-Point Algorithm Herein we describe a more implementable primal-dual interior-point algorithm. This algorithm differs from the previous method in the following respects: We do not assume that the current point is near the central path. In fact, we do not assume that the current point is even feasible. The fractional decrease parameter α is set to α = conservative value of 8 α =. + n 5 0 rather than the 7
We do not necessarily take the full Newton step at each iteration, and we take different step-sizes in the primal and dual. Our current point is ( x, π, s) for which x> 0and s> 0, but quite possibly A x = b and/or A T π + s = c. We are given a value of the central path barrier parameter > 0. We want to compute a direction ( x, π, s) so that ( x + x, π + π, s + s) approximately solves the central path equations. We set up the system: () A( x + x) = b (2) A T ( π + π)+( s + s) = c (3) ( X + X)(S + S)e = e. We linearize this system of equations and rearrange the terms to obtain the Newton equations for the current point ( π, s ): x, () A x = b A x =: r (2) A T π + s = c A T π s =: r 2 (3) S x + X s = e XSe =: r 3 We refer to the solution ( x, π, s) to the above system as the primaldual Newton direction at the point ( x, π, s). It differs from that derived earlier only in that earlier it was assumed that r = 0 and r 2 =0. Given our current point ( x, π, s ) and a given value of, we compute the Newton direction ( x, π, s) and we update our variables by choosing primal and dual step-sizes α P and α D to obtain new values: ( x, π, s ) ( x + α P x, π + α D π, s + α D s). 8
In order to ensure that x> 0and s> 0, we choose a value of r satisfying 0 < r<(r =0.99 is a common value in practice), and determine α P and α D as follows: { { }} x j α P = min, r min x j <0 x j { { }} s j α D = min, r min. s j <0 s j These step-sizes ensure that the next iterate ( x, π, s) satisfies x>0 and s >0. 5. Decreasing the Path Parameter We also want to shrink at each iteration, in order to (hopefully) shrink the duality gap. The current iterate is ( π, x, s), and the current values satisfy: T x s n We then re-set to ( ) ( x T s ) 0 n, where the fractional decrease is user-specified. 0 5.2 The Stopping Criterion We typically declare the problem solved when it is almost primal feasible, almost dual feasible, and there is almost no duality gap. We set our 9
tolerance ɛ to be a small positive number, typically ɛ = 0 8, for example, and we stop when: () A x b ɛ (2) A T π + s c ɛ (3) s T x ɛ 5.3 The Full Interior-Point Algorithm. Given (x 0,π 0,s 0 ) satisfying x 0 > 0, s 0 > 0, and 0 > 0, and r satisfying 0 < r <, and ɛ > 0. Set k 0. 2. Test stopping criterion. Check if: () Ax k b ɛ (2) A T π k + s k c ɛ (3) (s k ) T x k ɛ. If so, STOP. If not, proceed. ( ) ( (x k ) T (s k ) ) 3. Set 0 n 4. Solve the Newton equation system: () A x = b Ax k =: r (2) A T π + s = c A T π k s k =: r 2 (3) S k x + X k s = e X k S k e =: r 3 20
5. Determine the step-sizes: P = { { }} xk j min, r min x j <0 x j D = { { }} sk j min, r min. s j <0 s j 6. Update values: (x k+,π k+,s k+ ) (x k + α P x, π k + α D π, s k + α D s). Re-set k k + and return to (b). 5.4 Remarks on Interior-Point Methods The algorithm just described is almost exactly what is used in commercial interior-point method software. A typical interior-point code will solve a linear or quadratic optimization problem in 25-80 iterations, regardless of the dimension of the problem. These days, interior-point methods have been extended to allow for the solution of a very large class of convex nonlinear optimization problems. 6 Exercises on Interior-Point Methods for LP. (Interior Point Method) Verify that the formulas given in () indeed solve the equation system (0). 2
2. (Interior Point Method) Prove Proposition 6. of the notes on Newton s method for a logarithmic barrier algorithm for linear programming. 3. (Interior Point Method) Create and implement your own interior-point algorithm to solve the following linear program: LP: T minimize c x s.t. Ax = b x 0, where ( ) ( ) 0 00 A =, b =, 0 50 and ( ) T c = 9 0 0 0. In running your algorithm, you will need to specify the starting point (x 0,π 0,s 0 ), the starting value of the barrier parameter 0, and the value of the fractional decrease parameter α. Use the following values and make eight runs of the algorithm as follows: Starting Points: (x 0,π 0,s 0 ) = ((,,, ), (0, 0), (,,, )) (x 0,π 0,s 0 ) = ((00, 00, 00, 00), (0, 0), (00, 00, 00, 00)) Values of 0 : 0 = 000.0 0 =0.0 Values of α: α = 0 n α = 0 4. (Interior Point Method) Suppose that x> 0 is a feasible solution of the following logarithmic barrier problem: 22
P (): n minimize ct x ln(x j ) j= s.t. Ax = b x 0, andthatwewishtotestif x is a β-approximate solution to P () for a given value of. The conditions for x to be a β-approximate solution to P () are that there exist values (π, s) for which: A x = b, x> 0 A T π + s = c (7) Xs e β. Construct an analytic test for whether or not such (π, s) exist by solving an associated equality-constrained quadratic program (which has a closed form solution). HINT: Think of trying to minimize Xs e over appropriate values of (π, s). 5. Consider the logarithmic barrier problem: n P () : min ct x ln x j j= s.t. Ax = b x> 0. Suppose x is a feasible point, i.e., x> 0and A x = b. Let X = diag( x). a. Construct the (primal) Newton direction d at x. Show that d can be written as: ( ) d = X(I XA T (AX 2 A T ) AX) e Xc. b. Show in the above that P =(I XA T (AX 2 A T ) AX) isaprojection matrix. 23
c. Suppose A T π + s = c and s> 0 for some π, s. Show that d can also be written as: ( ) d = X(I XA T (AX 2 A T ) AX) e Xs. d. Show that if there exists ( π, s) with A T π + s = c and s> 0, and e s X 4, then X d 4, and then the Newton iterate satisfies x + d > 0. 24