Chapter 14 Linear Programming: Interior-Point Methods

Chapter 14 Linear Programming: Interior-Point Methods In the 1980s it was discovered that many large linear programs could be solved efficiently by formulating them as nonlinear problems and solving them with various modifications of nonlinear algorithms such as Newton s method. One characteristic of these methods was that they required all iterates to satisfy the inequality constraints in the problem strictly, so they became known as interior-point methods. By the early 1990s, one class primal-dual methods had distinguished itself as the most efficient practical approach, and proved to be a strong competitor to the simplex method on large problems. These methods are the focus of this chapter. Interior-point methods arose from the search for algorithms with better theoretical properties than the simplex method. As we mentioned in Chapter??, the simplex method can be inefficient on certain pathological problems. Roughly speaking, the time required to solve a linear program may be exponential in the size of the problem, as measured by the number of unknowns and the amount of storage needed for the problem data. For almost all practical problems, the simplex method is much more efficient than this bound would suggest, but its poor worst-case complexity motivated the development of new algorithms with better guaranteed performance. The first such method was the ellipsoid method, proposed by Khachiyan [11], which finds a solution in time that is at worst polynomial in the problem size. Unfortunately, this method approaches its worst-case bound on all problems and is not competitive with the simplex method in practice. Karmarkar s projective algorithm [10], announced in 1984, also has the polynomial complexity property, but it came with the added inducement of good practical behavior. The initial claims of excellent performance on large linear programs were never fully borne out, but the announcement prompted a great deal of research activity and a wide array of methods described by such labels as affine-scaling, logarithmic barrier, potential-reduction, path-following, 1

2 CHAPTER 14. INTERIOR-POINT METHODS primal-dual, and infeasible interior-point. All are related to Karmarkar s original algorithm, and to the log-barrier approach described in Chapter??, but many of the approaches can be motivated and analyzed independently of the earlier methods. Interior-point methods share common features that distinguish them from the simplex method. Each interior-point iteration is expensive to compute and can make significant progress towards the solution, while the simplex method usually requires a larger number of inexpensive iterations. Geometrically speaking, the simplex method works its way around the boundary of the feasible polytope, testing a sequence of vertices in turn until it finds the optimal one. Interior-point methods approach the boundary of the feasible set only in the limit. They may approach the solution either from the interior or the exterior of the feasible region, but they never actually lie on the boundary of this region. In this chapter, we outline some of the basic ideas behind primal-dual interiorpoint methods, including the relationship to Newton s method and homotopy methods and the concept of the central path. We sketch the important methods in this class, and give a comprehensive convergence analysis of a long-step pathfollowing method. We describe in some detail a practical predictor-corrector algorithm proposed by Mehrotra, which is the basis of much of the current generation of software. 14.1 Primal-Dual Methods Outline We consider the linear programming problem in standard form; that is, min c T x, subject to Ax = b, x 0, 14.1) where c and x are vectors in IR n, b is a vector in IR m,anda is an m n matrix. The dual problem for 14.1) is max b T λ, subject to A T λ + s = c, s 0, 14.2) where λ is a vector in IR m and s is a vector in IR n. As shown in Chapter??, solutions of 14.1),14.2) are characterized by the Karush-Kuhn-Tucker conditions??), which we restate here as follows: A T λ + s = c, 14.3a) Ax = b, 14.3b) x i s i = 0, i =1, 2,...,n, 14.3c) x, s) 0. 14.3d) Primal-dual methods find solutions x,λ,s ) of this system by applying variants of Newton s method to the three equalities in 14.3) and modifying the

14.1. PRIMAL-DUAL METHODS 3 search directions and step lengths so that the inequalities x, s) 0aresatisfied strictly at every iteration. The equations 14.3a),14.3b),14.3c) are only mildly nonlinear and so are not difficult to solve by themselves. However, the problem becomes much more difficult when we add the nonnegativity requirement 14.3d). This nonnegativity condition is the source of all the complications in the design and analysis of interior-point methods. To derive primal-dual interior-point methods we restate the optimality conditions 14.3) in a slightly different form by means of a mapping F from IR 2n+m to IR 2n+m : F x, λ, s) = AT λ + s c Ax b =0, 14.4a) XSe x, s) 0, 14.4b) where X =diagx 1,x 2,...,x n ), S =diags 1,s 2,...,s n ), 14.5) and e =1, 1,...,1) T. Primal-dual methods generate iterates x k,λ k,s k )that satisfy the bounds 14.4b) strictly, that is, x k > 0ands k > 0. This property is the origin of the term interior-point. By respecting these bounds, the methods avoid spurious solutions, that is, points that satisfy F x, λ, s) = 0 but not x, s) 0. Spurious solutions abound, and do not provide useful information about solutions of 14.1) or 14.2), so it makes sense to exclude them altogether from the region of search. Many interior-point methods actually require the iterates to be strictly feasible; thatis,eachx k,λ k,s k ) must satisfy the linear equality constraints for the primal and dual problems. If we define the primal-dual feasible set F and strictly feasible set F o by F = {x, λ, s) Ax = b, A T λ + s = c, x, s) 0}, 14.6a) F o = {x, λ, s) Ax = b, A T λ + s = c, x, s) > 0}, 14.6b) the strict feasibility condition can be written concisely as x k,λ k,s k ) F o. Like most iterative algorithms in optimization, primal-dual interior-point methods have two basic ingredients: a procedure for determining the step and a measure of the desirability of each point in the search space. The procedure for determining the search direction procedure has its origins in Newton s method for the nonlinear equations 14.4a). Newton s method forms a linear model for F around the current point and obtains the search direction x, λ, s) by solving the following system of linear equations: Jx, λ, s) x s = F x, λ, s), λ

4 CHAPTER 14. INTERIOR-POINT METHODS where J is the Jacobian of F. See Chapter?? for a detailed discussion of Newton s method for nonlinear systems.) If the current point is strictly feasible that is, x, λ, s) F o ), the Newton step equations become 0 AT I A 0 0 x 0 λ = 0. 14.7) S 0 X s XSe Usually, a full step along this direction would violate the bound x, s) 0. To avoid this difficulty, we perform a line search along the Newton direction so that the new iterate is x, λ, s)+α x, λ, s), for some line search parameter α 0, 1]. Unfortunately, we often can take only a small step along this direction α 1) before violating the condition x, s) > 0. Hence, the pure Newton direction 14.7), which is known as the affine scaling direction, often does not allow us to make much progress toward a solution. Most primal-dual methods modify the basic Newton procedure in two important ways: 1. They bias the search direction toward the interior of the nonnegative orthant x, s) 0, so that we can move further along the direction before one of the components of x, s) becomes negative. 2. They prevent the components of x, s) frommoving tooclose tothe boundary of the nonnegative orthant. To describe these modifications, we need to introduce the concept of the central path, and of neighborhoods of this path. The Central Path The central path C is an arc of strictly feasible points that plays a vital role in primal-dual algorithms. It is parametrized by a scalar τ>0, and each point x τ,λ τ,s τ ) Csolves the following system: A T λ + s = c, 14.8a) Ax = b, 14.8b) x i s i = τ, i =1, 2,...,n, 14.8c) x, s) > 0. 14.8d) These conditions differ from the KKT conditions only in the term τ on the right-hand side of 14.8c). Instead of the complementarity condition 14.3c), we require that the pairwise products x i s i have the same positive) value for all indices i. From 14.8), we can define the central path as C = {x τ,λ τ,s τ ) τ>0}.

14.1. PRIMAL-DUAL METHODS 5 It can be shown that x τ,λ τ,s τ ) is defined uniquely for each τ>0ifandonly if F o is nonempty. Another way of defining C is to use the mapping F defined in 14.4) and write F x τ,λ τ,s τ )= 0 0, x τ,s τ ) > 0. 14.9) τe The equations 14.8) approximate 14.3) more and more closely as τ goes to zero. If C converges to anything as τ 0, it must converge to a primal-dual solution of the linear program. The central path thus guides us to a solution along a route that maintains positivity of the x and s components and decreases the pairwise products x i s i, i =1, 2,...,n to zero at the same rate. Primal-dual algorithms take Newton steps toward points on C for which τ > 0, rather than pure Newton steps for F. Since these steps are biased toward the interior of the nonnegative orthant defined by x, s) 0, it usually is possible to take longer steps along them than along the pure Newton affine scaling) steps, before violating the positivity condition. To describe the biased search direction, we introduce a centering parameter σ [0, 1] and a duality measure µ defined by µ = 1 n n i=1 x i s i = xt s n, 14.10) which measures the average value of the pairwise products x i s i. By fixing τ = σµ and applying Newton s method to the system 14.9), we obtain 0 AT I A 0 0 x 0 λ = 0. 14.11) S 0 X s XSe+ σµe The step x, λ, s) is a Newton step toward the point x σµ,λ σµ,s σµ ) C,at which the pairwise products x i s i are all equal to σµ. In contrast, the step 14.7) aims directly for the point at which the KKT conditions 14.3) are satisfied. If σ = 1, the equations 14.11) define a centering direction, anewtonstep toward the point x µ,λ µ,s µ ) C, at which all the pairwise products x i s i are identical to the current average value of µ. Centering directions are usually biased strongly toward the interior of the nonnegative orthant and make little, if any, progress in reducing the duality measure µ. However, by moving closer to C, they set the scene for a substantial reduction in µ on the next iteration. At the other extreme, the value σ = 0 gives the standard Newton affine scaling) step 14.7). Many algorithms use intermediate values of σ from the open interval 0, 1) to trade off between the twin goals of reducing µ and improving centrality. A Primal-Dual Framework With these basic concepts in hand, we can define a general framework for primal-dual algorithms.

6 CHAPTER 14. INTERIOR-POINT METHODS Framework 14.1 Primal-Dual) Given x 0,λ 0,s 0 ) F o for k =0, 1, 2,... Solve 0 AT I A 0 0 S k 0 X k x k λ k s k = 0 0 X k S k e + σ k µ k e, 14.12) Set where σ k [0, 1] and µ k =x k ) T s k /n; end for). x k+1,λ k+1,s k+1 )=x k,λ k,s k )+α k x k, λ k, s k ), 14.13) choosing α k so that x k+1,s k+1 ) > 0. The choices of centering parameter σ k and step length α k are crucial to the performance of the method. Techniques for controlling these parameters, directly and indirectly, give rise to a wide variety of methods with different theoretical properties. So far, we have assumed that the starting point x 0,λ 0,s 0 ) is strictly feasible and, in particular, that it satisfies the linear equations Ax 0 = b, A T λ 0 + s 0 = c. All subsequent iterates also respect these constraints, because of the zero righthand side terms in 14.12). For most problems, however, a strictly feasible starting point is difficult to find. Infeasible-interior-point methods require only that the components of x 0 and s 0 be strictly positive. The search direction needs to be modified so that it improves feasibility as well as centrality at each iteration, but this requirement entails only a slight change to the step equation 14.11). If we define the residuals for the two linear equations as the modified step equation is 0 AT I A 0 0 S 0 X r b = Ax b, r c = A T λ + s c, 14.14) x λ = s r c r b XSe+ σµe. 14.15) The search direction is still a Newton step toward the point x σµ,λ σµ,s σµ ) C. It tries to correct all the infeasibility in the equality constraints in a single step. If a full step is taken at any iteration, that is, α k =1forsomek), the residuals r b and r c become zero, and all subsequent iterates remain strictly feasible. We discuss infeasible-interior-point methods further in Section 14.3. Central Path Neighborhoods and Path-Following Methods Path-following algorithms explicitly restrict the iterates to a neighborhood of the central path C and follow C to a solution of the linear program. By preventing

14.1. PRIMAL-DUAL METHODS 7 central path neighborhood C Figure 14.1: Central path, projected into space of primal variables x, showing a typical neighborhood N the iterates from coming too close to the boundary of the nonnegative orthant, they ensure that search directions calculated from each iterate make at least a minimal amount of progress toward the solution. A key ingredient of most optimization algorithms is a measure of the desirability of each point in the search space. In path-following algorithms, the duality measure µ defined by 14.10) fills this role. By forcing the duality measure µ k to zero as k, we ensure that the iterates x k,λ k,s k )comecloser and closer to satisfying the KKT conditions 14.3). The two most interesting neighborhoods of C are the so-called 2-norm neighborhood N 2 θ) defined by N 2 θ) ={x, λ, s) F o XSe µe 2 θµ}, 14.16) for some θ [0, 1), and the one-sided -norm neighborhood N γ) defined by N γ) ={x, λ, s) F o x i s i γµ all i =1, 2,...,n}, 14.17) for some γ 0, 1]. Typical values of the parameters are θ =0.5andγ =10 3.) If a point lies in N γ), each pairwise product x i s i must be at least some small multiple γ of their average value µ. This requirement is actually quite modest, and we can make N γ) encompass most of the feasible region F by choosing γ close to zero. The N 2 θ) neighborhood is more restrictive, since certain points in F o do not belong to N 2 θ) no matter how close θ is chosen to its upper bound of 1. By keeping all iterates inside one or other of these neighborhoods, pathfollowing methods reduce all the pairwise products x i s i to zero at more or less the same rate. Figure 14.1 shows the projection of the central path C onto the primal variables for a typical problem, along with a typical neighborhood N. Path-following methods are akin to homotopy methods for general nonlinear equations, which also define a path to be followed to the solution. Traditional

8 CHAPTER 14. INTERIOR-POINT METHODS homotopy methods stay in a tight tubular neighborhood of their path, making incremental changes to the parameter and chasing the homotopy path all the way to a solution. For primal-dual methods, this neighborhood is conical rather than tubular, and it tends to be broad and loose for larger values of the duality measure µ. It narrows as µ 0, however, because of the positivity requirement x, s) > 0. The algorithm we specify below, a special case of Framework 14.1, is known as a long-step path-following algorithm. This algorithm can make rapid progress because of its use of the wide neighborhood N γ), for γ close to zero. It depends on two parameters σ min and σ max, which are upper and lower bounds on the centering parameter σ k. The search direction is, as usual, obtained by solving 14.12), and we choose the step length α k to be as large as possible, subject to the requirement that we stay inside N γ). Here and in later analysis, we use the notation x k α),λ k α),s k α)) µ k α) def = x k,λ k,s k )+α x k, λ k, s k ), 14.18a) def = x k α) T s k α)/n. 14.18b) Algorithm 14.2 Long-Step Path-Following) Given γ, σ min, σ max with γ 0, 1), 0 <σ min <σ max < 1, and x 0,λ 0,s 0 ) N γ); for k =0, 1, 2,... Choose σ k [σ min,σ max ]; Solve 14.12) to obtain x k, λ k, s k ); Choose α k as the largest value of α in [0, 1] such that x k α),λ k α),s k α)) N γ); 14.19) Set x k+1,λ k+1,s k+1 )=x k α k ),λ k α k ),s k α k )); end for). Typical behavior of the algorithm is illustrated in Figure 14.2 for the case of n = 2. The horizontal and vertical axes in this figure represent the pairwise products x 1 s 1 and x 2 s 2, so the central path C is the line emanating from the origin at an angle of 45. A point at the origin of this illustration is a primal-dual solution if it also satisfies the feasibility conditions 14.3a), 14.3b), and 14.3d).) In the unusual geometry of Figure 14.2, the search directions x k, λ k, s k ) transform to curves rather than straight lines. As Figure 14.2 shows and the analysis confirms), the lower bound σ min on the centering parameter ensures that each search direction starts out by moving away from the boundary of N γ) and into the relative interior of this neighborhood. That is, small steps along the search direction improve the centrality. Larger values of α take us outside the neighborhood again, since the error in approximating the nonlinear system 14.9) by the linear step equations 14.11) becomes more pronounced as α increases. Still, we are guaranteed that a certain minimum step can be taken before we reach the boundary of N γ), as we show in the analysis below.

14.2. ANALYSIS OF ALGORITHM 14.2 9 x 2 s 2 1 iterates 0 central path C 2 3 boundary of neighborhood N x 1 s 1 Figure 14.2: Iterates of Algorithm 14.2, plotted in xs) space We present a complete analysis of Algorithm 14.2, which makes use of surprisingly simple mathematical foundations, in Section 14.2 below. With judicious choices of σ k, this algorithm is fairly efficient in practice. With a few more changes it becomes the basis of a truly competitive method, as we discuss in Section 14.3. An infeasible-interior-point variant of Algorithm 14.2 can be constructed by generalizing the definition of N γ) to allow violation of the feasibility conditions. In this extended neighborhood, the residual norms r b and r c are bounded by a constant multiple of the duality measure µ. By squeezing µ to zero, we also force r b and r c to zero, so that the iterates approach complementarity and feasibility at the same time. 14.2 Analysis of Algorithm 14.2 We now present a comprehensive analysis of Algorithm 14.2. Our aim is to show that given some small tolerance ɛ>0, and given some mild assumptions about the starting point x 0,λ 0,s 0 ), the algorithm requires On log ɛ ) iterations to identify a point x k,λ k,s k )forwhichµ k ɛ, whereµ k =x k ) T s k /n. For small ɛ, thepointx k,λ k,s k ) satisfies the primal-dual optimality conditions except for perturbations of about ɛ in the right-hand side of 14.3c), so it is usually very close to a primal-dual solution of the original linear program. The On log ɛ ) estimate is a worst-case bound on the number of iterations required; on practical problems, the number of iterations required appears to increase only slightly as

10 CHAPTER 14. INTERIOR-POINT METHODS n increases. The simplex method may require 2 n iterations to solve a problem with n variables, though in practice it usually requires a modest multiple of maxm, n) iterations, where m is the row dimension of the constraint matrix A in 14.1). As is typical for interior-point methods, the analysis builds from a purely technical lemma to a powerful theorem in just a few pages. We start with the technical lemma Lemma 14.1 and use it to prove Lemma 14.2, a bound on the vector of pairwise products x i s i, i =1, 2,...,n. Theorem 14.3 finds a lower bound on the step length α k and a corresponding estimate of the reduction in µ on iteration k. Finally, Theorem 14.4 proves that On log ɛ ) iterations are required to identify a point for which µ k <ɛ, for a given tolerance ɛ 0, 1). Lemma 14.1 Let u and v be any two vectors in IR n with u T v 0. Then where UVe 2 3/2 u + v 2, U =diagu 1,u 2,...,u n ), V =diagv 1,v 2,...,v n ). Proof. First, note that for any two scalars α and β with αβ 0, we have from the algebraic-geometric mean inequality that 1 αβ α + β. 14.20) 2 Since u T v 0, we have 0 u T v = u i v i + u i v i u i v i, 14.21) u i v i = u iv i 0 u iv i<0 i P where we partitioned the index set {1, 2,...,n} as Now, i M P = {i u i v i 0}, M = {i u i v i < 0}. UVe = [u i v i ] i P 2 + [u i v i ] i M 2) 1/2 1/2 [u i v i ] i P 2 1 + [u iv i ] i M 1) 2 since 2 1 1/2 2 [u i v i ] i P 1) 2 from 14.21) [ ] 2 1 4 u i + v i ) 2 from 14.20) i P 1 =2 3/2 u i + v i ) 2 i P 2 3/2 n i=1 u i + v i ) 2 2 3/2 u + v 2,

14.2. ANALYSIS OF ALGORITHM 14.2 11 completing the proof. Lemma 14.2 If x, λ, s) N γ), then X Se 2 3/2 1 + 1/γ)nµ. Proof. It is easy to show using 14.12) that x T s =0. 14.22) By multiplying the last block row in 14.12) by XS) 1/2 and using the definition D = X 1/2 S 1/2,weobtain D 1 x + D s =XS) 1/2 XSe+ σµe). 14.23) Because D 1 x) T D s) = x T s = 0, we can apply Lemma 14.1 with u = D 1 x and v = D s to obtain X Se = D 1 X)D S)e 2 3/2 D 1 x + D s 2 from Lemma 14.1 =2 3/2 XS) 1/2 XSe+ σµe) 2 from 14.23). Expanding the squared Euclidean norm and using such relationships as x T s = nµ and e T e = n, weobtain [ ] n X Se 2 3/2 x T s 2σµe T e + σ 2 µ 2 1 x i=1 i s i [ 2 3/2 x T s 2σµe T e + σ 2 µ 2 n ] since x i s i γµ γµ [ ] 2 3/2 1 2σ + σ2 nµ γ 2 3/2 1 + 1/γ)nµ, as claimed. Theorem 14.3 Given the parameters γ, σ min, and σ max in Algorithm 14.2, there is a constant δ independent of n such that µ k+1 1 δ ) µ k, 14.24) n for all k 0. Proof. We start by proving that x k α),λ k α),s k α) ) N γ) for all α [ 0, 2 3/2 γ 1 γ 1+γ ] σ k, 14.25) n

12 CHAPTER 14. INTERIOR-POINT METHODS where x k α),λ k α),s k α) ) is defined as in 14.18). It follows that the step length α k is at least as long as the upper bound of this interval, that is, α k 2 3/2 σ k n γ 1 γ 1+γ. 14.26) For any i =1, 2,...,n, we have from Lemma 14.2 that x k i s k i X k S k e 2 2 3/2 1 + 1/γ)nµ k. 14.27) Using 14.12), we have from x k i sk i γµ k and 14.27) that x k i α)s k i α) = x k i + α x k ) i s k i + α s k ) i = x k i sk i + α x k i sk i + ) sk i xk i + α 2 x k i sk i x k i sk i 1 α)+ασ kµ k α 2 x k i sk i γ1 α)µ k + ασ k µ k α 2 2 3/2 1 + 1/γ)nµ k. By summing the n components of the equation S k x k + X k s k = X k S k e + σ k µ k e the third block row from 14.12)), and using 14.22) and the definition of µ k and µ k α) see 14.18)), we obtain µ k α) =1 α1 σ k ))µ k. From these last two formulas, we can see that the proximity condition x k i α)s k i α) γµ k α) is satisfied, provided that γ1 α)µ k + ασ k µ k α 2 2 3/2 1 + 1/γ)nµ k γ1 α + ασ k )µ k. Rearranging this expression, we obtain ασ k µ k 1 γ) α 2 2 3/2 nµ k 1 + 1/γ), which is true if α 23/2 n σ kγ 1 γ 1+γ. We have proved that x k α),λ k α),s k α) ) satisfies the proximity condition for N γ) whenα lies in the range stated in 14.25). It is not difficult to show that x k α),λ k α),s k α) ) F o for all α in the given range see the exercises). Hence, we have proved 14.25) and therefore 14.26). We complete the proof of the theorem by estimating the reduction in µ on the kth step. Because of 14.22), 14.26), and the last block row of 14.11), we have µ k+1 = x k α k ) T s k α k )/n = [ x k ) T s k + α k x k ) T s k +s k ) T x k) + α 2 k x k ) T s k] 14.28) /n = µ k + α k x k ) T s k ) /n + σ k µ k 14.29) = 1 α k 1 σ k ))µ k 14.30) 1 23/2 n γ 1 γ ) 1+γ σ k1 σ k ) µ k. 14.31)

14.2. ANALYSIS OF ALGORITHM 14.2 13 Now, the function σ1 σ) is a concave quadratic function of σ, soonanygiven interval it attains its minimum value at one of the endpoints. Hence, we have σ k 1 σ k ) min {σ min 1 σ min ),σ max 1 σ max )}, for all σ k [σ min,σ max ]. The proof is completed by substituting this estimate into 14.31) and setting δ =2 3/2 γ 1 γ 1+γ min {σ min1 σ min ),σ max 1 σ max )}. We conclude with the complexity result. Theorem 14.4 Given ɛ>0 and γ 0, 1), suppose the starting point x 0,λ 0,s 0 ) N γ) in Algorithm 14.2 has µ 0 1/ɛ κ 14.32) for some positive constant κ. Then there is an index K with K = On log 1/ɛ) such that µ k ɛ, for all k K. Proof. By taking logarithms of both sides in 14.24), we obtain log µ k+1 log 1 δ ) +logµ k. n By repeatedly applying this formula and using 14.32), we have log µ k k log 1 δ ) +logµ 0 k log 1 δ ) + κ log 1 n n ɛ. The following well-known estimate for the log function, log1 + β) β, for all β> 1, implies that log µ k k δ ) + κ log 1 n ɛ. Therefore, the convergence criterion µ k ɛ is satisfied if we have k δ ) + κ log 1 log ɛ. n ɛ This inequality holds for all k that satisfy so the proof is complete. k K =1+κ) n δ log 1 ɛ,

14 CHAPTER 14. INTERIOR-POINT METHODS 14.3 Practical Primal-Dual Algorithms Practical implementations of interior-point algorithms follow the spirit of the theory in the previous section, in that strict positivity of x k and s k is maintained throughout and each step is a Newton-like step involving a centering component. However, several aspects of theoretical algorithms are typically ignored, while several enhancements are added that have a significant effect on practical performance. We discuss in this section the algorithmic enhancements that are present in most interior-point software. Practical implementations typically do not enforce membership of the central path neighborhoods N 2 and N defined in the previous section. Rather, they calculate the maximum steplengths that can be taken in the x and s variables separately) without violating nonnegativity, then take a steplength of slightly less than this maximum. Given an iterate x k,λ k,s k )withx k,s k ) > 0, and a step x k, λ k, s k ), it is easy to show that the quantities α pri defined as follows: k,max and αdual k,max α pri k,max α dual k,max def = min xk i i: x k i <0 x k, 14.33a) i def = min sk i i: s k i <0 s k, 14.33b) i are the largest values of α for which x k + α x k 0ands k + α s k 0, respectively. Practical algorithms then choose the steplengths to lie in the open intervals defined by these maxima, that is, α pri k 0,α pri k,max ), αdual k 0,α dual k,max), and then obtain a new iterate by setting x k+1 = x k + α pri k xk, λ k+1,s k+1 )=λ k,s k )+α dual k λ k, s k ). If the step x k, λ k, s k ) rectifies the infeasibility in the KKT conditions 14.3a) and 14.3b), that is, A x k = r k b = Axk b), A T λ k + s k = r k c = AT λ k + s k c), it is easy to show that the infeasibilities at the new iterate satisfy ) r k+1 b = 1 α pri k rb k, rk+1 c = 1 α dual ) k r k c, 14.34) where r k+1 b and r k+1 c are defined in the obvious way. A second feature of practical algorithms is their use of corrector steps, a development made practical by Mehrotra [14]. These steps compensate for the linearization error made by the Newton affine-scaling) step in modeling the equation x i s i = 0, i = 1, 2,...,n see 14.3c)). Consider the affine-scaling

14.3. PRACTICAL PRIMAL-DUAL ALGORITHMS 15 direction x, λ, s) defined by 0 AT I A 0 0 xaff λ aff S 0 X s aff = r c r b XSe, 14.35) where r b and r c are defined in 14.14)). If we take a full step in this direction, we obtain x i + x aff i )s i + s aff = x i s i + x i s aff i i ) + s i x aff i + x aff i s aff i = x aff i s aff i. That is, the updated value of x i s i is x aff i s aff i rather than the ideal value 0. We can solve the following system to obtain a step x cor, λ cor, s cor )that corrects for this deviation from the ideal: 0 AT I A 0 0 xcor λ cor = S 0 X s cor 0 0 X aff S aff e. 14.36) In many cases, the combined step x aff, λ aff, s aff )+ x cor, λ cor, s cor ) does a better job of reducing the duality measure than does the affine-scaling step alone. Like theoretical algorithms such as the one analysed in Section??, practical algorithms make use of centering steps, with an adaptive choice of the centering parameter σ k. Mehrotra [14] introduced a highly successful scheme for choosing σ k based on the effectiveness of the affine-scaling step at iteration k. Roughly, if the affine-scaling step multiplied by a steplength to maintain nonnegativity of x and s) reduces the duality measure significantly, there is not much need forcentering,soasmallervalueofσ k is appropriate. Conversely, if not much progress can be made along this direction before reaching the boundary of the nonnegative orthant, a larger value of σ k will ensure that the next iterate is more centered, so a longer step will be possible from this next point. Specifically, Mehrotra s scheme calculates the maximum allowable steplengths along the affine-scaling direction 14.35) as follows: α pri aff α dual aff def = min def = min 1, min i: x aff i <0 1, min i: s aff i <0 x i x aff i s i s aff i ), 14.37a) ), 14.37b) and then defines µ aff to be the value of µ that would be obtained by using these steplengths, that is, µ aff =x + α pri aff xaff ) T s + α dual aff saff )/n. 14.38)

16 CHAPTER 14. INTERIOR-POINT METHODS The centering parameter σ is then chosen as follows: ) 3 µaff σ =. 14.39) µ To summarize, computation of the search direction in Mehrotra s method requires the solution of two linear systems. First, the system 14.35) is solved to obtain the affine-scaling direction, also known as the predictor step. This step is used to define the right-hand side for the corrector step see 14.36)) and to calculate the centering parameter from 14.38), 14.39). Second, the search direction is calculated by solving 0 AT I A 0 0 x r c λ = r b. 14.40) S 0 X s XSe X aff S aff e + σµe Note that the predictor, corrector, and centering contributions have been aggregated on the right-hand side of this system. The coefficient matrix in both linear systems 14.35) and 14.40) is the same. Thus, the factorization of the matrix needs to be computed only once, and the marginal cost of solving the second system is relatively small. Once the search direction has been calculated, the maximum steplengths along this direction are computed from 14.33), and the steplengths actually used are chosen to be α pri k =min1,η k α pri max), α dual k =min1,η k α dual max), 14.41) where η k [0.9, 1.0) is chosen so that η k 1 as the iterates approach the primal-dual solution, solution, to accelerate the asymptotic convergence. For details on how η is chosen and on other elements of the algorithm such as the choice of starting point x 0,λ 0,s 0 ), see Mehrotra [14].) We now specify the algorithm in the usual format. Algorithm 14.3 Predictor-Corrector Algorithm Mehrotra [14])) Given x 0,λ 0,s 0 )withx 0,s 0 ) > 0; for k =0, 1, 2,... Set x, λ, s) =x k,λ k,s k ) and solve 14.35) for x aff, λ aff, s aff ); Calculate α pri aff, αdual aff,andµ aff as in 14.37) and 14.38); Set centering parameter to σ =µ aff /µ) 3 ; Solve 14.40) for x, λ, s); Calculate α pri k and α dual k from 14.41); Set end for). x k+1 = x k + α pri k x, λ k+1,s k+1 ) = λ k,s k )+α dual k λ, s);

14.3. PRACTICAL PRIMAL-DUAL ALGORITHMS 17 No convergence theory is available for Mehrotra s algorithm, at least in the form in which it is described above. In fact, there are examples for which the algorithm diverges. Simple safeguards could be incorporated into the method to force it into the convergence framework of existing methods. However, most programs do not implement these safeguards, because the good practical performance of Mehrotra s algorithm makes them unnecessary. When presented with a linear program that is infeasible or unbounded, the algorithm above typically diverges, with the infeasibilities r k b and rk c and/or the duality measure µ k going to. Since the symptoms of infeasibility and unboundedness are fairly easy to recognize, interior-point codes contain heuristics to detect and report these conditions. More rigorous approaches for detecting infeasibility and unboundedness make use of the homogeneous self-dual formulation; see Wright [24, Chapter 9] and the references therein for a discussion. A more recent approach that applies directly to infeasible-interior-point methods is described by Todd [20] Solving the Linear Systems Most of the computational effort in primal-dual methods is taken up in solving linear systems such as 14.15), 14.35), and 14.40). The coefficient matrix in these systems is usually large and sparse, since the constraint matrix A is itself large and sparse in most applications. The special structure in the step equations allows us to reformulate them as systems with more compact symmetric coefficient matrices, which are easierandcheapertofactorthanthe original sparse form. We apply the reformulation procedures to the following general form of the linear system: 0 AT I A 0 0 x λ = r c r b. 14.42) S 0 X s r xs Since x and s are strictly positive, the diagonal matrices X and S are nonsingular. Hence, by eliminating s from 14.42), we obtain the following equivalent system: A T D 2 x [ 0 A ][ λ where we have introduced the notation ] = [ ] r b r c + X 1, 14.43a) r xs s = X 1 r xs X 1 S x, 14.43b) D = S 1/2 X 1/2. 14.44) This form of the step equations usually is known as the augmented system. Since the matrix X 1 S is also diagonal and nonsingular, we can go a step further,

18 CHAPTER 14. INTERIOR-POINT METHODS eliminating x from 14.43a) to obtain another equivalent form: AD 2 A T λ = r b AXS 1 r c + AS 1 r xs 14.45a) s = r c A T λ, 14.45b) x = S 1 r xs XS 1 s. 14.45c) This form often is called the normal-equations form because the system 14.45a) can be viewed as the normal equations??) foralinearleast-squares problem with coefficient matrix DA T. Most implementations of primal-dual methods are based on formulations like 14.45). They use direct sparse Cholesky algorithms to factor the matrix AD 2 A T, and then perform triangular solves with the resulting sparse factors to obtain the step λ from 14.45a). The steps s and x are recovered from 14.45b) and 14.45c). General-purpose sparse Cholesky software can be applied to AD 2 A T, but a few modifications are needed because AD 2 A T may be ill-conditioned or singular. Ill conditioning of this system is often observed during the final stages of a primal-dual algorithm, when the elements of the diagonal weighting matrix D 2 take on both huge and tiny values.) A disadvantage of this formulation is that if A contains any dense columns, the matrix AD 2 A T is completely dense. Hence, practical software identifies dense and nearly-dense columns, excludes them from the matrix product AD 2 A T, and performs the Cholesky factorization of the resulting sparse matrix. Then, a device such as a Sherman-Morrison-Woodbury update is applied to account for the excluded columns. We refer the reader to Wright [24, Chapter 11] for further details. The formulation 14.43) has received less attention than 14.45), mainly because algorithms and software for factoring sparse symmetric indefinite matrices are more complicated, slower, and less prevalent than sparse Cholesky algorithms. Nevertheless, the formulation 14.43) is cleaner and more flexible than 14.45) in a number of respects. We avoid the fill-in caused by dense columns in A in the matrix product AD 2 A T, and free variables that is, components of x with no explicit lower or upper bounds) can be handled without resorting to the various artificial devices needed in the normal-equations form. 14.4 Other Primal-Dual Algorithms and Extensions Other Path-Following Methods Framework 14.1 is the basis of a number of other algorithms of the pathfollowing variety. They are less important from a practical viewpoint, but we mention them here because of their elegance and their strong theoretical properties. Some path-following methods choose conservative values for the centering parameter σ that is, σ only slightly less than 1) so that unit steps that is,

14.4. OTHER PRIMAL-DUAL ALGORITHMS AND EXTENSIONS 19 asteplengthofα = 1) can be taken along the resulting direction from 14.11) without leaving the chosen neighborhood. These methods, which are known as short-step path-following methods, make only slow progress toward the solution because they require the iterates to stay inside a restrictive N 2 neighborhood 14.16). Better results are obtained with the predictor-corrector method, due to Mizuno, Todd, and Ye [15], which uses two N 2 neighborhoods, nested one inside the other. Despite the similar terminology, this algorithm is quite distinct from Algorithm 14.3 of Section 14.3.) Every second step of this method is a predictor step, which starts in the inner neighborhood and moves along the affine-scaling direction computed by setting σ = 0 in 14.11)) to the boundary of the outer neighborhood. The gap between neighborhood boundaries is wide enough to allow this step to make significant progress in reducing µ. Alternating with the predictor steps are corrector steps computed with σ =1andα =1),which take the next iterate back inside the inner neighborhood in preparation for the next predictor step. The predictor-corrector algorithm produces a sequence of duality measures µ k that converge superlinearly to zero, in contrast to the linear convergence that characterizes most methods. Potential-Reduction Methods Potential-reduction methods take steps of the same form as path-following methods, but they do not explicitly follow the central path C and can be motivated independently of it. They use a logarithmic potential function to measure the worth of each point in F o and aim to achieve a certain fixed reduction in this function at each iteration. The primal-dual potential function, which we denote generically by Φ, usually has two important properties: Φ if x i s i 0forsomei, while µ = x T s/n 0, 14.46a) Φ if and only if x, λ, s) Ω. 14.46b) The first property 14.46a) prevents any one of the pairwise products x i s i from approaching zero independently of the others, and therefore keeps the iterates away from the boundary of the nonnegative orthant. The second property 14.46b) relates Φ to the solution set Ω. If our algorithm forces Φ to, then 14.46b) ensures that the sequence approaches the solution set. An interesting primal-dual potential function is the Tanabe-Todd-Ye function Φ ρ defined by Φ ρ x, s) =ρ log x T s n log x i s i, 14.47) for some parameter ρ>nsee [17], [21]). Like all algorithms based on Framework 14.1, potential-reduction algorithms obtain their search directions by solving 14.12), for some σ k 0, 1), and they take steps of length α k along these directions. For instance, the step length α k may be chosen to approximately minimize Φ ρ along the computed direction. By fixing σ k = n/n+ n) for all k, i=1

20 CHAPTER 14. INTERIOR-POINT METHODS one can guarantee constant reduction in Φ ρ at every iteration. Hence, Φ ρ will approach, forcing convergence. Adaptive and heuristic choices of σ k and α k are also covered by the theory, provided that they at least match the reduction in Φ ρ obtained from the conservative theoretical values of these parameters. Extensions Primal-dual methods for linear programming can be extended to wider classes of problems. There are simple extensions of the algorithm to the monotone linear complementarity problem LCP) and convex quadratic programming problems for which the convergence and polynomial complexity properties of the linear programming algorithms are retained. The LCP is the problem of finding vectors x and s in IR n that satisfy the following conditions: s = Mx+ q, x, s) 0, x T s =0, 14.48) where M is a positive semidefinite n n matrix and q IR n. The similarity between 14.48) and the KKT conditions 14.3) is obvious: The last two conditions in 14.48) correspond to 14.3d) and 14.3c), respectively, while the condition s = Mx + q is similar to the equations 14.3a) and 14.3b). For practical instances of the problem 14.48), see Cottle, Pang, and Stone [3]. In convex quadratic programming, we minimize a convex quadratic objective subject to linear constraints. A convex quadratic generalization of the standard form linear program 14.1) is min c T x + 1 2 xt Gx, subject to Ax = b, x 0, 14.49) where G is a symmetric n n positive semidefinite matrix. The KKT conditions for this problem are similar to those for linear programming 14.3) and also to the linear complementarity problem 14.48). See Section?? for further discussion of interior-point methods for 14.49). Primal-dual methods for nonlinear programming problems can be devised by writing down the KKT conditions in a form similar to 14.3), adding slack variables where necessary to convert all the inequality conditions to simple bounds of the type 14.3d). As in Framework 14.1, the basic primal-dual step is found by applying a modified Newton s method to some formulation of these KKT conditions, curtailing the step length to ensure that the bounds are satisfied strictly by all iterates. When the nonlinear programming problem is convex that is, its objective and constraint functions are all convex functions), global convergence of primal-dual methods can be proved. Extensions to general nonlinear programming problems are not so straightforward; this topic is the subject of Chapter??. Interior-point methods are highly effective in solving semidefinite programming problems, a class of problems involving symmetric matrix variables that are constrained to be positive semidefinite. Semidefinite programming, which has been the topic of concentrated research since the early 1990s, has applications in many areas, including control theory and combinatorial optimization.

14.5. PERSPECTIVES AND SOFTWARE 21 Further information on this increasingly important topic can be found in the survey papers of Todd [19] and Vandenberghe and Boyd [22] and the books of Nesterov and Nemirovskii [16], Boyd et al. [1], and Boyd and Vandenberghe [2]. 14.5 Perspectives and Software The appearance of interior-point methods in the 1980s presented the first serious challenge to the dominance of the simplex method as a practical means of solving linear programming problems. By about 1990, interior-point codes had emerged that incorporated the techniques described in Section 14.3 and that were superior on many large problems to the simplex codes available at that time. The years that followed saw a quantum improvement in simplex software, evidenced by the appearance of packages such as CPLEX and XPRESS-MP. These improvements were due to algorthmic advances such as steepest-edge pivoting See Goldfarb and Forrest [8]) and improved pricing heuristics, and also to close attention to the nuts and bolts of efficient implementation. The efficiency of interior-point codes also continued to improve, through improvements in the linear algebra for solving the step equations and through the use of higher-order correctors in the step calculation see Gondzio [9]). During this period, a number of good interior-point codes became freely available, at least for research use, and found their way into many applications. In general, simplex codes are faster on problems of small-medium dimensions, while interior-point codes are competitive and often faster on large problems. However, this rule is certainly not hard-and-fast; it depends strongly on the structure of the particular application. Interior-point methods are generally not able to take advantage of prior knowledge about the solution, such as an estimate of the solution itself or an estimate of the optimal basis. Hence, these methods are less useful than simplex approaches in situations in which warmstart information is readily available. The most widespread situation of this type involves branch-and-bound algorithms for solving integer programs, where each node in the branch-and-bound tree requires the solution of a linear program that differs only slightly from one already solved in the parent node. Interior-point software has the advantage that it is easy to program, relative to the simplex method. The most complex operation is the solution of the large linear systems at each iteration to compute the step; software to perform this linear algebra operation is readily available. The interior-point code LIPSOL [25] is written entirely in the Matlab language, apart from a small amount of FORTRAN code that interfaces to the linear algebra software. The code PCx [4] is written in C, but also is easy for the interested user to comprehend and modify. It is even possible for a non-expert in optimization to write an efficient interior-point implementation from scratch that is customized to their particular application. Notes and References

22 CHAPTER 14. INTERIOR-POINT METHODS For more details on the material of this chapter, see the book by Wright [24]. As noted in the text, Karmarkar s method arose from a search for linear programming algorithms with better worst-case behavior than the simplex method. The first algorithm with polynomial complexity, Khachiyan s ellipsoid algorithm [11], was a computational disappointment. In contrast, the execution times required by Karmarkar s method were not too much greater than simplex codes at the time of its introduction, particularly for large linear programs. Karmarkar s is a primal algorithm; that is, it is described, motivated, and implemented purely in terms of the primal problem 14.1) without reference to the dual. At each iteration, Karmarkar s algorithm performs a projective transformation on the primal feasible set that maps the current iterate x k to the center of the set and takes a step in the feasible steepest descent direction for the transformed space. Progress toward optimality is measured by a logarithmic potential function. Nice descriptions of the algorithm can be found in Karmarkar s original paper [10] and in Fletcher [7]. Karmarkar s method falls outside the scope of this chapter, and in any case, its practical performance does not appear to match the most efficient primaldual methods. The algorithms we discussed in this chapter have polynomial complexity, like Karmarkar s method. Many of the algorithmic ideas that have been examined since 1984 actually had their genesis in three works that preceded Karmarkar s paper. The first of these is the book of Fiacco and McCormick [6] on logarithmic barrier functions, which proves existence of the central path, among many other results. Further analysis of the central path was carried out by McLinden [12], in the context of nonlinear complementarity problems. Finally, there is Dikin s paper [5], in which an interior-point method known as primal affine-scaling was originally proposed. The outburst of research on primal-dual methods, which culminated in the efficient software packages available today, dates to the seminal paper of Megiddo [13]. Todd gives an excellent survey of potential reduction methods in [18]. He relates the primal-dual potential reduction method mentioned above to pure primal potential reduction methods, including Karmarkar s original algorithm, and discusses extensions to special classes of nonlinear problems. For an introduction to complexity theory and its relationship to optimization, see the book by Vavasis [23]. Exercises 1. This exercise illustrates the fact that the bounds x, s) 0 are essential in relating solutions of the system 14.4a) to solutions of the linear program 14.1) and its dual. Consider the following linear program in IR 2 : min x 1, subject to x 1 + x 2 =1, x 1,x 2 ) 0. Show that the primal-dual solution is ) 0 1 x =, λ 1 =0, s = 0 ).

14.5. PERSPECTIVES AND SOFTWARE 23 Also verify that the system F x, λ, s) = 0 has the spurious solution ) ) 1 0 x =, λ =1, s =, 0 1 which has no relation to the solution of the linear program. 2. i) Show that N 2 θ 1 ) N 2 θ 2 ) when 0 θ 1 < θ 2 < 1 and that N γ 1 ) N γ 2 )for0<γ 2 γ 1 1. ii) Show that N 2 θ) N γ) ifγ 1 θ. 3. Given an arbitrary point x, λ, s) F o, find the range of γ values for which x, λ, s) N γ). The range depends on x and s.) 4. For n = 2, find a point x, s) > 0 for which the condition is not satisfied for any θ [0, 1). XSe µe 2 θµ 5. Prove that the neighborhoods N 1) see 14.17)) and N 2 0) see 14.16)) coincide with the central path C. 6. Show that Φ ρ defined by 14.47) has the property 14.46a). 7. Prove that the coefficient matrix in 14.11) is nonsingular if and only if A has full row rank. 8. Given x, λ, s) satisfying 14.12), prove 14.22). 9. NEW) Given an iterate x k,λ k,s k )withx k,s k ) > 0, show that the quantities α pri max and α dual max defined by 14.33) are the largest values of α such that x k + α x k 0ands k + α s k 0, respectively. 10. NEW) Verify 14.34). 11. Given that X and S are diagonal with positive diagonal elements, show that the coefficient matrix in 14.45a) is symmetric and positive definite if and only if A has full row rank. Does this result continue to hold if we replace D by a diagonal matrix in which exactly m of the diagonal elements are positive and the remainder are zero? Here m is the number of rows of A.) 12. Given a point x, λ, s) withx, s) > 0, consider the trajectory H defined by F ˆxτ), ˆλτ), ) ŝτ) = 1 τ)at λ + s c) 1 τ)ax b), ˆxτ), ŝτ)) 0, 1 τ)xse

24 CHAPTER 14. INTERIOR-POINT METHODS ) for τ [0, 1], and note that ) ˆx0), ˆλ0), ŝ0) =x, λ, s), while the limit of ˆxτ), ˆλτ), ŝτ) as τ 1 will lie in the primal-dual solution set of the linear program. Find equations for the first, second, and third derivatives of H with respect to τ at τ = 0. Hence, write down a Taylor series approximation to H near the point x, λ, s). 13. Consider the following linear program, which contains free variables denoted by y: min c T x + d T y, subject to A 1 x + A 2 y = b, x 0. By introducing Lagrange multipliers λ for the equality constraints and s for the bounds x 0, write down optimality conditions for this problem in an analogous fashion to 14.3). Following 14.4) and 14.11), use these conditions to derive the general step equations for a primal-dual interior-point method. Express these equations in augmented system form analogously to 14.43) and explain why it is not possible to reduce further to a formulation like 14.45) in which the coefficient matrix is symmetric positive definite. 14. Program Algorithm 14.3 in Matlab. Choose η =0.99 uniformly in 14.41). Test your code on a linear programming problem 14.1) generated by choosing A randomly, and then setting x, s, b, andc as follows: { random positive number i =1, 2,...,m, x i = 0 i = m +1,m+2,...,n, { random positive number i = m +1,m+2,...,n s i = 0 i =1, 2,...,m, λ = random vector, c = A T λ + s, b = Ax. Choose the starting point x 0,λ 0,s 0 ) with the components of x 0 and s 0 set to large positive values.