A Second Full-Newton Step On Infeasible Interior-Point Algorithm for Linear Optimization H. Mansouri C. Roos August 1, 005 July 1, 005 Department of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, P.O. Box 5031, 600 GA Delft, The Netherlands e-mail: h.mansouri@ewi.tudelft.nl, c.roos@ewi.tudelft.nl Abstract In [4] the second author presented a new primal-dual infeasible interior-point algorithm that uses full-newton steps and whose iteration bound coincides with the best known bound for infeasible interior-point algorithms. Each iteration consists of a step that restores the feasibility for an intermediate problem the so-called feasibility step and a few usual centering steps. No more than On logn/ε iterations are required for getting an ε-solution of the problem at hand, which coincides with the best known bound for infeasible interior-point algorithms. In this paper we use a different feasibility step and show that with a simpler analysis the same result can be obtained. Keywords: Linear optimization, infeasible interior-point method, primal-dual method, polynomial complexity. AMS Subject Classification: 90C05, 90C51 1 Introduction For a discussion of the relevance and practical importance of IIPMs we refer to the introduction of [4]. In that paper the second author presented the first primal-dual Infeasible Interior-Point Method IIPM that uses full-newton steps for solving the linear optimization LO problem in the standard form P min { c T x : Ax = b, x 0 }, and its dual problem D max { b T y : A T y + s = c, s 0 }, The first author kindly acknowledges the support of the Iranian Ministry of Science, Research and Technology. On leave from Department of Mathematical Science, Shahrekord University, P.O. Box 115, Shahrekord, Iran. 1
where A R m n, b, y R m and c, x, s R n and rank A = m. The vectors x, y and s are the vectors of variables. As usual for IIPMs it was assumed in [4] that one knows a positive scalar ζ such that x ; s ζ for some optimal solution x, y, s of P and D, and that the initial iterates are x 0, y 0, s 0 = ζe, 0, e, where e denotes the all-one vector of length n. Using x 0 T s 0 = nζ, the total number of iterations in the algorithm of [4] is bounded above by 16 n log max { nζ, rb 0, rc 0 }, 1 ε where r 0 b and r0 c are the initial residual vectors: r 0 b = b Ax0 = b ζae r 0 c = c A T y 0 s 0 = c ζe. 3 If no such constant is known, we take ζ = L, where L denotes the binary input size of P and D. In that case also infeasibility or unboundedness of P and D can be detected by the algorithm. Up to a constant factor, the iteration bound 1 was first obtained by Mizuno [] and it is still the best known iteration bound for IIPMs. See also [3, 6]. To describe the aim of this paper we need to recall the main ideas underlying the algorithm in [4]. For any ν with 0 < ν 1 we consider the perturbed problem P ν, defined by { c } P ν min νr 0 T c x : Ax = b νr 0 b, x 0, and its dual problem D ν, which is given by { b } D ν max νr 0 T b y : A T y + s = c νrc 0, s 0. Note that if ν = 1 then x = x 0 yields a strictly feasible solution of P ν, and y, s = y 0, s 0 a strictly feasible solution of D ν. Due to the choice of the initial iterates we may conclude that if ν = 1 then P ν and D ν each have a strictly feasible solution, which means that both perturbed problems then satisfy the well known interior-point condition IPC. More generally one has the following lemma see also [4, Lemma 3.1]. Lemma 1.1 Theorem 5.13 in [7] The perturbed problems P ν and D ν satisfy the IPC for each ν 0, 1], if and only if the original problems P and D are feasible. Assuming that P and D are feasible, it follows from Lemma 1.1 that the problems P ν and D ν satisfy the IPC, for each ν 0, 1]. But then their central paths exist. This means that the system 1 b Ax = νr 0 b, x 0 4 c A T y s = νr 0 c, s 0 5 xs = µe. 6 1 Here and below we use the following notation: if x, s R n, then xs denotes the componentwise or Hadamard product of the vectors x and s. Furthermore, if z R n + and f : R + R +, then fz denotes the vector in R n + whose i-th component is fz i, with 1 i n.
has a unique solution, for every µ > 0. If ν 0, 1] and µ = νζ we denote this unique solution in the sequel as xν, yν, sν. As a consequence, xν is the µ-center of P ν and yν, sν the µ-center of D ν. Due to this notation we have, by taking ν = 1, x1, y1, s1 = x 0, y 0, s 0 = ζe, 0, ζe. We measure proximity of iterates x, y, z to the µ-center of the perturbed problems P ν and D ν by the quantity δx, s; µ, which is defined as follows. δx, s; µ := δv := 1 v v 1 xs where v := µ. 7 Initially we have x = s = ζe and µ = ζ, whence v = e and δx, s; µ = 0. In the sequel we assume that at the start of each iteration, δx, s; µ is smaller than or equal to a small threshold value τ > 0. So this is certainly true at the start of the first iteration. Now we describe one main iteration of our algorithm. Suppose that for some ν 0, 1] we have x, y and s satisfying the feasibility conditions 4 and 5 and such that x T s = nµ and δx, s; µ τ, 8 where µ = νζ. Each main iteration consists of one so-called feasibility step, a µ-update, and a few centering steps, respectively. First we find new iterates x f, y f and s f that satisfy 4 and 5, with ν replaced by ν +. As we will see, by taking θ small enough this can be realized by one feasibility step, to be described below soon. So, as a result of the feasibility step we obtain iterates that are feasible for P ν + and D ν +. Then we reduce ν to ν + = ν, with θ 0, 1, and apply a limited number of centering steps with respect to the µ + -centers of P ν + and D ν +. The centering steps keep the iterates feasible for P ν + and D ν +; their purpose is to get iterates x +, y + and s + such that x +T s + = nµ +, where µ + = ν + ζ and δx +, s + ; µ + τ. This process is repeated until the duality gap and the norms of the residual vectors are less than some prescribed accuracy parameter ε. Before describing the search directions used in the feasibility step and the centering step we give a more formal description of the algorithm in Figure 1. For the feasibility step we used in [4] search directions f x, f y and f s that are uniquely defined by the system A f x = θνr 0 b 9 A T f y + f s = θνr 0 c 10 s f x + x f s = µe xs. 11 It can easily be understood that if x, y, s is feasible for the perturbed problems P ν and D ν then after the feasibility step the iterates satisfy the feasibility conditions for P ν + and D ν +, provided that they satisfy the nonnegativity conditions. Assuming that before the step δx, s; µ τ holds, and by taking θ small enough, it can be guaranteed that after the step the iterates x f = x + f x, y f = y + f y, s f = s + f s are nonnegative and moreover δx f, s f ; µ + 1/, where µ + = 1 θµ. So, after the µ-update the iterates are feasible for P ν + and D ν + and µ is such that δx f, s f ; µ 1/. In the centering steps, starting at the iterates x, y, s = x f, y f, s f and targeting at the µ- centers, the search directions x, y, s are the usual primal-dual Newton directions, uniquely 3
Primal-Dual Infeasible IPM Input: Accuracy parameter ε > 0; barrier update parameter θ, 0 < θ < 1 threshold parameter τ > 0. begin x := ζe; y := 0; s := ζe; ν = 1; while max x T s, b Ax, c A T y s ε do begin end end feasibility step: µ-update: centering steps: x, y, s := x, y, s + f x, f y, f s; µ := µ; while δx, s; µ τ do x, y, s := x, y, s + x, y, s; endwhile Figure 1: Algorithm defined by A x = 0, 1 A T y + s = 0, 13 s x + x s = µe xs. 14 Denoting the iterates after a centering step as x +, y + and s +, we recall from [4] the following result. Lemma 1. If δ := δx, s; µ 1, then the primal-dual Newton step is feasible, i.e., x + and s + are nonnegative, and x + T s + = nµ. Moreover, if δ := δx, s; µ 1, then δx +, s + ; µ δ. The centering steps serve to get iterates that satisfy x T s = nµ + and δx, s; µ + τ, where τ is much smaller than 1/. By using Lemma 1., the required number of centering steps can easily be obtained. Because after the µ-update we have δ = δx f, s f ; µ + 1/, and hence after k centering steps the iterates x, y, s satisfy δx, s; µ + 1 k. 4
From this one easily deduces that no more than 1 log log τ. 15 centering steps are needed. Having described the approach taken in [4] we now are able to explain the aim of this paper. In this paper we use the same algorithm as described in Figure 1, but we change the definition of the feasibility step by replacing the equation 11 by the equation s f x + x f s = 0. 16 As we will see this simplifies the analysis of the algorithm at some places, whereas the iteration bound essentially remains the same. To conclude this section we briefly describe how the paper is organized. Section is devoted to the analysis of the new feasibility step, which is the main part of the paper. We will see that the new search direction requires a different analysis, but at some places we can use results that were obtained in [4]; in such cases we will cite these results without repeating their proofs. Like in [4] we need lower and upper bounds for the vectors xν and sν. We use in this paper the bounds that were derived in the Appendix of [4] and use these bounds with reference to this paper. The final iteration bound is derived in Section 3. Some concluding remarks can be found in Section 4. Some notations used throughout the paper are as follows. denotes the -norm of a vector. For any x = x 1 ; x ; ; x n R n. Furthermore, e denotes the all-one vector of length n. We write fx = Ogx if fx γ gx for some positive constant γ. Analysis of the feasibility step Let x, y and s denote the iterates at the start of an iteration, and assume δx, s; µ τ. Recall that at the start of the first iteration this is certainly true, because then δx, s; µ = 0..1 The feasibility step and the choice of τ and θ As we established in Section 1, the feasibility step generates new iterates x f, y f and s f that satisfy the feasibility conditions for P ν + and D ν +, except possibly the nonnegativity conditions. A crucial element in the analysis is to show that after the feasibility step δx f, s f ; µ + 1/, i.e., that the new iterates are positive and within the region where the Newton process targeting at the µ + -centers of P ν + and D ν + is quadratically convergent. Defining d x := v f x x, d s := v f s, 17 s with v as defined in 7. Now using 16 and xs = µv we may write x f s f = xs + s f x + x f s + f x f s = µv + f x f s = µ v + d x d s. 18 Lemma.1 The iterates x f, y f, s f are strictly feasible if and only if v + d x d s > 0. 5
Proof: Note that if x f and s f are positive then 18 makes clear that v + d x d s > 0, proving the only if part of the statement in the lemma. For the proof of the converse implication we introduce a step length α [0, 1], and we define x α = x + α f x, y α = y + α f y, s α = s + α f s. We then have x 0 = x, x 1 = x + and similar relations for y and s. Hence we have x 0 s 0 = xs > 0. We write x α s α = x + α f xs + α f s = xs + α s f x + x f s + α f x f s. From the definitions of d x and d s in 17 we deduce f x f s = µd x d s. Using this and s f x + x f s = 0 and µv = xs, we obtain x α s α = xs + α f x f s = µv + α µd x d s = µv + α d x d s. Now suppose v + d x d s > 0. Hence d x d s > v, so we get x α s α > µv α v = µ1 α v = 1 α xs, α [0, 1]. Since 1 α xs 0 it follows that x α s α > 0 for 0 α 1. Hence, none of the entries of x α and s α vanishes for 0 α 1. Since x 0 and s 0 are positive, and x α and s α depend linearly on α, this implies that x α > 0 and s α > 0 for 0 α 1. Hence, x 1 and s 1 must be positive, proving the if part of the statement in the lemma. Using 17 we may also write x f = x + f x = x + xd x v = x v v + d x 19 s f = s + f s = s + sd s v = s v v + d s. 0 To simplify the presentation we will denote δx, s; µ below simply as δ. Recall that we assume that before the feasibility step one has δ τ. Lemma. The iterates x f, y f, s f are certainly strictly feasible if where d x < 1 ρδ and d s < 1 ρδ, 1 ρδ := δ + 1 + δ. Proof: It is clear from 19 that x f is strictly feasible if and only if v + d x > 0. This certainly holds if d x < minv. Since δ = v v 1, the minimal value t that an entry of v can attain will satisfy t 1 and 1/t t = δ. The last equation implies t + δt 1 = 0, which gives t = δ + 1 + δ = 1/ρδ. This proves the first inequality in 1. The second inequality is obtained in the same way. The above proof makes clear that the elements of the vector v satisfy 1 ρδ v i ρδ, i = 1,..., n. 3 6
One may easily check that the system 9 11, which defines the search directions f x, f y and f s, can be expressed in terms of the scaled search directions d x and d s as follows. where Ād x = θνr 0 b, 4 Ā T f y µ + d s = θνvs 1 rc 0, 5 d x + d s = 0, 6 Ā = AV 1 X, V = diag v, X = diag x. 7 Hence, due to 6, we have d s = d x, and therefore we have ξ := d x d s = d x 0. 8 Assuming v + d x d s > 0, which according to Lemma.1 holds if and only if the iterates x f, y f, s f are strictly feasible, we proceed by deriving an upper bound for δx f, s f ; µ +. According to definition 7 it holds that δx f, s f ; µ + = 1 v f e, where v f x v f = f s f µ +. In the sequel we denote δx f, s f ; µ + also shortly by δv f. We can prove the following result. Lemma.3 Assuming v + d x d s > 0, one has 4δv + θ n d x + 4δ + ρδ4 d x 1 ρδ d x. Proof: After division of both sides in 18 by µ + we get, using 8, v f µ v + d x d s = µ + = v + d x d s = v + ξ. Hence we have 4δv f = i=1 For each i we define the function v f i + v f i = i=1 v i + ξ i + vi + ξ i f i z i := v i z i + vi z, i = 1,..., n. i One may easily verify that if vi z i > 0 then f i z i is convex in z i. Taking z = ξ = d x d s = d x 0, since v + ξ = v + d x d s > 0 we therefore may apply Lemma A.1. This gives, also using e T z = d x, 4δv + d xj d x v j d x + vj d x + v i + v. i j i 7
Using [4, Lemma.] we obtain i j v i + vi = i=1 v i + v j vi = 4δ + θ n Substituting this gives the following upper bound for 4δv + : d xj v j d x d x = 4δ + θ n + 1 d x = 4δ + θ n + 1 d x + v j d x + 4δ + θ n d xj d xj = 4δ + θ n d x + 1 d x Finally, by using 3 we get v j d x d x d xj vj + v j v j + vj v j +. v j + v vj d x j + + v j d x v j d x vj d x. v j 4δv + 4δ + θ n d x = 4δ + θ n d x This implies the lemma. + 1 d x + ρδ4 d x 1 ρδ d x. d ρδ 4 d x xj 1 ρδ d x We conclude this section by presenting a value that we not allow d x to exceed. It may be worth noting that d x is dependent on the value of θ, as is clear from 4-6. This fact will be explored later on. For the moment we observe that because we need to have δv + 1/, it follows from Lemma.3 that it suffices if At this stage we decide to choose θ n d x + Then, for n 1 and δ τ, one may easily verify that 4δ + ρδ4 d x 1 ρδ d x. τ = 1 8, θ = α n, α 1. 9 d x 1 We proceed by considering the vectors d x more in detail. δv + 1. 30 8
. An upper bound for d x It is clear from 4-6 that d x is the unique solution of the system Ād x = θνr 0 b, Ā T ξ d x = θνvs 1 r 0 c. To derive an upper bound for d x we recall a result from [4, Lemma 4.7]. There we proved that if the vector q satisfies Āq = θνr 0 b, Ā T χ + q = θνvs 1 r 0 c, then it follows that µ q θν ζ e T x s + s x. Using almost the same reasoning one easily proves that we also have x µ dx θν ζ e T s + s. 31 x To proceed we need upper and lower bounds for the elements of the vectors x and s..3 Bounds for x/s and s/x and the choice of α Recall that x is feasible for P ν and y, s for D ν and, moreover δx, s; µ τ, i.e., these iterates are close to the µ-centers of P ν and D ν. Based on this information we need to estimate the sizes of the entries of the vectors x/s and s/x. Since τ = 1/8, we can again use a result from [4], namely Corollary A.10, which gives x s xν µ, s x sν µ. Substitution into 31 yields µ dx θν ζ e T This implies Therefore, also using µ = νζ and θ = Following [4], we define xν µ + sν µ µ d x θνζ xν + sν. d x α n, we obtain the following upper bound for d x : α ζ xν + sν. 3 n xν + sν κζ, ν = ζ, 0 < ν 1, 33 n 9.
and Now we may write κζ = max 0<ν 1 d x α κζ where κζ = max 0<ν 1 κζ, ν. 34 κζ, ν. Note that since x1 = s1 = ζe, we have κζ, 1 = 1. Hence it follows that κζ 1. We found in 30 that in order to have δv + 1/, we should have d x 1. Due to 3 this certainly holds if α κζ 1. We conclude that if we take then we will certainly have δv + 1/. α = 1 κζ, Lemma.4 Section 4.6 in [4] One has κζ n. Substitution of the upper bound for κζ we obtain that we certainly have δv + 1/ if According to 9 this gives the following value for θ: 3 Iteration bound α = 1 4 n. 35 θ = 1 4n. 36 In the previous sections we have found that if at the start of an iteration the iterates satisfy δx, s; µ τ, with τ as defined in 9, and θ as in 36, then after the feasibility step and the µ-update the iterates satisfy δx f, s f ; µ + 1/. According to 15, at most 1 log log τ = log log 64 = 3 centering steps suffice to get iterates that satisfy δx, s; µ + τ. So each main iteration consists of one feasibility step and 3 centering steps. In each main iteration both the duality gap and the norms of the residual vectors are reduced by the factor. Hence, using x 0 T s 0 = nζ, the total number of main iterations is bounded above by { 1 max nζ, r 0 b, r 0 } c log = 4n log max { nζ, r 0 b, r 0 } c. θ ε ε It has become a custom to measure the complexity of an IPM by the required number of inner iterations, i.e., by the number of times that we need to compute a new search direction. Since each main iteration consists of four inner iterations, one feasibility step and three centering steps, we obtain that the total number of inner iterations is bounded above by 16 n log max { nζ, } r 0 b, r 0 c. ε Note that this bound is exactly the same as the bound in 1. 10
4 Concluding remarks Using similar techniques as in [4] we analyzed a full-newton step method that differs from the algorithm considered in [4] only by the definition of the feasibility step. The new step is defined by the equation 16: s f x + x f s = 0, whereas the feasibility step in [4] was determined by s f x + x f s = µ xs. There is one more natural candidate for the definition of this step, namely s f x + x f s = µ xs. We leave it to the future to analyze a full-newton step method based on this candidate search direction, but is unlikely that this will lead to a much better iteration bound. To change the order of the bound it would be more fruitful to to improve the upper bound for the parameter κζ as given by Lemma.4. Let us recall from [4] that based on extensive computational evidence we conjecture that if ζ is large enough, then κζ = 1. If this conjecture were true then the iteration bound would improve by a factor n. References [1] Y.Q. Bai, M. El Ghami, and C. Roos. A comparative study of kernel functions for primal-dual interiorpoint algorithms in linear optimization. SIAM Journal on Optimization, 151:101 18, 004. [] S. Mizuno. Polynomiality of infeasible-interior-point algorithms for linear programming. Mathematical Programming 67, pages 109 119, 1994. [3] F. A. Potra. An infeasible-interior-point predictor-corrector algorithm for linear programming. SIAM Journal on Optimization, 61:19 3, 1996. [4] C. Roos. A full-newton step On infeasible interior-point algorithm for linear optimization, February 005. Submitted to SIAM Journal on Opitmization. [5] C. Roos, T. Terlaky, and J.-Ph. Vial. Theory and Algorithms for Linear Optimization. An Interior- Point Approach. John Wiley & Sons, Chichester, UK, 1997. [6] Michael J. Todd and Yinyu Ye. A lower bound on the number of iterations of long-step primaldual linear programming algorithms. Ann. Oper. Res., 6:33 5, 1996. Interior point methods in mathematical programming. [7] Y. Ye. Interior Point Algorithms, Theory and Analysis. John Wiley & Sons, Chichester, UK, 1997. A Appendix: A technical lemma Lemma A.1 For i = 1,..., m, let f i : R + R denote a convex function. nonzero vector z R n + the following inequality: i=1 f i z i 1 e T z z j f j e T z + i j f i 0. Then we have for any 11
Proof: We define the function F : R n + R by F z = f i z i, z 0. i=1 Letting e j denote the j-th unit vector in R n, we may write z as a convex combination of the vectors e T z e j, as follows. z j z = e T e T z e j, z Indeed, n z j e T z = 1 and z j/e T z 0 for each j. Since F z is a sum of convex functions, F z is convex in z, and hence we have F z z j e T z F e T z e j = Since e j i = 1 if i = j and zero if i j, we obtain F z z j e T z n z j e T z f j e T z + i j f i e T z e j i. i=1 f i 0. Hence the inequality in the lemma follows 1