Full Newton step polynomial time methods for LO based on locally self concordant barrier functions (work in progress) Kees Roos and Hossein Mansouri e-mail: [C.Roos,H.Mansouri]@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos Georgia Tech, Atlanta, GA November 21, A.D. 2005 1
Self-concordant (barrier) functions Outline Definitions Newton step and proximity measure Algorithm with full Newton steps Complexity analysis Minimization of a linear function over a convex domain Algorithm with full Newton steps Complexity analysis Kernel-function-based approach Linear optimization via self-dual embedding Central path of self-dual problem Kernel-function-based barrier functions Complexity results Local self-concordancy of kernel-function-based barrier functions Analysis of the full Newton step method Concluding remarks Some references 2
Self-concordant univariate functions We start by considering a univariate function φ : D R. The domain D of the function φ must be an open interval in R. One calls φ a κ-self-concordant (SC) function if there exists a nonnegative number κ such that φ (x) 2κ ( φ (x) )32, x D. (1) Note that this definition assumes that φ (x) is nonnegative, whence φ is convex, and moreover that φ is three times differentiable. Moreover, if φ (x) > 0 for all x D, then φ is SC if and only if is bounded above (by 4κ 2 ). φ (x) 2 φ (x) 3 3
Self-concordant (multivariate) functions Let φ : D R be a strictly convex function, where the domain D is an open convex subset of R n, with n > 1. So φ is a multivariate function. Then φ is called a κ-sc function if its restriction to an arbitrary line in its domain is κ-sc. In other words, φ is a κ-sc if and only if φ(t) = φ(x + th) is κ-sc for all x D and for all h R n. The domain of φ(t) is defined in the natural way: given x and h it consists of all t such that x + th D. We want to find the minimal value φ on its domain (if it exists) by Newton s method. 4
Newton step and proximity measure Let φ : D R be a a strictly convex κ-sc function having a minimizer, and such that the minimal value equals 0. The Newton step at x is defined by x = H(x) 1 g(x), (2) where g(x) and H(x) denote the gradient and the Hessian of φ(x) at x, respectively. In the sequel we always assume that φ is strictly convex. As a consequence, the quantity λ(x) = x T H(x) x = x H(x) = g(x) T H(x) 1 g(x), can be used as a measure for the distance of x to the minimizer of φ(x). The quantity λ(x) plays a crucial role in the analysis of Newton s method. Many results can be nicely expressed by using the univariate (nonnegative) function ω(t) defined by For example, if λ(x) < 1 κ then one has ω(t) = t ln(1 + t), t > 1. (3) κλ(x) + ln(1 κλ(x)) φ(x) Hence, since ω(t) is monotonically decreasing if t ( 1,0], we obtain λ(x) 1 4κ κ 2 = ω ( κλ(x)) κ 2. φ(x) ω( 1 4 ) κ 2 = 0.0376821 κ 2 1 26κ 2. 5
Quadratic convergence result A major result in the theory of self-concordant functions states that the Newton process is quadratically convergent if 3κλ(x) < 1. This is because of the following result. Lemma 1 If κλ(x) < 1 then x + x D and λ(x + x) κ ( λ(x) 1 κλ(x) ) 2. Corollary 1 If 3κλ(x) < 1 then x + x D and λ(x + x) ( 3 2 λ(x) κ ) 2. 6
Algorithm with full Newton steps Assuming that we know a point x D with λ(x) 3κ 1 we can easily obtain a point x D such that λ(x) ǫ, for prescribed ǫ > 0, with the following algorithm. Input: An accuracy parameter ǫ (0,1); x D such that λ(x) 1 3κ. while λ(x) ǫ do x := x + x endwhile Theorem 1 Let x D and λ(x) 3κ 1. Then the algorithm with full Newton steps requires at most log 2 log ǫ log 3 4 ( log 2 3.5log 1 ) ǫ iterations. The output is a point x D such that λ(x) ǫ. 7
Minimization of a linear function over a convex domain We consider the problem of minimizing a linear function over a closed convex domain D: (P) min { c T x : x D }. We assume that we have a self-concordant barrier function φ : D R, where D = int D, and also that H(x) = 2 φ(x) is positive definite for every x D. For µ > 0 we define φ µ (x) := ct x µ + φ(x), x D. (P µ ) inf {φ µ (x) : x D}. We have g µ (x) := φ µ (x) = c µ + φ(x) = c µ + g(x) H µ (x) := 2 φ µ (x) = 2 φ(x) = H(x), 3 φ µ (x) = 3 φ(x). Note that the two higher derivatives do not depend on µ. It follows that φ µ (x) is selfconcordant. The minimizer of φ µ (x), if it exists, is denoted as x(µ). When µ runs though all positive numbers then x(µ) runs through the central path of (P). We expect that x(µ) converges to an optimal solution of (P) when µ approaches 0. Therefore we are going to follow the central path. This approach is likely to be feasible because since φ µ (x) is self-concordant, its minimizer can be computed efficiently. 8
Newton step and proximity measure φ µ (x) := ct x µ + φ(x), x D. g µ (x) := φ µ (x) = c µ + φ(x) = c µ + g(x) H µ (x) := 2 φ µ (x) = 2 φ(x) = H(x), 3 φ µ (x) = 3 φ(x). The Newton step at x is now given by x = H(x) 1 g µ (x) and the distance of x D to the µ-center x(µ) is measured by the quantity λ µ (x) = x T H(x) x = g µ (x) T H(x) 1 g µ (x) = g µ (x) H 1. 9
Effect of a µ-update and the barrier parameter ν Let λ = λ µ (x) and µ + = (1 θ)µ. Our aim is to estimate λ µ +(x). We have g µ +(x) = c µ + φ(x) = c + (1 θ)µ + φ(x) = 1 1 θ ( c Hence, denoting H(x) shortly as H, µ + φ(x) θ φ(x) λ µ +(x) = 1 1 θ g µ(x) θ φ(x) H 1 1 1 θ ) = 1 1 θ (g µ(x) θ φ(x)). g µ(x) H 1 } {{ } λ µ (x) +θ g(x) H 1 Definition 1 Let ν 0. The self-concordant barrier function φ is called a ν-barrier if λ(x) 2 = g(x) 2 H 1 ν, x D.. An immediate consequence of this definition is Lemma 2 If φ is a self-concordant ν-barrier then λ µ +(x) λ µ(x)+θ ν 1 θ. 10
Input: Algorithm with full Newton steps An accuracy parameter ǫ > 0; proximity parameter τ > 0; update parameter θ, 0 < θ < 1; x = x 0 D and µ = µ 0 > 0 such that λ µ (x) τ < 1 κ. ( while µ ν + τ (τ+ ν) 1 κτ µ := (1 θ)µ; x := x + x; endwhile ) ǫ do Theorem 2 If τ = 9κ 1 and θ = 5 requires not more than 9+36κ ν 2 ( 1 + 4κ ν ) ln 2µ0 ν ǫ, then the algorithm with full Newton steps iterations. The output is a point x D such that c T x c T x + ǫ, where x denotes an optimal solution of (P). 11
Graphical illustration of full-newton-step path-following method z 0 µe central path λ(x) τ z k = x k s k (1 θ)µe z 1 One iteration. 12
Relevant part of the analysis of the algorithm At the start of the first iteration we have x D and µ = µ 0 such that λ µ (x) τ. When the barrier parameter is updated to µ + = (1 θ)µ, Lemma 2 gives λ µ +(x) λ µ(x) + θ ν 1 θ τ + θ ν 1 θ. (4) Then after the Newton step, the new iterate is x + = x + x and ( ) 2 λ µ +(x + λµ +(x) ) κ. (5) 1 κλ µ +(x) The algorithm is well defined if we choose τ and θ such that λ µ +(x + ) τ. To get the lowest iteration bound, we need at the same time to maximize θ. From (5) we deduce that λ µ +(x + ) τ certainly holds if λ µ +(x) 1 κλ µ +(x) τ κ, which is equivalent to λ µ +(x) the following condition on θ: τ κ τ+ κ. According to (4) this will hold if τ+θ ν 1 θ θ τ 1 κτ κτ τ + νκ ( 1 + κτ ) τ κ τ+. This leads to κ We choose τ = 1 9κ. The upper bound for θ gets the value 5 9+36κ ν 1 2+8κ ν, and then λ µ+(x) 1 4κ. This justifies the choice of the value of τ and θ in the theorem. For the rest of the proof we refer to the relevant references. 13
Linear optimization via self-dual embedding (1) It is now well known that every linear optimization problem can be solved efficiently if we can find in polynomial time a strictly complementary solution of problems of the form (SP) min{q T x : Mx + q 0, x 0}, where the n n matrix M is skew-symmetric (i.e., M T = M) and q = (0;...;0; n) R n, and under the assumption that the all-one vector 1 is feasible with M 1 + q = 1. The problem (SP) is trivial in the sense that it has a trivial optimal solution, namely x = 0, with 0 as optimal value. But this observation is not sufficient for our goal, since we need a strictly complementary solution of (SP). What this means requires some explanation. 14
Linear optimization via self-dual embedding (2) We associate to any vector x R n its slack vector s(x) according to s(x) = Mx + q. In the sequel we simply denote s(x) as s, and s will always have this meaning. Since M is skew-symmetric we have z T Mz = 0 for every vector z R n. Hence we have q T x = (s Mx) T x = s T x x T Mx = s T x. Therefore, if x is feasible, then x is optimal if and only if s T x = 0. Since x and s are nonnegative this holds if and only if x i s i = 0 for each i. This shows that x is optimal if and only if the vectors x and s are complementary vectors. We say that x is a strictly complementary solution if moreover x i + s i > 0 for each i. Summarizing these facts, we have that x is feasible if x 0 and s 0. A feasible x is optimal if xs = 0, and x is a strictly complementary solution if moreover x + s > 0. Thus we need to solve the system s = Mx + q, x 0, s 0, xs = 0 x + s > 0. 15
Central path The basic idea of IPMs is to replace the so-called complementarity condition xs = 0 for (SP), by the parameterized equation xs = µ1, with µ > 0. Thus we consider the system s = Mx + q, x 0, s 0, xs = µ1. Clearly, any solution (x, s) will satisfy x > 0 and s > 0. Note that x = s = 1 and µ = 1 satisfy this system. Surprisingly enough, a solution exists for each µ > 0, and this solution is unique. It is denoted as (x(µ), s(µ)) and we call x(µ) the µ-center of (SP); s(µ) is the corresponding slack vector. The set of µ-centers (with µ running through all positive real numbers) gives a homotopy path, which is called the central path of (SP). If µ 0 then the limit of the central path exists and since the limit point satisfies the complementarity condition, the limit yields an optimal solution for (SP). Moreover, this solution can be shown to be strictly complementary. We will start our method at x = s = 1 and µ = 1. The method uses nonnegative barrier functions φ µ (x, s), for each µ > 0, such that φ µ (x(µ), s(µ)) = 0. If s = Mx + q then we denote φ µ (x, s) as Φ µ (x). 16
Kernel-function-based barrier functions First we choose a kernel function ψ : (0, ) [0, ). We require that ψ(t) is three times differentiable and strictly convex, and moreover that ψ(t) is minimal at t = 1, whereas ψ(1) = 0. Then we define n xs Φ µ (x) := φ µ (x, s) = 2 ψ (v i ) where v :=, s = Mx + q. µ i=1 The barrier function Φ µ (x) based on the kernel function ψ(t) is defined on the interior of the domain of (SP). φ µ (x, s) is strictly convex and minimal when v = 1, and then x = x(µ) (and s = s(µ)). Provided that θ is small enough, after a full Newton step we get a good enough approximation of x = x(µ). Then we repeat the above process: reduce µ by the factor 1 θ, do a full Newton step, etc., until µ is close enough to zero. At the end this yields an ǫ-solution of the problem (SP). In earlier papers we used a search direction determined by the system M x = s, s x + x s = µ v ψ (v). 17
Complexity results i kernel function ψ i (t) small-update large-update ref. t 1 2 1 2 ln t O ( ) n ln n O ( ) n ln n RTV ǫ ǫ ( ) 1 2 2 t 1 2 O ( ) ( ) t n ln n O n 2 3 ln n PRT ǫ ǫ t 3 2 1 2 + t1 q 1 q 1, q > 1 O ( q 2 ) ) n ln n O (qn q+1 2q ln n PRT ǫ ǫ t 4 2 1 2 + t1 q 1 q(q 1) q 1 q (t 1), q > 1 O ( q n ) ) ln n O (qn q+1 2q ln n PRT ǫ ǫ 5 6 t 2 1 2 + e1 t e e t 2 1 2 t 1 1 e1 ξ dξ O ( n ln n ǫ O ( ) n ln n O ( ǫ n ln 2 n ) ln n BER ǫ ) O ( n ln 2 n ) ln n BER ǫ 7 t 1 + t1 q 1 q 1, q > 1 O ( q 2 n ) ln n ǫ O(qn) ln n ǫ BR In all cases the iteration bound for small-update methods is O ( n log n ǫ ). The best bound for large-update methods is obtained for i {3,4} by taking q = 1 2 log n. This gives the iteration bound O ( ) n(log n)log n ǫ, which is currently the best known bound for large-update methods. 18
Local self-concordancy of the barrier function We define φ : D R to be locally κ-sc at x D R n if φ(x + th) is κ-sc for all h R n ; to express the dependence of κ on x we use the notation κ(x). Clearly φ is κ-sc if and only if κ(x) is bounded above by some (finite) constant on the domain of φ. It is well known that the classical logarithmic barrier function whose kernel function is t2 1 2 ln t is SC. But this is quite exceptional. In general kernel-function-based barrier functions are not SC, but they are locally SC. The following table shows this for the kernel function, ψ 2 (t). iteration bounds local value of κ i kernel function ψ i (t) small-update large-update ψ(t) Φ µ (x) 1 t 2 1 2 ln t O ( ) ( ) n ln n ǫ O n ln n ǫ 1 1 ( ) 2 t 1 2 t O ( ) n ln n ǫ? 1 2 2t 3 2 v 3 At the start of the algorithm we have v = 1, where the local value of κ is 2/ 3. During the course of the algorithm the iterates stay so close to the central path that v stays in a very small neighborhood of 1, and hence the barrier function is SC for some suitable value of κ, slightly larger than 2/ 3. 19
Assumptions on the kernel function We assume that and we make the following assumptions: ψ(t) = 1 2 ( t 2 1 ) + ψ b (t) (6) ψ b (t) < 0, ψ b (t) > 0, ψ b (t) < 0, t > 0. It will be convenient to use the following notations (for t > 0): ξ(t) := ψ (t) ψ (t) t, ξ b (t) = ψ b (t) ψ b (t) t. (7) Note that these definitions imply ξ(t) = ξ b (t) > 0, t > 0. 20
Consequences of the assumption For x > 0 and s > 0 we have φ µ (x, s) = 2 n i=1 Hence, if s := s(x) = Mx + q then Φ µ (x) = φ µ (x, s) = xt s nµ µ ψ (v i ) = +2 n j=1 n i=1 ( v 2 i 1 ) + 2 n i=1 ψ b ( vj ) = q T x nµ µ ψ b (v i ). +2 n j=1 ψ b ( xj s j µ ). In the special case that ψ(t) is the kernel function of the logarithmic barrier function we have ψ b (t) = ln t, whence φ µ (x, s) = qt x µ n j=1 ln ( x j s j ) + nln µ n, which is (up to the constant term n ln µ n) the classical primal-dual logarithmic barrier function. 21
Results on local self-concordancy (1) Let N(t) = ψ b (t) t > 0. (t), ψ b Theorem 3 ν(x, s) = 2 N(v) 2. It is quite surprising that the local value of ν depends only on the vector v. Recall that if x x(µ) and s s(µ) then v 1. We give two examples. i ψ i (t) ψ b (t) ψ b 1 t2 1 2 1 2 (t) ψ b 2 ln t ln t 1 t ( ) t 1 2 ( 12 t t 2 1 ) 1 3 t 3 t 4 (t) ψ b (t) ν(t) ν(v) 1 t 2 2 t 3 2 2n 12 t 5 2t 2 3 2 v 1 2 3 22
Proof of Theorem 3 We apply the composition rule, which is well known. Lemma 3 Let φ i be (κ i, ν i )-SCB s on D i, for i = 1,2. Then φ 1 + φ 2 is a (κ, ν)-scb for D 1 D 2, where κ = max {κ 1, κ 2 } and ν = ν 1 + ν 2. Since the linear part in φ µ (x, s) is 0-self-concordant, with ν = 0, it suffices to consider f(x, s) = 2 n j=1 ψ b ( xj s j µ where s = s(x) = Mx + q. In the sequel we will neglect this relation between s and x. Thus we will prove that f(x, s) is (κ, ν)-self-concordant on the set { (x, s) : x R n +, s R n } +. ), This will imply that f(x, s) is a (κ, ν)-self-concordant barrier function for the domain of (SP), which is the intersection of this set and affine space determined by s = Mx + q. We do this by considering each of the terms in the definition separately and then apply the composition rules of Lemma 3. 23
The case n = 1 (1) ( ) xs f(x, s) = 2ψ b, x > 0, s > 0. µ Now let σ, τ R and α such that x + ασ > 0 and s + ατ > 0. We define and Writing v = xs µ, v(α) = ϕ(α) = f(x + ασ, s + ατ) = 2 h = σ x, (x + ασ)(s + ατ), µ k = τ s n j=1 ψ b (v(α)). we have, using xs = µv 2, (x + ασ)(s + ατ) = xs(1 + αh)(1 + αk) = µv 2 (1 + αh)(1 + αk), and hence v(α) 2 = v 2 (1 + αh)(1 + αk). 24
The case n = 1 (2) v(α) 2 = v 2 (1 + αh)(1 + αk). Taking successive derivatives with respect to α at both sides we obtain 2v(α)v (α) = v 2 (h(1 + αk) + k(1 + αh)) v(α)v (α) + v (α) 2 = v 2 hk v(α)v (α) + 3 v (α)v (α) = 0. Substitution of α = 0 gives This gives v(0) 2 = v 2 2v (0) = v (h + k) vv (0) + v (0) 2 = v 2 hk vv (0) + 3v (0)v (0) = 0. v(0) = v v (0) = 1 2v (h + k) v (0) = 1 4 v (h k)2 v (0) = 3 8 v (h + k)(h k)2. 25
Since The case n = 1 (3) ϕ (α) = 2ψ b (v(α)) v (α) ϕ (α) = 2 [ ψ b (v(α)) v (α) 2 + ψ b (v(α)) v (α) ] ϕ (α) = 2 [ ψ b (v(α)) v (α) 3 + 3ψ b (v(α)) v (α)v (α) + ψ b (v(α)) v (α) ], it follows that ϕ (0) = 2ψ b (v) v (0) ϕ (0) = 2 [ ψ b (v) v (0) 2 + ψ b (v) v (0) ] ϕ (0) = 2 [ ψ b (v) v (0) 3 + 3ψ b (v) v (0)v (0) + ψ b (v) v (0) ]. Substitution of of the above expressions for v(0), v (0), v (0) and v (0) yields (8) ϕ (0) = ψ b (v) v (h + k) [ ψ ϕ (0) = 1 2 b (v) v2 (h + k) 2 ψ b (v) v (h k)2] ϕ (0) = 1 [ 4 ψ b (v) v2 (h + k) 2 3 ξ b (v) v (h k) 2] v (h + k). Lemma 4 φ µ (x, s) is strictly convex. 26
Computation of ν To compute the barrier parameter ν we need to find an upper bound for ( ϕ (0) ) [ 2 ψ 2 ϕ (0) = b (v) v (h + k)] Substituting we have ν = max y,z 1 2 1 2 [ ψ b (v) v2 (h + k) 2 ψ b (v) v (h k)2]. (9) y = h + k, z = h k, [ ψ b (v) vy ] 2 [ ψ b (v) v2 y 2 ψ b (v) vz2] = 2 [ ψ b (v) ] 2 ψ b (v) = 2N(v) 2. Thus we have proved the following lemma. Lemma 5 If n = 1, and N(t) is as defined before, then ν(x, s) = 2N(v) 2. Theorem 3 If n 1 then ν = 2 N(v) 2. Proof: This is an immediate consequence of Lemma 3 and Lemma 5. 27
We define where Results on local self-concordancy (2) K(t) = ρ(t) 2 1 ψ b (t) (, 3 ρ(t) ψ b (t))3 2 ρ(t) = ψ b (t) ψ b (t) ξ b (t)ψ b (t), ρ(t) = min[2, ρ(t)], ξ b(t) = ψ b (t) ψ b (t). t Theorem 4 κ(x, s) = K(v). i ψ i (t) ψ b (t) ψ b 1 t2 1 2 1 2 (t) ψ b 2 ln t ln t 1 t ( ) t 1 2 ( 12 t t 2 1 ) 1 3 t 3 t 4 (t) ψ b (t) ξ(t) ρ(t) κ(t) κ(v) 1 t 2 2 2 t 3 t 2 1 1 1 12 t 5 4 t 4 1 2t 3 2 v 3 28
Proof of Theorem 4 (1) We first consider the case where n = 1. Then that κ = κ(x, s) is defined by 2κ = max h,k Substituting get ϕ (0) (ϕ (0)) 3 2 = 1 4 [ ψ 2 2 κ = max y,z b (v) v2 (h + k) 2 3 ξ(v) v (h k) 2] v (h + k) [ [ 12 ψ b (v) v2 (h + k) 2 ψ b (v) v (h k)2]]3 2 y = h + k, z = h k, [ ψ b (v) v2 y 2 3 ξ(v) vz 2] vy [ ψ b (v) v2 y 2 ψ b (v) vz2]3 2 The last expression is homogeneous in (y, z). It follows that 2 2 κ = max { [ ψ b (v) v2 y 2 3 ξ(v) vz 2] vy : ψ b (v) v2 y 2 ψ b (v) vz2 = 1 }. Before proceeding we recall the definitions of ρ(t) and ρ(t): Note that ρ(t) (0,2]. ρ(t) = ψ b (t) ψ b (t) ξ(t)ψ, ρ(t) = min[2, ρ(t)]. (10) (t) b. 29.
Proof of Theorem 4 (2) 2 2 κ = max { [ ψ b (v) v 2 y 2 3 ξ(v) vz 2] vy : ψ b (v) v 2 y 2 ψ b(v) vz 2 = 1 }. (11) The optimality conditions are, for some suitable multiplier λ, or, equivalently, 3ψ b (v) v3 y 2 3 ξ(v) v 2 z 2 = 3λ ψ b (v) v2 y 6 ξ(v) v 2 yz = 3λ [ ψ b (v) vz], ψ b (v) vy2 ξ(v) z 2 = λ ψ b (v) y 2 ξ(v) vyz = λ ψ b (v) z. (12) We see that either z = 0 or 2 ξ(v) vy = λ ψ b (v). If z = 0 then the constraint in our problem implies that ψ b (v) v2 y 2 = 1, and hence (since (v) < 0), κ is in this case given by ψ b 2 2 κ = ψ b (v) (. (13) ψ b (v))3 2 30
Proof of Theorem 4 (3) Now assuming z 0, we can eliminate λ by substituting 2 ξ(v) vy = λ ψ b (v) into (12), which gives ψ b [ (v) ψ b (v) vy2 ξ(v) z 2] = λ ψ b (v) ψ b (v) y = 2 ξ(v)ψ b (v) vy2. Rearranging the terms, and using (10) we obtain ψ b (v)ξ(v) z2 = [ 2 ξ(v)ψ b (v) ψ b (v)ψ b (v) ] vy 2 = (2 ρ(v)) ξ(v)ψ b (v) vy2, yielding ψ b (v) z2 = (2 ρ(v)) ψ b (v) vy2, (14) Since ψ b (v) > 0 and ψ b (v) > 0, this equation has no nonzero solution for y if ρ(v) > 2, and hence κ is then given by (13). If ρ(v) 2, substitution of (14) into the constraint ψ b (v) v2 y 2 ψ b (v) vz2 = 1 yields or, equivalently, Hence ψ b (v) v2 y 2 + (2 ρ(v)) ψ b (v) v2 y 2 = 1, [3 ρ(v)] ψ b (v) v2 y 2 = 1. (15) vy = ±1 [3 ρ(v)] ψ b (16) (v). 31
Proof of Theorem 4 (4) The rest of the proof consists of computing the value of the objective function using the relations found so far. Using (14), (10) and (15), respectively, we may write 2 [ 2 κ = ± ψ b (2 ρ(v)) ψ (v) 3 ξ(v) b (v) ] ψ b (v) v 3 y 3 = ± 1 [ ψ b ψ (v) b (v) ψ b ] (v) + 3 (2 ρ(v)) ξ(v)ψ b (v) v 3 y 3 = ± ξ(v)ψ b (v) ψ b (v) [ρ(v) + 3(2 ρ(v))] v 3 y 3 = ±2 ξ(v) [ ψ b (3 ρ(v)) ψ (v) b (v) v2 y 2] vy = ±2 ξ(v) Finally, using (10) and (16) respectively, we get (since we are maximizing) 2 2 κ = ±2ξ(v) ±2ξ(v) ψ b vy = (v) ψ b (v) [3 ρ(v)] ψ b (v) = ρ(v) 2 ψ b (v) vy. (3 ρ(v)) ψ b (v) (. )3 ψ b (v) 2 For ρ(v) = 2 this yields exactly the same value as in (13). Thus the following holds. Lemma 6 If n = 1, and with K(t) as defined above, we have κ(x, s) = K(v). Theorem 4 If n 1 then κ = K(v). Proof: This is an immediate consequence of Lemma 3 and Lemma 6. 32
Summary of results From now on we assume that s = Mx + q. Our ingredients are Theorem 3 ν(x) = 2 N(v) 2, where N(t) = ψ b (t). ψ b (t) Theorem 4 κ(x) = K(v), where K(t) =. 1 ρ(t) 2 3 ρ(t) ψ b (t) ( ψ b (t))3 2, with ρ(t) = ψ b (t) ψ b (t) ξ b (t)ψ b (t), ρ(t) = min[2, ρ(t)], ξ b(t) = ψ b (t) ψ b (t). t Lemma 7 During the course of the algorithm we have λ(x) 1 4κ. This lemma implies φ µ (x, s) = 2 n i=1 ψ (v i ) ω( 1 4 ) κ 2 = 0.0376821 κ 2 1 26κ 2 33
Some examples of barier functions and their local κ and ν values i ψ b (t) ψ b (t) ψ b (t) ψ b (t) ξ b(t) ρ(t) ν(t) κ(t) 1 log t 1 t 1 2 2 1 1 1 t 2 t 3 t 2 2 1 2 ( t 2 1 ) 1 t 3 3 t 4 12 t 5 4 t 4 1 2 3t 2 2t 3 3 4 t 1 q 1 q 1 t q qt q 1 q(q + 1)t q 2 (q + 1)t q 1 1 e 1 t e e 5 t 1 e1 ξ 1 dξ e 1 t 6 e1 t 1 1+2t e 1 1 t 2 t 4 t 1+6t+6t2 e 1 1 1+3t t 6 t e 1 1 1+6t+6t2 t 4 t 2 qt q 1 2 e 1 t 1 1+5t+6t 2 1+2t 1 e1t 1 1+2t e 1 1 1+t t 2 t 4 t e 1 1 1+2t t 2 t 2t 2 e 1 1 t 1+t e σ(1 t) 1 e σ(1 t) σe σ(1 t) σ 2 eσ(1 t) 1+σt eσ(1 t) σt σ t 1+σt 2e σ(1 t) σ (1+3t) (q+1)t q 1 2 2 q 1+5t+6t 2 2+9t+12t 2 (2+4t) e 1 t 1 1+t t (1+σt) 1+t 2+t 2e 1 t 1 t 2σe σ(1 t) 1+σt 3+2σt 34
Analysis of the algorithm (1) Note that ψ(t) is monotonically decreasing for t 1 and monotonically increasing for t 1. In the sequel we denote by : [0, ) [1, ) the inverse function of ψ(t) for t 1 and by χ : [0, ) (0,1] the inverse function of ψ(t) for 0 < t 1. So we have and (s) = t s = ψ(t), s 0, t 1. (17) χ(s) = t s = ψ(t), s 0, 0 < t 1. (18) Note that χ(s) is monotonically decreasing and (s) is monotonically increasing in s 0. Lemma 8 Let t > 0 and ψ(t) s. Then χ(s) t (s). Proof: This is almost obvious. Since ψ(t) is strictly convex and minimal at t = 1, with ψ(1) = 0, ψ(t) s implies that t belongs to a closed interval whose extremal points are χ(s) and (s). 35
Graphical illustration of the functions χ(s) and (s) ψ(t) 11 10 9 8 s 7 6 5 4 3 2 1 0 0 1 2 3 4 5 χ(s) (s) t 36
The local values of κ and ν are given by κ(x) = max i We need to find values of κ and ν such that Analysis of the algorithm (2) K(v i ), ν(x) = 2 n i=1 N (v i ) 2. κ(x) κ, ν(x) ν for each v that occurs during the course of the algorithm. This certainly holds if φ µ (x, s) = 2 n i=1 The left-hand side of this implication implies ψ (v i ) 1 26κ 2 max K(v i ) κ, 2 i ψ (v i ) 1 52κ2, i = 1,..., n. According to Lemma 8 this implies ( ) ( ) 1 1 χ 52κ 2 v i 52κ 2, i = 1,..., n. n i=1 N (v i ) 2 ν. 37
χ ( 1 Analysis of the algorithm (3) 52κ 2 ) v i ( 1 52κ 2 ), i = 1,..., n. If we choose κ such that ( ( )) 1 K 52κ 2 κ (19) then the barrier function is locally κ-sc. The above inequality certainly has a solution, because if κ goes to infinity then the left-hand side approaches K(1), which is finite, whereas the right-hand side goes to. Let κ denote the smallest solution of (19). Finally, if we take ν such that ( ( ( ))) 1 2 ν = 2n N χ 52 κ 2 then the barrier function is a locally ( κ, ν)-sc barrier function. 38
Analysis of the algorithm (4) Substitution of the chosen values of κ and ν yields (also using that µ 0 = 1) the following iteration bound for the algorithm: 2 ( 1 + 4 κ ν ) ln 2 ν ǫ = 2 ( 1 + 4 κn ( ( 1 )) ) χ 2n 52 κ 2 ln 2n( N ( χ ( 1 ǫ ))) 2 52 κ 2 Note that apart from n the coefficients occurring in this expression depend only on the kernel function ψ, and not on n. Thus we may safely state that for every kernel function satisfying our conditions the iteration bound is ( nlog ) n O. ǫ. 39
Concluding remarks Recently we have used kernel function-based barrier functions (including so-called selfregular kernel functions) to improve the iteration bound for large-update methods from O(nlog n ǫ ) to O( n(log n)log n ǫ ). We were surprised to observe (most of the time after a tedious analysis, for each kernel function separately) that the iteration bounds for small-update methods based on these barrier functions always turned out to be O( nlog n ǫ ). The current results seem to explain this phenomenon. The results presented in this talk can be easily generalized to other (symmetric) cone optimization problems, like second-order cone optimization and semidefinite optimization. The next challenge is to find out if we can obtain the improved bounds for large-update methods by using this approach. 40
Some references Y.Q. Bai, M. El Ghami, and C. Roos. A comparative study of kernel functions for primaldual interior-point algorithms in linear optimization. SIAM Journal on Optimization, 15(1):101 128 (electronic), 2004. J. Peng, C. Roos, and T. Terlaky. Self-Regularity. A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton University Press, 2002. M. Salahi, T. Terlaky, and G. Zhang. The complexity of self-regular proximity based infeasible IPMs. Technical Report 2003/3, Advanced Optimization Laboratory, Mc Master University, Hamilton, Ontario, Canada, 2003. S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 2004. Y. Nesterov. Introductory Lectures on Convex Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2004. F. Glineur. Topics in convex optimization: interior-point methods, conic duality and approximations. Faculté Polytechnique de Mons, Mons, Belgium, 2001. PhD thesis. Y.E. Nesterov and A.S. Nemirovskii. Interior Point Polynomial Methods in Convex Programming. Theory and Algorithms. SIAM, Philadelphia, USA, 1993. 41