Primal-Dual Interior-Point Methods Javier Peña Convex Optimization 10-725/36-725
Last time: duality revisited Consider the problem min x subject to f(x) Ax = b h(x) 0 Lagrangian L(x, u, v) = f(x) + u T h(x) + v T (Ax b) We can rewrite the primal problem as min x max u,v u 0 L(x, u, v) Dual problem max u,v u 0 min x L(x, u, v) 2
Optimality conditions Assume f, h 1,..., h m are convex and differentiable. Assume also that strong duality holds. KKT optimality conditions for primal and dual f(x) + h(x)u + A T v = 0 Uh(x) = 0 Ax = b u, h(x) 0. Here U = Diag(u), h(x) = [ h 1 (x) h m (x) ] 3
Barrier problem where Central path equations min x φ(x) = f(x) + τφ(x) Ax = b m log( h i (x)). i=1 Optimality conditions for barrier problem (and its dual) f(x) + h(x)u + A T v = 0 Uh(x) = τ1 Ax = b u, h(x) > 0. Useful fact: solution (x(τ), u(τ), v(τ)) has duality gap f(x(τ)) min L(x, u(τ), v(τ)) = mτ. x 4
Outline Today: Primal-dual interior-point method Special case: linear programming Extension to semidefinite programming 5
Barrier method versus primal-dual method Like the barrier method, primal-dual interior-point methods aim to compute (approximately) points on the central path. Main differences between primal-dual and barrier methods: Primal-dual interior-point methods usually take one Newton step per iteration (no additional loop for the centering step). Primal-dual interior-point methods are not necessarily feasible. Primal-dual interior-point methods are typically more efficient. Under suitable conditions they have better than linear convergence. 6
Central path equations and Newton step Central path equations: f(x) + h(x)u + A T v = 0 Uh(x) + τ1 = 0 Ax b = 0 u, h(x) > 0. Newton step: 2 f(x) + i u i 2 h i (x) h(x) A T x U h(x) T H(x) 0 u = r(x, u, v) A 0 0 v where f(x) + h(x)u + A T v r(x, u, v) := Uh(x) + τ1 Ax b, H(x) = Diag(h(x)) 7
Surrogate duality gap, residuals Define the dual, central, and primal residuals at current (x, u, v) as r dual = f(x) + h(x)u + A T v r cent = Uh(x) + τ1 r prim = Ax b Given x, u with h(x) 0, u 0, the surrogate duality gap is h(x) T u This is a true duality gap when r dual = 0 and r prim = 0. Observe that (x, u, v) is on the central path if and only if u > 0, h(x) < 0 and r(x, u, v) = 0 for τ = h(x)t u m. 8
Given x, u such that h(x) 0, u 0, define τ(x, u) := h(x)t u m. Primal-Dual Algorithm 1. Choose σ (0, 1) 2. Choose (x 0, u 0, v 0 ) such that h(x 0 ) < 0, u 0 > 0 3. For k = 0, 1,... Compute Newton step for (x, u, v) = (x k, u k, v k ), τ := στ(x k, u k ) Choose steplength θ k and set (x k+1, u k+1, v k+1 ) := (x k, u k, v k ) + θ k ( x, u, v) Parallel notation in the barrier method: τ = 1 t, σ = 1 µ. 9
10 Backtracking line search At each step, we need to find θ and set x + = x + θ x, u + = u + θ u, v + = v + θ v. Two main goals: Maintain h(x) < 0, u > 0 Reduce r(x, u, v) Use a multi-stage backtracking line search for this purpose: start with largest step size θ max 1 that makes u + θ u 0: { } θ max = min 1, min{ u i / u i : u i < 0} Then, with parameters α, β (0, 1), we set θ = 0.99θ max, and Update θ = βθ, until h i (x + ) < 0, i = 1,... m Update θ = βθ, until r(x +, u +, v + ) (1 αθ) r(x, u, v)
11 Special case: linear programming Consider min x subject to c T x Ax = b x 0 for c R n, A R m n, b R m. Dual: max y,s subject to b T y A T y + s = c s 0
12 Some history Dantzig (1940s): the simplex method, still today is one of the most well-known/well-studied algorithms for LPs Klee and Minty (1972): pathological LP with n variables and 2n constraints, simplex method takes 2 n iterations to solve Khachiyan (1979): polynomial-time algorithm for LPs, based on ellipsoid method of Nemirovski and Yudin (1976). Strong in theory, weak in practice Karmarkar (1984): interior-point polynomial-time method for LPs. Fairly efficient (US Patent 4,744,026, expired in 2006) Renegar (1988): Newton-based interior-point algorithm for LP. Best known theoretical complexity until very recent work by Lee-Sidford. Modern state-of-the-art LP solvers typically use both simplex and interior-point methods
13 Optimality conditions and central path equations Optimality conditions for previous primal-dual pair of linear programs Central path equations A T y + s = c Ax = b XS1 = 0 x, s 0 A T y + s = c Ax = b XS1 = τ1 x, s > 0
14 Primal-dual method versus barrier method Newton equations for primal-dual method 0 A T I x A T y + s c A 0 0 y = Ax b S 0 X s XS1 τ1 Simple observation: XS1 = τ1 s = τx 1 1 x = τs 1 1. Hence can eliminate either s or x to get optimality conditions for either primal or dual barrier problems.
15 Newton steps for barrier problems Primal and dual central path equations A T y + τx 1 1 = c Ax = b x > 0 A T y + s = c τas 1 1 = b s > 0 Primal Newton step [ τx 2 A T ] [ ] x A 0 y Dual Newton step [ A T I 0 τas 2 [ A = T y + τx 1 ] 1 c Ax b ] [ ] y = s [ A T ] y + s c τas 1 1 b
Example: barrier versus primal-dual Example from B & V 11.3.2 and 11.7.4: standard LP with n = 50 variables and m = 100 equality constraints Barrier method uses various values of µ, primal-dual method uses µ = 10. Both use α = 0.01, β = 0.5 duality gap 10 2 10 0 10 2 10 4 10 6 µ =50 µ =150 µ =2 0 20 40 80 60 Newton iterations Barrier duality gap 10 2 10 2 10 5 10 0 10 0 10 0 10 2 10 2 10 4 10 4 ˆη ˆη r rfeas 10 5 10 6 10 6 10 10 10 8 10 8 10 10 10 10 10 15 0 0 5 5 10 1015 1520 2025 2530 30 0 5 10 15 20 25 30 iteration iteration number number iteration number Figure 11.21 Progress of the primal-dual interior-point method for an LP, showing surrogate duality gap ˆη and the norm of the primal and dual residuals, versus iteration number. The residual Primal-dual converges rapidly tofeasibility zero within 24 iterations; the surrogate gap also converges to a very small number in duality about 28 iterations. gap The primal-dual interior-point gap, methodr convergesfaster than the barrier method, especially if high accuracy is required. feas = Primal-dual surrogate ( r prim 2 2 + r dual 2 2) 1/2 Can see that primal-dual is faster to converge to high accuracy 10 2 10 5 16
17 Now a sequence of problems with n = 2m, and n growing. Barrier method uses µ = 100, runs just two outer loops (decreases duality gap by 10 4 ); primal-dual method uses µ = 10, stops when duality gap and feasibility gap are at most 10 8 35 50 Newton iterations 30 25 20 iterations 40 30 20 15 10 1 10 2 10 3 m Barrier method 10 10 1 10 2 10 3 m Primal-dual method Primal-dual method require only slightly more iterations, despite the fact that they it is producing higher accuracy solutions
Interior-point methods for semidefinite programming Primal Dual min X subject to max y subject to C X Recall trace inner product in S n A i X = b i, i = 1,..., m X 0. b T y m y i A i + S = C i=1 S 0. X S = trace(xs). Strong duality holds and primal and dual attained if both are strictly feasible. 18
19 Optimality conditions for semidefinite programming Primal and dual problems min X subject to C X A(X) = b X 0 max y,s subject to b T y A (y) + S = C S 0 Here A : S n R m linear map. Assume also that strong duality holds. Then X and (y, S ) are respectively primal and dual optimal solutions if and only if (X, y, S ) solves A (y) + S = C A(X) = b XS = 0 X, S 0.
20 Central path for semidefinite programming Primal barrier problem min X subject to C X τ log(det(x)) A(X) = b Dual barrier problem max y,s subject to b T y + τ log(det(s)) A (y) + S = C Optimality conditions for both A (y) + S = C A(X) = b XS = τi X, S 0.
21 Newton step Primal central path equations Newton equations A (y) + τx 1 = C A(X) = b X 0 τx 1 XX 1 + A ( y) = (A (y) + τx 1 C) A( X) = (A(X) b) Similar dual central path and Newton equations involving (y, S).
22 Primal-dual Newton step Recall central path equations A (y) + S C 0 A(X) b XS = 0, X, S 0. τi Natural Newton step: 0 A I X A (y) + S C A 0 0 y = A(X) b. S 0 X S XS τi But we run into issues of symmetry...
23 Nesterov-Todd direction We want to linearize Primal linearization: XS τi = 0. S τx 1 = 0 τx 1 XX 1 + S = τx 1 S. Dual linearization: X τs 1 = 0 X + τs 1 SS 1 = τs 1 X.
24 Nesterov-Todd direction Proper primal-dual linearization: average of previous two W 1 XW 1 + S = τx 1 S or equivalently X + W SW = τs 1 X provided W SW = X. Achieve the above by taking W as the geometric mean of X, S: W = S 1/2 (S 1/2 XS 1/2 ) 1/2 S 1/2 = X 1/2 (X 1/2 SX 1/2 ) 1/2 X 1/2
25 Given X, S 0, define τ(x, S) := X S n. Primal-Dual Algorithm for Semidefinite Programming 1. Choose σ (0, 1) 2. Choose (X 0, y 0, S 0 ) such that X 0, S 0 0 3. For k = 0, 1,... Compute Nesterov-Todd direction for (X, y, S) = (X k, y k, S k ), τ := στ(x k, S k ) Choose steplength θk and set (X k+1, y k+1, S k+1 ) := (X k, y k, S k ) + θ k ( X, y, S)
26 References and further reading S. Boyd and L. Vandenberghe (2004), Convex optimization, Chapter 11 S. Wright (1997), Primal-dual interior-point methods, Chapters 5 and 6 J. Renegar (2001), A mathematical view of interior-point methods Y. Nesterov and M. Todd (1998), Primal-dual interior-point methods for self-scaled cones. SIAM J. Optim.