Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 218): Convex Optimiation and Approximation Instructor: Morit Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowit Email: msimchow+ee227c@berkeley.edu October 15, 218 26 Primal-dual interior point methods Previously, we discussed barrier methods and the so-called short-step method for solving constrained LPs, and proved that convergence is guaranteed (albeit slow). Herein, we study a primal-dual interior-point method (the so-called "long-step" path following algorithm), which similarly seeks to approximate points on the central path. Unlike the short-step method, the long-step method considers iterates of primal-dual variables and seeks a more aggressive step sie so long as it lies within a neighborhood of the central path. 26.1 Deriving the dual problem Let x R n be the decision variable, and A R m n, b R m and c R n min c x s.t. Ax = b, x Here, is pointwise. 1

Observe that we can always write the constraint as min x c x + max (b Ax) = min max x c x + max (b Ax) max b min (c x A ) x = max b I(A > c) Hence, the dual problem is as follows This is equivalent to max b s.t. A c max b s.t. A + s = c, s where we introduce the slack variable s R n. If (x,, s) are only feasible, then Ax = b A + s = c x, s Moreover, we can compute that for feasible (x,, s), x, s = x (c A ) = x, c Ax, = x, c b,. This is a proof of weak duality, namely that for any feasible x and, and therefore x, c b, x, c b, Moreover, if there exists an feasible (x,, s ), with x, s = then we have x, c = b, which is strong duality. Duality is also useful to bound the suboptimality gap, since in fact if (x,, s) is feasible, then x, s = x, c b, x, c x, c = x x, c 2

x 2 s2 1 C 2 3 N 8 (γ) x s 1 1 Figure 1: Require iterates to stay within a certain neighborhood of the central path. We want the pairwise products x i s i to be not too different for i = 1, 2,..., n. 26.2 Primal-dual iterates along the central path The above suggests the following approach. Consider iterates (x k, k, s k ), and define Define the strictly feasible set µ k := 1 n x k, s k = x k, c b, k n x k x, c n F o := {Ax = b A + s = c x, s > } Minimiing µ k thus amounts to minimiing a bilinear objective over a linear constraint set. The goal is to generate iterates (x k+1, k+1, s k+1 ) such that µ k+1 (1 Cn ρ )µ k This implies that x k x, c ɛ in k = O (n ρ log(n/ɛ)) steps. The goal is to find a tuple (x,, s) such that µ. We consider the following 3

approach. Define Ax b F τ (x,, s) := A + s c x s τ1 Then the goal is to approx solve F (x,, s ) = over F. We see that this can be obtained by computing the solutions (x τ, τ, τ ) to F τ (x,, s) =. We call the curve τ (x τ, τ, τ ) the central path. Note that, on the central path, x i s i = τ for some τ >. To ensure we stay close to the central path, we consider N (γ) := {(x,, s) F : min x i s i γµ(x, s)} i What we would like to do is take iterates (x k, k, s k ) such that µ k decreases, and (x k, k, s k ) N (γ) for appropriate constants γ. N (γ) ensures the nonnegativity contraints. This is portrayed in Figure 1. 1 Input: Parameters γ (, 1), < σ min < σ max < 1, and initialiation (x,, t ) N (γ). for t =, 1, 2,... do 2 Choose σ k [σ min, σ max ]; 3 Run Newton step on F σk µ k (to be defined). Let ( x k, k, s k ) denote the Newton step ( x k, k, s k ) = 2 F τk (w k ) 1 F τk (w k ), where τ k = σ k µ k and w k = (x k, k, s k ). Let α k (, 1] be the largest step such that α k = max{α (, 1] : (x k, k, s k ) + α( x k, k, s k ) N (γ)} Set (x k+1, k+1, s k+1 ) (x k, k, s k ) + α k ( x k, k, s k ). 4 end Algorithm 1: Long-step Path Following method 26.3 Generating Iterates with the Newton Step The Newton Step for solving fixed point equations F(w) =. Indeed F(w + d) = F(w) + J(w) d + o( d ) 4

Newton s Method in 1D This suggests an iterative scheme: Starting from some w 2 R N : d k = J(w k ) 1 F (w k ), w k+1 = w k + d k, k =, 1, 2,.... F(w) w * w 2 w 1 w w Wright (UW-Madison) Interior-Point Methods August 217 11 / 48 Figure 2: Recall that Newton s method iteratively finds better approximations to the roots (or eroes) of a real-valued function. The Newton s method then chooses w w + d, J(w)d = F(w) Which implies that F(w + d) = o( d ) for w sufficiently closed to the fixed point. This gives you the quick converge. Note that, if F is a linear map, then in fact one Newton step suffices. This can be seen from the Taylor expansion. Our function F τk is nearly linear, but not quite. Let s compute the Newton Step. We observe that the Jacobian is the linear operator A A I Diag(S) Diag(X) Moreover, since (x k, k, s k ) F, we have that Ax k b F τk (x k, k, s k ) = A k + s k c = x k s k τ k 1 x k s k τ k 1 Let s drop subscripts. Then, on one can verify that the Newton satisfies A x A I = Diag(S) Diag(X) s x s + τ1 Some remarks 1. A x =, and that A + s =. 5

2. This implies that (x +, +, s + ) := (x + x, +, s + s) satisfies Ax + b = and A + + s c = 3. We also have and thus 4. Thus, s x + x s = x s + τ1 x + s + = x s + ( x + x s) + x s = x s x s + τ1 + x s F τ (x + s + ) = x s In other words, if we can argue that x s is negligible, then the Newton step produces an almost exact solution. A more concrete analysis would be to study the term nµ(x + α x, s + α x) = x + x, s + s = x, s + α ( x, s + s, x ) + α 2 s, x The last term in the above display vanishes, as shown by the above = x (A + s) = (A x) + x, s = x, Moreover, since s x + x s = x s + τ1, we have by summing that x, s + s, x = x, x + τn = (1 σ) x, s where the last line uses nτ = nσµ = σnτ = σ x, s. Hence, nµ(x + α x, s + α x) = nµ(x, s)(1 α(1 σ)) Hence, if one can show that (1 α)α C(n) > for some constant depending on the dimension, then we see that nµ(x k+1 ) (1 C(n)) k nµ(x ) giving us the rate of decrease. One can then show with more technical work that α = Ω(1/n) while maintaining the N (γ) invariant. References 6