Convex Optimization Prof. Nati Srebro Lecture 12: Infeasible-Start Newton s Method Interior Point Methods
Equality Constrained Optimization f 0 (x) s. t. A R p n, b R p Using access to: 2 nd order oracle for f 0 x f 0 x, f 0 x, 2 f 0 (x) Explicit access to A, b
Equality Constrained Newton Init Feasible x 0 (x 0 dom(f) and Ax 0 = b) Iterate Solve: 2 f x k A Δx k f x k = A 0 ν b Ax k Stop if λ x k 2 = Δx k 2 f x k Δx k ε Set t k by backtracking linesearch x k+1 x k + t k Δx k Equivalent to unconstrained Newton on ሚf z = f Fz + x 0 where null A = image(f) Input: feasible x 0 Do: Use Newton to miize ሚf z = f 0 Fz + x 0 starting from z 0 = 0 Return Fz k + x 0 f 0 self-concordant ሚf self-concordant k = O f 0 x 0 f 0 x + log log 1 ε
Equality Constrained Newton Init Feasible x 0 (x 0 dom(f) and Ax 0 = b) Iterate Solve: 2 f x k A Δx k f x k = A 0 ν b Ax k Stop if λ x k 2 = Δx k 2 f x k Δx k ε Set t k by backtracking linesearch x k+1 x k + t k Δx k Is x k+1 f x k+1 feasible? < (otherwise we backtrack) Equality constraint: Ax k+1 = Ax k + t k AΔx k = b If x k AΔx k = b Ax k = 0 feasible
Starting at a Feasible Point Is it always easy to find feasible x 0? (i.e. f x 0 <, Ax 0 = b) E.g.: s.t. σ n i=1 log(x i ) Requires finding a feasible point for LP, which is as hard as LP optimization: s.t. c x Gx h bisection search over θ,z R m t R s.t. σ i=1 m log z i log(t) Gx + z = h t = c x θ
KKT conditions: Infeasible Start Newton r P x = Ax b = 0 r D x, ν = f x + A ν = 0 Linearized about x = x k + Δx and ν = ν k Δν 2 f x k A Δx f x k A = v k A 0 Δν b Ax k Init x 0 dom(f) and any ν 0 Iterate Solve: 2 f x k A A 0 Set t k Δx Δν by backtracking linesearch = f x k b Ax k on r x 2 A v k (while r x k + tδx k, ν k + tδν k > 1 αt r x k, ν k, set t βt) x k+1 x k + t k Δx k, ν k+1 ν k + t k Δν k Use r x, ν 2 = [r P x r D x, ν 2 as measure of progress (See Boyd and Vandenberghe p. 533)
Will x k Infeasible Start Newton r P x = Ax b = 0 r D x, ν = f x + A ν = 0 r x, ν = [r P r D ] Init x 0 dom(f) and any ν 0 Iterate be feasible? We will always have f x k Stop if Ax k = b and r x k, ν k ε Solve: 2 f x k A Δx f x k A = v k A 0 Δν b Ax k Set t k by backtracking linesearch on r x 2 x k+1 x k + t k Δx k, ν k+1 ν k + t k Δν k < (otherwise we backtrack) After a full step, with t = 1: Ax k+1 = A x k + 1 Δx k = Ax k + b Ax k = b If Ax k = b then: Ax k+1 = A x k + tδx k = Ax k + t b Ax k = b Conclusion: once we enter quadratic phase, x k feasible
Equality Constrained Optimization f 0 (x) s. t. A R p n, b R p Using access to: 2 nd order oracle for f 0 x f 0 x, f 0 x, 2 f 0 (x) Explicit access to A, b x 0 dom(f 0 ) Analysis: If f 0 convex and self-concordant Ax 0 = b Then, feasible ε-suboptimal solution at: k = O f 0 x 0 f 0 x + log log 1 ε
Adding Inequality Constraints f 0 (x) s. t. f i x 0 i = 1.. m A R p n, b R p Using access to: 2 nd order oracle for f 0, f i, h i x f 0 x, f 0 x, 2 f 0 (x) x f i x, f i x, 2 f i x Explicit access to A, b x 0 dom f 0 x, x i 0 dom f i x (but for now, assume access to feasible x 0 )
Optimization with Implicit Constraints We already saw implicit constraints, e.g.: Generally not a problem for descent methods: if x k interior dom f 0, linesearch will find x k+1 = x k + t k Δx k dom(f 0 ) In fact, never leave sublevel set x f 0 x f 0 x 0 dom f 0 Runtime: s.t. With Gradient Descent k max x S f 0 (x) σ i=1 n log(x i ) x S f 0 (x) But with Newton, for self concordant: k f 0 x 0 f 0 x log 1 x 1 x 2 log x
Making Constraints Implicit f 0 (x) s. t. f i x 0 s. t. f 0 x + σ m i=1 I f i x s. t. f 0 x + σ m i=1 I t f i x 0, u 0 I u = ቊ, u > 0 I t u = 1 log u t 0, u < 0 ቊ t, u 0 I 1 I2 I 3 0
Barrier Methods s. t. f 0 x + σ m i=1 I t f i x Could use other barriers or penalties Log barriers I t u = log( u): Clean analysis with good guarantees Connection with dual (as we will see) Works well in practice barrier function 0 penalty function 0
Log Barrier s. t. Central Path: x (t) t > 0 x (t) strictly feasible for original (constrained) problem We will show: x t x f 0 x 1 σ t i=1 m log f i x f i x 0 x How suboptimal (for original problem) is x (t)? How do we set t? What s the complexity of optimizing with that t?
s. t. f 0 x 1 σ t i=1 m log f i x s. t. f 0 (x) f i x 0, L t x, ν = Optimum x t, dual opt ν t f 0 x 1 t σ i log f i x + ν, Ax b 0 = x L t x t, ν t = f 0 x t + σ i 1 tf i (x t ) f i(x t ) + A ν t Define λ t = 1 tf i (x t ) > 0 L x, λ, ν = f 0 x + σ i λ i f i (x) + ν, Ax b x L x t, λ t, ν t = f 0 x t + σ i λ t f i (x t ) + A ν t = 0 g λ t, ν t = inf x L x, λ t, ν t x t is strictly feasible How suboptimal is x t? = f 0 x t + σ 1 tf i x t = L x t, λ t, ν t f i (x t ) + ν t, Ax t b = f 0 x t m t λ t, ν t dual (strictly) feasible with f 0 x t g λ t, ν t = m t
Log Barrier: Sub-Optimality s. t. f 0 x 1 σ t i=1 m log f i x x t is m -suboptimal for original problem t To get ε-suboptimal, set t = mτ ε and solve using Newton Runtime? Number of Newton iterations? Assumptions: - f 0 (x) is self concordant - f i (x) is quadratic (or linear) log f i x self conc Objective not self-concordant, but scaled by t it is: #Newton iterations: ሚf x = tf 0 x σ m i=1 log f i x k ሚf x 0 ሚf x t = t f 0 x 0 f 0 x t + σ i log f i x t f i x 0 m ε + Overall runtime: O m n+p 3 ε f 0 x 0 f 0 x typically 0, if x 0 not too close to boundary
Central Path Log Barrier Method When do we stop Newton? Why nested loops OK? Stop at machine precision; OK since Newton gets there quickly Sketch of analysis: #steps to reach m t ϵ Init: Feasible x 0 and some t (0) Do: Solve t k -barrier problem using Newton starting at x k x k+1 x (t k ) Stop if m ϵ t t k+1 μ t k (for some parameter μ > 1) #Newton to optimize barrier problem log m ε log μ t f 0 x tτμ f 0 x t + log f i x t = O m log m f i x Τ ε i t μ m(μ 1 log μ) μ = Using 1 + 1 m
Central Path Log Barrier Method Init: Feasible x 0 and some t (0) Do: Solve t k -barrier problem using Newton starting at x k x k+1 x (t k ) Stop if m ϵ t t k+1 μ t k (for some parameter μ > 1) Access to: 2 nd order oracle for f 0, f i Explicit access to A, b Strictly feasible point x 0 Assumptions: f 0 convex and self-concordant f i convex quadratic (or linear) x 0 strictly convex with f i x 0 < δ Overall #Newton Iterations: O m (log 1 Τε + log log 1 Τδ ) Overall runtime: O m n + p 3 + m 2 evals log 1 Τε Arkadi Nemirovski John von Neumann Narendra Karmarkar Yuri Nesterov