Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

Similar documents
18. Primal-dual interior-point methods

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

12. Interior-point methods

Advances in Convex Optimization: Theory, Algorithms, and Applications

Interior Point Algorithms for Constrained Convex Optimization

10. Unconstrained minimization

Lecture 6: Conic Optimization September 8

12. Interior-point methods

Lecture 9 Sequential unconstrained minimization

Barrier Method. Javier Peña Convex Optimization /36-725

Primal-Dual Interior-Point Methods. Javier Peña Convex Optimization /36-725

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

CS711008Z Algorithm Design and Analysis

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming

10 Numerical methods for constrained problems

Lecture 17: Primal-dual interior-point methods part II

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Nonsymmetric potential-reduction methods for general cones

Primal-Dual Symmetric Interior-Point Methods from SDP to Hyperbolic Cone Programming and Beyond

Interior Point Methods. We ll discuss linear programming first, followed by three nonlinear problems. Algorithms for Linear Programming Problems

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Primal-Dual Interior-Point Methods

15. Conic optimization

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

Second-order cone programming

Newton s Method. Javier Peña Convex Optimization /36-725

Analytic Center Cutting-Plane Method

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Supplement: Universal Self-Concordant Barrier Functions

More First-Order Optimization Algorithms

Homework 4. Convex Optimization /36-725

Optimization for Machine Learning

Nonlinear Optimization for Optimal Control

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

Convex Optimization and l 1 -minimization

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Lecture 14 Barrier method

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Constrained Optimization and Lagrangian Duality

Lecture 3. Optimization Problems and Iterative Algorithms

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

A new primal-dual path-following method for convex quadratic programming

Interior-Point Methods

Largest dual ellipsoids inscribed in dual cones

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lecture 7: Convex Optimizations

Unconstrained minimization

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

9. Dual decomposition and dual algorithms

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

POLYNOMIAL OPTIMIZATION WITH SUMS-OF-SQUARES INTERPOLANTS

Lecture 14: Optimality Conditions for Conic Problems

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Algorithms for nonlinear programming problems II

Interior Point Methods for Mathematical Programming

Written Examination

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

A CONIC DANTZIG-WOLFE DECOMPOSITION APPROACH FOR LARGE SCALE SEMIDEFINITE PROGRAMMING

The Q Method for Symmetric Cone Programmin

Interior Point Methods in Mathematical Programming

Lecture 5. The Dual Cone and Dual Problem

Lecture: Duality of LP, SOCP and SDP

Lecture 16: October 22

Algorithms for nonlinear programming problems II

Full Newton step polynomial time methods for LO based on locally self concordant barrier functions

Lecture 8. Strong Duality Results. September 22, 2008

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Interior-Point Methods for Linear Optimization

minimize x subject to (x 2)(x 4) u,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Unconstrained minimization: assumptions

Lecture 24: August 28

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 4

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Convex Optimization M2

IMPLEMENTATION OF INTERIOR POINT METHODS

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

ICS-E4030 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

10-725/ Optimization Midterm Exam


Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Interior Point Methods for LP

Primal-dual IPM with Asymmetric Barrier

Semidefinite Programming

4TE3/6TE3. Algorithms for. Continuous Optimization

Convex Optimization Lecture 13

Algorithms for constrained local optimization

Self-Concordant Barrier Functions for Convex Optimization

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Transcription:

Agenda Interior Point Methods 1 Barrier functions 2 Analytic center 3 Central path 4 Barrier method 5 Primal-dual path following algorithms 6 Nesterov Todd scaling 7 Complexity analysis

Interior point methods Primal (P) Dual (D) minimize subject to c T x Gx + s = h Ax = b s K 0 maximize h T z b T y subject to A T y + G T z + c = 0 z K 0 Wlog (why?), work with [G; A] full col rank Interior point methods (IPMs): maintain primal + dual strict feasibility while working toward complementary slackness (x k, s k ) primal feasible with s k 0 (y k, z k ) dual feasible with z k 0 z k, s k 0

Main idea minimize subject to [ tc T x + ϕ(s) [ ] [ G I x h = A 0] s b] ϕ is a barrier function defined on int(k) with following properties: strictly convex analytic self-concordant blows up when s approaches K For each t, unique minimizer (x(t), s(t)) [requires tiny bit of thought] Limiting points as t are primal-optimal solutions Smooth curve (x(t), s(t)) usually called the central path

Canonical cones and canonical barriers 1 K = R n + ϕ(x) = Σ n i=1logx i 1 1 ) 1 ϕ(x) = (,, 2 ϕ(x) = diag( x 1 x n x 2,, 1 1 ) x 2 n 2 K = L = {(x 1, x 2,..., x n ) : x n 2 x n } ϕ(x) = 1 2 log(xt Jx) ϕ(x) = Jx x T Jx 2 ϕ(x) = J x T Jx + 2(Jx)(Jx)T (x T Jx) 2 J = [ ] I 0 0 1 Note [ 2 ϕ(x)] 1 = 2xx T (x T Jx)J

3 K = S n + ϕ(x) = X 1 ϕ(x) = log detx 2 ϕ(x)h = X 1 HX 1

Self concordance (1) (2) (3) are strictly convex and self-concordant Implication of self-concordance Newton s method extremely effective at minimizing smooth, cvx, self-concordant objectives

Barrier function for composite cones x = (x 1,..., x m ), x i K i Product K = K 1 K m Barrier: ϕ(x) = i ϕ i (x i ) each ϕ i SC = ϕ SC

Properties of barrier functions: Generalized logarithm (i) ϕ(tx) = ϕ(x) θ(ϕ) log t, for t > 0 θ(ϕ) = n for R n + θ(ϕ) = 1 for L θ(ϕ) = n for S n + Further properties following from (i) (ii) (iii) ϕ(x), x = θ(ϕ) [ 2 ϕ(x)]x = ϕ(x)

Barriers are self dual K: cone product where each component is either LP, SOCP or SDP cone For every x in K, ϕ(x) K The mapping is self inverse and homogenous of degree 1 int(k) int(k) x ϕ(x) ϕ( ϕ(x)) = x ϕ(tx) = t 1 ϕ(x) x int(k), t > 0

Analytic center minimize subject to [ ϕ(s) [ ] [ G I x h = A 0] s b] Convex program Solution strictly feasible Unique solution (x, s )

Computing analytic center Newton s method + line search (P ) minimize f(x) (cvx) subject to Ax = b Pure Newton s method: sequence {x k }, k = 0, 1, 2,... Input: x 0 feasible Repeat x k+1 = arg min Ax=b [f(x k )+ f(x k )(x x k )+ 1 ] 2 (x x k) T [ 2 f(x k )](x x k ) until convergence

With v = x k+1 x k, this boils down to minimize f(x k)v + 1 2 vt 2 f(x k )v subject to Av = 0 Optimality conditions { f(xk ) + 2 f(x k )v + A T λ = 0 Av = 0 or in matrix form [ ] [ [ ] 2 f(x k ) A T v f(xk ) = A 0 λ] 0

Problem: Can get outside the feasible set Solution: Line search : x k+1 = x k + tv Exact line search : ˆt = argminf(x k + tv) Backtracking line search : 0 < α, β < 1 while f(x + tv) > f(x) + α f(x), tv do t = βt

Complexity analysis f : cvx & self concordant Repeat until convergence (1) Compute Newton direction v (2) Compute ˆt from line search (3) Update x = x + ˆtv Theorem Assume ɛ < 1 2. Then f(x k) f ɛ if k f(x 0 ) f + log 2 log 2 ɛ 1 For practical purposes, log 2 log 2 ɛ 1 is constant, e.g. 5

This lecture: K = K (symmetry) minimize c T x (P) subject to Gx + s = h Ax = b s 0 Same cone maximize h T z b T y (D) subject to A T y + G T z + c = 0 z 0

Central path minimize subject to c, x + t 1 ϕ(s) Gx + s = h Ax = b Optimality conditions Important consequence (x, s) feasible and s 0 c + A T y + G T z = 0 (y, z) dual feasible and z 0 t 1 ϕ(s) = z Optimality conditions (t = ) (x, s) primal feasible (y, z) dual feasible (s, z) 0 Complementary slackness: s, z = 0 Central path (x, s) primal feasible (y, z) dual feasible (s, z) 0 Relaxed compl. slack. : s, z = θ(k)/t

Dual central path (D) (Dual CP) maximize h T z b T y subject to A T y + G T z + c = 0 z 0 minimize ht z + b T y + t 1 ϕ(z) subject to A T y + G T z + c = 0 Theorem Primal and dual central paths linked via { z (t) = t 1 ϕ(s (t)) s (t) = t 1 ϕ(z (t)) There is only one central path (s(t), z(t)) and s(t), z(t) = 1 t θ(k)

Proof (x, s) on central path Gx + s = h, Ax = b, s 0 (y, z) : A T y + G T z + c = 0 t 1 ϕ(s) = z Dual central path minimize ht z + b T y + t 1 ϕ(z) subject to A T y + G T z + c = 0 Lagrangian: h T z + b T y + 1 t ϕ(z) xt (A T y + G T z + c) Optimality conditions (y, z) on dual central path Unique CP since A T y + G T z + c = 0, z 0 x : b Ax = 0 s = h Gx = t 1 ϕ(z) z = t 1 ϕ(s) s = t 1 ϕ(z)

s, z = s, t 1 ϕ(s) = θ(k)/t

Characterization of central path (CP 1 ) (CP 2 ) (CP 3 ) s (t) strictly feasible z (t) strictly feasible augmented complementary slackness z (t) = t 1 ϕ(s (t)) In the case of SDP : tz (t) = [s (t)] 1 = z (t)s (t) = t 1 Id = trace(z (t)s (t)) = t 1 n (CP 1 ) - (CP 2 ) - (CP 3 ) fully characterize CP

Duality gap along CP c T x + b T y + h T z = y T Ax z T Gx + b T y + h T z = (h Gx) T z = s T z = 1 t θ(k) Proposition Duality gap along CP is t 1 θ(k). In particular, c T x p θ(k) t d + b T y + h T z θ(k) t Therefore, as t (x(t), s(t)) opt. sol. (y(t), z(t)) opt. sol.

Path following algorithm Start with t = t 0 and (x (t 0 ), s (t 0 )) Increase t = t 1 > t 0 and compute (x(t 1 ), s(t 1 )) using Newton s method with (x (t 0 ), s (t 0 )) as initial guess Few Newton iterations because we may be inside the region of quadratic convergence

Barrier method min subject to c T x Gx + s = h Ax = b s 0 Given strictly fasible (x, s), t = t 0, µ > 1 and tol > 0, repeat 1 Centering step 2 Update (x, s) = (x(t), s(t)) 3 Quit if θ(k)/t < tol 4 Increase t = µt min tc T x + ϕ(s) s. t. Gx + s = h Ax = b

Primal-dual path following methods Closely related to barrier methods Follow CP to find approximate solutions Steps are computed by linearizing CP equations Gx + s = h (s, z) 0 Central path: Ax = b A T y + G T z + c = 0 z = 1 t ϕ(s) e.g. SDP: G t (s, z) := z 1 t s 1 = 0

Main idea: From (t, s, z), update into (t +, s +, z + ) (i) Equivalent system Ḡ t (s, z) = 0 (ii) Choose t + > t and linearize equation Ḡ t+ (s + s, z + z) Ḡt + (s, z) + Ḡt + s s + Ḡt + z z = 0

Suppose current guess is feasible (iii) Solve system and update G x + s = 0 A x = 0 A T y + G T z = 0 Ḡt + s s + Ḡt + z z = Ḡt + (s, z) { s+ = s + α s z + = z + β z

Symmetrization: How do we construct the system Ḡt(s, z) = 0? SDP : z = 1 t s 1 zs = 1 t Id sz = 1 t Id Popular approach: Make system symmetric in s and z 1 2 (sz + zs) = 1 t Id Fact [requires some thought]: (s, z) 0 1 t s 1 = z 1 2 (sz + zs) = 1 t Id Leads to Alizadeh-Haeberly-Overton search direction and the sz + zs primal-dual path following method

Other symmetrizations LP : With (s, z) 0 and s z = (s i z i ) i=1,...,m z = 1 t ϕ(s) s z = 1 t 1

SOCP : {x = (x, x n ) R n : x x n } ϕ(x) = 1 2 log[d x] ϕ(x) = 1 D x [ x x n ] D x = x 2 n x 2 Then z = 1 t ϕ(s) { tz = 1 D s s tz n = 1 D s s n { z s n + z n s = 0 tz n = 1 D s s n where the second equivalence follows from 1/D s = tz n /s n. Since t z, s + tz n s n = s2 n s 2 D s = 1 we have z = 1 t ϕ(s) { z s n + z n s = 0 s, z + s n z n = 1/t

Scaling Idea for SDP: Q 0 Z = 1 t S 1 SZ = 1 t Id QSZQ 1 = 1 t Id ZS = 1 t Id Q 1 ZSQ = 1 t Id 1 2 [QSZQ 1 + Q 1 ZSQ] = 1 t Id complete freedom in choosing Q Q can vary from one iteration to the next

Change of coordinates 1 2 [QSQQ 1 ZQ 1 + Q 1 ZQ 1 QSQ] = 1 t Id Change of coordinates { S = QSQ 0 Z = Q 1 ZQ 1 0 Preserves positive definite cone Preserves central path Convergence analysis is simplified considerably when at each iteration, Q is chosen such that S and Z commute when (S, Z) are iterates to be updated

Nesterov-Todd scaling: S = Z General Scaling (S, Z) 0 G t (S, Z) = 0 W scaling matrix if multiplications with W and W T preserve the cone multiplications preserve the central path (S, Z) on CP (W S, W T Z) on CP Example K = S+ n W (S) = QSQ T W T (S) = Q T SQ W T (S) = Q T SQ 1 W is a scaling matrix: preserves cone and central path Positive scaling Q 0 W (S) = QSQ W T (Z) = Q 1 ZQ 1

Nesterov-Todd scaling Used in SEDUMI and SDPT 3 W associated with Ŝ, Ẑ such that W T Ẑ = W Ŝ = λ implies Ŝ, Ẑ = λ 2 W T W = 2 ϕ(w) where w is the unique point obeying 2 ϕ(w)ŝ = ẑ

NT for S n + Q 0 W (S) = QSQ W T (Z) = Q 1 ZQ 1 W T W (S) = Q 2 SQ 2 [ 2 ϕ(p )]S = P 1 SP 1 Q = P 1/2 P 1 ŜP 1 = Ẑ P = Ŝ1/2 (Ŝ1/2 ẐŜ1/2 ) 1/2 Ŝ 1/2 Can be computed by Cholesky or SVD computations

NT for R n + Positive diagonal scaling: W = diag(w i ) w i ŝ i = 1 ẑ i w i ẑi w i = ŝ i λ = W ŝ = W T ẑ = { ŝ i ẑ i } i

NT for Lorentz cone: Ben-Tal and Nemirovski, Chapter 6.8 v = W = β(2vv T J) w + e n 2(wn + 1) w = 1 2γ γ = [ Jŝ ŝjŝ + [ẑt Jẑ β = ŝ T Jŝ ] ẑ ẑt Jẑ [ 1 2 + ŝ T ẑ 2 ŝjŝ ẑ T Jẑ ] 1/2 ] 1/4

Basic primal-dual update Current (ŝ, ẑ) and (ẑ, ŷ) ŝ 0, ẑ 0 1. Set t such that ŝ T ẑ = 1 t θ(k) and scaling W for ŝ, ẑ 2. Choose t + = µt (µ > 1) 3. Solve the KKT system by linearizing CP equation Gx + s = h Ḡ t+ (W s, W T z) = 0 Ax = b A T y + G T z + c = 0 around ŝ, ẑ 4. Update { (s+, x + ) = (ŝ, ˆx) + α p ( s, x) (z +, y + ) = (ẑ, ŷ) + α d ( z, y) such that positivity is preserved ŝ +, ẑ + 0

Linearized CP equations Gˆx + ŝ h Aˆx b := r A T ŷ + G T ẑ + c (r = 0 if strictly feasible) Linear system and G x + s A x = r A T y + G T z Ḡ t+ (W T ŝ, W ẑ) + Ḡt + [ W T s + W z ] = 0 where the scaling obeys W T ŝ = W ẑ = λ

The linearized equation LP: Ḡ t = s z 1 t e & e = 1 s z = sizi SDP: Ḡ t = 1 2 [SZ + ZS] 1 t Id = s z 1 t e s z = 1 [SZ + ZS] & e = Id 2 0 [ ] sn z SOCP: s z = + z n s = s n z 1 t. n 0 = 1 t e 1 Ḡ t+ (λ, λ) = λ λ 1 t + e gives Linearized equation reads Ḡt + [ W T s + W z ] = λ [W T s + W z ] λ [W T s + W z ] = 1 t + e λ λ

Path following algorithm Choose starting points ŝ 0, ẑ 0, ˆx and ŷ 1. Compute residuals and evaluate stopping criteria r = Gˆx + ŝ h Aˆx b A T ŷ + G T ẑ + c Terminate if r and ŝ T ẑ sufficiently small 2. Compute scaling matrix W λ = W T ŝ = W ẑ, 1 t := ŝt ẑ θ(k)

3. Computing search directions: solve G x a + s a A x a = r and λ [W z a + W T ] s a = λ λ A T y a + G T z a 4. Select barrier parameter: [ (ŝ + αp s) T (ẑ + α d ẑ) σ = ŝ T ẑ δ: algorithm parameter (typical value is δ = 3) α p = sup {α [0, 1], ŝ + α s a 0} ] δ t + = t/σ α d = sup {α [0, 1], ẑ + α z a 0}

5. Compute search direction G x + s A x = r A T y + G T z λ [W z + W T s ] = 1 t + e λ λ 6. Update iterates (ˆx, ŝ) = (ˆx, ŝ) + min {1, 0.99α p } ( x, s) (ŷ, ẑ) = (ŷ, ẑ) + min {1, 0.99α d } ( y, z) α p = sup {α 0, ŝ + α s 0} α d = sup {α 0, ẑ + α z 0}

Interpretation Step 3: affine scaling directions solve linearized CP equations Step 4: heuristic for updating t t + based on an estimate of the quality of the affine scaling direction σ small if the step in affine scaling direction large reduction in ŝ T ẑ Step 5: system has same coefficient matrix as in step 3 Direct method solve two equations with the cost of one (i.e. can reuse the matrix factorization)

Mehrotra correction Step 5: Solve the same system but with RHS of linearized equations 1 e λ λ [ W T ] s a W z a t + Extra term is the approximation of second order term in Typically saves a few iterations W T (ŝ + ŝ) W (ẑ + ẑ) = 1 t + e

Newton equations Eliminating s reduces to 0 AT G T A 0 0 G 0 W T W Eliminating z [ G T W 1 W T G A T A 0 Because W T W = 2 ϕ(w) 1 (NT scaling) x y = RHS z ] [ ] x = RHS y G T W 1 W T G = G T 2 ϕ(w)g Hessian of barrier ϕ(h Gx) at scaling point w

Complexity analysis: SDP Short step path following methods based on commutative scalings (e.g. NT) (ˆt, ŝ, ẑ) N 0.1 ˆtŝ 1/2 ẑŝ 1/2 I 2 0.1 ŝ, ẑ strictly feasible 1. Choose a new value of t: ( t + = 1 χ ) 1 ˆt χ : parameter n 2. Solve linearized CP equations with commutative scaling

Key result Theorem If χ 0.1, then ŝ +, ẑ + strictly feasible and ŝ +, ẑ + = n t + and (ˆt +, ŝ +, ẑ + ) N 0.1 Same proximity to CP Value of centrality parameter larger by 1 + O(1) n factor Once we reach N 0.1 trace primal-dual CP by staying in N 0.1 and increasing parameter by an absolute constant factor every O( n) steps

In general t + = ( 1 0.1 θ(k) )t Once we managed to get close to the CP, then every O( θ(k)) steps of the scheme improves the quality of approximations by an absolute constant factor In particular, it takes no more than O(1) θ(k) log steps to generate a strictly feasible ɛ-solution ( 1 + θ(k) ) t 0 ɛ

References 1 A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, MPS-SIAM Series on Optimization 2 S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press 3 L. Vandenberghe, EE236C (Spring 2011), UCLA