Convex Optimization Lecture 13

Convex Optimization Lecture 13 Today: Interior-Point (continued) Central Path method for SDP Feasibility and Phase I Methods From Central Path to Primal/Dual

Central'Path'Log'Barrier'Method Init: Feasible&# " and&some&5 (") Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# > # >GH # (5 > ) Stop&if& f Ñ ϵ 5 >GH ï 5 > (for&some&parameter&ï > 1) Access&to: 2 nd order&oracle&for&! ",&! _ Explicit&access&to&*, + Strictly&feasible&point&# " Assumptions:! " convex&and&selfgconcordant! _ convex&quadratic&(or&linear) # " strictly&convex&with&! _ # " < ö John von&neumann Narendra Karmarkar Overall&#Newton&Iterations:&V = (log H [ + log log H õ) Overall&runtime:& V = N + ú à + =:& ' :evals log H [ Arkadi Nemirovski Yuri Nesterov

Optimizing'with'Matrix'Inequalities min :! "(#) / R 2 3. 5.! _ # 0 *# = +! _ :R 9 > è Central&path&given&by&solutions&to: min :! / R 2 " # _ log det(! _ # ) 3. 5. *# = +

min :! / R 2 " # H log det! Ñ _`H _ # 3. 5. *# = + ã Ñ #, B = f Optimum&# Ñ,&dual&opt&B Ñ! " # H Ñ log det! _ # _ + B, *# + 0 = & / ã Ñ # Ñ, B Ñ = &! " # Ñ + éh! _ Ñ _ # éh Ñ &! _ (# Ñ ) + *? B Ñ Define&D Ñ = éh Ñ! _ # Ñ éh 0 ã #, D, B =! " # + D _,! _ # _ + B, *# + & / ã # Ñ, D Ñ, B Ñ min :! "(#) / R 2 3. 5.! _ # 0,&*# = + # Ñ is&strictly&feasible How&suboptimal&is&# Ñ? = &! " # Ñ + D Ñ &! _ (# Ñ ) _ + *? B Ñ = 0 S D Ñ, B Ñ = inf / ã #, D Ñ, B Ñ = ã # Ñ, D Ñ, B Ñ =! " # Ñ H Ñ! _ # Ñ éh,! _ (# Ñ ) + B Ñ,*# Ñ + =! " # Ñ e èß > è Ñ D Ñ, B Ñ dual&(strictly)&feasible&with f! " # Ñ S D Ñ, B Ñ = _`H U _ 5

Optimizing'with'Matrix'Inequalities min :! "(#) / R 2 3. 5.! _ # 0 > è *# = + min :! / R 2 " # H log det(! Ñ # ) 3. 5. *# = + An&optimum&# (5) for&the&5gbarrier&problem&is&f = suboptimal&for&constrained&problem Central&Path&method: e èß > è Ñ Init: Feasible&# " and&some&5 (") Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# > # >GH # (5 > ) Stop&if& e èß > è ϵ Ñ 5 >GH ï 5 > (for&some&parameter&ï > 1)

Central'Path'Method'for'SDP Init: Feasible&# " and&some&5 (") Do: Solve&5 > Gbarrier&problem&using&Newton&starting&at&# > # >GH # (5 > ) Stop&if& e èß > è ϵ Ñ 5 >GH ï 5 > (for&some&parameter&ï > 1) Access&to: 2 nd order&oracle&for&! ",&! _ Explicit&access&to&*, + Strictly&feasible&point&# " Assumptions:! " : R 9 R convex&and&selfgconcordant! _ : R 9 > è convex&quadratic&(or&linear) # " strictly&feasible&with&! _ # " öç: Overall&#Newton&Iterations:&V =U (log H [ + log log H õ) Overall&runtime:& V =U N + ú à + =:& ' :evals log H [ Arkadi Nemirovski Yuri Nesterov

Feasibility and Phase I Methods Recall that in the Log Barrier Central Path method we need to start with a (strictly) feasible x (0). Two phases: Phase I : Solve feasibility problem Phase II : Use solution as starting point for barrier method We can convert feasibility to an optimization problem: (P ) Find x R n s.t. f i (x) 0 min s x R n,s R s.t. f i (x) s This optimization problem is always feasible: we can start from a solution to Ax (0) = b and set s = max i f i (x (0) ). Then we can apply the log barrier method to solve the optimization problem. ( P )

min x R n,s R s.t. s f i (x) s How well do we need to optimize? If we find a P -feasible (x, s) with s < 0 x is strictly P -feasible If we get an ɛ-suboptimal solution to P with s > ɛ P is infeasible Otherwise, there could be a solution that is feasible but not strictly so

Can convert feasibility to optimization with matrix constraints too: Find x R n s.t. f i (x) 0 min s x R n,s R s.t. f i (x) si Finally, note that we can also reduce optimization to feasibility: min f 0 (x) s.t. f i (x) 0 Find x s.t. f i (x) 0 f 0 (x) s (P s ) then search over s.

From Central Path to Primal/Dual Let us review our approach. We would like to solve the KKT of (P): (KKT) f i (x) 0 λ 0 f 0 (x) + i λ if i (x) + A ν = 0 λ i f i (x) = 0 At each iteration we consider problem (P t ), i.e., solving: f 0 (x) + i 1 tf i (x) f i(x) + A ν = 0 And we do this by Newton: linearize w.r.t. x (and ν) around x (k).

This can be viewed as solving modified KKT: (KKT t ) f i (x) 0 λ 0 f 0 (x) + i λ if i (x) + A ν = 0 λ i f i (x) = 1/t Solve by: (i) Eliminate λ i = 1 tf i (x), and get a problem in (x, ν) (ii) Linearize w.r.t. (x, ν) around x (k) Instead, in P/D we maintain both x (k) and λ (k), and linearize (KKT t ) w.r.t. both x and λ around x (k) and λ (k), without first eliminating λ.

Primal-dual method Define the residuals: r pri (x) = Ax b r dual (x, λ, ν) = f 0 (x) + λ i f i (x) + A ν i λ 1 f 1 (x) + 1/t r cent(t) (x, λ) =. λ m f m (x) + 1/t R p R n R m Jointly: r (t) (x, λ, ν) = (r pri, r dual, r cent(t) ) R p+n+m If x, λ, ν satisfy r (t) (x, λ, ν) = 0 (and f i (x) < 0, λ > 0), then x = x (t), λ = λ (t), and ν = ν (t).

610 11 Interior-point methods Therefore, at each iteration we approximately solve: r (t) (x + x, λ + λ, ν + ν) = 0 s.t. f i (x r cent + x) = diag(λ)f(x) 0 (1/t)1, λ + λ 0 This is done by linearizing w.r.t. x, λ: Boils down to: duality gap m/t. Thefirstblockcomponentofr t, r dual = f 0 (x)+df(x) T λ + A T ν, is called the dual residual, andthelastblockcomponent,r pri = Ax b, iscalled the primal residual. Themiddleblock, is the centrality residual, i.e., theresidualforthemodifiedcomplementaritycondition. Now consider the Newton step for solving the nonlinear equations r t (x, λ, ν) = 0, for fixed t (without first eliminating λ, asin 11.3.4), at a point (x, λ, ν) that satisifes f(x) 0, λ 0. We will denote the current point and Newton step as y =(x, λ, ν), y =( x, λ, ν), r (t) (x, λ, ν + ν) + x r (t) (x, λ, ν) x + λ r (t) (x, λ, ν) λ respectively. The Newton step is characterized by the linear equations r t (y + y) r t (y)+dr t (y) y =0, i.e., y = Dr t (y) 1 r t (y). In terms of x, λ, andν, wehave 2 f 0 (x)+ m i=1 λ i 2 f i (x) Df(x) T A T x diag(λ)df(x) diag(f(x)) 0 λ A 0 0 ν = r dual r cent r pri. (11.54) = ( x pd, λ pd, ν pd ) is defined as the The primal-dual search direction y pd solution of (11.54). The primal and dual search directions are coupled, both through the coefficient matrix and the residuals. For example, the primal search direction x pd depends on the current value of the dual variables λ and ν, aswellasx. Wenotealsothat if x satisfies, i.e., theprimalfeasibilityresidualr pri is zero, then we have A x pd =0,so x pd defines a (primal) feasible direction: for any s, x + s x pd will satisfy A(x + s x pd )=b. while always maintaining f i (x) < 0 and λ i > 0

It follows that: r pri (x) = 0 x is primal feasible r dual (x, λ, ν) = 0 x L(x, λ, ν) = 0, so x minimizes L, and g(λ, ν) = f 0 (x) + λ i f i (x) + ν (Ax b) >, i so (λ, ν) are dual feasible If in addition we have r cent = 0, then: g(λ, ν) = f 0 (x) + i λ i 1 λ i t + 0 = f 0(x) m t So the gap between (P) and (D): f 0 (x) g(λ, ν) m t. suboptimality m t

Even if r cent 0, as long as r pri = 0 and r dual = 0, then g(λ, ν) = f 0 (x) + i λ i f i (x) f 0 (x) g(λ, ν) = λ i f i (x) i }{{} ˆη(x,λ) where ˆη(x, λ) > 0 is the surrogate gap, and we are ˆη suboptimal.

Primal-dual interior-point algorithm Start at initial x (0), λ (0), ν (0) s.t. f i (x (0) ) < 0 and λ (0) i > 0 Iterate: Determine t (k) : set t (k) = µ m ˆη(x (k),λ (k) ) Compute search direction: Linearize (KKT k ) for x = x (k) + x, λ = λ (k) + λ, ν = ν (k) + ν Solve to obtain x (k), λ (k), ν (k) Set step size s (k) by line search on r (t) (x, λ, ν), ensuring f i (x) < 0 and λ i > 0 Update: (x (k+1), λ (k+1), ν (k+1) ) += s (k) ( x (k), λ (k), ν (k) ) Stop if: r pri < ɛ feas and r dual < ɛ feas (approx. feasible), and ˆη(x (k), λ (k) ) < ɛ Important: x (k) need not be feasible OK if Ax (k) b Also, (λ (k), ν (k) ) need not be feasible g(λ (k), ν (k) ) can be Advantages: single loop, no phase I

Why no need for phase I? We don t need to ensure, but we do need f i (x) < 0 and λ > 0. We can rewrite (P ) as: min x R n,s R s.t. f 0(x) f i (x) s s = 0 Now we can start with any x (0) s.t. f i (x (0) ) <, then set s = max i f i (x (0) ) + 1.

If finding such x (0) is hard, we can rewrite as: min x R n s R x 1,...,x m R n s.t. f 0 (x) f i (x i ) s s = 0 x = x i i Then can find point in domain for each f i separately. But many more variables (mn)