CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods March 23, 2018 1 / 35

This material is covered in S. Boyd, L. Vandenberge s book Convex Optimization https://web.stanford.edu/~boyd/cvxbook/. Some of the materials and the figures are taken from it. 2 / 35

Context Two weeks ago: unconstrained problems, solved with descent methods Last week: linearly constrained problems, solved with Newton s method This week: inequality constrained problems, solved with interior point methods 3 / 35

Inequality constrained minimization problems min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b f 0,..., f m : convex and twice continuously differentiable, A R p n, rank(a) = p < n) Assume: optimal solution x exists, with obj. value p. problem is strictly feasible (i.e., feasible region has interior points) Slater s condition hold: There exist λ and ν that, with x, satisfy KKTs. 4 / 35

Hierarchy of algorithms Transforming constrained problem to unconstrained: always possible, but has drawbacks Solving the constrained problem: direct, leverages problem structure What s the constrained problem class that is the easiest to solve? Quadratic Problems with Linear equality Constraints (LCQP) Only require to solve...a system of linear equations How did we solve generic problems with linear equality constraints? With Newton s method, which solves a sequence of...lcqps! We will solve inequality constrained problems with interior point methods, which solve a sequence of linear constrained problems! 5 / 35

Problem Transformation Goal: approximate the Inequality Constrained Problem (ICP) with an Equality Constrained Problem (ECP) solvable with Newton s method; We start by transforming the ICP into an equivalent ECP: From: To: min f 0 (x) s.t. f i (x) 0, i = 1,..., m Ax = b min g(x) s.t. Ax = b For g(x) = f 0 (x) + m I _ (f i (x)) where I _ (u) = i=1 { 0 u 0 u > 0 So we just use Newton s method and we are done. The End. Nope. 6 / 35

Logarithmic barrier min f 0 (x) + s.t. Ax = b m I _ (f i (x)) i=1 The obj. function is in general not differentiable: We can t use Newton s method. We want to approximate I _ (u) with a differentiable function: Î _ (u) = 1 t log( u) with domain R ++, and where t > 0 is a parameter 7 / 35

Logarithmic barrier The problem (11.3) has no inequality constraints, but its objective function is not (in general) differentiable, so Newton s method cannot be applied. 8 / 35 Î _ (u) 11.2is convex Logarithmic andbarrier differentiable function and central path 563 10 5 0 5 3 2 1 0 1 u Figure 11.1 The dashed lines show the function I (u), and the solid curves show Î (u) = (1/t) log( u), for t =0.5, 1, 2. The curve for t = 2 gives the best approximation.

Logarithmic barrier min f 0 (x) 1 t s.t. Ax = b m log( f i (x)) i=1 The objective function is convex and differentiable: we can use Newton s method φ(x) = m i=1 log( f i(x)) is called the logarithmic barrier for the problem 9 / 35

Example: Inequality form linear programming min c T x Ax b The logarithmic barrier for this problem is m φ(x) = log(b i a T i x) i=1 where a i are the rows of A. 10 / 35

How to choose t? min f 0 (x) + 1 t φ(x) s.t. Ax = b is an approximation of the original problem. How does the quality of the approximation change with t? As t grows, 1 t φ(x) tends to I _ (f i (x)) so the approximation quality increases So let s just use a large t? Nope. 11 / 35

Why not using (immediately) a large t? What s the intuition behind Newton s method? Replace obj. function with 2nd-order Taylor approximation at x: f(x + v) f(x) + f(x) T v + 1 2 vt 2 f(x)v When does this approximation (and Newton s method) work well? When the Hessian changes slowly Is it the case for the barrier function? 12 / 35

Back to the example min c T x s.t. Ax b φ(x) = 2 φ(x) = m log(b i a T i x) i=1 m i=1 1 (b i a T i x)2 a ia T i The Hessian changes fast as x gets close to the boundary of the feasible region. 13 / 35

Why not using (immediately) a large t? The Hessian of the function f 0 + 1 t φ varies rapidly near the boundary of the feasible set. This fact makes directly using a large t not efficient Instead, we will solve a sequence of problems in the form for increasing values of t min f 0 (x) + 1 t φ(x) s.t. Ax = b We start each Newton minimization at the solution of the problem for the previous value of t. 14 / 35

The central path Slight rewrite: min tf 0 (x) + φ(x) s.t. Ax = b Assume it has a unique solution x (t) for each t > 0. Central path: {x (t) : t > 0} (made of central points) 15 / 35

The central path Necessary and sufficient conditions for x (t): Strict feasibility: Ax (t) = b f i (x (t)) < 0, i = 1,..., m Zero of the Lagrangian (centrality condition): Exists ˆν 0 = t f 0 (x (t)) + φ(x (t)) + A Tˆν m = t f 0 (x 1 (t)) + f i (x (t)) f i(x (t)) + A Tˆν i=1 16 / 35

Back to the example min c T x s.t. Ax b Centrality condition: φ(x) = m log(b i a T i x) i=1 0 = t f 0 (x (t)) + φ(x (t)) + A Tˆν m 1 = tc + b i a T i xa i i=1 17 / 35

Back to the example we see that x (t) minimizes the Lagrangian 18 / 35 0 = tc + m 1 b i a T i xa i 566 i=1 11 Interior-point methods c x x (10) Figure 11.2 Central path for an LP with n = 2 and m = 6. The dashed curves show three contour lines of the logarithmic barrier function φ. The central path converges to the optimal point x as t.alsoshownisthe point on the central path with t = 10. The optimality condition (11.9) at this point can be verified geometrically: The line c T x = c T x (10) is tangent to the contour line of φ through x (10).

Dual point from the central path Every central point x (t) yields a dual feasible point (λ (t), ν (t)), thus a...lower bound to the optimal obj. value p : λ 1 i (t) = tf i (x (t)), i = 1,..., m ν (t) = ˆν t The proof gives us a lot of information 19 / 35

Proof λ i (t) > 0 because f i (x (t)) < 0 Rewrite the centrality condition: m 0 = t f 0 (x 1 (t)) + f i=1 i (x (t)) f i(x (t)) + A Tˆν m = f 0 (x (t)) + λ i (t) f i (x (t)) + A T ν (t) The above equals i=1 L x (x (t), λ (t), ν (t)) = 0 i.e., x (t)...minimizes the Lagrangian at λ (t), ν (t); 20 / 35

Proof Let s look at the dual function: m g(λ (t), ν (t)) = f 0 (x (t)) + λ i (t)f i x (t) + ν (t)(ax b) i=1 It holds g(λ (t), ν (t)) = f 0 (x (t)) m/t So f 0 (x (t))p m/t i.e., x ( t) is no more than m/t-suboptimal! x (t) converges to x as t. 21 / 35

The barrier method To get an ε-approximation we could just set t =m/ε and solve min m ε f 0(x) + φ(x) Ax = b This method does not scale well with the size of the problem and with ε. Barrier method: Compute x (t) for an increasing sequence of values t until t m/ε 22 / 35

The barrier method input: strictly feasible x = x (0), t = t (0) > 0, µ > 1, ε > 0 repeat: 1 Centering step: Compute x (t) by minimizing tf 0 + φ subject to Ax = b, starting at x 2 Update: x x (t) 3 Stopping criterion: quit if m/t < ε 4 Increase t: t µt What can we ask about this algorithm? 23 / 35

The barrier method What can we ask about this algorithm? 1 How many iterations does it take to converge? 2 Do we need to optimally solve the centering step? 3 What is a good value for µ? 4 How to choose t (0)? 24 / 35

Convergence The algorithm stops when m/t < ε t starts at t (0) t increases to µt at each iteration How to compute the number of iterations needed? We must find the smallest i such that It holds: i = m ε < t(0) µ i log m εt (0) log m Is there anything important that this analysis does not tell us? It does not tell us whether, as t grows, the centering step becomes more difficult. (It does not) 25 / 35

35 Newton iterations 30 25 20 15 10 1 10 2 10 3 m Figure 11.8 Average number of Newton steps required to solve 100 randomly generated LPs of different dimensions, with n = 2m. Error bars show standard deviation, around the average value, for each value of m. The growth in the number of Newton steps required, as the problem dimensions range over a 100:1 ratio, is very small. 26 / 35

Solving the centering step optimally? Computing x (t) exactly is not necessary: the central path has no significance, it just leads to a solution of the original problem Inexact centering will still lead to the solution but the points (λ (t), ν (t)) may not be dual feasible. This issue can be corrected (homework) Additionally, getting a extremely accurate minimizer of tf 0 + φ only takes a few more Newton iterations than a good minimizer, so why not just go for it? 28 / 35

Choosing µ The choice of µ involves a trade-off between the number of outer iterations of the barrier method and the number of inner iterations of the Newton s method For small µ, t grows...slowly: the initial point of Newton s method is very good: in few inner iterations it converges to the next x(t). successive x(t), x(µt) are close so more outer iterations are needed For larger µ, the opposite holds. The two effects really cancel out: the total number of inner iterations stay constant for sufficiently large µ. 30 / 35

11 Interior-point methods 10 2 10 0 duality gap 10 2 10 4 10 6 µ =50 µ =150 µ =2 0 20 40 60 80 Newton iterations Figure 11.4 Progress of barrier method for a small LP, showing duality gap versus cumulative number of Newton steps. Three plots are shown, corresponding to three values of the parameter µ: 2, 50, and 150. In each case, we have approximately linear convergence of duality gap. Newton s method is λ(x) 2 /2 10 5, where λ(x) is the Newton decrement of the 31 / 35

11.3 The barrier method 140 120 Newton iterations 100 80 60 40 20 0 0 40 80 120 160 200 µ Figure 11.5 Trade-off in the choice of the parameter µ, for a small LP. The vertical axis shows the total number of Newton steps required to reduce the duality gap from 100 to 10 3, and the horizontal axis shows µ. The plot shows the barrier method works well for values of µ larger than around 3, but is otherwise not sensitive to the value of µ. This plot shows that the barrier method performs very well for a wide range of 32 / 35

How to choose t (0) A very large initial t incurs in more inner iterations at the first outer iteration A very small initial t incurs in more outer iterations m/t (0) is the 1st duality gap. We want to choose t (0) so that m/t (0) µ(f 0 (x (0) ) p ). If we have feasible dual points (λ, ν), with duality gap η = f 0 (x (0) ) g(λ, ν), then we can take t (0) = m/η. Thus in the 1st outer iteration we get the same duality gap as the initial primal and dual. 34 / 35

Recap Inequality constrained problems Up and down a hierarchy of algorithms The central path Getting the dual points and the optimality certificate The barrier method Convergence, parameters, and other details 35 / 35