Lecture 16: October 22

Similar documents
Lecture 14: October 17

Lecture 14: Newton s Method

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Lecture 24: August 28

Lecture 23: November 19

12. Interior-point methods

Lecture 15: October 15

Interior Point Algorithms for Constrained Convex Optimization

Lecture 23: Conditional Gradient Method

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Lecture 5: September 12

Lecture 17: Primal-dual interior-point methods part II

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Lecture 14: October 11

Barrier Method. Javier Peña Convex Optimization /36-725

Lecture 17: October 27

12. Interior-point methods

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Lecture 18: November Review on Primal-dual interior-poit methods

Computational Optimization. Constrained Optimization Part 2

Lecture 6: September 12

Lecture 23: November 21

Lecture 25: November 27

Lecture 10: September 26

Nonlinear Optimization for Optimal Control

Lecture 5: September 15

CS711008Z Algorithm Design and Analysis

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Newton s Method. Javier Peña Convex Optimization /36-725

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Constrained Optimization and Lagrangian Duality

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 9 Sequential unconstrained minimization

Lecture 9: September 28

Lecture 4: September 12

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture Support Vector Machine (SVM) Classifiers

Primal/Dual Decomposition Methods

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 6: September 17

LINEAR AND NONLINEAR PROGRAMMING

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Support Vector Machines: Maximum Margin Classifiers

CS-E4830 Kernel Methods in Machine Learning

Lecture 4: January 26

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Lecture 10: Duality in Linear Programs

ICS-E4030 Kernel Methods in Machine Learning

Primal-Dual Interior-Point Methods

Lecture 18: Optimization Programming

Lecture 26: April 22nd

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Written Examination

Lecture 11: October 2

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Duality revisited. Javier Peña Convex Optimization /36-725

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

minimize x subject to (x 2)(x 4) u,

Convex Optimization M2

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

Constrained optimization: direct methods (cont.)

Interior-Point Methods for Linear Optimization

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

10 Numerical methods for constrained problems

Convex Optimization Overview (cnt d)

A Tutorial on Convex Optimization II: Duality and Interior Point Methods

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Dual Ascent. Ryan Tibshirani Convex Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

A Brief Review on Convex Optimization

Lecture 1: September 25

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture 13: Duality Uses and Correspondences

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Lecture: Duality of LP, SOCP and SDP

Convex Optimization. Problem set 2. Due Monday April 26th

Applications of Linear Programming

Introduction to Alternating Direction Method of Multipliers

Lecture 14: Optimality Conditions for Conic Problems

Constrained Optimization

Computational Finance

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Sparse Optimization Lecture: Basic Sparse Optimization Models

Math 273a: Optimization Lagrange Duality

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Convex Optimization and Modeling

5. Duality. Lagrangian

Convex Optimization and l 1 -minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Transcription:

0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. This lecture s notes illustrate some uses of various L A TEX macros. Take a look at this and imitate. 6. Equality-constrained Newton s Method Considering the following problem f() subject to A = b, We might think at the following in order to find a solution: Eliating Equality Constraints We set = F y + 0, where F spans the null space of A, and A 0 = b. We then solve for y. The problem with this is that in doing so we might lose the structure of the Hessian of the original problem. Let s say the original problem Hessian is diagonal, by operating this transformation we have that: g(y) = f(f Y + 0 ) = 2 g(y) = F T 2 f(f y + 0 )F Generally when multiplying by F the structure of the original Hessian is lost - i.e. if the original Hessian was diagonal or sparse, the Hessian of the transformation will not anymore. Deriving the Dual We can use the Lagrange dual function, which is f ( A T v) b T v and, with strong duality holding by refined Slater s condition, we can maybe epress in terms of the v. Although, if A had some type of structure, we would lose it when considering the dual problem. In this case we would like to apply Newton Method, but we should take into account that the new point proposed is feasible. In equality-constrained Newtons method, we start with (0) such that A (0) = b. Then we repeat the updates + = + tv such that + is again a feasible point, which we guarantee by selecting v in the following way: v = arg Az=0 f()t (z ) + 2 (z )T 2 f()(z ), where again we are approimate the function with a quadratic, as in the unconstrained Newton method. The new point + is still feasible as 6-

6-2 Lecture 6: October 22 A + = A + tav = b + 0 = b. In comparison of the unconstrained Newton s method, this is a slightly bigger system to solve. Considering that v is the solution of imizing a quadratic subject to equality constraint we have that: 2 f()v = f() [ ] [ [ ] 2 f() A T v f() = A 0 w] 0 Uncostrained Newton s Method Equality-Constrained Newton s Method For some w. 6.. Quasi-Newton Methods Oftentimes it might be epensive to calculate or to store the Hessian matri and approimate it might be preferable. Quasi-Newton methods aim to approimate the Hessian with a matri H 0, making the update + = th f(). These methods do not converge at a quadratic rate like the Newton methods, but still enjoy a faster-than-linear convergence rate, denoted superlinear rate. The advantage for these methods is that they only take O(n 2 ) operations per single iteration, which improves on the O(n 3 ) operations the Newton method takes. 6.2 Barrier Method Barrier method is an interior point method, category of which we will also eplore the primal-dual method. The usefulness of barrier method comes into a place when considering the following hierarchy: Quadratic Problems: very easy to solve, we have a closed-form solution; Equality-Constrained Quadratic Problems: still easy, and we use KKT to derive closed-form solutions; Equality-Constrained Smooth Problems: this is where Newton s Method is useful, as it reduces these problems to a sequence of equality constrained quadratic problems, for which we can calculate the closed-form solutions to; Inequality and Equality-Constrained Smooth Problems: currently do not know how to solve, this is where interior point methods are useful. The intuition behind the barrier method is that we are going to push the inequality constraints in the problem in a smooth way, reducing the problem to an equality-constrained problem. Consider now the following imization problem: f() subject to h i () 0 i =,..., m A = b,

Lecture 6: October 22 6-3 Where we assume f, h,...,h m to be conve and twice differentiable functions, with the domain in R n. We define the log-barrier function as: φ() = log( h i ()). The domain of this function is defined on the set of the such that they are strictly feasible - i.e. they do not attain the equality in the inequality constrained. If we assume that this set is non-empty, which is going to be the case, we automatically also obtain strong duality by Slater s conditions. The rationale behind the log-barrier function is that, when we ignore equality constraints, the problem above can be written by incorporating the inequality with the identity function, which in turn can be approimated by the log-barrier functions as follows f() + I {hi() o}() f() + φ() = t f() t log( h i ()), for t > 0 being a large number. This approimation is more accurate the larger the t, with also the log-barrier approaching if any h i () 0. Calculating the log-barrier function gradient and Hessian can be done using the chain rule and product rule respectively, resulting in: φ() = 2 φ() = h i () h i(), h i () 2 h i() h i () T h i () 2 h i (). 6.3 Central Path We can formally articulate the barrier problem as follows: tf() + φ() subject to A = b For t > 0. The central path is the solution (t) for a given t. As t, the log barrier function approaches the indicator function as described above, so we hope that the central path will approach the solution of the original problem. There are at least two reasons why we don t just solve for some large value t and hope the solution is close to :. As t gets large, Newton s method may get numerically unstable. 2. It takes Newton s method a long time to enter the pure convergence state if we do this. Instead, it s better to take a sequence of increasing t values and use each solution X (t) as a warm start for the net t. If this is done carefully, Newton s method can be kept in the pure convergence phase.

6-4 Lecture 6: October 22 6.3. Barrier problem for a Linear Program In a LP, the barrier problem is c T log(e i d T i ) with the polyhedral constraint D e. The contours of the log barrier function approimate this feasible set better and better from the inside as t. Taking the gradient setting it equal to 0, we have the gradient optimality condition: 0 = tc e i e T i (t) d i In other words, the gradient of the barrier function at the optimum, φ( (t)) must be parallel to c. 6.3.2 Bounding duality gap for original problem The Lagrangian for the original problem is L(, u, v) = f() + u i h i () + v T (A b) (6.) We can use the KKT conditions for the central path to identify feasible dual points for this Lagrangian. The KKT conditions for the central path are t f( (t)) For a given (t), w that satisfy these conditions, let h i ( (t)) h i( (t)) + A T w = 0 (6.2) A (t) = b (6.3) h i ( (t)) < 0, i =,... m (6.4) u i (t) = th i ( (t)), i =... m (6.5) v (t) = w t (6.6) Since h i ( (t) < 0, due to the strict feasibility requirement, we know that u i (t) > 0. Additionally, notice that L( (t), u (t), v (t) = 0, so (u (t), v (t)) is in the domain of (6.) and (t) imizes L(, u (t), v (t)) over, so g(u (t), v (t)) >. Therefore, u (t), v (t) are feasible for the original problem. We now have g(u (t), v (t)) = f( (t)) + u i (t)h i ( (t)) + v (t) T (A (t) b) (6.7) = f( (t)) m/t (6.8) = f( (t)) f f( (t)) g(u (t), v (t)) m/t (6.9) This means that (t) as t, as we had hoped. This also provide a useful stopping criterion.

Lecture 6: October 22 6-5 6.4 Another interpretation of the central path Above, we derived the central path from the barrier problem. Another way to understand the central path is as the solution to the following perturbed KKT conditions: f() + u i h i () + A T v = 0 (6.0) u i h i () = /t, i =... m (6.) h i () 0, i =,... m, A = b (6.2) u i 0, i =,... m (6.3) i.e., we have replaced the usual complementary slackness equations u i h i () = 0 with u i h i () = /t. As t, of course, the perturbed complementary slackness condition will approach the original condition. 6.5 Barrier Method The Barrier method iteratively (as t ) solves a sequence of problems Allegorically, for a t (0) > 0, µ > (t multiplier):. For k 0, define t (k) = µt (k ) tf() + φ() s.t. A = b (6.4) 2. Solve Barrier problem (eq 6.4) and get (t (k) ) using Newton s method, initialized with (t (k ) ) - for k 0 (soft start) 6.5. Global Parameters Although the barrier method is decently robust in practice to choices of µ and t (0),. µ : potentially lots off iterations to get to global convergence, µ : warm starts may not keep in quadratic convergence phase for Newton step 2. t (0) : potentially lots off iterations to get to global convergence, t (0) : first step may take a while to get into the quadratic convergence phase 6.5.2 Convergence Relative to the central path approach of the Barrier method we get convergence of the form: f( (k) f m or k log( m µ k t (0) log(µ) and noting that if we are in the quadratic convergence phase the number of steps for numerical convergence for the Newton method is 5 or 6, making this of the correct order if the warm starts keep the descents in the quadratic convergence phase. t (0) ɛ ),

6-6 Lecture 6: October 22 6.5.3 Newton Iterations The above comment about the Newton Method converging in at most 5-6 steps requires us to be in the quadratic convergence phase. This can be formalized with the following statement: Suppose The function tf + φ is self-concordant our original problem has bounded sublevel sets The solving Newton with approimate accuracy gets the total number of iterations O(log( m t (0) ɛ )). LP, QP and QCQPs are have tf + φ functions that are self-concordant (aka above fact is useful). (only requires f, h i linear or quadratic). 6.6 Feasibility Methods To leverage some of the above analysis, specifically using the dual formulation in Section 6.3.2, we needed to have a strictly feasible point (for Slater s condition). This is very important for (0) estimation. To find a strictly feasible point we could apply the Barrier to,s s s.t. h i () si =,..., n A = b, and terate when s 0 ( would then be an optimal point). Deviations of this would also allow us to see which h i () 0, if any cause the solution be be unsolvable but looking at,s t s s.t. h i () s i i =,..., n A = b, s 0, which again, could be solved with the Barrier method. 6.7 History and Motivation for Interior-Point Methods See lecture notes (Primal-dual interior point methods, slide 4) for some historical bullet points.