Sequential Convex Programming

Similar documents
Algorithms for Constrained Optimization

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Algorithms for constrained local optimization

CS-E4830 Kernel Methods in Machine Learning

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Applications of Linear Programming

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

10 Numerical methods for constrained problems

8. Geometric problems

ICS-E4030 Kernel Methods in Machine Learning

Convex Optimization M2

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

subject to (x 2)(x 4) u,

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Lecture Note 5: Semidefinite Programming for Stability Analysis

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

8. Geometric problems

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)


Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Numerical optimization

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

5. Duality. Lagrangian

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Nonlinear Least Squares

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

Convex Optimization Boyd & Vandenberghe. 5. Duality

An Inexact Newton Method for Optimization

Convex Optimization & Lagrange Duality

Selected Topics in Optimization. Some slides borrowed from

5 Handling Constraints

Lagrangian Duality Theory

A Brief Review on Convex Optimization

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Multidisciplinary System Design Optimization (MSDO)

Constrained Optimization

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Optimization for Machine Learning

Lecture: Duality of LP, SOCP and SDP

Lagrangian Duality and Convex Optimization

Stochastic Subgradient Methods

Optimization Theory. Lectures 4-6

On the Method of Lagrange Multipliers

CSCI : Optimization and Control of Networks. Review on Convex Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

minimize x subject to (x 2)(x 4) u,

Inexact Newton Methods and Nonlinear Constrained Optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

A New Penalty-SQP Method

Lecture 9 Sequential unconstrained minimization

Lecture 2: Convex Sets and Functions

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machines

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

4. Convex optimization problems

Module 04 Optimization Problems KKT Conditions & Solvers

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture: Duality.

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Semidefinite Programming Basics and Applications

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Lecture V. Numerical Optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

1 Strict local optimality in unconstrained optimization

Constrained Optimization and Lagrangian Duality

Convex Optimization M2

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Generalization to inequality constrained problem. Maximize

A New Trust Region Algorithm Using Radial Basis Function Models

EE364b Homework 5. A ij = φ i (x i,y i ) subject to Ax + s = 0, Ay + t = 0, with variables x, y R n. This is the bi-commodity network flow problem.

Convex Optimization and SVM

4TE3/6TE3. Algorithms for. Continuous Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization

Statistical Methods for SVM

8. Constrained Optimization

More on Lagrange multipliers

Duality Theory of Constrained Optimization

Lecture 7: Convex Optimizations

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

9. Geometric problems

Subgradient Descent. David S. Rosenberg. New York University. February 7, 2018

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Numerical Optimization. Review: Unconstrained Optimization

Constrained optimization: direct methods (cont.)

Support Vector Machines: Maximum Margin Classifiers

14. Nonlinear equations

EE364b Homework 2. λ i f i ( x) = 0, i=1

Transcription:

Sequential Convex Programming sequential convex programming alternating convex optimization convex-concave procedure Prof. S. Boyd, EE364b, Stanford University

Methods for nonconvex optimization problems convex optimization methods are (roughly) always global, always fast for general nonconvex problems, we have to give up one local optimization methods are fast, but need not find global solution (and even when they do, cannot certify it) global optimization methods find global solution (and certify it), but are not always fast (indeed, are often slow) this lecture: local optimization methods that are based on solving a sequence of convex problems Prof. S. Boyd, EE364b, Stanford University 1

Sequential convex programming (SCP) a local optimization method for nonconvex problems that leverages convex optimization convex portions of a problem are handled exactly and efficiently SCP is a heuristic it can fail to find optimal (or even feasible) point results can (and often do) depend on starting point (can run algorithm from many initial points and take best result) SCP often works well, i.e., finds a feasible point with good, if not optimal, objective value Prof. S. Boyd, EE364b, Stanford University 2

Problem we consider nonconvex problem minimize f 0 (x) subject to f i (x) 0, h i (x) = 0, i = 1,...,m j = 1,...,p with variable x R n f 0 and f i (possibly) nonconvex h i (possibly) non-affine Prof. S. Boyd, EE364b, Stanford University 3

Basic idea of SCP maintain estimate of solution x (k), and convex trust region T (k) R n form convex approximation ˆf i of f i over trust region T (k) form affine approximation ĥi of h i over trust region T (k) x (k+1) is optimal point for approximate convex problem minimize ˆf0 (x) subject to ˆfi (x) 0, i = 1,...,m ĥ i (x) = 0, i = 1,...,p x T (k) Prof. S. Boyd, EE364b, Stanford University 4

Trust region typical trust region is box around current point: T (k) = {x x i x (k) i ρ i, i = 1,...,n} if x i appears only in convex inequalities and affine equalities, can take ρ i = Prof. S. Boyd, EE364b, Stanford University 5

Affine and convex approximations via Taylor expansions (affine) first order Taylor expansion: ˆf(x) = f(x (k) ) + f(x (k) ) T (x x (k) ) (convex part of) second order Taylor expansion: ˆf(x) = f(x (k) ) + f(x (k) ) T (x x (k) ) + (1/2)(x x (k) ) T P(x x (k) ) P = ( 2 f(x (k) ) ), PSD part of Hessian + give local approximations, which don t depend on trust region radii ρ i Prof. S. Boyd, EE364b, Stanford University 6

Particle method particle method: choose points z 1,...,z K T (k) (e.g., all vertices, some vertices, grid, random,... ) evaluate y i = f(z i ) fit data (z i,y i ) with convex (affine) function (using convex optimization) advantages: handles nondifferentiable functions, or functions for which evaluating derivatives is difficult gives regional models, which depend on current point and trust region radii ρ i Prof. S. Boyd, EE364b, Stanford University 7

Fitting affine or quadratic functions to data fit convex quadratic function to data (z i, y i ) K ( ) minimize i=1 (zi x (k) ) T P(z i x (k) ) + q T (z i x (k) 2 ) + r y i subject to P 0 with variables P S n, q R n, r R can use other objectives, add other convex constraints no need to solve exactly this problem is solved for each nonconvex constraint, each SCP step Prof. S. Boyd, EE364b, Stanford University 8

Quasi-linearization a cheap and simple method for affine approximation write h(x) as A(x)x + b(x) (many ways to do this) use ĥ(x) = A(x(k) )x + b(x (k) ) example: h(x) = (1/2)x T Px + q T x + r = ((1/2)Px + q) T x + r ĥql(x) = ((1/2)Px (k) + q) T x + r ĥtay(x) = (Px (k) + q) T (x x (k) ) + r Prof. S. Boyd, EE364b, Stanford University 9

Example nonconvex QP minimize f(x) = (1/2)x T Px + q T x subject to x 1 with P symmetric but not PSD use approximation f(x (k) ) + (Px (k) + q) T (x x (k) ) + (1/2)(x x (k) ) T P + (x x (k) ) Prof. S. Boyd, EE364b, Stanford University 10

example with x R 20 SCP with ρ = 0.2, started from 10 different points 10 20 30 f(x (k) ) 40 50 60 70 5 10 15 20 25 30 k runs typically converge to points between 60 and 50 dashed line shows lower bound on optimal value 66.5 Prof. S. Boyd, EE364b, Stanford University 11

Lower bound via Lagrange dual write constraints as x 2 i 1 and form Lagrangian n L(x, λ) = (1/2)x T Px + q T x + λ i (x 2 i 1) i=1 = (1/2)x T (P + diag(λ)) x + q T x g(λ) = (1/2)q T (P + diag(λ)) 1 q; need P + diag(λ) 0 solve dual problem to get best lower bound: maximize (1/2)q T (P + diag(λ)) 1 q subject to λ 0 Prof. S. Boyd, EE364b, Stanford University 12

Some (related) issues approximate convex problem can be infeasible how do we evaluate progress when x (k) isn t feasible? need to take into account objective f 0 (x (k) ) inequality constraint violations f i (x (k) ) + equality constraint violations h i (x (k) ) controlling the trust region size ρ too large: approximations are poor, leading to bad choice of x (k+1) ρ too small: approximations are good, but progress is slow Prof. S. Boyd, EE364b, Stanford University 13

Exact penalty formulation instead of original problem, we solve unconstrained problem where λ > 0 minimize φ(x) = f 0 (x) + λ( m i=1 f i(x) + + p i=1 h i(x) ) for λ large enough, minimizer of φ is solution of original problem for SCP, use convex approximation ( ˆφ(x) = ˆf m 0 (x) + λ i=1 ˆf i (x) + + ) p ĥi(x) i=1 approximate problem always feasible Prof. S. Boyd, EE364b, Stanford University 14

Trust region update judge algorithm progress by decrease in φ, using solution x of approximate problem decrease with approximate objective: ˆδ = φ(x (k)) ) ˆφ( x) (called predicted decrease) decrease with exact objective: δ = φ(x (k)) ) φ( x) if δ αˆδ, ρ (k+1) = β succ ρ (k), x (k+1) = x (α (0,1), β succ 1; typical values α = 0.1, β succ = 1.1) if δ < αˆδ, ρ (k+1) = β fail ρ (k), x (k+1) = x (k) (β fail (0,1); typical value β fail = 0.5) interpretation: if actual decrease is more (less) than fraction α of predicted decrease then increase (decrease) trust region size Prof. S. Boyd, EE364b, Stanford University 15

Nonlinear optimal control l 2, m 2 τ 2 θ 2 τ 1 l 1, m 1 θ 1 2-link system, controlled by torques τ 1 and τ 2 (no gravity) Prof. S. Boyd, EE364b, Stanford University 16

dynamics given by M(θ) θ + W(θ, θ) θ = τ, with M(θ) = W(θ, θ) = [ [ (m 1 + m 2 )l1 2 m 2 l 1 l 2 (s 1 s 2 + c 1 c 2 ) m 2 l 1 l 2 (s 1 s 2 + c 1 c 2 ) m 2 l2 2 0 m 2 l 1 l 2 (s 1 c 2 c 1 s 2 ) θ 2 m 2 l 1 l 2 (s 1 c 2 c 1 s 2 ) θ 1 0 ] ] s i = sinθ i, c i = cosθ i nonlinear optimal control problem: minimize J = T 0 τ(t) 2 2 dt subject to θ(0) = θ init, θ(0) = 0, θ(t) = θfinal, θ(t) = 0 τ(t) τ max, 0 t T Prof. S. Boyd, EE364b, Stanford University 17

Discretization discretize with time interval h = T/N J h N i=1 τ i 2 2, with τ i = τ(ih) approximate derivatives as θ(ih) θ i+1 θ i 1 θ i+1 2θ i + θ i 1, θ(ih) 2h h 2 approximate dynamics as set of nonlinear equality constraints: M(θ i ) θ i+1 2θ i + θ i 1 h 2 + W ( θ i, θ ) i+1 θ i 1 θi+1 θ i 1 2h 2h = τ i θ 0 = θ 1 = θ init ; θ N = θ N+1 = θ final Prof. S. Boyd, EE364b, Stanford University 18

discretized nonlinear optimal control problem: minimize h N i=1 τ i 2 2 subject to θ 0 = θ 1 = θ init, τ i τ max, M(θ i ) θ i+1 2θ i +θ i 1 h 2 θ N = θ N+1 = θ final i = 1,...,N ( + W θ i, θ i+1 θ i 1 2h ) θi+1 θ i 1 2h = τ i replace equality constraints with quasilinearized versions M(θ (k) i ) θ i+1 2θ i + θ i 1 h 2 + W ( θ (k) i, θ(k) i+1 ) θ(k) i 1 2h θ i+1 θ i 1 2h = τ i trust region: only on θ i initialize with θ i = ((i 1)/(N 1))(θ final θ init ), i = 1,...,N Prof. S. Boyd, EE364b, Stanford University 19

Numerical example m 1 = 1, m 2 = 5, l 1 = 1, l 2 = 1 N = 40, T = 10 θ init = (0, 2.9), θ final = (3, 2.9) τ max = 1.1 α = 0.1, β succ = 1.1, β fail = 0.5, ρ (1) = 90 λ = 2 Prof. S. Boyd, EE364b, Stanford University 20

SCP progress 70 60 50 φ(x (k) ) 40 30 20 10 5 10 15 20 25 30 35 40 k Prof. S. Boyd, EE364b, Stanford University 21

Convergence of J and torque residuals 14 10 2 J (k) 13.5 13 12.5 12 11.5 11 sum of torque residuals 10 1 10 0 10 1 10 2 10.5 5 10 15 20 25 30 35 40 k 10 3 5 10 15 20 25 30 35 40 k Prof. S. Boyd, EE364b, Stanford University 22

Predicted and actual decreases in φ 140 10 2 ˆδ (dotted), δ (solid) 120 100 80 60 40 20 0 ρ (k) ( ) 10 1 10 0 10 1 10 2 20 5 10 15 20 25 30 35 40 k 10 3 5 10 15 20 25 30 35 40 k Prof. S. Boyd, EE364b, Stanford University 23

Trajectory plan τ1 1.5 1 0.5 0 θ1 3.5 3 2.5 2 1.5 1 0.5 0.5 0 1 2 3 4 5 6 7 8 9 10 2 t 0 0 1 2 3 4 5 6 7 8 9 10 5 t τ2 1 0 θ2 0 1 0 2 4 6 8 10 t 5 0 2 4 6 8 10 t Prof. S. Boyd, EE364b, Stanford University 24

Difference of convex programming express problem as minimize f 0 (x) g 0 (x) subject to f i (x) g i (x) 0, i = 1,...,m where f i and g i are convex f i g i are called difference of convex functions problem is sometimes called difference of convex programming Prof. S. Boyd, EE364b, Stanford University 25

Convex-concave procedure obvious convexification at x (k) : replace f(x) g(x) with ˆf(x) = f(x) g(x (k) ) g(x (k) ) T (x x (k) ) since ˆf(x) f(x) for all x, no trust region is needed true objective at x is better than convexified objective true feasible set contains feasible set for convexified problem SCP sometimes called convex-concave procedure Prof. S. Boyd, EE364b, Stanford University 26

Example (BV 7.1) given samples y 1,...,y N R n from N(0, Σ true ) negative log-likelihood function is f(σ) = log detσ + Tr(Σ 1 Y ), Y = (1/N) N y i yi T i=1 (dropping a constant and positive scale factor) ML estimate of Σ, with prior knowledge Σ ij 0: minimize f(σ) = log detσ + Tr(Σ 1 Y ) subject to Σ ij 0, i,j = 1,...,n with variable Σ (constraint Σ 0 is implicit) Prof. S. Boyd, EE364b, Stanford University 27

first term in f is concave; second term is convex linearize first term in objective to get ˆf(Σ) = log detσ (k) + Tr ( ) (Σ (k) ) 1 (Σ Σ (k) ) + Tr(Σ 1 Y ) Prof. S. Boyd, EE364b, Stanford University 28

Numerical example convergence of problem instance with n = 10, N = 15 0 5 10 f(σ) 15 20 25 30 1 2 3 4 5 6 7 k Prof. S. Boyd, EE364b, Stanford University 29

Alternating convex optimization given nonconvex problem with variable (x 1,...,x n ) R n I 1,...,I k {1,...,n} are index subsets with j I j = {1,...,n} suppose problem is convex in subset of variables x i, i I j, when x i, i I j are fixed alternating convex optimization method: cycle through j, in each step optimizing over variables x i, i I j special case: bi-convex problem x = (u,v); problem is convex in u (v) with v (u) fixed alternate optimizing over u and v Prof. S. Boyd, EE364b, Stanford University 30

Nonnegative matrix factorization NMF problem: minimize A XY F subject to X ij, Y ij 0 variables X R m k, Y R k n, data A R m n difficult problem, except for a few special cases (e.g., k = 1) alternating convex optimation: solve QPs to optimize over X, then Y, then X... Prof. S. Boyd, EE364b, Stanford University 31

Example convergence for example with m = n = 50, k = 5 (five starting points) 30 25 A XY F 20 15 10 5 0 0 5 10 15 20 25 30 k Prof. S. Boyd, EE364b, Stanford University 32