Sequential Convex Programming - PDF Free Download

Sequential Convex Programming sequential convex programming alternating convex optimization convex-concave procedure Prof. S. Boyd, EE364b, Stanford University

Methods for nonconvex optimization problems convex optimization methods are (roughly) always global, always fast for general nonconvex problems, we have to give up one local optimization methods are fast, but need not find global solution (and even when they do, cannot certify it) global optimization methods find global solution (and certify it), but are not always fast (indeed, are often slow) this lecture: local optimization methods that are based on solving a sequence of convex problems Prof. S. Boyd, EE364b, Stanford University 1

Sequential convex programming (SCP) a local optimization method for nonconvex problems that leverages convex optimization convex portions of a problem are handled exactly and efficiently SCP is a heuristic it can fail to find optimal (or even feasible) point results can (and often do) depend on starting point (can run algorithm from many initial points and take best result) SCP often works well, i.e., finds a feasible point with good, if not optimal, objective value Prof. S. Boyd, EE364b, Stanford University 2

Problem we consider nonconvex problem minimize f 0 (x) subject to f i (x) 0, h i (x) = 0, i = 1,...,m j = 1,...,p with variable x R n f 0 and f i (possibly) nonconvex h i (possibly) non-affine Prof. S. Boyd, EE364b, Stanford University 3

Basic idea of SCP maintain estimate of solution x (k), and convex trust region T (k) R n form convex approximation ˆf i of f i over trust region T (k) form affine approximation ĥi of h i over trust region T (k) x (k+1) is optimal point for approximate convex problem minimize ˆf0 (x) subject to ˆfi (x) 0, i = 1,...,m ĥ i (x) = 0, i = 1,...,p x T (k) Prof. S. Boyd, EE364b, Stanford University 4

Trust region typical trust region is box around current point: T (k) = {x x i x (k) i ρ i, i = 1,...,n} if x i appears only in convex inequalities and affine equalities, can take ρ i = Prof. S. Boyd, EE364b, Stanford University 5

Affine and convex approximations via Taylor expansions (affine) first order Taylor expansion: ˆf(x) = f(x (k) ) + f(x (k) ) T (x x (k) ) (convex part of) second order Taylor expansion: ˆf(x) = f(x (k) ) + f(x (k) ) T (x x (k) ) + (1/2)(x x (k) ) T P(x x (k) ) P = ( 2 f(x (k) ) ), PSD part of Hessian + give local approximations, which don t depend on trust region radii ρ i Prof. S. Boyd, EE364b, Stanford University 6

Particle method particle method: choose points z 1,...,z K T (k) (e.g., all vertices, some vertices, grid, random,... ) evaluate y i = f(z i ) fit data (z i,y i ) with convex (affine) function (using convex optimization) advantages: handles nondifferentiable functions, or functions for which evaluating derivatives is difficult gives regional models, which depend on current point and trust region radii ρ i Prof. S. Boyd, EE364b, Stanford University 7

Fitting affine or quadratic functions to data fit convex quadratic function to data (z i, y i ) K ( ) minimize i=1 (zi x (k) ) T P(z i x (k) ) + q T (z i x (k) 2 ) + r y i subject to P 0 with variables P S n, q R n, r R can use other objectives, add other convex constraints no need to solve exactly this problem is solved for each nonconvex constraint, each SCP step Prof. S. Boyd, EE364b, Stanford University 8

Quasi-linearization a cheap and simple method for affine approximation write h(x) as A(x)x + b(x) (many ways to do this) use ĥ(x) = A(x(k) )x + b(x (k) ) example: h(x) = (1/2)x T Px + q T x + r = ((1/2)Px + q) T x + r ĥql(x) = ((1/2)Px (k) + q) T x + r ĥtay(x) = (Px (k) + q) T (x x (k) ) + r Prof. S. Boyd, EE364b, Stanford University 9

Example nonconvex QP minimize f(x) = (1/2)x T Px + q T x subject to x 1 with P symmetric but not PSD use approximation f(x (k) ) + (Px (k) + q) T (x x (k) ) + (1/2)(x x (k) ) T P + (x x (k) ) Prof. S. Boyd, EE364b, Stanford University 10

example with x R 20 SCP with ρ = 0.2, started from 10 different points 10 20 30 f(x (k) ) 40 50 60 70 5 10 15 20 25 30 k runs typically converge to points between 60 and 50 dashed line shows lower bound on optimal value 66.5 Prof. S. Boyd, EE364b, Stanford University 11

Lower bound via Lagrange dual write constraints as x 2 i 1 and form Lagrangian n L(x, λ) = (1/2)x T Px + q T x + λ i (x 2 i 1) i=1 = (1/2)x T (P + diag(λ)) x + q T x g(λ) = (1/2)q T (P + diag(λ)) 1 q; need P + diag(λ) 0 solve dual problem to get best lower bound: maximize (1/2)q T (P + diag(λ)) 1 q subject to λ 0 Prof. S. Boyd, EE364b, Stanford University 12

Some (related) issues approximate convex problem can be infeasible how do we evaluate progress when x (k) isn t feasible? need to take into account objective f 0 (x (k) ) inequality constraint violations f i (x (k) ) + equality constraint violations h i (x (k) ) controlling the trust region size ρ too large: approximations are poor, leading to bad choice of x (k+1) ρ too small: approximations are good, but progress is slow Prof. S. Boyd, EE364b, Stanford University 13

Exact penalty formulation instead of original problem, we solve unconstrained problem where λ > 0 minimize φ(x) = f 0 (x) + λ( m i=1 f i(x) + + p i=1 h i(x) ) for λ large enough, minimizer of φ is solution of original problem for SCP, use convex approximation ( ˆφ(x) = ˆf m 0 (x) + λ i=1 ˆf i (x) + + ) p ĥi(x) i=1 approximate problem always feasible Prof. S. Boyd, EE364b, Stanford University 14

Trust region update judge algorithm progress by decrease in φ, using solution x of approximate problem decrease with approximate objective: ˆδ = φ(x (k)) ) ˆφ( x) (called predicted decrease) decrease with exact objective: δ = φ(x (k)) ) φ( x) if δ αˆδ, ρ (k+1) = β succ ρ (k), x (k+1) = x (α (0,1), β succ 1; typical values α = 0.1, β succ = 1.1) if δ < αˆδ, ρ (k+1) = β fail ρ (k), x (k+1) = x (k) (β fail (0,1); typical value β fail = 0.5) interpretation: if actual decrease is more (less) than fraction α of predicted decrease then increase (decrease) trust region size Prof. S. Boyd, EE364b, Stanford University 15

Nonlinear optimal control l 2, m 2 τ 2 θ 2 τ 1 l 1, m 1 θ 1 2-link system, controlled by torques τ 1 and τ 2 (no gravity) Prof. S. Boyd, EE364b, Stanford University 16

dynamics given by M(θ) θ + W(θ, θ) θ = τ, with M(θ) = W(θ, θ) = [ [ (m 1 + m 2 )l1 2 m 2 l 1 l 2 (s 1 s 2 + c 1 c 2 ) m 2 l 1 l 2 (s 1 s 2 + c 1 c 2 ) m 2 l2 2 0 m 2 l 1 l 2 (s 1 c 2 c 1 s 2 ) θ 2 m 2 l 1 l 2 (s 1 c 2 c 1 s 2 ) θ 1 0 ] ] s i = sinθ i, c i = cosθ i nonlinear optimal control problem: minimize J = T 0 τ(t) 2 2 dt subject to θ(0) = θ init, θ(0) = 0, θ(t) = θfinal, θ(t) = 0 τ(t) τ max, 0 t T Prof. S. Boyd, EE364b, Stanford University 17

Discretization discretize with time interval h = T/N J h N i=1 τ i 2 2, with τ i = τ(ih) approximate derivatives as θ(ih) θ i+1 θ i 1 θ i+1 2θ i + θ i 1, θ(ih) 2h h 2 approximate dynamics as set of nonlinear equality constraints: M(θ i ) θ i+1 2θ i + θ i 1 h 2 + W ( θ i, θ ) i+1 θ i 1 θi+1 θ i 1 2h 2h = τ i θ 0 = θ 1 = θ init ; θ N = θ N+1 = θ final Prof. S. Boyd, EE364b, Stanford University 18

discretized nonlinear optimal control problem: minimize h N i=1 τ i 2 2 subject to θ 0 = θ 1 = θ init, τ i τ max, M(θ i ) θ i+1 2θ i +θ i 1 h 2 θ N = θ N+1 = θ final i = 1,...,N ( + W θ i, θ i+1 θ i 1 2h ) θi+1 θ i 1 2h = τ i replace equality constraints with quasilinearized versions M(θ (k) i ) θ i+1 2θ i + θ i 1 h 2 + W ( θ (k) i, θ(k) i+1 ) θ(k) i 1 2h θ i+1 θ i 1 2h = τ i trust region: only on θ i initialize with θ i = ((i 1)/(N 1))(θ final θ init ), i = 1,...,N Prof. S. Boyd, EE364b, Stanford University 19

Numerical example m 1 = 1, m 2 = 5, l 1 = 1, l 2 = 1 N = 40, T = 10 θ init = (0, 2.9), θ final = (3, 2.9) τ max = 1.1 α = 0.1, β succ = 1.1, β fail = 0.5, ρ (1) = 90 λ = 2 Prof. S. Boyd, EE364b, Stanford University 20

SCP progress 70 60 50 φ(x (k) ) 40 30 20 10 5 10 15 20 25 30 35 40 k Prof. S. Boyd, EE364b, Stanford University 21

Convergence of J and torque residuals 14 10 2 J (k) 13.5 13 12.5 12 11.5 11 sum of torque residuals 10 1 10 0 10 1 10 2 10.5 5 10 15 20 25 30 35 40 k 10 3 5 10 15 20 25 30 35 40 k Prof. S. Boyd, EE364b, Stanford University 22

Predicted and actual decreases in φ 140 10 2 ˆδ (dotted), δ (solid) 120 100 80 60 40 20 0 ρ (k) ( ) 10 1 10 0 10 1 10 2 20 5 10 15 20 25 30 35 40 k 10 3 5 10 15 20 25 30 35 40 k Prof. S. Boyd, EE364b, Stanford University 23

Trajectory plan τ1 1.5 1 0.5 0 θ1 3.5 3 2.5 2 1.5 1 0.5 0.5 0 1 2 3 4 5 6 7 8 9 10 2 t 0 0 1 2 3 4 5 6 7 8 9 10 5 t τ2 1 0 θ2 0 1 0 2 4 6 8 10 t 5 0 2 4 6 8 10 t Prof. S. Boyd, EE364b, Stanford University 24

Difference of convex programming express problem as minimize f 0 (x) g 0 (x) subject to f i (x) g i (x) 0, i = 1,...,m where f i and g i are convex f i g i are called difference of convex functions problem is sometimes called difference of convex programming Prof. S. Boyd, EE364b, Stanford University 25

Convex-concave procedure obvious convexification at x (k) : replace f(x) g(x) with ˆf(x) = f(x) g(x (k) ) g(x (k) ) T (x x (k) ) since ˆf(x) f(x) for all x, no trust region is needed true objective at x is better than convexified objective true feasible set contains feasible set for convexified problem SCP sometimes called convex-concave procedure Prof. S. Boyd, EE364b, Stanford University 26

Example (BV 7.1) given samples y 1,...,y N R n from N(0, Σ true ) negative log-likelihood function is f(σ) = log detσ + Tr(Σ 1 Y ), Y = (1/N) N y i yi T i=1 (dropping a constant and positive scale factor) ML estimate of Σ, with prior knowledge Σ ij 0: minimize f(σ) = log detσ + Tr(Σ 1 Y ) subject to Σ ij 0, i,j = 1,...,n with variable Σ (constraint Σ 0 is implicit) Prof. S. Boyd, EE364b, Stanford University 27

first term in f is concave; second term is convex linearize first term in objective to get ˆf(Σ) = log detσ (k) + Tr ( ) (Σ (k) ) 1 (Σ Σ (k) ) + Tr(Σ 1 Y ) Prof. S. Boyd, EE364b, Stanford University 28

Numerical example convergence of problem instance with n = 10, N = 15 0 5 10 f(σ) 15 20 25 30 1 2 3 4 5 6 7 k Prof. S. Boyd, EE364b, Stanford University 29

Alternating convex optimization given nonconvex problem with variable (x 1,...,x n ) R n I 1,...,I k {1,...,n} are index subsets with j I j = {1,...,n} suppose problem is convex in subset of variables x i, i I j, when x i, i I j are fixed alternating convex optimization method: cycle through j, in each step optimizing over variables x i, i I j special case: bi-convex problem x = (u,v); problem is convex in u (v) with v (u) fixed alternate optimizing over u and v Prof. S. Boyd, EE364b, Stanford University 30

Nonnegative matrix factorization NMF problem: minimize A XY F subject to X ij, Y ij 0 variables X R m k, Y R k n, data A R m n difficult problem, except for a few special cases (e.g., k = 1) alternating convex optimation: solve QPs to optimize over X, then Y, then X... Prof. S. Boyd, EE364b, Stanford University 31

Example convergence for example with m = n = 50, k = 5 (five starting points) 30 25 A XY F 20 15 10 5 0 0 5 10 15 20 25 30 k Prof. S. Boyd, EE364b, Stanford University 32