ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of Magdeburg, University of Freiburg 1

Motivation: sensor network localization Decoupled case: each sensor takes measurement η i of its position χ i and solves i {1,..., 7}, χ i χ i η i 2 2. 2

Motivation: sensor network localization Coupled case: sensors additionally measure the distance to their neighbors 7 { χ i η i 2 χ 2 + ( χ i χ i+1 2 η i ) 2} with χ 8 = χ 1 3

Motivation: sensor network localization Equivalent formulation: set x 1 = (χ 1, ζ 1 ) with ζ 1 = χ 2, set x 2 = (χ 2, ζ 2 ) with ζ 2 = χ 3, and so on 4

Motivation: sensor network localization Equivalent formulation: set x 1 = (χ 1, ζ 1 ) with ζ 1 = χ 2, set x 2 = (χ 2, ζ 2 ) with ζ 2 = χ 3, and so on 5

Motivation: sensor network localization Equivalent formulation: set x 1 = (χ 1, ζ 1 ) with ζ 1 = χ 2, set x 2 = (χ 2, ζ 2 ) with ζ 2 = χ 3, and so on 6

Motivation: sensor network localization Equivalent formulation (cont.): new variables x i = (χ i, ζ i ) separable non-convex objectives f i (x i ) = 1 2 χ i η i 2 2 + 1 2 ζ i η i+1 2 2 + 1 2 ( χ i ζ i 2 η i ) 2 affine coupling, ζ i = χ i+1, can be written as 7 A i x i = 0. 7

Motivation: sensor network localization Optimization problem: x 7 f i (x i ) s.t. 7 A i x i = 0. 10

Aim of distributed optimization algorithms Find local imizers of x f i (x i ) s.t. A i x i = b Functions f i : R n R potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 11

Overview Theory - Distributed optimization algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 15

Distributed optimization problem Find local imizers of x f i (x i ) s.t. A i x i = b. Functions f i : R n R potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 16

Dual decomposition Main idea: solve dual problem max λ d(λ) with d(λ) = x {f i (x i ) + λ T A i x i } λ T b Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 17

Dual decomposition Main idea: solve dual problem max λ d(λ) with d(λ) = {f i (x i ) + λ T A i x i } λ T b x i Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 18

ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i R n and λ i R m ; ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T i A i y i + ρ 2 A i(y i x i ) 2 2. 2. Implement dual gradient steps λ + i = λ i + ρa i (y i x i ). 3. Solve coupled QP x + { ρ Ai (y i x i + 2 ) 2 2 (λ+ i )T A i x i + 4. Update the iterates x x + and λ λ +. } s.t. A i x i + = b. D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 21

Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: x x 1 x 2 s.t. x 1 x 2 = 0. unique and regular imizer at x1 = x2 = λ = 0. For ρ = 3 all sub-problems are strictly convex. 4 ADMM is divergent; λ + = 2λ. This talk: addresses Problem 2), mitigates Problem 1) 26

Overview Theory - Distributed optimization algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 32

ALADIN (full step variant) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i. 2. Compute g i = f i (y i ), choose H i 2 f i (y i ), and solve coupled QP y { } 1 2 yt i H i y i + gi T y i 3. Set x y + y and λ λ +. s.t. A i (y i + y i ) = b λ + 33

Special cases For ρ : ALADIN SQP For 0 < ρ < : If H i = ρa T i A i, Σ i = A T i A i, then ALADIN ADMM For ρ = 0: ALADIN Dual Decomposition (+ Inexact Newton) 37

Special cases For ρ : ALADIN SQP For 0 < ρ < : If H i = ρa T i A i, Σ i = A T i A i, then ALADIN ADMM For ρ = 0: ALADIN Dual Decomposition (+ Inexact Newton) 38

Special cases For ρ : ALADIN SQP For 0 < ρ < : If H i = ρa T i A i, Σ i = A T i A i, then ALADIN ADMM For ρ = 0: ALADIN Dual Decomposition (+ Inexact Newton) 39

Local convergence Assumptions f i s are twice continuously differentiable imizer (x, λ ) regular KKT point (LICQ + SOSC satisfied) ρ satisfies 2 f i (y i ) + ρσ i 0 Theorem Full-step variant of ALADIN converges locally 1. with quadratic convergence rate, if H i = 2 f i (y i ) + O( y i x ) 2. with linear converges rate, if H i 2 f i (y i ) is sufficiently small 40

Globalization Definition (L1-penalty function) We say that x + is a descent step if Φ(x + ) < Φ(x) for Φ(x) = f i (x i ) + λ A i x i b, 1 λ sufficiently large. 42

Globalization Rough sketch: As long as 2 f i (x i ) + ρσ i 0 the proximal objectives fi (y i ) = f i (y i ) + ρ 2 y i x i 2 Σ i are strictly convex in a neighborhood of the current primal iterates x i. If we don t update x, y can be enforced to converge to the solution z of the convex auxiliary problem z fi (z i ) s.t. A i z i = b. Strategy: if x + is not a descent direction, we skip the primal update until y is a descent and set x + = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if H i 0. 43

Inequalities Distributed NLP with inequalities: x f i (x i ) s.t. N A ix i = b h i (x i ) 0. Functions f i : R n R and h i : R n R n h potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 47

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i s.t. h i (y i ) 0 κ i. 2. Set g i = f i (y i )+ h i (y i )κ i, H i 2 ( f i (y i )+κ T i h i (y i ) ), solve y { } 1 2 yt i H i y i + gi T y i s.t. A i (y i + y i ) = b λ + 3. Set x y + y and λ λ +. 48

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Remarks: If approximation C i C i = h i (y i ) is available, solve QP N { 1 } y,s 2 yt i H i y i + gi T y i + λ T s + µ 2 s 2 2 N A i (y i + y i ) = b + s λ QP s.t. C i y i = 0, i {1,..., N }. with g i = f i (y i ) + (C i C i ) T κ i and µ > 0 instead. If H i and C i constant, pre-compute matrix decompositions 52

Overview Theory - Distributed Optimization Algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 54

Sensor network localization Case study: 25000 sensors measure positions and distances in a graph additional inequality constraints, χ i η i 2 σ, remove outliers. 55

ALADIN versus SQP 10 5 primal and 7.5 10 4 dual optimization variables implementation in JULIA 56

Overview Theory - Distributed Optimization Algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 57

Nonlinear MPC Repeat: Wait for state measurement ˆx. Solve x,u s.t. m 1 i=0 l(x i, u i ) + M (x m ) x i+1 = f (x i, u i ) x 0 = ˆx (x i, u i ) X U, Send u 0 to the process. 58

ACADO Toolkit 59

MPC Benchmark Nonlinear Model, 4 States, 2 Controls Run-time of ACADO code generation (not distributed) CPU time Percentage Integration & sensitivities 117 µs 65 % QP (Condensing + qpoases) 59 µs 33 % Complete real-time iteration 181 µs 100 % 60

MPC with ALADIN ALADIN Step 1: solve decoupled NLPs choose a short horizon n = m N and solve y,v s.t. Ψ j (y j 0 ) + n 1 i=0 l(yj i, vj i ) + Φ j+1(yn) j y j i+1 = f (yj i, vj i ), i = 0,..., n 1 y j i X, vj i U. Arrival and end costs depend on ALADIN iterates, Ψ 0 (y) = I (ˆx, y) Φ N (y) = M (y) Ψ j (y) = λ T j y + ρ 2 y z j 2 P Ψ j (y) = λ T j y + ρ 2 y z j 2 P 62

MPC with ALADIN ALADIN Step 2: solve coupled QP As we have no constraints, QP = LQR problem solve all matrix-valued Ricatti equations offline solve online QP online by backward-forward sweep Code export export all online operations as optimized C-code NLP solver: explicit MPC (rough heuristic) one ALADIN iteration per sampling time, skip globalization 64

ALADIN + code generation (results by Yuning Jiang) Same nonlinear model as before, n = 2 Run-time of real-time ALADIN, 10 processors CPU time Percentage Parallel explicit MPC 6 µs 54 % QP sweeps 3 µs 27 % communication overhead 2 µs 18 % 66

Conclusions ALADIN Theory can solve nonconvex distributed optimization problems, x f i (x i ) s.t. A i x i = b, to local optimality. contains SQP, ADMM, and Dual Decomposition as special case local convergence analysis similar to SQP; globalization possible 68

Conclusions ALADIN Applications large-scale distributed sensor network: ALADIN outperforms SQP and ADMM small-scale embedded MPC: ALADIN can be used to alternate between explicit & online MPC 71

Conclusions ALADIN Applications large-scale distributed sensor network: ALADIN outperforms SQP and ADMM small-scale embedded MPC: ALADIN can be used to alternate between explicit & online MPC 72

References B. Houska, J. Frasch, M. Diehl. An Augmented Lagrangian Based Algorithm for Distributed Non-Convex Optimization. SIOPT, 2016. D. Kouzoupis, R. Quirynen, B. Houska, M. Diehl. A block based ALADIN scheme for highly parallelizable direct optimal control. ACC, 2016. 73