ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

Similar documents
Convex Optimization with ALADIN

Block Condensing with qpdunes

Inexact Newton-Type Optimization with Iterated Sensitivities

10 Numerical methods for constrained problems

minimize x subject to (x 2)(x 4) u,

ICS-E4030 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Lecture 13: Constrained optimization

Numerical Methods for PDE-Constrained Optimization

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

1 Computing with constraints

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Distributed and Real-time Predictive Control

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Algorithms for Constrained Optimization

12. Interior-point methods

A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme

Constrained optimization: direct methods (cont.)

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Math 273a: Optimization Overview of First-Order Optimization Algorithms

An Inexact Newton Method for Optimization

Asynchronous Non-Convex Optimization For Separable Problem

Constrained Optimization and Lagrangian Duality

Applications of Linear Programming

An Inexact Newton Method for Nonlinear Constrained Optimization

Recent Adaptive Methods for Nonlinear Optimization

Algorithms for constrained local optimization

4y Springer NONLINEAR INTEGER PROGRAMMING

Computational Optimization. Constrained Optimization Part 2

Written Examination

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Machine Learning. Support Vector Machines. Manfred Huber

On sequential optimality conditions for constrained optimization. José Mario Martínez martinez

Lagrange Relaxation and Duality

Computational Finance

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Accelerated primal-dual methods for linearly constrained convex problems

Relaxed linearized algorithms for faster X-ray CT image reconstruction

Numerical optimization

Introduction to Alternating Direction Method of Multipliers

Efficient Numerical Methods for Nonlinear MPC and Moving Horizon Estimation

Academic Editor: Dominique Bonvin Received: 26 November 2016; Accepted: 13 February 2017; Published: 27 February 2017

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Projection methods to solve SDP

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

SEQUENTIAL QUADRATIC PROGAMMING METHODS FOR PARAMETRIC NONLINEAR OPTIMIZATION

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

A New Penalty-SQP Method

Introduction to Nonlinear Stochastic Programming

Computational Optimization. Augmented Lagrangian NW 17.3

Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control

Block Structured Preconditioning within an Active-Set Method for Real-Time Optimal Control

Lecture 16: October 22

Inexact Newton Methods and Nonlinear Constrained Optimization

Projected Preconditioning within a Block-Sparse Active-Set Method for MPC

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Support Vector Machines for Regression

Support Vector Machines: Maximum Margin Classifiers

On Generalized Primal-Dual Interior-Point Methods with Non-uniform Complementarity Perturbations for Quadratic Programming

Optimization and Root Finding. Kurt Hornik

The Lifted Newton Method and Its Use in Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

PDE-Constrained and Nonsmooth Optimization

Numerical Optimal Control Overview. Moritz Diehl

Generalization to inequality constrained problem. Maximize

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Exam in TMA4180 Optimization Theory

Numerical Methods for Embedded Optimization and Optimal Control. Exercises

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

CS711008Z Algorithm Design and Analysis

More on Lagrange multipliers

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

A Simple Heuristic Based on Alternating Direction Method of Multipliers for Solving Mixed-Integer Nonlinear Optimization

The Alternating Direction Method of Multipliers

An optimal control problem for a parabolic PDE with control constraints

Final Exam Solutions

OPTIMAL POWER FLOW (OPF) problems or variants

Numerical Optimization. Review: Unconstrained Optimization

Primal-Dual Interior-Point Methods

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Transcription:

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of Magdeburg, University of Freiburg 1

Motivation: sensor network localization Decoupled case: each sensor takes measurement η i of its position χ i and solves i {1,..., 7}, χ i χ i η i 2 2. 2

Motivation: sensor network localization Coupled case: sensors additionally measure the distance to their neighbors 7 { χ i η i 2 χ 2 + ( χ i χ i+1 2 η i ) 2} with χ 8 = χ 1 3

Motivation: sensor network localization Equivalent formulation: set x 1 = (χ 1, ζ 1 ) with ζ 1 = χ 2, set x 2 = (χ 2, ζ 2 ) with ζ 2 = χ 3, and so on 4

Motivation: sensor network localization Equivalent formulation: set x 1 = (χ 1, ζ 1 ) with ζ 1 = χ 2, set x 2 = (χ 2, ζ 2 ) with ζ 2 = χ 3, and so on 5

Motivation: sensor network localization Equivalent formulation: set x 1 = (χ 1, ζ 1 ) with ζ 1 = χ 2, set x 2 = (χ 2, ζ 2 ) with ζ 2 = χ 3, and so on 6

Motivation: sensor network localization Equivalent formulation (cont.): new variables x i = (χ i, ζ i ) separable non-convex objectives f i (x i ) = 1 2 χ i η i 2 2 + 1 2 ζ i η i+1 2 2 + 1 2 ( χ i ζ i 2 η i ) 2 affine coupling, ζ i = χ i+1, can be written as 7 A i x i = 0. 7

Motivation: sensor network localization Equivalent formulation (cont.): new variables x i = (χ i, ζ i ) separable non-convex objectives f i (x i ) = 1 2 χ i η i 2 2 + 1 2 ζ i η i+1 2 2 + 1 2 ( χ i ζ i 2 η i ) 2 affine coupling, ζ i = χ i+1, can be written as 7 A i x i = 0. 8

Motivation: sensor network localization Equivalent formulation (cont.): new variables x i = (χ i, ζ i ) separable non-convex objectives f i (x i ) = 1 2 χ i η i 2 2 + 1 2 ζ i η i+1 2 2 + 1 2 ( χ i ζ i 2 η i ) 2 affine coupling, ζ i = χ i+1, can be written as 7 A i x i = 0. 9

Motivation: sensor network localization Optimization problem: x 7 f i (x i ) s.t. 7 A i x i = 0. 10

Aim of distributed optimization algorithms Find local imizers of x f i (x i ) s.t. A i x i = b Functions f i : R n R potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 11

Aim of distributed optimization algorithms Find local imizers of x f i (x i ) s.t. A i x i = b Functions f i : R n R potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 12

Aim of distributed optimization algorithms Find local imizers of x f i (x i ) s.t. A i x i = b Functions f i : R n R potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 13

Aim of distributed optimization algorithms Find local imizers of x f i (x i ) s.t. A i x i = b Functions f i : R n R potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 14

Overview Theory - Distributed optimization algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 15

Distributed optimization problem Find local imizers of x f i (x i ) s.t. A i x i = b. Functions f i : R n R potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 16

Dual decomposition Main idea: solve dual problem max λ d(λ) with d(λ) = x {f i (x i ) + λ T A i x i } λ T b Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 17

Dual decomposition Main idea: solve dual problem max λ d(λ) with d(λ) = {f i (x i ) + λ T A i x i } λ T b x i Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 18

Dual decomposition Main idea: solve dual problem max λ d(λ) with d(λ) = {f i (x i ) + λ T A i x i } λ T b x i Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 19

Dual decomposition Main idea: solve dual problem max λ d(λ) with d(λ) = {f i (x i ) + λ T A i x i } λ T b x i Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 20

ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i R n and λ i R m ; ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T i A i y i + ρ 2 A i(y i x i ) 2 2. 2. Implement dual gradient steps λ + i = λ i + ρa i (y i x i ). 3. Solve coupled QP x + { ρ Ai (y i x i + 2 ) 2 2 (λ+ i )T A i x i + 4. Update the iterates x x + and λ λ +. } s.t. A i x i + = b. D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 21

ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i R n and λ i R m ; ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T i A i y i + ρ 2 A i(y i x i ) 2 2. 2. Implement dual gradient steps λ + i = λ i + ρa i (y i x i ). 3. Solve coupled QP x + { ρ Ai (y i x i + 2 ) 2 2 (λ+ i )T A i x i + 4. Update the iterates x x + and λ λ +. } s.t. A i x i + = b. D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 22

ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i R n and λ i R m ; ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T i A i y i + ρ 2 A i(y i x i ) 2 2. 2. Implement dual gradient steps λ + i = λ i + ρa i (y i x i ). 3. Solve coupled QP x + { ρ Ai (y i x i + 2 ) 2 2 (λ+ i )T A i x i + 4. Update the iterates x x + and λ λ +. } s.t. A i x i + = b. D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 23

ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i R n and λ i R m ; ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T i A i y i + ρ 2 A i(y i x i ) 2 2. 2. Implement dual gradient steps λ + i = λ i + ρa i (y i x i ). 3. Solve coupled QP x + { ρ Ai (y i x i + 2 ) 2 2 (λ+ i )T A i x i + 4. Update the iterates x x + and λ λ +. } s.t. A i x i + = b. D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 24

ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i R n and λ i R m ; ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T i A i y i + ρ 2 A i(y i x i ) 2 2. 2. Implement dual gradient steps λ + i = λ i + ρa i (y i x i ). 3. Solve coupled QP x + { ρ Ai (y i x i + 2 ) 2 2 (λ+ i )T A i x i + 4. Update the iterates x x + and λ λ +. } s.t. A i x i + = b. D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 25

Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: x x 1 x 2 s.t. x 1 x 2 = 0. unique and regular imizer at x1 = x2 = λ = 0. For ρ = 3 all sub-problems are strictly convex. 4 ADMM is divergent; λ + = 2λ. This talk: addresses Problem 2), mitigates Problem 1) 26

Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: x x 1 x 2 s.t. x 1 x 2 = 0. unique and regular imizer at x1 = x2 = λ = 0. For ρ = 3 all sub-problems are strictly convex. 4 ADMM is divergent; λ + = 2λ. This talk: addresses Problem 2), mitigates Problem 1) 27

Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: x x 1 x 2 s.t. x 1 x 2 = 0. unique and regular imizer at x1 = x2 = λ = 0. For ρ = 3 all sub-problems are strictly convex. 4 ADMM is divergent; λ + = 2λ. This talk: addresses Problem 2), mitigates Problem 1) 28

Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: x x 1 x 2 s.t. x 1 x 2 = 0. unique and regular imizer at x1 = x2 = λ = 0. For ρ = 3 all sub-problems are strictly convex. 4 ADMM is divergent; λ + = 2λ. This talk: addresses Problem 2), mitigates Problem 1) 29

Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: x x 1 x 2 s.t. x 1 x 2 = 0. unique and regular imizer at x1 = x2 = λ = 0. For ρ = 3 all sub-problems are strictly convex. 4 ADMM is divergent; λ + = 2λ. This talk: addresses Problem 2), mitigates Problem 1) 30

Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: x x 1 x 2 s.t. x 1 x 2 = 0. unique and regular imizer at x1 = x2 = λ = 0. For ρ = 3 all sub-problems are strictly convex. 4 ADMM is divergent; λ + = 2λ. This talk: addresses Problem 2), mitigates Problem 1) 31

Overview Theory - Distributed optimization algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 32

ALADIN (full step variant) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i. 2. Compute g i = f i (y i ), choose H i 2 f i (y i ), and solve coupled QP y { } 1 2 yt i H i y i + gi T y i 3. Set x y + y and λ λ +. s.t. A i (y i + y i ) = b λ + 33

ALADIN (full step variant) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i. 2. Compute g i = f i (y i ), choose H i 2 f i (y i ), and solve coupled QP y { } 1 2 yt i H i y i + gi T y i 3. Set x y + y and λ λ +. s.t. A i (y i + y i ) = b λ + 34

ALADIN (full step variant) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i. 2. Compute g i = f i (y i ), choose H i 2 f i (y i ), and solve coupled QP y { } 1 2 yt i H i y i + gi T y i 3. Set x y + y and λ λ +. s.t. A i (y i + y i ) = b λ + 35

ALADIN (full step variant) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i. 2. Compute g i = f i (y i ), choose H i 2 f i (y i ), and solve coupled QP y { } 1 2 yt i H i y i + gi T y i 3. Set x y + y and λ λ +. s.t. A i (y i + y i ) = b λ + 36

Special cases For ρ : ALADIN SQP For 0 < ρ < : If H i = ρa T i A i, Σ i = A T i A i, then ALADIN ADMM For ρ = 0: ALADIN Dual Decomposition (+ Inexact Newton) 37

Special cases For ρ : ALADIN SQP For 0 < ρ < : If H i = ρa T i A i, Σ i = A T i A i, then ALADIN ADMM For ρ = 0: ALADIN Dual Decomposition (+ Inexact Newton) 38

Special cases For ρ : ALADIN SQP For 0 < ρ < : If H i = ρa T i A i, Σ i = A T i A i, then ALADIN ADMM For ρ = 0: ALADIN Dual Decomposition (+ Inexact Newton) 39

Local convergence Assumptions f i s are twice continuously differentiable imizer (x, λ ) regular KKT point (LICQ + SOSC satisfied) ρ satisfies 2 f i (y i ) + ρσ i 0 Theorem Full-step variant of ALADIN converges locally 1. with quadratic convergence rate, if H i = 2 f i (y i ) + O( y i x ) 2. with linear converges rate, if H i 2 f i (y i ) is sufficiently small 40

Local convergence Assumptions f i s are twice continuously differentiable imizer (x, λ ) regular KKT point (LICQ + SOSC satisfied) ρ satisfies 2 f i (y i ) + ρσ i 0 Theorem Full-step variant of ALADIN converges locally 1. with quadratic convergence rate, if H i = 2 f i (y i ) + O( y i x ) 2. with linear converges rate, if H i 2 f i (y i ) is sufficiently small 41

Globalization Definition (L1-penalty function) We say that x + is a descent step if Φ(x + ) < Φ(x) for Φ(x) = f i (x i ) + λ A i x i b, 1 λ sufficiently large. 42

Globalization Rough sketch: As long as 2 f i (x i ) + ρσ i 0 the proximal objectives fi (y i ) = f i (y i ) + ρ 2 y i x i 2 Σ i are strictly convex in a neighborhood of the current primal iterates x i. If we don t update x, y can be enforced to converge to the solution z of the convex auxiliary problem z fi (z i ) s.t. A i z i = b. Strategy: if x + is not a descent direction, we skip the primal update until y is a descent and set x + = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if H i 0. 43

Globalization Rough sketch: As long as 2 f i (x i ) + ρσ i 0 the proximal objectives fi (y i ) = f i (y i ) + ρ 2 y i x i 2 Σ i are strictly convex in a neighborhood of the current primal iterates x i. If we don t update x, y can be enforced to converge to the solution z of the convex auxiliary problem z fi (z i ) s.t. A i z i = b. Strategy: if x + is not a descent direction, we skip the primal update until y is a descent and set x + = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if H i 0. 44

Globalization Rough sketch: As long as 2 f i (x i ) + ρσ i 0 the proximal objectives fi (y i ) = f i (y i ) + ρ 2 y i x i 2 Σ i are strictly convex in a neighborhood of the current primal iterates x i. If we don t update x, y can be enforced to converge to the solution z of the convex auxiliary problem z fi (z i ) s.t. A i z i = b. Strategy: if x + is not a descent direction, we skip the primal update until y is a descent and set x + = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if H i 0. 45

Globalization Rough sketch: As long as 2 f i (x i ) + ρσ i 0 the proximal objectives fi (y i ) = f i (y i ) + ρ 2 y i x i 2 Σ i are strictly convex in a neighborhood of the current primal iterates x i. If we don t update x, y can be enforced to converge to the solution z of the convex auxiliary problem z fi (z i ) s.t. A i z i = b. Strategy: if x + is not a descent direction, we skip the primal update until y is a descent and set x + = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if H i 0. 46

Inequalities Distributed NLP with inequalities: x f i (x i ) s.t. N A ix i = b h i (x i ) 0. Functions f i : R n R and h i : R n R n h potentially non-convex. Matrices A i R m n and vectors b R m given. Problem: N is large. 47

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i s.t. h i (y i ) 0 κ i. 2. Set g i = f i (y i )+ h i (y i )κ i, H i 2 ( f i (y i )+κ T i h i (y i ) ), solve y { } 1 2 yt i H i y i + gi T y i s.t. A i (y i + y i ) = b λ + 3. Set x y + y and λ λ +. 48

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i s.t. h i (y i ) 0 κ i. 2. Set g i = f i (y i )+ h i (y i )κ i, H i 2 ( f i (y i )+κ T i h i (y i ) ), solve y { } 1 2 yt i H i y i + gi T y i s.t. A i (y i + y i ) = b λ + 3. Set x y + y and λ λ +. 49

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i s.t. h i (y i ) 0 κ i. 2. Set g i = f i (y i )+ h i (y i )κ i, H i 2 ( f i (y i )+κ T i h i (y i ) ), solve y { } 1 2 yt i H i y i + gi T y i s.t. A i (y i + y i ) = b λ + 3. Set x y + y and λ λ +. 50

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Input: Initial guesses x i R n and λ R m, ρ > 0, ɛ > 0. Repeat: 1. Solve decoupled NLPs y i f i (y i ) + λ T A i y i + ρ 2 y i x i 2 Σ i s.t. h i (y i ) 0 κ i. 2. Set g i = f i (y i )+ h i (y i )κ i, H i 2 ( f i (y i )+κ T i h i (y i ) ), solve y { } 1 2 yt i H i y i + gi T y i s.t. A i (y i + y i ) = b λ + 3. Set x y + y and λ λ +. 51

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Remarks: If approximation C i C i = h i (y i ) is available, solve QP N { 1 } y,s 2 yt i H i y i + gi T y i + λ T s + µ 2 s 2 2 N A i (y i + y i ) = b + s λ QP s.t. C i y i = 0, i {1,..., N }. with g i = f i (y i ) + (C i C i ) T κ i and µ > 0 instead. If H i and C i constant, pre-compute matrix decompositions 52

ALADIN (with inequalities) Augmented Lagrangian based Alternating Direction Inexact Newton Method Remarks: If approximation C i C i = h i (y i ) is available, solve QP N { 1 } y,s 2 yt i H i y i + gi T y i + λ T s + µ 2 s 2 2 N A i (y i + y i ) = b + s λ QP s.t. C i y i = 0, i {1,..., N }. with g i = f i (y i ) + (C i C i ) T κ i and µ > 0 instead. If H i and C i constant, pre-compute matrix decompositions 53

Overview Theory - Distributed Optimization Algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 54

Sensor network localization Case study: 25000 sensors measure positions and distances in a graph additional inequality constraints, χ i η i 2 σ, remove outliers. 55

ALADIN versus SQP 10 5 primal and 7.5 10 4 dual optimization variables implementation in JULIA 56

Overview Theory - Distributed Optimization Algorithms - ALADIN Applications - Sensor network localization - MPC with long horizons 57

Nonlinear MPC Repeat: Wait for state measurement ˆx. Solve x,u s.t. m 1 i=0 l(x i, u i ) + M (x m ) x i+1 = f (x i, u i ) x 0 = ˆx (x i, u i ) X U, Send u 0 to the process. 58

ACADO Toolkit 59

MPC Benchmark Nonlinear Model, 4 States, 2 Controls Run-time of ACADO code generation (not distributed) CPU time Percentage Integration & sensitivities 117 µs 65 % QP (Condensing + qpoases) 59 µs 33 % Complete real-time iteration 181 µs 100 % 60

MPC Benchmark Nonlinear Model, 4 States, 2 Controls Run-time of ACADO code generation (not distributed) CPU time Percentage Integration & sensitivities 117 µs 65 % QP (Condensing + qpoases) 59 µs 33 % Complete real-time iteration 181 µs 100 % 61

MPC with ALADIN ALADIN Step 1: solve decoupled NLPs choose a short horizon n = m N and solve y,v s.t. Ψ j (y j 0 ) + n 1 i=0 l(yj i, vj i ) + Φ j+1(yn) j y j i+1 = f (yj i, vj i ), i = 0,..., n 1 y j i X, vj i U. Arrival and end costs depend on ALADIN iterates, Ψ 0 (y) = I (ˆx, y) Φ N (y) = M (y) Ψ j (y) = λ T j y + ρ 2 y z j 2 P Ψ j (y) = λ T j y + ρ 2 y z j 2 P 62

MPC with ALADIN ALADIN Step 1: solve decoupled NLPs choose a short horizon n = m N and solve y,v s.t. Ψ j (y j 0 ) + n 1 i=0 l(yj i, vj i ) + Φ j+1(yn) j y j i+1 = f (yj i, vj i ), i = 0,..., n 1 y j i X, vj i U. Arrival and end costs depend on ALADIN iterates, Ψ 0 (y) = I (ˆx, y) Φ N (y) = M (y) Ψ j (y) = λ T j y + ρ 2 y z j 2 P Ψ j (y) = λ T j y + ρ 2 y z j 2 P 63

MPC with ALADIN ALADIN Step 2: solve coupled QP As we have no constraints, QP = LQR problem solve all matrix-valued Ricatti equations offline solve online QP online by backward-forward sweep Code export export all online operations as optimized C-code NLP solver: explicit MPC (rough heuristic) one ALADIN iteration per sampling time, skip globalization 64

MPC with ALADIN ALADIN Step 2: solve coupled QP As we have no constraints, QP = LQR problem solve all matrix-valued Ricatti equations offline solve online QP online by backward-forward sweep Code export export all online operations as optimized C-code NLP solver: explicit MPC (rough heuristic) one ALADIN iteration per sampling time, skip globalization 65

ALADIN + code generation (results by Yuning Jiang) Same nonlinear model as before, n = 2 Run-time of real-time ALADIN, 10 processors CPU time Percentage Parallel explicit MPC 6 µs 54 % QP sweeps 3 µs 27 % communication overhead 2 µs 18 % 66

ALADIN + code generation (results by Yuning Jiang) Same nonlinear model as before, n = 2 Run-time of real-time ALADIN, 10 processors CPU time Percentage Parallel explicit MPC 6 µs 54 % QP sweeps 3 µs 27 % communication overhead 2 µs 18 % 67

Conclusions ALADIN Theory can solve nonconvex distributed optimization problems, x f i (x i ) s.t. A i x i = b, to local optimality. contains SQP, ADMM, and Dual Decomposition as special case local convergence analysis similar to SQP; globalization possible 68

Conclusions ALADIN Theory can solve nonconvex distributed optimization problems, x f i (x i ) s.t. A i x i = b, to local optimality. contains SQP, ADMM, and Dual Decomposition as special case local convergence analysis similar to SQP; globalization possible 69

Conclusions ALADIN Theory can solve nonconvex distributed optimization problems, x f i (x i ) s.t. A i x i = b, to local optimality. contains SQP, ADMM, and Dual Decomposition as special case local convergence analysis similar to SQP; globalization possible 70

Conclusions ALADIN Applications large-scale distributed sensor network: ALADIN outperforms SQP and ADMM small-scale embedded MPC: ALADIN can be used to alternate between explicit & online MPC 71

Conclusions ALADIN Applications large-scale distributed sensor network: ALADIN outperforms SQP and ADMM small-scale embedded MPC: ALADIN can be used to alternate between explicit & online MPC 72

References B. Houska, J. Frasch, M. Diehl. An Augmented Lagrangian Based Algorithm for Distributed Non-Convex Optimization. SIOPT, 2016. D. Kouzoupis, R. Quirynen, B. Houska, M. Diehl. A block based ALADIN scheme for highly parallelizable direct optimal control. ACC, 2016. 73