MULTIPLE GRADIENT DESCENT ALGORITHM (MGDA) FOR MULTIOBJECTIVE OPTIMIZATION

Similar documents
MUTIPLE-GRADIENT DESCENT ALGORITHM FOR MULTIOBJECTIVE OPTIMIZATION

Application of MGDA to domain partitioning

MGDA Variants for Multi-Objective Optimization

MGDA II: A direct method for calculating a descent direction common to several criteria

Haris Malik. Presentation on Internship 17 th July, 2009

Nonlinear Optimization for Optimal Control

Kernel Methods. Machine Learning A W VO

HIERARCHICAL SHAPE OPTIMIZATION : Cooperation and Competition in Multi-Disciplinary Approaches

Iterative Methods for Solving A x = b

Constrained Optimization and Lagrangian Duality

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Nonlinear Programming

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Chapter 8 Gradient Methods

Optimization of an Unsteady System Governed by PDEs using a Multi-Objective Descent Method

Game Theory and its Applications to Networks - Part I: Strict Competition

Numerical optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

5 Handling Constraints

Convex Optimization. Problem set 2. Due Monday April 26th

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Unconstrained optimization

A high-order discontinuous Galerkin solver for 3D aerodynamic turbulent flows

Optimal control problems with PDE constraints

Stability of Feedback Solutions for Infinite Horizon Noncooperative Differential Games

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Iterative methods for Linear System

Convex Optimization M2

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Algorithms for Nonsmooth Optimization

Gradient Descent. Dr. Xiaowei Huang

Interior Point Methods in Mathematical Programming

Math 164-1: Optimization Instructor: Alpár R. Mészáros

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

Module III: Partial differential equations and optimization

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

A Brief Review on Convex Optimization

DG Methods for Aerodynamic Flows: Higher Order, Error Estimation and Adaptive Mesh Refinement

Written Examination

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Linear & nonlinear classifiers

October 25, 2013 INNER PRODUCT SPACES

Homework 5. Convex Optimization /36-725

Math 471 (Numerical methods) Chapter 3 (second half). System of equations

CSCI : Optimization and Control of Networks. Review on Convex Optimization

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Numerical Methods I Non-Square and Sparse Linear Systems

Mathematical optimization

Notes for CS542G (Iterative Solvers for Linear Systems)

September Math Course: First Order Derivative

Examination paper for TMA4180 Optimization I

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Is My CFD Mesh Adequate? A Quantitative Answer

Generalization of Dominance Relation-Based Replacement Rules for Memetic EMO Algorithms

SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS

Primal/Dual Decomposition Methods

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Numerical Solution of Partial Differential Equations governing compressible flows

Stabilization and Acceleration of Algebraic Multigrid Method

The convergence of stationary iterations with indefinite splitting

Neural Network Training

Preliminary draft only: please check for final version

Only Intervals Preserve the Invertibility of Arithmetic Operations

Numerical Methods for Large-Scale Nonlinear Systems

5. Duality. Lagrangian

Matrices and systems of linear equations

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Lecture 3. Optimization Problems and Iterative Algorithms

CONVERGENCE PROPERTIES OF COMBINED RELAXATION METHODS

Evolutionary Multiobjective. Optimization Methods for the Shape Design of Industrial Electromagnetic Devices. P. Di Barba, University of Pavia, Italy

Conditional Gradient (Frank-Wolfe) Method

ARE202A, Fall Contents

On John type ellipsoids

Lecture 10: Dimension Reduction Techniques

Convex Functions and Optimization

Near-Potential Games: Geometry and Dynamics

Optimization and Root Finding. Kurt Hornik

1 Computing with constraints

Constrained Optimization

Introduction to Real Analysis Alternative Chapter 1

Multipoint secant and interpolation methods with nonmonotone line search for solving systems of nonlinear equations

Linear Solvers. Andrew Hazel

Linear & nonlinear classifiers

Numerical Optimization

Chapter 3 Numerical Methods

The Conjugate Gradient Method

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

ORIE 6326: Convex Optimization. Quasi-Newton Methods

A REVIEW OF OPTIMIZATION

Lecture Notes on Support Vector Machine

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

Convex Optimization Boyd & Vandenberghe. 5. Duality

A Multi-Dimensional Limiter for Hybrid Grid

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Tangent spaces, normals and extrema

Transcription:

applied to MULTIPLE GRADIENT DESCENT ALGORITHM () FOR MULTIOBJECTIVE OPTIMIZATION description and numerical Jean-Antoine Désidéri INRIA Project-Team OPALE Sophia Antipolis Méditerranée Centre http://www-sop.inria.fr/opale I ISMP 2012 21st International Synposium on Mathematical Programming, Berlin, August 19-24, 2012 PDE-Constrained Optimization and Multi-Level/Multi-Grid Methods ECCOMAS 2012 European Congress on Computational Methods in Applied Sciences and Engineering, Vienna (Austria), September 10-14, 2012 TO16: Computational inverse problems and optimization 1 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 2 / 102

applied to description and numerical I vector: : Pareto optimality Y Υ R N Objective functions to be minimized (reduced): Dominance in efficiency J i (Y ) (i = 1,...,n) (see e.g. K. Miettinen: Nonlinear Multiobjective Optimization, Kluwer Academic Publishers) -point Y 1 dominates design-point Y 2 in efficiency, Y 1 Y 2, iff J i (Y 1 ) J i (Y 2 ) ( i = 1,...,n) and at least one inequality is strict. (relationship of partial order) The designer s Holy Grail: the Pareto set set of "non-dominated design-points" or "Pareto-optimal solutions" The Pareto front image of the Pareto set in the objective function space (bounds the domain of attainable performance) 3 / 102

applied to description and numerical Pareto front identification Favorable situation: continuous and convex Pareto front Gradient-based optimization can be used efficiently, if number n of objective functions is not too large Non-convex or discontinuous Pareto fronts do exist Evolutionary strategies or GA s most commonly-used for robustness: NSGA-II (Deb), PAES From: A Fast and Elitist Multiobjective Genetic : NSGA-II, K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, IEEE Transactions in Evolutionary Computation, Vol. 6, No. 2, April 2002. I Nondominated solutions with NSGA-II on KUR (testcase by F. Kursawe, 1990) n = 3 f 1 (x) = n 1 10exp i=1 x 2 i + x 2 5 i+1 f 2 (x) = ( n i=1 x i 4 5 + 5sinx 3 i ) 4 / 102

applied to Our general objective description and numerical Construct a robust gradient-based method for Pareto front identification Note : in shape optimization, N is often the dimension of a parameterization, meant to be large as the dimension of a discretization usually is; thus, often, N n. However, the following theoretical results also apply to cases where n > N which can be the case of other situations in engineering optimization. I 5 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 6 / 102

applied to description and numerical The basic question : Constructing a gradient-based cooperative algorithm Knowing the gradients, J i (Y 0 ), of n criteria, assumed to be smooth functions of the design vector Y R N, at a given starting design-point Y 0, can we define a nonzero vector ω R N, in the direction of which the Fréchet derivatives of all criteria have the same sign, ( Ji (Y 0 ),ω ) 0 ( i = 1,2,...,n)? I Answer: Yes, if Y 0 is not Pareto-optimal! Then, ω is locally a descent direction common to all criteria. 7 / 102

applied to description and numerical I Notion of Classically, for smooth real-valued functions of N variables Optimality (+ regularity) = Stationarity Now, for smooth R n -valued functions of N variables, which stationarity requirement is to be enforced for Pareto-optimality? Proposition 1: Pareto Stationarity at a design point Y 0 in an open domain B in which the objective-functions are smooth and convex, if there exists a convex combination of the gradients equal to 0: α = {α i } (i=1,,n) s.t. n α i J i (Y 0 n ) = 0, α i 0 ( i), α i = 1 i=1 i=1 Then (to be established subsequently): Pareto-optimality (+ regularity) = 8 / 102

applied to General principle description and numerical I Proposition 2: Minimum-norm element in the convex hull Let {u i } (i=1,...,n) be a family of n vectors in R N, and U the so-called convex hull of this family, i.e. the following set of vectors : { } U = u R N / u = n i=1 α i u i ; α i 0 ( i); n i=1 α i = 1 Then, U admits a unique element of minimal norm, ω, and the following holds : u U : (u,ω) ω 2 in which (u,v) denotes the usual scalar product of the vectors u and v. 9 / 102

applied to description and numerical I Existence and uniqueness of ω Existence = U closed. Uniqueness = U convex. To establish uniqueness, let ω 1 and ω 2 be two realisations of the minimum: ω 1 = ω 2 = Argmin u U u Then, since U is convex, ε [0,1], ω 1 + εr 12 U (r 12 = ω 2 ω 1 ) and: ω 1 + εr 12 ω 1 Square both sides, and remplace by scalar products: Then ε [0,1] : (ω 1 + εr 12,ω 1 + εr 12 ) (ω 1,ω 1 ) 0 ε [0,1] : 2ε(r 12,ω 1 ) + ε 2 (r 12,r 12 ) 0 As ε 0 +, this condition requires that (r 12,ω 1 ) 0; but then, for ε = 1: unless r 12 = 0, i.e. ω 2 = ω 1. ω 2 2 ω 1 2 = 2(r 12,ω 1 ) + (r 12,r 12 ) > 0 Remark: The identification of the element ω is equivalent to the constrained minimization of a quadratic form in R n, and not R N. 10 / 102

applied to First consequence description and numerical I u U : (u,ω) ω 2 Proof: Let u U, and r = u ω. ε [0,1], ω + εr U (convexity); hence: and this requires that: ω + εr 2 ω 2 = 2ε(r,ω) + ε 2 (r,r) 0 (r,ω) = (u ω,ω) 0 11 / 102

applied to description and numerical I Proof of at Pareto-optimal design-point Y 0 B (open ball) R N Hyp : {J i (Y )} ( i n) smooth and convex in B ; u i = J i (Y 0 ) ( i n) Without loss of generality, assume J i (Y 0 ) = 0 ( i). Then: Y 0 = ArgminJ n (Y ) subject to: J i (Y) 0 ( i n 1) (1) Y Let U n 1 be the convex hull of the gradients {u 1,u 2,...,u n 1 } and ω n 1 = Argmin u Un 1 u. The vector ω n 1 exists, is unique, and such that: ( u i,ω n 1 ) ωn 1 2 ( i n 1). Two possible situations: 1. ω n 1 = 0, and the objective-functions {J 1,J 2,...,J n 1 } satisfy the Pareto stationarity condition at Y = Y 0. A fortiori, the condition is also satisfied by the whole set of objective-functions. 2. Otherwise ω n 1 0. Let j i (ε) = J i (Y 0 εω n 1 ) (i = 1,...,n 1) so that j i (0) = 0 and j i (0) = ( u i,ω n 1 ) ωn 1 2 < 0, and for sufficiently small strictly-positive ε: j i (ε) = J i (Y 0 εω n 1 ) < 0 ( i n 1) and Slater s qualification condition is satisfied for (1). Thus, the Lagrangian L = J n (Y ) + n 1 i=1 λ i J i (Y ) is stationary w.r.t. Y at Y = Y 0 : n 1 u n + λ i u i = 0 i=1 in which λ i > 0 ( i n 1) since equality constraints hold J i (Y 0 ) = 0 (KKT condition). Normalizing this equation results in the condition. 12 / 102

applied to R N Affine and Vector Structures Notation description and numerical I Usually indistinct When distinction necessary, a dotted symbol is used for an affine subset/subspace In particular for the convex hull U the set of points vectors of a given origin O and equipollent to u U point to. Rigorously speaking, the term "convex hull" applies to U. 13 / 102

applied to R N Affine and Vector Structures A case where O / U Convex hulls description and numerical Ȧ 2 U (affine) O u 1 u 3 I O u U U (vector) u 2 14 / 102

applied to Note that u U: Second consequence description and numerical I u u n = n i=1 α i u i ( n i=1 α i ) n 1 u n = i=1 α i u n,i (u n,i = u i u n ) Thus U A n 1 (or in affine space, U Ȧn 1), where A n 1 is a set of vectors pointing to an affine sub-space Ȧn 1 of dimension at most n 1. Define the orthogonal projection O of O onto Ȧn 1 Since OO Ȧn 1 U: O U ω = OO In this case: u U : (u,ω) = const. = ω 2 15 / 102

applied to description and numerical I Second consequence Proof Since O is the orthogonal projection of O onto Ȧn 1, OO Ȧn 1, and in particular OO U. If additionally O U, vector OO is the minimum-norm element of U: ω = OO. In this case, ω can be calculated by ignoring the inequality constraints, i, α i 0, that are automatically satisfied by the solution. Hence, vector ω realizes the minimum of the quadratic form ω 2 = n i=1 α i u i 2 uniquely subject to the equality constraint: n i=1 α i = 1. The Lagrangian is formed with a unique multiplier λ: L(α,λ) = 1 2 ( n i=1 The stationarity condition requires that: i, ) n α i u i, α i u i + λ i=1 ( n α i 1 i=1 L α i = (u i,ω) + λ = 0 = (u i,ω) = λ = const. = ω 2 By convex combination, the result extends straightforwardly to the whole U. ) 16 / 102

applied to Application to the case of gradients description and numerical I Suppose u i = J i (Y 0 ) ( i); then: either ω 0, and all criteria admit positive Fréchet derivatives in the direction of ω (all equal if ω belongs to the interior U) or ω = 0, and the current design point Y 0 is Pareto stationary: {α i } i=1,2,...,n (α i 0, i; n α i = 1) so that i=1 n i=1 α i u i = 0 17 / 102

applied to description and numerical I Proposition 3: Common Descent Direction By virtue of Propositions 1 and 2 in the particular case where u i = J i = J i(y 0 ) S i (S i : user-supplied scaling constant; S i > 0), two situations are possible at Y = Y 0 : either ω = 0, and the design point Y 0 is Pareto-stationary (or Pareto-optimal); or ω 0, and ω is a descent direction common to all criteria {J i (x)} (i=1,...,n) ; additionally, if ω U, the scalar product (u,ω) (u U), and the Frechet derivatives (u i,ω) are all equal to ω 2. : substitute vector ω to the single-criterion gradient in the steepest-descent method. Proposition 4: Convergence Certain normalization provisions being made, in R N, the Multiple Gradient Descent () converges to a Pareto-stationary point. 18 / 102

applied to Convergence proof description and numerical I Define the criteria to be positive (possibly apply an exp-transform), and at (possibly add an exploding term outside of the working ball): Assume these criteria to be continuous. i, J i (Y) as Y. If the iteration stops in a finite number of steps, a Pareto-stationary point has been reached. Otherwise, the iteration continues indefinitely, generating an infinite sequence of design points, {Y k }. The corresponding sequences of criteria are infinite, positive and monotone decreasing. They are bounded. Hence, the sequence of iterates, {Y k }, is itself bounded and it admits a subsequence converging to say Y. Necessarily, Y is a Pareto-stationary point. To establish this, assume instead that ω, which corresponds to Y, is nonzero. Then for each criterion, there exists a stepsize ρ for which, the variation is finite. These criteria are in finite number. Hence, the smallest ρ wlll cause a finite variation to all criteria, and this is in contradiction with the fact that only infinitely-small variations of the criteria are realized from Y. 19 / 102

applied to description and numerical I to be solved in U R N Usually, but not necessarily : N n. Parameterization Let Remark 1 : practical determination of vector ω ω = Argmin u U u { n n U = u R N / u = α i u i ; α i 0 ( i); i=1 i=1 α i = σ 2 i (i = 1,...,n) in the convex hull α i = 1 to satisfy trivially the inequality constraints (α i 0, i), and transform the equality constraint, n i=1 α i = 1 into n σ 2 i = 1 σ = (σ 1,σ 2,...,σ n ) S n i=1 where S n is the unit sphere of R n and precisely not R N. } 20 / 102

applied to description and numerical I Determining vector ω (cont d) The sphere is easily parameterized using trigonometric functions of n 1 independent arcs φ 1,φ 2,...,φ n 1 : σ 1 = cosφ 1. cosφ 2. cosφ 3..... cosφ n 1 σ 2 = sinφ 1. cosφ 2. cosφ 3..... cosφ n 1 σ 3 = 1. sinφ 2. cosφ 3..... cosφ n 1. σ n 1 = 1. 1..... sinφ n 2. cosφ n 1 σ n = 1. 1..... 1. sinφ n 1 (Consider only : φ i [0,π/2], i and set : φ 0 = π 2.) Let c i = cos 2 φ i and get: (i = 1,...,n) α 1 = c 1. c 2. c 3..... c n 1 α 2 = (1 c 1 ). c 2. c 3..... c n 1 α 3 = 1. (1 c 2 ). c 3..... c n 1. α n 1 = 1. 1..... (1 c n 2 ). c n 1 α n = 1. 1..... 1. (1 c n 1 ) (c 0 = 0, and c i [0,1] for all i 1)... 21 / 102

applied to Determining vector ω (end) The convex hull is thus parameterized in description and numerical I [0,1] n 1 (n : number of criteria) independently of N (dimension of design space). For example, with n = 4 criteria : α 1 = c 1 c 2 c 3 α 2 = (1 c 1 )c 2 c 3 α 3 = (1 c 2 )c 3 α 4 = (1 c 3 ) (c 1,c 2,c 3 ) [0,1] 3 22 / 102

applied to description and numerical I u 2 Case n = 2 Remark 2 : appropriate normalization of the gradients may reveal to be essential : u i = J i (Y 0 )/S i Without gradient normalization Then, ω = OO unless the angle < u 1,u 2 > is acute, and the norms u 1, u 2 are very different; in that case, ω is equal the one of smaller norm. ω = OO u 2 u 1 ω = u 2 = ω u 1 OO u 1 OO With normalization: The equality ω = OO holds automatically = General Case EQUAL FRÉCHET DERIVATIVES IF the vectors {u i } are NOT NORMALIZED, those of smaller norms are more influential to determine the direction of vector ω 23 / 102

applied to description and numerical I Recall Normalizations being examined u i = J i = J i (Y 0 )/S i (S i > 0) = ω δy = ρω = δj i = ( J i,δy ) = ρs i (u i,ω) (u i,ω) = const. if ω does not lie on the edge of convex hull Standard : u i = J i (Y 0 ) J i (Y 0 ) Equal logorithmic variations (whenever ω is not on edge) : Newton-inspired when limj i = 0 : u i = u i = J i (Y 0 ) J i (Y 0 ) J i J i (Y 0 ) 2 J i (Y 0 ) Newton-inspired when limj i 0 : ( ) max u i = J (k 1) i J (k) i,δ J i (Y 0 ), k : iteration no., δ > 0 (small) J i (Y 0 ) 2 24 / 102

applied to description and numerical No, if n > 2 : Is standard normalization sufficient to guarantee that ω = OO? (yielding equal Fréchet derivatives) Unit Sphere I 0 Ȧ 2 u 1 u 2 u 3 O OO / U 25 / 102

applied to Experimenting From Adrien ZERBINATI, on-going doctoral thesis Basic testcase : minimize the functions f(x,y) = 4x 2 + y 2 + xy g(x,y) = (x 1) 2 + 3(y 1) 2 description and numerical I 26 / 102

applied to Basic testcase Convergence from different initial design points DESIGN SPACE FUNCTION SPACE description and numerical (0.5,2) (1.5,2.5) I (-1.5,-2.5) 27 / 102

applied to Fonseca testcase description and numerical Minimize 2 fonctions of 3 variables : ( (x i ± 3 1 ) ) 2 f 1,2 (x 1,x 2,x 3 ) = 1 exp 3 i=1 I 28 / 102

applied to Fonseca testcase (cont d) Convergence from initial design points over a sphere description and numerical DESIGN SPACE FUNCTION SPACE I Continuous but nonconvex front 29 / 102

applied to Fonseca testcase (end) Compare with Pareto Archived Evolution Strategy (PAES) description and numerical I x 3 DESIGN SPACE PAES 2x50 & 6x12 iterate PAES generate 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 0.8 0.6 0.4 0.2 0 0.2 1 0.4 0.6 x 1 0 x 2 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 FUNCTION SPACE iterate PAES generate PAES 2x50 & 6x12 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 30 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 31 / 102

applied to description and numerical I Wing-shape optimization Wave-drag minimization in conjunction with lift maximization Metamodel-assisted (Adrien Zerbinati) Wing-shape geometry extruded from airfoil geometry (initial airfoil : NACA0012), 10 parameters Two-point/two-objective optimization 1 Subsonic Eulerian flow : M = 0.3, α = 8 o, for C L maximization 2 Transonic Eulerian flow : M = 0.83, α = 2 o, for C D minimization - Initial database of 40 design points evolves as follows : 1 Evaluate each new point by two 3D Eulerian flow computations 2 Construct surrogate models for C L and C D (using the entire dataset), and perform to convergence 3 Enrich the database with new points, if appropriate After 6 loops, the database is made of 227 design points (554 flow computations) 32 / 102

applied to 6 loops Wing-shape optimization 2 Convergence of dataset description and numerical -LIFT (subsonic) lift coefficient 0.45 0.5 0.55 0.6 0.65 initial database 1st step 2nd step 3rd step 4th step 5th step 6th step 0.7 I 0.75 0.8 0.85 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 drag coefficient DRAG (transonic) 33 / 102

applied to Wing-shape optimization 3 Low-drag quasi-pareto-optimal shape description and numerical SUBSONIC TRANSONIC I M = 0.3, α = 8 o M = 0.83, α = 2 o 34 / 102

applied to Wing-shape optimization 4 High-lift quasi-pareto-optimal shape description and numerical SUBSONIC TRANSONIC I M = 0.3, α = 8 o M = 0.83, α = 2 o 35 / 102

applied to Wing-shape optimization 5 Intermediate quasi-pareto-optimal shape description and numerical SUBSONIC TRANSONIC I M = 0.3, α = 8 o M = 0.83, α = 2 o 36 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 37 / 102

applied to Description of the test case : description and numerical In-house CFD Solver (Num3sis) : Compressible Navier-Stokes (FV/FE) Parallel computing (MPI) CPU cost : 4h on 32 cores CAD and mesh by GMSH Test case : Mach number : 0.1 Reynolds number : Re 2800 Two-criterion shape optimization Fine mesh (some 700,000 points) I 38 / 102

applied to description and numerical Velocity Variance U mean = 1 V tot u(cel) Vol(cel) d(cel,p o ) ε First objective function σ 2 vel,x = 1 V tot (U mean,x u x (cel)) 2 Vol(cel) d(cel,p o ) ε σ 2 = σ 2 vel,x + σ2 vel,y + σ2 vel,z I 39 / 102

applied to Second objective function description and numerical Pressure loss p k = p mean for inlet (k = i), or outlet (k = o) u k = u mean for in- or out-let p = p i p o + ρ i ui 2 2 ρ ouo 2 2 I 40 / 102

applied to parameters description and numerical I 41 / 102

applied to on surrogate model description and numerical I 42 / 102

applied to description and numerical I 43 / 102

applied to description and numerical Pressure loss 2,5 2 1,5 Identifying the Pareto set Nominal Point Initial database 3rd step 6th step 9th step Initial Pareto front Final Pareto front I 1 1 3 0,5 0,6 0,8 1 1,2 1,4 Velocity variance Figure: metamodel assisted step by step evolution. Initial and final non-dominated sets. 223 solver calls. 2 44 / 102

applied to Velocity Magnitude Streamlines description and numerical 1: BEST VELOCITY VARIANCE 2 : NOMINAL SHAPE I SHAPE COMPARISON 3 : BEST PRESSURE LOSS 45 / 102

applied to Velocity magnitude in a section just after the bend : description and numerical 1: BEST VELOCITY VARIANCE 2 : NOMINAL SHAPE I SECTION POSITION 3 : BEST PRESSURE LOSS 46 / 102

applied to Velocity Magnitude in output section : description and numerical 1 : BEST VELOCITY VARIANCE 2 : NOMINAL SHAPE I SECTION POSITION 3 : BEST PRESSURE LOSS 47 / 102

applied to NOMINAL versus LOWEST VELOCITY VARIANCE description and numerical I 48 / 102

applied to NOMINAL versus LOWEST PRESSURE LOSS description and numerical I 49 / 102

applied to LOWEST VELOCITY VARIANCE versus LOWEST PRESSURE LOSS description and numerical I 50 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 51 / 102

y applied to Dirichlet problem Discrete solution by standard 2nd-order finite-difference method over 40 40 qradrangular mesh description and numerical u = f u = 0 over Ω = [ 1,1] [ 1,1] (Γ = Ω) I u_h 1.5 1 0.5 0 u_h fort.20 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0-0.1-0.2-0.3-0.4-0.5 1 0.5 0 u_h contour lines 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0-0.1-0.2-0.3-0.4-0.5-0.5-1 -0.8-0.6-0.4-0.2 x 0 0.2 0.4 0.6 0.8 1-1-0.8-0.6-0.4-0.2 0 0.20.4 0.6 0.8 1 y -0.5-1 -1-0.5 0 0.5 1 x 52 / 102

applied to Domain partitioning Function value controls at 4 interfaces description and numerical I yielding 4 sub-domain Dirichlet problems with 2 controlled interfaces each 53 / 102

applied to description and numerical I Formal expression over γ 1 (0 x 1; y = 0): s 1 (x) = u y (x,0+ ) u y (x,0 ) = over γ 2 (x = 0; 0 y 1): s 2 (y) = u x (0+,y) u x (0,y) = over γ 3 ( 1 x 0; y = 0): s 3 (x) = u y (x,0+ ) u y (x,0 ) = over γ 4 (x = 0; 1 y 0): s 4 (y) = u x (0+,y) u x (0,y) = Approximation by 2nd-order one-sided finite-differences Interface jumps [ u1 y u ] 4 (x,0); y [ u1 x u ] 2 (0,y); x [ u2 y u ] 3 (x,0); y [ u4 x u ] 3 (0,y). x 54 / 102

applied to description and numerical I Functionals and matching condition Interface functionals 1 J i = 2 s2 i w dγ i := J i (v) γ i that is, explicitly: 1 J 1 = J 3 = 0 0 1 1 1 2 s 1(x) 2 w(x)dx J 2 = 1 2 s 3(x) 2 w(x)dx J 4 = 0 0 1 1 2 s 2(y) 2 w(y)dy (w(t) (t [0, 1]) is an optional weighting function, and w( t) = w(t).) Matching condition J 1 = J 2 = J 3 = J 4 = 0 1 2 s 4(y) 2 w(y)dy 55 / 102

applied to Adjoint problems description and numerical Eight Dirichlet sub-problems p i = 0 (Ω i ) q i = 0 p i = 0 ( Ω i \γ i ) q i = 0 p i = s i w (γ i ) q i = s i+1 w (Ω i ) ( Ω i \γ i+1 ) (γ i+1 ) I Green s formula and s i u i n w = γ i γ i p i n v i + γ i+1 p i n v i+1 s i+1 u i n w = γ i+1 q i n v i + γ i q i n v i+1 γ i+1 56 / 102

applied to description and numerical I Functional gradients 1 1 [ ] J 1 = s 1 (x)s 1(x)w(x)dx u = s 1 (x) 1 0 0 y u 4 (x,0)w(x)dx y 1 (p 1 q 4 ) 1 = (x,0)v p 0 1 0 y 1(x)dx + 0 x (0,y)v 2(y)dy q 4 + 1 x (0,y)v 4(y)dy 1 1 [ ] J 2 = s 2 (y)s 2(y)w(y)dy u = s 2 (y) 1 0 0 x u 2 (0,y)w(y)dy x 1 q 1 1 = 0 y (x,0)v 1(x)dx (q 1 p 2 ) 0 + (0,y)v p 2 0 x 2(y)dy + 1 y (x,0)v 3(x)dx 0 0 [ ] J 3 = s 3 (x)s 3(x)w(x)dx u = s 3 (x) 2 1 1 y u 3 (x,0)w(x)dx y 1 q 0 2 = 0 x (0,y)v 2(y)dy (q 2 p 3 ) 0 + (x,0)v p 3 1 y 3(x)dx 1 x (0,y)v 4(y)dy 0 0 [ ] J 4 = s 4 (y)s 4(y)w(y)dy u = s 4 (y) 4 1 1 x u 3 (0,y)w(y)dy x 1 p 0 4 = 0 y (x,0)v 1(x)dx q 0 3 1 y (x,0)v 3(x)dx (p 4 q 3 ) + (0,y)v 1 x 4(y)dy Conclusion: J 4 i = j=1 γ j G i,j v j dγ j (i = 1,...,4); {G i,j } : partial gradients. 57 / 102

applied to Other technical details description and numerical Dirichlet sub-problems All 12 (4 direct, 4 2 adjoint) sub-problems are solved by direct inverse (discrete sine-transform) I Integrals u h = (Ω X Ω Y ) (Λ X Λ Y ) 1 (Ω X Ω Y ) f h are approximated by the trapezoidal rule 58 / 102

applied to description and numerical Reference method applied to agglomerated criterion Gradient J = 4 i=1 J i 4 J = J i i=1 I Iteration Stepsize is set to εj (l) by fixing v (l+1) = v (l) ρ l J (l) δj = J (l).δv (l) J (l) = ρ l ρ l = εj(l) J (l) 2 2 59 / 102

applied to Quasi-Newton steepest-descent ε = 1 Convergence history Asymptotic Global description and numerical 0.1 0.01 J_1 J_2 J_3 J_4 J = SUM TOTAL 100 10 1 0.1 J_1 J_2 J_3 J_4 J = SUM TOTAL 0.001 0.01 0.001 0.0001 0.0001 1e-05 1e-06 1e-05 1e-07 I 1e-06 0 2 4 6 8 10 12 14 16 18 20 1e-08 1e-09 0 20 40 60 80 100 120 140 160 180 200 60 / 102

applied to Quasi-Newton steepest descent ε = 1 description and numerical History of gradients over 200 iterations J/ v 1 J/ v 2 0.8 0.8 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V1 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V2 I 0.6 0.4 0.2 0-0.2-0.4-0.6 0 2 4 6 8 10 12 14 16 18 20 0.6 0.4 0.2 0-0.2-0.4-0.6 0 2 4 6 8 10 12 14 16 18 20 61 / 102

applied to Quasi-Newton steepest descent ε = 1 description and numerical I History of gradients over 200 iterations J/ v 3 J/ v 4 0.4 0.5 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V3 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V4 0.3 0.4 0.2 0.1 0.3 0 0.2-0.1 0.1-0.2-0.3 0-0.4-0.1-0.5-0.6-0.2 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 62 / 102

y applied to Quasi-Newton steepest descent ε = 1 Discrete solution description and numerical u_h 1.5 1 0.5 0 u_h u_h(x,y) 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0-0.1-0.2-0.3-0.4-0.5 1 0.5 0 u_h contour lines 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0-0.1-0.2-0.3-0.4-0.5-0.5 1-0.5 I -1-0.5 x 0 0.5 1-1 -0.5 0 0.5 y -1-1 -0.5 0 0.5 1 x 63 / 102

applied to Practical determination of minimum-norm element ω description and numerical I ω = 4 i=1 α i u i u i = J i (Y 0 ) using the following parameterization of the convex hull: α 1 = c 1 c 2 c 3 α 2 = (1 c 1 )c 2 c 3 α 3 = (1 c 2 )c 3 α 4 = (1 c 3 ) and (c 1,c 2,c 3 ) [0,1] 3 are discretized by step of 0.01. Best set of coefficients retained. 64 / 102

applied to Stepsize Iteration v (l+1) = v (l) ρ l ω (l) description and numerical Stepsize is set to εj (l) by fixing δj = J (l).δv (l) = ρ l J (l).ω I In practice: ε = 1. εj(l) ρ l = J (l).ω 65 / 102

applied to ε = 1 description and numerical Asymptotic Convergence Unscaled u i = J i Scaled u i = J i J i I 66 / 102

applied to ε = 1 description and numerical Global Convergence Unscaled u i = J i Scaled u i = J i J i I 67 / 102

applied to First conclusions description and numerical I Scaling essential Somewhat deceiving convergence Who is to blame? the insufficiently accurate determination of ω; the non-optimality of the scaling of gradients; the non-optimality of the step-size, the parameter ε being maintained equal to 1 throughout; the large dimension of the design space, here 76 (4 interfaces associated with 19 d.o.f. s). 68 / 102

applied to description and numerical I : a direct algorithm Valid when gradients are linearly independent User-supplied scaling factors {S i } (i=1,,n), and scaled gradients J i = J i(y 0 ) S i (S i > 0; e.g. S i = J i for logarithmic gradients). Perform Gram-Schmidt with special normalization Set u 1 = J 1 For i = 2,...,n, set: u i = J i k<i c i,k u k where: ( ) A i J i,u k k < i : c i,k = ), and: (u k,u k 1 c i,k A i = k<i ε i if nonzero otherwise (ε i arbitrary, but small) 69 / 102

applied to Consequences Element ω always in the interior of convex hull description and numerical ω = n i=1 1 α i u i α i = u i 2 n j=1 = 1 u j 2 This implies equal projections of ω onto u k 1 < 1 u 1 + i 2 j i u j 2 I and finally: k : ( u k,ω ) = α k u k 2 = ω 2 ( J i,ω ) = ω 2 ( i) (with a possibly-modified scale S i = (1 + ε i )S i ), that is: the same positive constant. EQUAL FRÉCHET DERIVATIVES FORMED WITH SCALED GRADIENTS 70 / 102

applied to Geometrical demonstration description and numerical u 3 u U Given J i = J i (Y 0 )/S i, perform G.-S. process to compute u i = [J i k<i c i,k u k ]/A i ; then: O U = ω = OO = ( u1,ω ) = ( u 2,ω ) = ( u 3,ω ) = ω 2 J i = A i u i + k<i c i,k u ( k = J i,ω ) = ( ) A i + c i,k ω 2 = ω 2 = const. k<i }{{} = 1 by normalization I thru A i O u 1 O u 2 71 / 102

applied to description and numerical b: automatic rescale Do not let constant A i become negative; instead define A i = 1 c i,k k<i only if this number is strictly-positive. Otherwise, use modified scale: ( S i = c i,k )S i k<i I ( automatic rescale ), so that: c i,k = ( ) 1 c i,k c i,k, k<i c i,k = 1, k<i and set A i = ε i, for some small ε i. Same formal conclusion: the Fréchet derivatives are equal; but the value is much larger, and (at least) one criterion has been rescaled 72 / 102

applied to description and numerical Some open questions ω II = ω? Yes, if n = 2 and angle obtuse No, in general. Is the following implication I limω II = 0 = limω = 0 true? (assuming the limit is the result of the iteration) In which order should we perform the Gram-Schmidt process? Etc 73 / 102

applied to ε = 1 description and numerical Global convergence Unscaled u i = J i Scaled u i = J i J i I 74 / 102

applied to b ε = 1 description and numerical Global convergence Unscaled u i = J i automatic rescale on Scaled u i = J i J i automatic rescale on I 75 / 102

applied to : Recap description and numerical Convergence over 500 iterations Unscaled u i = J i automatic rescale off Scaled u i = J i J i automatic rescale on I 76 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 77 / 102

applied to I (Strelitzia reginae) description and numerical I 78 / 102

applied to I (Strelitzia reginae) description and numerical I 79 / 102

applied to I (Strelitzia reginae) description and numerical u 1 ω u 2 I 80 / 102

applied to I (Strelitzia reginae) description and numerical ω I 81 / 102

applied to description and numerical I I - definition 1 Start from scaled gradients, {J i }, and duplicates, {g i}, J i = J i(y 0 ) (S i : user-supplied scale) S i and proceed in 3 steps: A,B and C. A: Initialization Set (justified afterwards) u 1 := g 1 = J k / k = Argmax i min j ( J j,j i ) ( J i,j i ) Set n n lower-triangular array c = {c i,j } (i j) to 0. Set, conservatively, I := n (expected number of computed orthogonal basis vectors). Assign some appropriate value to a cut-off constant a : 0 a < 1. (Note: main diagonal of array c is to contain cumulative row-sums.) 82 / 102

applied to description and numerical I I - definition 2 B: Main G.-S. loop; for i = 2,3,...,(at most) n, do: 1. Calculate the i 1st column of coefficients: ( ) gj,u i 1 c j,i 1 = ( ) ( j = i,...,n) ui 1,u i 1 and update the cumulative row-sums: 2. Test: c j,j := c j,j + c j,j 1 = c j,k k<i If the following condition is satisfied c j,j > a ( j = i,...,n) ( j = i,...,n) set I := i 1, and interrupt the Gram-Schmidt process (go to 3.). Otherwise, compute next orthogonal vector u i as follows: 83 / 102

applied to description and numerical I I - definition 3 - Identify index l = Argmin j {c j,j / i j n} (Note that c l,l a < 1.) - Permute information associated with i and l : g-vectors g i g l, rows i and l of array c and cumulative row-sums c i,i c l,l. - Set A i = 1 c i,i 1 a > 0 (c i,i = former-c l,l a), and calculate (in which g i = former-g l, c i,k = former-c l,k ). u i = g i k<i c i,k u k A i - If u i 0, return to 1. with incremented i; otherwise: g i = c i,k u k = c i,k g k k<i k<i where the {c i,k } are calculated by backward substitution. Then, if c i,k 0 ( k < i): detected: STOP iteration; otherwise (ambiguous exceptional case): STOP G.-S. process; compute original ω and go to C. 84 / 102

applied to I - definition 4 description and numerical I 3. Calculate ω as the minimum-norm element in the convex hull of {u 1,u 2,...,u I }: ω = I i=1 α i u i 0 1 1 α i = = u i 2 I 1 u j=1 u j 2 1 + i 2 j i (Note that here all computed u i 0, and 0 < α i < 1.) u j 2 C: If ω < TOL, STOP iteration; otherwise, perform descent step and return to B. 85 / 102

applied to description and numerical I If I = n : I - consequences I = b + automatic ordering in Gram-Schmidt process making rescale unnecessary since, by construction i : A i 1 a > 0 Otherwise, I < n (incomplete Gram-Schmidt process): First I Fréchet derivatives: ( gi,ω ) = ( u i,ω ) = ω 2 > 0 ( i = 1,...,I) Subsequent ones (i > I): ω = I j=1 ( gi,ω ) = α j u j g i = I k=1 I k=1 c i,k ( uk,ω ) = c i,k u k + v i v i {u 1,u 2,...,u I } I k=1 c i,k ω 2 = c i,i ω 2 > a ω 2 > 0 86 / 102

applied to Choice of g 1 Justification description and numerical At initialization, we have set u 1 := g 1 = J k / k = Argmax i min j ( J j,j i ) ( J i,j i ) I This was equivalent to maximizing c 2,1 = c 2,2, that is, maximizing the least cumulative row-sum, at first estimation. (Recall that the favorable situation is when all cumulative row-sums are positive, or > a.) 87 / 102

applied to Expected benefits description and numerical I Automatic ordering and rescale When gradients exhibit a general trend, ω is found in fewer steps, and has a larger norm Larger ω implies larger directional derivatives, and greater efficiency of descent step ( gi, ω ) = ω or a ω ω 88 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 89 / 102

applied to description and numerical I When Hessians are known Adressing the question of scaling For objective function J i (Y ) alone, Newton s iteration writes Y 1 = Y 0 p i H i p i = J i (Y 0 ) Split p i into orthogonal components where: q i = ( ) p i, J i (Y 0 ) Define scaled gradient Apply I p i = q i + r i J i (Y 0 ) 2 J i (Y 0 ) and r i J i (Y 0 ) J i = q i Equivalently, set scaling factor as follows: S i = J i (Y ( 0 ) 2 ) p i, J i (Y 0 ) 90 / 102

applied to Variant b Use BFGS iterative estimates in place of exact Hessians description and numerical I H (k+1) i = H(k) i k : iteration index ( i = 1,...,n) 1 H(k) i s (k)t H(k) s (k) i in which: H(0) i = Id s (k) T (k)t s H(k) s (k) = Y (k+1) Y (k) i + z (k) i = J i (Y (k+1) ) J i (Y (k) ) z (k)t i 1 (k) z s (k) i z (k)t i 91 / 102

applied to Recommended Step-Size J i (Y 0 ρω) = J i (Y 0 ) ρ ( J i (Y 0 ),ω ) + 1 2 ρ2( H i ω,ω ) +.... description and numerical I δj i := J i (Y 0 ) J i (Y 0 ρω) := A i ρ 1 2 B iρ 2, where: A i = S i a i, B i = S i b i, and: a i = ( J i,ω ) b i = ( H i ω,ω ) /S i. If the scales {S i } are truly relevant for the objective-functions ρ = Argmax ρ min δ i. i where: δ i = δj i S i = a i ρ 1 2 b iρ 2. 92 / 102

applied to I = n Recommended Step-Size 2 description and numerical I ρ = ρ I := ω 2 b I, where b I = max i I b i is assumed to be positive. I < n ρ II := a ω 2 b II, where b II = max i>i b i is assumed to be positive. Also 2(1 a) ω 2 ρ = b I b II at which point the two bounding parabolas intersect: ω 2 ρ 1 2 b Iρ 2 = a ω 2 ρ 1 2 b IIρ 2. 93 / 102

applied to δ a) I = n Recommended Step-Size 3 b) I < n, ρ > max(ρ δ I,ρ II ) ω 2 ω 2 a ω 2 description and numerical δ ρ I c) I < n, ρ (ρ I,ρ II ) ω 2 a ω 2 ρ δ ρ I ρ II ρ d) I < n, ρ < min(ρ I,ρ II ) ω 2 ρ I ρ I ρ ρ II ρ ρ ρ I ρ II ρ 94 / 102

applied to Recommended Step-Size end description and numerical If I = n, ρ = ρ I. If I < n, ρ is the element of the triplet {ρ I,ρ II,ρ } which separates the other two. I 95 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 96 / 102

applied to description and numerical I Multiple-Gradient Descent CONCLUSIONS Theory Notion of Pareto stationarity introduced to characterize "standard Pareto-optimal points" : permits to identify a descent direction common to arbitrary number of criteria, knowing the local gradients in number at most equal to the number of design parameters Proof of convergence of to Pareto-stationary points Capability to identify the Pareto front demonstrated for mathematical test-cases; non-convexity of Pareto front not a problem : a direct procedure, faster and more accurate, when gradients are linearly independent; variant b (with automatic rescale) recommended On-going research: scaling, preconditioning, step-size adjustment, what to do to extend to cases of linearly-dependent gradients,... 97 / 102

applied to CONCLUSIONS Applications description and numerical I Implementation and first experiments Validation with analytical test cases and comparison with a Pareto capturing method (PAES) Extension to Meta-Model-Assisted and Automobile Cooling-System Successful design experiments with 3D compressible flow (Euler and Navier-Stokes) On-going development of meta-model-assisted and testing of 3D external aerodynamics case to be presented at ECCOMAS 2012, Vienna, Austria) 98 / 102

applied to CONCLUSIONS Applications description and numerical I Domain-partitioning model problem The method works, but it is not as cost-efficient as the standard quasi-newton method applied to the agglomerated criterion; but the test-case was peculiar in several ways: Pareto front and set reduced to a singleton large dimension of the design space (4 criteria, 76 design variables) robust procedure to adjust the step-size needed determination of ω in the basic method should be made more accurately, perhaps using iterative refinement scaling of gradients found important but never really analyzed completely (logarithmic gradients appropriate for vanishing criteria) general preconditioning not yet clear (on-going) The direct variant found faster and more robust; with automatic rescaling procedure on, b seems to exhibit quadratic convergence. 99 / 102

applied to description and numerical I 1 2 3 4 description and numerical 5 6 I Outline 100 / 102

applied to Flirting with the Pareto front is combined with a locally-adapted Nash game based on a territory splitting that preserves to second-order the condition Cooperation AND Competition description and numerical I COOPERATION () : The design point is steered to the Pareto front. COMPETITION (Nash game) : at start (from the Pareto front) : α 1 J 1 + α 2 J 2 = 0; then let : J A = α 1 J 1 + α 2 J 2, et J B = J 2, and adapt territory-splitting to best preserve J A ; then : the trajectory remains tangent to the Pareto front. 101 / 102

applied to Some references description and numerical I Reports from INRIA Open Archive: M. G. D. A., INRIA Research Report No. 6953 (2009), Revised version, October 2012 (Open Archive : http://hal.inria.fr/inria-00389811) Related INRIA Research Reports: Nos. 7667 (2011), 7922 (2012), 7968 (2012), 8068 (2012) Other C. R. Acad. Sci. Paris, Ser. I 350 (2012) 313-318 102 / 102