Machine Learning: A change of paradigm in Flow Control?

Size: px

Start display at page:

Download "Machine Learning: A change of paradigm in Flow Control?"

Miles Weaver
5 years ago
Views:

1 Machine Learning: A change of paradigm in Flow Control? Laurent CORDIER Laurent.Cordier@univ-poitiers.fr PPRIME Institute Poitiers, France

2 GDR CDD, Nantes, November 19, 2015 p.1/29 Flow control State of the art 1. Phenomenological approaches Pros: Physically based Work experimentally Cons: Restricted to one type of flow physics 2. Model-based control (a) Based on identification: ARMAX, ERA, OKID Pros: Pure data driven Work experimentally Cons: Restricted to one type of flow physics (linearized behaviour) (b) Based on first principle equations and (optionally) data Pros: Rigorous approach Cons: Purely numerical Too fragile to work in most of the real configurations

3 1 Model-based control GDR CDD, Nantes, November 19, 2015 p.2/29

4 GDR CDD, Nantes, November 19, 2015 p.3/29 t s st 3 t y V c J r x t

5 GDR CDD, Nantes, November 19, 2015 p.4/29 t s st 3 t y V c J r x t

6 GDR CDD, Nantes, November 19, 2015 p.5/29 t s st 3 t y V c J r x t

7 GDR CDD, Nantes, November 19, 2015 p.6/29 t s st 3 t y V c J r x t

8 2 Machine Learning GDR CDD, Nantes, November 19, 2015 p.7/29

9 GDR CDD, Nantes, November 19, 2015 p.8/29 Machine Learning Definitions and Applications Arthur Samuel (1959) Field of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell (1998) Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Applications: Database mining, Games, Autonomous helicopter, handwriting recognition, Natural Language Processing,...

i=1. Classification or pattern recognition Regression 2.

10 Machine Learning 1. Supervised Learning Sub categories Learn a mapping from inputsxto outputsy given a labeled setd SL = {x i,y i } N i=1. Classification or pattern recognition Regression 2. Unsupervised Learning Given only inputsd UL = {x i } N i=1, discover interesting patterns Clustering Dimensionality Reduction: PCA 3. Reinforcement Learning How to take actions in an environment so as to maximize a cumulative reward. GDR CDD, Nantes, November 19, 2015 p.9/29

11 3 Genetic Programming Control GDR CDD, Nantes, November 19, 2015 p.10/29

12 GDR CDD, Nantes, November 19, 2015 p.11/29 Genetic programming basics Step 1: 1st generation with random nonlinear control laws b 1 m = K 1 m(s), m = 1,..., 100 Steps 2...n: Biologically inspired optimization of the control laws based on the fitness grades J [b = K(s)] J.R. Koza 1992 Genetic Programming, The MIT Press 11

13 GDR CDD, Nantes, November 19, 2015 p.12/29 Geneticprogramming: operations REPLICATION CROSS-OVER b b b b + * exp + * exp + * exp s 3 / log s 1 C s 2 s 1 C s 2 * s 1 C s 2 s 1 s 1 MUTATION b b b b + / sin * cos sin * tanh * log exp * s 1 C s 2 s 3 C s 4 C / s 1 s 1 C s 3

for Modelling and Control in Fluid Mechanics 14/04/2015 Poitiers, France

Cordier 1, B. R. Noack 1, J. P. Bonnet 1, T. Duriez 2, M. Segond 3, M.

Brunton 4 1 Institut PPRIME, CNRS, Poitiers, FRANCE 2 CONICET, Buenos Aires,

14 GDR CDD, Nantes, November 19, 2015 p.13/29 Bayesian Control st International Workshop on Bayesian Inference for Modelling and Control in Fluid Mechanics 14/04/2015 Poitiers, France Genetic programming (GP) for closed loop flow control V. Parezanovic 1, L. Cordier 1, B. R. Noack 1, J. P. Bonnet 1, T. Duriez 2, M. Segond 3, M. Abel 3, S. L. Brunton 4 1 Institut PPRIME, CNRS, Poitiers, FRANCE 2 CONICET, Buenos Aires, ARGENTINA 3 Ambrosys GmbH, Potsdam, GERMANY 4 University of Washington, Seattle, USA Project support: ANR Chair of Excellence "Closed-loop control of turbulent shear flows using reduced-order models (TUCOROM) 1

15 GDR CDD, Nantes, November 19, 2015 p.14/29 Experimental setup TUCOROM demonstrator for control of the mixing layer Fans 1, 2 (side by side) Flow intake Splitter plate Grid Tripwires 2D Displacement system Hot-wire rake 2650 [mm] Honeycomb Foam Ramp 1000 [mm] Settling chambers Convergent Test section Diffuser [mm] Wind tunnel: long test section, independently driven streams, velocity range [0:12m/s] 2

16 GDR CDD, Nantes, November 19, 2015 p.15/29 Machine Learning Control design Mixing layer plant Heaviside function? LEARNING PHASE (Genetic Programming) 10

17 GDR CDD, Nantes, November 19, 2015 p.16/29 Machine learningcontrol design Mixing layer plant Heaviside function s 1 s n INDEPENDENT REAL-TIME CONTROLLER Learning module is disconnected... 14

5 1 2 u' [m 2 /s 2 ] T y [mm] 56-88 Unactuated 56 OpenLoop f a =21[Hz] dc=50% -88 56-88 0 1 t [s] Max K (x=200mm)

18 GDR CDD, Nantes, November 19, 2015 p.17/29 MLC results(i) Frequencyselection Max W (x=200mm) (mixing layer thickness) y [mm] u' [m 2 /s 2 ] T y [mm] Unactuated 56 OpenLoop f a =21[Hz] dc=50% t [s] Max K (x=200mm) (mixing layer fluctuation energy) MLC f a =19[Hz] dc=48% +132% +120% y [mm] y [mm] Unactuated OpenLoop f a =12[Hz] dc=70% +144% u' [m 2 /s 2 ] T t [s] MLC f a =12[Hz] dc=62% +152% 15

19 4 Cluster Reduced-Order Model GDR CDD, Nantes, November 19, 2015 p.18/29

20 GDR CDD, Nantes, November 19, 2015 p.19/29

21 GDR CDD, Nantes, November 19, 2015 p.20/29

22 GDR CDD, Nantes, November 19, 2015 p.21/29

23 GDR CDD, Nantes, November 19, 2015 p.22/29

24 GDR CDD, Nantes, November 19, 2015 p.23/29

25 5 Reinforcement Learning GDR CDD, Nantes, November 19, 2015 p.24/29

26 Reinforcement Learning set-up GDR CDD, Nantes, November 19, 2015 p.25/29 Agent action a reward r state s Environment Agent interacts with environment to gain knowledge Explores and receives rewards Actions change the state of the environment Choose actions to maximize long-term reward

27 GDR CDD, Nantes, November 19, 2015 p.26/29 Markov Decision Process Definition Objective: S: State space (finite) ;s k S A: Action space (finite) ;a k A Transition probabilityp(s k+1 s k,a k ) r: Reward function γ [0, 1[: Discount factor Π: Policy Deterministic: a = Π(s) Stochastic: p Π (a s) = Π(a s) Find a policyπ that maximizes the expected long-term reward [ + ] V Π (s) = E γ k r k+1 s 0 = s,π k=0 r k+1 = r k+1 (s k,a k,s k+1 )

28 Hash functions (2) GDR CDD, Nantes, November 19, 2015 p.27/29

29 GDR CDD, Nantes, November 19, 2015 p.28/29 2D cylinder wake Control performance and effect of the noise No noise With noise Left: a(t). Right: C d (t) under three different control policies: 0-command (black), best known command ( ora ), and present approach.

30 Questions??? GDR CDD, Nantes, November 19, 2015 p.29/29

Closed-loop turbulence control using machine learning Stop thinking and let your PC and experiment do the hard work!

Closed-loop turbulence control using machine learning Stop thinking and let your PC and experiment do the hard work! B. Noack 3, T. Duriez 1,3, L. Cordier 3, K. von Krbek 3, E. Kaiser 3,4, V. Parezanovic