Development of a Deep Recurrent Neural Network Controller for Flight Applications

Similar documents
Introduction to Reinforcement Learning. CMPT 882 Mar. 18

REINFORCEMENT LEARNING

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018

Improving PILCO with Bayesian Neural Network Dynamics Models

Normalization Techniques in Training of Deep Neural Networks

Recurrent Latent Variable Networks for Session-Based Recommendation

PATTERN RECOGNITION AND MACHINE LEARNING

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Learning Tetris. 1 Tetris. February 3, 2009

attention mechanisms and generative models

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Neural Map. Structured Memory for Deep RL. Emilio Parisotto

On the use of Long-Short Term Memory neural networks for time series prediction

Introduction to Deep Neural Networks

Reinforcement Learning and NLP

ECE521 Lectures 9 Fully Connected Neural Networks

Reinforcement Learning as Variational Inference: Two Recent Approaches

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

Improved Learning through Augmenting the Loss

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

Long-Short Term Memory and Other Gated RNNs

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Autoregressive Neural Models for Statistical Parametric Speech Synthesis

Spatial Transformation

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

HMM part 1. Dr Philip Jackson

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Recurrent Neural Networks. Jian Tang

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Autonomous Helicopter Flight via Reinforcement Learning

Reinforcement Learning: An Introduction

Lecture 17: Neural Networks and Deep Learning

VIME: Variational Information Maximizing Exploration

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

Reservoir Computing and Echo State Networks

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017

The connection of dropout and Bayesian statistics

Model-Based Reinforcement Learning with Continuous States and Actions

Expectation Propagation in Dynamical Systems

Reinforcement Learning

Stochastic Analogues to Deterministic Optimizers

Bayesian Machine Learning

Approximate Q-Learning. Dan Weld / University of Washington

Deep Learning Architectures and Algorithms

Bayesian Deep Learning

Machine Learning and Adaptive Systems. Lectures 3 & 4

Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More. March 8, 2017

Chapter 3: The Reinforcement Learning Problem

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

Recurrent Neural Networks. COMP-550 Oct 5, 2017

What is flight dynamics? AE540: Flight Dynamics and Control I. What is flight control? Is the study of aircraft motion and its characteristics.

Reinforcement Learning. Machine Learning, Fall 2010

Gradient Methods for Markov Decision Processes

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

RECURRENT NETWORKS I. Philipp Krähenbühl

Reinforcement Learning

Nonlinear Optimization Methods for Machine Learning

COMP9444 Neural Networks and Deep Learning 10. Deep Reinforcement Learning. COMP9444 c Alan Blair, 2017

A Blade Element Approach to Modeling Aerodynamic Flight of an Insect-scale Robot

Reinforcement. Function Approximation. Learning with KATJA HOFMANN. Researcher, MSR Cambridge

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Ordonnancement robuste de réseaux de capteurs sans fil pour le suivi d une cible mobile sous incertitudes

Deep Learning & Neural Networks Lecture 4

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

CS Machine Learning Qualifying Exam

arxiv: v3 [cs.lg] 14 Jan 2018

Recurrent Neural Network

Optimization for neural networks

UNSUPERVISED LEARNING

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Independent Component Analysis and Unsupervised Learning

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Optimal Control with Learned Forward Models

Inverse Optimal Control

Introduction of Reinforcement Learning

Deep Learning Recurrent Networks 2/28/2018

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control

Lecture 1: March 7, 2018

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Stochastic Optimization Algorithms Beyond SG

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Probabilistic & Bayesian deep learning. Andreas Damianou

Payments System Design Using Reinforcement Learning: A Progress Report

STATE GENERALIZATION WITH SUPPORT VECTOR MACHINES IN REINFORCEMENT LEARNING. Ryo Goto, Toshihiro Matsui and Hiroshi Matsuo

Transcription:

Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer Engineering University of Florida

Outline Inspiration DLC Limitations and Research Goals Background Direct Policy Search Flight Vehicle (Plant) Model Architecture - Deep Learning Flight Control Optimization - Deep Learning Controller Simulation Results Conclusions Future Work Distribution A: Approved for public release; distribution is unlimited

Inspiration Deep Learning (DL): Dominant performance in multiple machine learning areas: Speech, language, vision, face recognition, etc. Replaces time consuming typical hand-tuned machine learning methods Key Breakthroughs in establishing Deep Learning: Optimization methods: Stochastic Gradient Descent, Hessian-free, Nestorov s momentum, etc. Deep recurrent neural network (RNN) architectures: deep stacked RNN, deep bidirectional RNN, etc. RNN Modules: Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) Automatic gradient computation Parallel computing Connecting Deep Learning to Control: Guided Policy Search uses trajectory optimization to assist policy learning (S. Levine and V. Koltun, 2013) Hessian-free Optimization deep RNN learning control with uncertainties (I. Sutskever, 2013)

DLC Limitations and Research Goals Policy Search/Deep Learning (DL) Control Limitations: Large number of computations at each time step No standard training procedures for control tasks Few analysis tools for deep neural network architectures Non-convex form of optimization provides few guarantees Lack of research in optimization with regards to robustness Most RL approaches use the combination of modeled dynamics and real-life trial to tune policy (not useful for flight control) Research Goals Develop a robust deep recurrent neural network (RNN) controller with gated recurrent unit (GRU) modules for a highly agile high-speed flight vehicle that is trained using a set of sample trajectories that contain disturbances, aerodynamic uncertainties, and significant control attenuation/amplification in the plant dynamics during optimization.

Direct Policy Search Background Reinforcement Learning (RL): Develop methods to sufficiently train an agent by maximizing a cost Adaptive function through Controller: repeated interactions with its environment. Markovian Dynamics Adaptive Longitudinal Controller: Short-Period Dynamics for High-Speed Flight Vehicle: x t+1 = f x t, u t + w, x 0 ~p(x 0 ) Find Parametrized Policy (π Θ ) for the finite horizon problem: π t Θ = aaaaax πθ J Θ = aaaaax π Σ f t=1 γ t E r x t, u t π Θ, γ [0,1] Certainty-Equivalence (CE) Assumption: The optimal policy for the learned model corresponds to the optimal policy for the true dynamics Adaptive Controller: How to use models for long-term predictions: 1. Stochastic inference (i.e. trajectory sampling) 2. Deterministic approximate inference *M. P. Deisenroth, et al., A survey on policy search for robotics, Foundations and Trends in Robotics, vol. 2, 2013.

Direct Policy Search Background Stochastic Sampling: Expected long-term reward: Adaptive Controller: t J Θ = aaaaax π Σ f t=1 γ t Ε[r(x t, u t ) π Θ ] Approximation of J Θ : Adaptive Controller: N J Θ = 1 N Σ i=1 t f γ t r(x i t ) lim J Θ = J Θ N Σ t=1 Deterministic Approximations: Approximation of p x t : (e. g. Unscented Transformation or Moment Matching) Adaptive Controller: E r x t, u t π Θ = r x t p x t dx t p x t Ɲ x t μ x x t, Σ t *M. P. Deisenroth, et al., A survey on policy search for robotics, Foundations and Trends in Robotics, vol. 2, 2013.

Deep Learning based Flight Control Deep Learning Training Architecture Generalized Form of the Discrete Plant Dynamics x t are the states of the plant u t aaa is the actuator output y t is the output vector ζ p is the plant noise λ u is the control effectiveness d u is an input disturbance ρ u, ρ α, ρ q are uncertainty parameters

Flight Vehicle Model Longitudinal Rigid Body Dynamics: Force and Moment Equations: Aerodynamic Coefficients (Longitudinal):

Deep Learning Controller - Architecture Stacked Recurrent Neural Network (S-RNN) Controller Input Vector: c t = [e i, α, q, q ] e = y sss y ccc t f e i = e dd 0 Θ i matrix of parameters for each GRU module Θ i = U u, W u, U r, W r, U h, W h, b 1, b 2, b 3 Θ total parameters of the controller Θ = [Θ 1, Θ 2,, Θ L, V, c] L total layers of GRU modules

Deep Learning Controller - Optimization Estimated Expected Long-Term Reward: J (Θ) = 1 N Σ i=1 N J i (Θ) Cost function for each trajectory, i: t J i (Θ) = Σ f t=0 γ t χ x t, u t lllll = P: only uncertainties lllll = R: uncertainties, noise, and disturbances Instantaneous Measurement: χ x t, u t = k 1e 2 t + k 2 f u 2 k 3 f 2 e + k 4 f u 2 f e = max e t b e, 0 f u = max ( u t b u, 0) ii lllll = P ii lllll = R R α, R q, R u, Λ u ~UUUUUUU mmm, mmm [ρ i α, ρ i q, ρ i u, λ i u ] are uncertainties (ith trajectory) N is the number of sampled trajectories t f is the duration of each sampled trajectory k 1, k 2, k 3, k 4 are positive constant gains b e, b u are static constant bounds of the funnel e t = y sss y rrr Time-varying funnel: β u t = x R n U x, t b u t β e t = x R n E x, t b e t

DLC Incremental Training Optimization Specifications: RNN/GRU, RNN, TD-FNN L-BFGS (Quasi-Newton) Optimization N = 2,970 Sample Trajectories 540 Performance Trajectories 2,430 Robust Trajectories 3.1 GHz PC with 256 GB RAM and 10 cores Parallel Processing ~40 hours for 1,000 sample trajectories for optimization

DLC - Results Flight Condition: Initial Conditions: Augmented polynomial short period model:

DLC - Results Flight Condition: Initial Conditions: Augmented polynomial short period model: Performance Metrics: AAA = 1 Σ N i=1 N t Σ f t=1 e t AAA = 1 N Σ i=1 N CCC = Σ M j=1 CCC = Σ M j=1 t Σ f t=1 AAA AAA u t DLC Performance: 66% reduction in CTE, 91.5% reduction in CCR N number of trajectories for each analysis model t f is the duration of each sampled trajectory CTE is the cumulative tracking error CCR is the cumulative control rate

DLC - Results Flight Condition: Initial Conditions: Augmented polynomial short period model: Gain Scheduled Controller Deep Learning Controller

DLC - Results Gain Scheduled Controller Gain Scheduled Controller Deep Learning Controller No uncertainty or disturbances

Contribution and Conclusion Created a novel training procedure focused on bringing deep learning benefits to flight control. Trained controller using a set of sample trajectories that contain disturbances, aerodynamic uncertainties, and significant control attenuation/amplification in the plant dynamics during optimization. We found benefits of using a piecewise cost function that allows the designer to solve both robustness and performance criteria simultaneously. We utilized an incremental initialization training procedure for deep recurrent neural networks.

Future Work Pursue a vehicle model with flexible body effects, time delays, controller effectiveness, center of gravity changes, and aerodynamic parameter variations. Explore improving parameter convergence and analytic guarantees: Kullback-Leibler (KL) divergence, importance sampling, etc. Pursue development/use of robustness analysis tools for deep learning controllers to provide region of attraction estimates and time delay margins: sum-of-squares programming etc.

Questions?