Optimal Control with Learned Forward Models

Size: px
Start display at page:

Download "Optimal Control with Learned Forward Models"

Transcription

1 Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1

2 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V (x) π (u x) Policy Search 2 π (u x) Optimal Control with Model Learning Value Function Methods

3 Goal of this Lecture 1. Step: Learn an Forward Model 2. Step: Use your favorite Optimal Control Method to get an optimal policy action Robot Policy Forward Model state action Forward Model state 3

4 Outline of the Lecture 1.Introduction to Optimal Control 2.Solving Linear-Quadratic Optimal Control Problems 3.Optimal Control with Learned Models 4.Hot story: Marc Deisenroth s PILCO Approach 5.Final Remarks 4

5 Example: Ball Paddling What are the states x? Ball Velocity Racket Velocity Ball Position Racket Position 5

6 Example: Ball Paddling What are the actions u? All motor torques? If you do not have a model... Joint Accelerations? Perfect, if you have a good model... Maybe identify the proper degrees of freedom? Accelerations in Operational Space? Ideally!... but only if you have a good operational space control law! 6

7 Example: Ball Paddling What are good rewards r? Task knowledge or success/failure? For some algorithms rewards in {1,0} are perfect... Real problems often require reward shaping... What s a good reward for our problem? Height of the ball? Distance between ball and the paddle? Ball needs to move in a certain region? All of the above? Additional punishments? All of these together do the job! 7

8 So can we get this to work? The state space has at least 12 dimensions. The action space has at least 3 dimensions. Can discretizations deal with such spaces? No! Finding an Optimal Value Function is limited by the curse of dimensionality. 8

9 Example: Real world application... So how can you get this by RL? 9

10 Outline of the Lecture 1.Introduction to Optimal Control 2.Solving Linear-Quadratic Optimal Control Problems 3.Optimal Control with Learned Models 4.Hot story: Marc Deisenroth s PILCO Approach 5.Final Remarks 10

11 Optimal Control Goal The goal of optimal control is find a policy u ~ π(x) such that J(π) = lim T 1 T E T k=0 r(x t, u t ) is maximal for a given reward function such as r(x, u) = x T Qx u T Ru system: x = Ax + Bu + For Simplicity of Derivation! Nothing Changes! 11...so how do we solve this?!

12 Plot the Problem x = Ax + Bu + r(x, u) = x T Qx u T Ru 12

13 Bellman Recipe: Steps At the last step, we have the value function V T (x) =0! 2.For t=t-1, compute optimal policy such that πt (u x) =argmax π r(x, u)+v t+1 (f(x, u)) determined by d du 13 r(x, u)+v t+1 (f(x, u)) = 0 d du x T Qx u T Ru) = 0 u = 0

14 Bellman Recipe: Step Obtain next value function V t (x) =max π r(x, u)+v t+1 (f(x, u)) V t+1(x) = r(x, u )+V t+1(f(x, u )) = x T Qx u T Ru = x T Qx 4.As not converged, go back to Step 2. 14!

15 Bellman Recipe: Step 2 again! 2.For t<t-1, compute optimal policy such that πt (u x) =argmax π r(x, u)+v t+1 (f(x, u)) determined by d r(x, u)+v du t+1 (f(x, u)) = 0 d x T Qx u T Ru f(x, u) T P t+1 f(x, u) du = 0 d x T Qx u T Ru (Ax + Bu) T P t+1 (Ax + Bu) du = 0 which implies d du Ru B T P t+1 (Ax + Bu) = 0 Policy Parameters! 15 u = (R + B T P t+1 B) 1 B T P t+1 Ax = θ t x!

16 Bellman Recipe: Step Obtain next value function V t (x) =max π r(x, u)+v t+1 (f(x, u)) 4. We have a converged to Recursion: P t = Q θ T t+1rθ t+1 (A + Bθ t+1 ) T P t+1 +(A + Bθ t+1 ) 16 V t (x) = r(x, u )+V t+1(ax + Bu ) = x T Qx (θ t x) T R(θ t x) (Ax + Bθ t+1 x) T P t+1 +(Ax + Bθ t x) = x T [Q θ T t Rθ t (A + Bθ t ) T P t+1 +(A + Bθ t )]x = x T P t x θ t = (R + B T P t+1 B) 1 B T P t+1 A!

17 Optimal Solution u = θ t x 17!

18 Outline of the Lecture 1.Introduction to Optimal Control 2.Solving Linear-Quadratic Optimal Control Problems 3.Optimal Control with Learned Models 4.Hot story: Marc Deisenroth s PILCO Approach 5.Final Remarks 18

19 Example: Swing-Up System Reward 19

20 Possible: Learn Solutions only where needed! If you know places where we start we can just look ahead! 20 Atkeson & Schaal, 1995

21 21 Atkeson & Schaal, 1995 Local Solutions Every smooth function can be modeled with a Taylor expansion f(x) =f(a)+ df dx (x a)+ 1 (x a)t d2 x=a 2 f dx 2 (x a)+... x=a Hence, we can also approximate: x f(ˆx, û)+ df dx (x ˆx)+ df (u û) ˆx,û duˆx,û r r(ˆx, û)+ = a 0 t + A t(x ˆx)+B t (u û) dr dx dr du T x ˆx u û x ˆx u û T d 2 r dx 2 d 2 r dxdu d 2 r dxdu d 2 r du 2 x ˆx u û

22 Bellman Recipe: Steps At the last step, we have the value function V T (x) =0! 2.For t=t-1, compute optimal policy such that πt (u x) =argmax π r(x, u)+v t+1 (f(x, u)) u = û gives. 3.Obtain next value function 4.As not converged, go back to Step V t (x) =max π r(x, u)+v t+1 (f(x, u)) V t (x) = r t+1 (x ˆx t ) T Q t+1 (x ˆx t )

23 Bellman Recipe: Step 2-4 again! 2.For t<t-1, compute optimal policy such that πt (u x) =argmax π r(x, u)+v t+1 (f(x, u)) determined by u = û t (R t + B T t P t B t ) 1 B T t P t+1 (a 0 t + A t (x ˆx t )) = θ 1 t (x ˆx t )+θ 0 t 3.Obtain the recursions θ 1 t = û t (R t + B T t P t+1 B t ) 1 B T t P t+1 A t (x ˆx t ) θ 0 t = û t (R t + B T t P t+1b t ) 1 B T t P t+1a 0 t P t = Q t θ T t R t θ 1 t +(A + B t θ 1 t )P t+1 (A + B t θ 1 t ) 23!

24 How do we get the Optimal Policy 1.Forward Propagation: Run Simulator to Obtain Linearizations 2.Backward Solution: Compute Optimal Control Law 3.If not converged, go to 1. 24

25 Model Learning with subsequent Policy Optimization 25 Atkeson & Schaal, 1996; Schaal, 1997

26 Model Learning with subsequent Policy Optimization 26 Atkeson & Schaal, 1996; Schaal, 1997

27 Outline of the Lecture 1.Introduction to Optimal Control 2.Solving Linear-Quadratic Optimal Control Problems 3.Optimal Control with Learned Models 4.Hot story: Marc Deisenroth s PILCO Approach 5.Final Remarks 27

28 Probabilistic Forward Models Marc Deisenroth Many forward models explain measured data. Choosing a bad model destroys will cause an optimization bias. Can we average ensure robustness towards bad approximations? Yes! Even in a Bayesian way with Gaussian Process Regression! 28 Deisenroth, Fox, Rasmussen, R:SS 2011

29 Basic Idea Marc Deisenroth With a GP, you can compute the distributions over all future states based on all forward models weighted by their likelihood. The propagation is still approximate and done by moment matching From the distribution: Expected Return: Gradient: We can do policy updates! 29 Deisenroth, Fox, Rasmussen, R:SS 2011

30 Applications Marc Deisenroth 30 Deisenroth, Fox, Rasmussen, R:SS 2011

31 Outline of the Lecture 1.Introduction to Optimal Control 2.Solving Linear-Quadratic Optimal Control Problems 3.Optimal Control with Learned Models 4.Hot story: Marc Deisenroth s PILCO Approach 5.Final Remarks 31

32 Conclusions You have learned about optimal control today! Only two cased are solvable: linear & discrete! Linear scales but does not generalize. Discrete generalizes but does not scale. Using Learned Models, you can compute at least optimal policy tubes. If you have many many tubes, in good regions, you have a policy. We will continue with Value Function and Policy Search Methods. 32

33 Further Reading C. G. Atkeson (1994), Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming, Proceedings, Neural Information Processing Systems, Denver, Colorado, December, 1993, In: Neural Information Processing Systems 6, J. D. Cowan, G. Tesauro, and J. Alspector, eds. Morgan Kaufmann, Schaal, S. (1997). Learning from demonstration. In: M.C. Mozer, M. Jordan, & T. Petsche (eds.), Advances in Neural Information Processing Systems 9, pp Cambridge, MA: MIT Press Marc P. Deisenroth, Carl E. Rasmussen, Dieter Fox (2011). Learning to Control a Low-Cost Robotic Manipulator Using Data-Efficient Reinforcement Learning, Robotics: Science & Systems (RSS 2011) 33

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

Efficient Reinforcement Learning for Motor Control

Efficient Reinforcement Learning for Motor Control Efficient Reinforcement Learning for Motor Control Marc Peter Deisenroth and Carl Edward Rasmussen Department of Engineering, University of Cambridge Trumpington Street, Cambridge CB2 1PZ, UK Abstract

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Tutorial on Policy Gradient Methods. Jan Peters

Tutorial on Policy Gradient Methods. Jan Peters Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup

More information

Expectation Propagation in Dynamical Systems

Expectation Propagation in Dynamical Systems Expectation Propagation in Dynamical Systems Marc Peter Deisenroth Joint Work with Shakir Mohamed (UBC) August 10, 2012 Marc Deisenroth (TU Darmstadt) EP in Dynamical Systems 1 Motivation Figure : Complex

More information

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &

More information

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach B. Bischoff 1, D. Nguyen-Tuong 1,H.Markert 1 anda.knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701

More information

Data-Driven Differential Dynamic Programming Using Gaussian Processes

Data-Driven Differential Dynamic Programming Using Gaussian Processes 5 American Control Conference Palmer House Hilton July -3, 5. Chicago, IL, USA Data-Driven Differential Dynamic Programming Using Gaussian Processes Yunpeng Pan and Evangelos A. Theodorou Abstract We present

More information

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

EM-based Reinforcement Learning

EM-based Reinforcement Learning EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data

More information

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces

Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Reinforcement Learning with Reference Tracking Control in Continuous State Spaces Joseph Hall, Carl Edward Rasmussen and Jan Maciejowski Abstract The contribution described in this paper is an algorithm

More information

Model-based Imitation Learning by Probabilistic Trajectory Matching

Model-based Imitation Learning by Probabilistic Trajectory Matching Model-based Imitation Learning by Probabilistic Trajectory Matching Peter Englert1, Alexandros Paraschos1, Jan Peters1,2, Marc Peter Deisenroth1 Abstract One of the most elegant ways of teaching new skills

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information

Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information 5st IEEE Conference on Decision and Control December 0-3, 202 Maui, Hawaii, USA Modelling and Control of Nonlinear Systems using Gaussian Processes with Partial Model Information Joseph Hall, Carl Rasmussen

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search Marc Peter Deisenroth marc@cs.washington.edu Department of Computer Science & Engineering, University of Washington, USA Carl Edward Rasmussen cer54@cam.ac.uk Department of Engineering, University of Cambridge,

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart Robotics Mobile Robotics State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM Marc Toussaint U Stuttgart DARPA Grand

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Minimax Differential Dynamic Programming: An Application to Robust Biped Walking Jun Morimoto Human Information Science Labs, Department 3, ATR International Keihanna Science City, Kyoto, JAPAN, 619-0288

More information

Lecture Notes: (Stochastic) Optimal Control

Lecture Notes: (Stochastic) Optimal Control Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

More information

Reinforcement Learning In Continuous Time and Space

Reinforcement Learning In Continuous Time and Space Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous

More information

Probabilistic Differential Dynamic Programming

Probabilistic Differential Dynamic Programming Probabilistic Differential Dynamic Programming Yunpeng Pan and Evangelos A. Theodorou Daniel Guggenheim School of Aerospace Engineering Institute for Robotics and Intelligent Machines Georgia Institute

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

Inverse Optimal Control

Inverse Optimal Control Inverse Optimal Control Oleg Arenz Technische Universität Darmstadt o.arenz@gmx.de Abstract In Reinforcement Learning, an agent learns a policy that maximizes a given reward function. However, providing

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

CS599 Lecture 2 Function Approximation in RL

CS599 Lecture 2 Function Approximation in RL CS599 Lecture 2 Function Approximation in RL Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA)

More information

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Generalization and Function Approximation

Generalization and Function Approximation Generalization and Function Approximation 0 Generalization and Function Approximation Suggested reading: Chapter 8 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.

More information

Particle Filtering for Data-Driven Simulation and Optimization

Particle Filtering for Data-Driven Simulation and Optimization Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October

More information

Optimal Control, Trajectory Optimization, Learning Dynamics

Optimal Control, Trajectory Optimization, Learning Dynamics Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Optimal Control, Trajectory Optimization, Learning Dynamics Katerina Fragkiadaki So far.. Most Reinforcement Learning

More information

Reinforcement Learning in Continuous Time and Space

Reinforcement Learning in Continuous Time and Space LETTER Communicated by Peter Dayan Reinforcement Learning in Continuous Time and Space Kenji Doya ATR Human Information Processing Research Laboratories, Soraku, Kyoto 619-288, Japan This article presents

More information

Maximum Margin Planning

Maximum Margin Planning Maximum Margin Planning Nathan Ratliff, Drew Bagnell and Martin Zinkevich Presenters: Ashesh Jain, Michael Hu CS6784 Class Presentation Theme 1. Supervised learning 2. Unsupervised learning 3. Reinforcement

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic Optimization Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,

More information

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,

More information

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Hamilton-Jacobi-Bellman Equation Feb 25, 2008 Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#20 (Monday November 12) Gaussian Processes Statistical Techniques in Robotics (6-83, F) Lecture# (Monday November ) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan Applications of Gaussian Processes (a) Inverse Kinematics

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Chapter 8: Generalization and Function Approximation

Chapter 8: Generalization and Function Approximation Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview

More information

and 3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithm

and 3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithm Reinforcement Learning In Continuous Time and Space Kenji Doya Λ ATR Human Information Processing Research Laboratories 2-2 Hikaridai, Seika, Soraku, Kyoto 619-288, Japan Neural Computation, 12(1), 219-245

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

Adaptive Reinforcement Learning

Adaptive Reinforcement Learning Adaptive Reinforcement Learning Increasing the applicability for large and time varying systems using parallel Gaussian Process regression and adaptive nonlinear control Delft Center for Systems and Control

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics EKF, UKF Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and sensory

More information

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

EKF, UKF. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics EKF, UKF Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Kalman Filter Kalman Filter = special case of a Bayes filter with dynamics model and sensory

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

1 Kalman Filter Introduction

1 Kalman Filter Introduction 1 Kalman Filter Introduction You should first read Chapter 1 of Stochastic models, estimation, and control: Volume 1 by Peter S. Maybec (available here). 1.1 Explanation of Equations (1-3) and (1-4) Equation

More information

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications

Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Lecture 6: CS395T Numerical Optimization for Graphics and AI Line Search Applications Qixing Huang The University of Texas at Austin huangqx@cs.utexas.edu 1 Disclaimer This note is adapted from Section

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

Gaussian Processes. 1 What problems can be solved by Gaussian Processes?

Gaussian Processes. 1 What problems can be solved by Gaussian Processes? Statistical Techniques in Robotics (16-831, F1) Lecture#19 (Wednesday November 16) Gaussian Processes Lecturer: Drew Bagnell Scribe:Yamuna Krishnamurthy 1 1 What problems can be solved by Gaussian Processes?

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Mathematics 136 Calculus 2 Everything You Need Or Want To Know About Partial Fractions (and maybe more!) October 19 and 21, 2016

Mathematics 136 Calculus 2 Everything You Need Or Want To Know About Partial Fractions (and maybe more!) October 19 and 21, 2016 Mathematics 36 Calculus 2 Everything You Need Or Want To Know About Partial Fractions (and maybe more!) October 9 and 2, 206 Every rational function (quotient of polynomials) can be written as a polynomial

More information

Lecture 5 Linear Quadratic Stochastic Control

Lecture 5 Linear Quadratic Stochastic Control EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Inverse Reinforcement Learning Inverse RL, behaviour cloning, apprenticeship learning, imitation learning. Vien Ngo Marc Toussaint University of Stuttgart Outline Introduction to

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Extending Model-based Policy Gradients for Robots in Heteroscedastic Environments

Extending Model-based Policy Gradients for Robots in Heteroscedastic Environments Extending Model-based Policy Gradients for Robots in Heteroscedastic Environments John Martin and Brendan Englot Department of Mechanical Engineering Stevens Institute of Technology {jmarti3, benglot}@stevens.edu

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Approximate Q-Learning. Dan Weld / University of Washington

Approximate Q-Learning. Dan Weld / University of Washington Approximate Q-Learning Dan Weld / University of Washington [Many slides taken from Dan Klein and Pieter Abbeel / CS188 Intro to AI at UC Berkeley materials available at http://ai.berkeley.edu.] Q Learning

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Bellman s Curse of Dimensionality

Bellman s Curse of Dimensionality Bellman s Curse of Dimensionality n- dimensional state space Number of states grows exponen

More information

From Bayes to Extended Kalman Filter

From Bayes to Extended Kalman Filter From Bayes to Extended Kalman Filter Michal Reinštein Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/

More information

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5 Table of contents 1 Introduction 2 2 Markov Decision Processes 2 3 Future Cumulative Reward 3 4 Q-Learning 4 4.1 The Q-value.............................................. 4 4.2 The Temporal Difference.......................................

More information

Lecture 23: Reinforcement Learning

Lecture 23: Reinforcement Learning Lecture 23: Reinforcement Learning MDPs revisited Model-based learning Monte Carlo value function estimation Temporal-difference (TD) learning Exploration November 23, 2006 1 COMP-424 Lecture 23 Recall:

More information

Reinforcement Learning Wrap-up

Reinforcement Learning Wrap-up Reinforcement Learning Wrap-up Slides courtesy of Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Machine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel

Machine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING Slides adapted from Tom Mitchell and Peter Abeel Machine Learning: Jordan Boyd-Graber UMD Machine Learning

More information

Policy Search For Learning Robot Control Using Sparse Data

Policy Search For Learning Robot Control Using Sparse Data Policy Search For Learning Robot Control Using Sparse Data B. Bischoff 1, D. Nguyen-Tuong 1, H. van Hoof 2, A. McHutchon 3, C. E. Rasmussen 3, A. Knoll 4, J. Peters 2,5, M. P. Deisenroth 6,2 Abstract In

More information

Gaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Gaussians. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Gaussians Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Univariate Gaussian Multivariate Gaussian Law of Total Probability Conditioning

More information

arxiv: v4 [cs.lg] 2 Jul 2018

arxiv: v4 [cs.lg] 2 Jul 2018 Model-Free Trajectory-based Policy Optimization with Monotonic Improvement arxiv:1606.09197v4 [cs.lg] 2 Jul 2018 Riad Akrour 1 riad@robot-learning.de Abbas Abdolmaleki 2 abbas.a@ua.pt Hany Abdulsamad 1

More information

Lecture 17: Numerical Optimization October 2014

Lecture 17: Numerical Optimization October 2014 Lecture 17: Numerical Optimization 36-350 22 October 2014 Agenda Basics of optimization Gradient descent Newton s method Curve-fitting R: optim, nls Reading: Recipes 13.1 and 13.2 in The R Cookbook Optional

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information