Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832

Size: px
Start display at page:

Download "Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832"

Transcription

1 Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT Russ Tedrake Massachusetts Institute of Technology c Russ Tedrake, 2009

2 2 c Russ Tedrake, 2009

3 Contents Preface vii 1 Fully Actuated vs. Underactuated Systems Motivation Honda s ASIMO vs. Passive Dynamic Walkers Birds vs. modern aircraft The common theme Definitions Feedback Linearization Input and State Constraints Underactuated robotics Goals for the course I Nonlinear Dynamics and Control 11 2 The Simple Pendulum Introduction Nonlinear Dynamics w/ a Constant Torque The Overdamped Pendulum The Undamped Pendulum w/ Zero Torque The Undamped Pendulum w/ a Constant Torque The Dampled Pendulum The Torque-limited Simple Pendulum The Acrobot and Cart-Pole Introduction The Acrobot Equations of Motion Cart-Pole Equations of Motion Balancing Linearizing the Manipulator Equations Controllability of Linear Systems LQR Feedback Partial Feedback Linearization PFL for the Cart-Pole System General Form Swing-Up Control Energy Shaping Simple Pendulum Cart-Pole Acrobot c Russ Tedrake, 2009 i

4 ii Discussion Other Model Systems Manipulation Introduction Dynamics of Manipulation Form Closure Force Closure Active research topics Ground reaction forces in Walking ZMP Underactuation in Walking Walking Limit Cycles Poincaré Maps The Ballistic Walker The Rimless Wheel Stance Dynamics Foot Collision Return Map Fixed Points and Stability The Compass Gait The Kneed Walker Numerical Analysis Finding Limit Cycles Local Stability of Limit Cycle Running Introduction Comparative Biomechanics Raibert hoppers Spring-loaded inverted pendulum (SLIP) Flight phase Stance phase Transitions Approximate solution Koditschek s Simplified Hopper Lateral Leg Spring (LLS) Flight Flate Plate Theory Simplest Glider Model Perching Swimming and Flapping Flight Swimming The Aerodynamics of Flapping Flight

5 iii 8 Model Systems with Stochasticity Stochastic Dynamics The Master Equation Continuous Time, Continuous Space Discrete Time, Discrete Space Stochastic Stability Walking on Rough Terrain State Estimation System Identification II Optimal Control and Motion Planning 65 9 Dynamic Programming Introduction to Optimal Control Finite Horizon Problems Additive Cost Dynamic Programming in Discrete Time Discrete-State, Discrete-Action Continuous-State, Discrete-Action Continuous-State, Continous-Actions Infinite Horizon Problems Value Iteration Value Iteration w/ Function Approximation Special case: Barycentric interpolation Detailed Example: the double integrator Pole placement The optimal control approach The minimum-time problem The quadratic regulator Detailed Example: The Simple Pendulum Analytical Optimal Control with the Hamilton-Jacobi-Bellman Sufficiency Theorem Introduction Dynamic Programming in Continuous Time Infinite-Horizon Problems The Hamilton-Jacobi-Bellman Examples Analytical Optimal Control with Pontryagin s Minimum Principle Introduction Necessary conditions for optimality Pontryagin s minimum principle Derivation sketch using calculus of variations Examples

6 iv 12 Trajectory Optimization The Policy Space Nonlinear optimization Gradient Descent Sequential Quadratic Programming Shooting Methods Computing the gradient with Backpropagation through time (BPTT) Computing the gradient w/ Real-Time Recurrent Learning (RTRL) BPTT vs. RTRL Direct Collocation LQR trajectory stabilization Linearizing along trajectories Linear Time-Varying (LTV) LQR Iterative LQR Real-time planning (aka receding horizon control) Feasible Motion Planning Artificial Intelligence via Search Motion Planning as Search Configuration Space Sampling-based Planners Rapidly-Exploring Randomized Trees (RRTs) Proximity Metrics Reachability-Guided RRTs Performance Probabilistic Roadmaps Discrete Search Algorithms Global policies from local policies Real-time Planning Multi-query Planning Probabilistic Roadmaps Feedback Motion Planning Stochastic Optimal Control Essentials Implications of Stochasticity Markov Decision Processes Dynamic Programming Methods Policy Gradient Methods Model-free Value Methods Introduction Policy Evaluation for known Markov Chains Monte Carlo Evaluation Bootstrapping

7 v A Continuum of Updates The TD(λ) Algorithm TD(λ) with function approximators LSTD Off-policy evaluation Q functions TD for Q with function approximation Importance Sampling LSTDQ Policy Improvement Sarsa(λ) Q(λ) LSPI Case Studies: Checkers and Backgammon Model-free Policy Search Introduction Stochastic Gradient Descent The Weight Pertubation Algorithm Performance of Weight Perturbation Weight Perturbation with an Estimated Baseline The REINFORCE Algorithm Optimizing a stochastic function Adding noise to the outputs Episodic REINFORCE Infinite-horizon REINFORCE LTI REINFORCE Better baselines with Importance sampling Actor-Critic Methods Introduction Pitfalls of RL Value methods Policy Gradient methods Actor-Critic Methods Case Study: Toddler III Applications and Extensions Learning Case Studies and Course Wrap-up Learning Robots Ng s Helicopters Schaal and Atkeson AIBO UNH Biped Morimoto

8 vi Heaving Foil Optima for Animals Bone geometry Bounding flight Preferred walking and running speeds/transitions RL in the Brain Bird-song Dopamine TD Course Wrap-up IV Appendix 117 A Robotics Preliminaries 118 A.1 Deriving the equations of motion (an example) A.2 The Manipulator Equations B Machine Learning Preliminaries 121 B.1 Function Approximation

9 Preface This book is about building robots that move with speed, efficiency, and grace. The author believes that this can only be achieve through a tight coupling between mechanical design, passive dynamics, and nonlinear control synthesis. Therefore, these notes contain selected material from dynamical systems theory, as well as linear and nonlinear control. These notes also reflect a deep belief in computational algorithms playing an essential role in finding and optimizing solutions to complex dynamics and control problems. Algorithms play an increasingly central role in modern control theory; nowadays even rigorous mathematicians use algorithms to develop mathematical proofs. Therefore, the notes also cover selected material from optimization theory, motion planning, and machine learning. Although the material in the book comes from many sources, the presentation is targeted very specifically at a handful of robotics problems. Concepts are introduced only when and if they can help progress our capabilities in robotics. I hope that the result is a broad but reasonably self-contained and readable manuscript that will be of use to any robotics practicioner. Organization The material in these notes is organized into two main parts: nonlinear dynamics and control, which introduces a series of increasingly complex dynamical systems and the associated control ideas, and optimal control and motion planning, which introduces a series of general derivations and algorithms that can be applied to many, if not all of the problems introduced in the first part of the book. This second part of the book is organized by techniques; perhaps the most logical order when using the book as a reference. In teaching the course, however, I take a spiral trajectory through the material, introducing robot dynamics and control problems one at a time, and introducing only the techniques that are required to solve that particular problem. Finally, a third part of the book puts it all together through a few more complicated case studies and examples. Exercises The exercises in these notes come in a few varieties. The standard exercises are intended to be straight-forward extensions of the materials presented in the chapter. Some exercises are labeled as MATLAB exercises - these are computational investigations, which sometimes involve existing code that will help get you started. Finally, some exercises are labeled as CHALLENGE problems. These are problems that I have not yet seen or found the answers too, yet, but which I would very much like to solve. I cannot guarantee that they are unsolved in the literature, but the intention is to identify some problems which would advance the state-of-the-art. Russ Tedrake, 2009 c Russ Tedrake, 2009 vii

10 viii c Russ Tedrake, 2009

11 Notation Dynamics and System Identification: q Generalized coordinates (e.g., joint space) x State space (x = [q T, q T ] T ) u Controllable inputs ᾱ Time trajectory of α s i S s i is a particular state in the set of all states S (for a discrete-state system) a i A a i is a particular action from the set of all actions A (for a discrete-action system) w Uncontrollable inputs (disturbances) y Outputs v Measurement errors z Observations H Mass/Inertial Matrix C Coriolis Matrix G Gravity and potential terms f First-order plant dynamics T Kinetic Energy U Potential Energy L Lagrangian (L = T U) Learning and Optimal Control: π Control policy π Optimal control policy α, β, γ,... Parameters g Instantaneous cost h Terminal cost J Long-term cost / cost-to-go function (value function) J Optimal cost-to-go function e eligibility vector η learning rate Basic Math: E[z], µ z Expected value of z σz 2 Variance of z σ xy, C xy Scalar covariance (and covariance matrix) between x and y f 1 f 1 f 1 x 1 x 2 x n f 2 x 1 f x Vector gradient ( = ).... f m f m x 1 x n { x 0 x < 0 δ(z) Continuous delta-function, defined via δ(z)dz = 1 x 0 δ[z] Discrete delta-function, equals 1 when z = 0, zero otherwise 1

12 2 δ ij Shorthand for δ[i j]

13 MIT OpenCourseWare Underactuated Robotics Spring 2009 For information about citing these materials or our Terms of Use, visit:

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2017 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2018 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

Lecture 7: Value Function Approximation

Lecture 7: Value Function Approximation Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,

More information

The Acrobot and Cart-Pole

The Acrobot and Cart-Pole C H A P T E R 3 The Acrobot and Cart-Pole 3.1 INTRODUCTION A great deal of work in the control of underactuated systems has been done in the context of low-dimensional model systems. These model systems

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Model-free Policy Search

Model-free Policy Search C H A P E R 17 Model-free Policy Search 17.1 INRODUCION In chapter 1, we talked about policy search as a nonlinear optimization problem, and discussed efficient ways to calculate the gradient of the long-term

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning RL in continuous MDPs March April, 2015 Large/Continuous MDPs Large/Continuous state space Tabular representation cannot be used Large/Continuous action space Maximization over action

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Passive Dynamic Walking with Knees: A Point Foot Model. Vanessa F. Hsu Chen

Passive Dynamic Walking with Knees: A Point Foot Model. Vanessa F. Hsu Chen Passive Dynamic Walking with Knees: A Point Foot Model by Vanessa F. Hsu Chen B.S., Electrical Science and Engineering (2005) Massachusetts Institute of Technology B.A., Physics (2005) Wellesley College

More information

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

More information

Reinforcement Learning of Potential Fields to achieve Limit-Cycle Walking

Reinforcement Learning of Potential Fields to achieve Limit-Cycle Walking IFAC International Workshop on Periodic Control Systems (PSYCO 216) Reinforcement Learning of Potential Fields to achieve Limit-Cycle Walking Denise S. Feirstein Ivan Koryakovskiy Jens Kober Heike Vallery

More information

Open Theoretical Questions in Reinforcement Learning

Open Theoretical Questions in Reinforcement Learning Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B. Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

An online kernel-based clustering approach for value function approximation

An online kernel-based clustering approach for value function approximation An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr

More information

Fully Actuated vs. Underactuated Systems

Fully Actuated vs. Underactuated Systems C H A P T E R 1 Fully Actuated vs. Underactuated Systems Robots today move far too conservatively, and accomplish only a fraction of the tasks and achieve a fraction of the performance that they are mechanically

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Reinforcement Learning with Function Approximation. Joseph Christian G. Noel

Reinforcement Learning with Function Approximation. Joseph Christian G. Noel Reinforcement Learning with Function Approximation Joseph Christian G. Noel November 2011 Abstract Reinforcement learning (RL) is a key problem in the field of Artificial Intelligence. The main goal is

More information

CHANGING SPEED IN A SIMPLE WALKER WITH DYNAMIC ONE-STEP TRANSITIONS

CHANGING SPEED IN A SIMPLE WALKER WITH DYNAMIC ONE-STEP TRANSITIONS Preface In this thesis I describe the main part of the research I did at the Delft Biorobotics Lab to finish my master in BioMechanical Engineering at the 3ME faculty of the Delft University of Technology.

More information

Robust Bipedal Locomotion on Unknown Terrain. Hongkai Dai

Robust Bipedal Locomotion on Unknown Terrain. Hongkai Dai Robust Bipedal Locomotion on Unknown Terrain by Hongkai Dai Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Feedback Control of Dynamic Bipedal Robot Locomotion

Feedback Control of Dynamic Bipedal Robot Locomotion Feedback Control of Dynamic Bipedal Robot Locomotion Eric R. Westervelt Jessy W. Grizzle Christine Chevaiiereau Jun Ho Choi Benjamin Morris CRC Press Taylor & Francis Croup Boca Raton London New York CRC

More information

Application of Neural Networks for Control of Inverted Pendulum

Application of Neural Networks for Control of Inverted Pendulum Application of Neural Networks for Control of Inverted Pendulum VALERI MLADENOV Department of Theoretical Electrical Engineering Technical University of Sofia Sofia, Kliment Ohridski blvd. 8; BULARIA valerim@tu-sofia.bg

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

Reading. Realistic Character Animation. Modeling Realistic Motion. Two Approaches

Reading. Realistic Character Animation. Modeling Realistic Motion. Two Approaches Realistic Character Animation Reading Jessica Hodgins,,et.al,Animating Human Athletics,, SIGGRAPH 95 Zoran Popović, Changing Physics for Character Animation,, SIGGRAPH 00 2 Modeling Realistic Motion Model

More information

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Temporal difference learning

Temporal difference learning Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).

More information

Reinforcement Learning: An Introduction

Reinforcement Learning: An Introduction Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is

More information

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Lecture 23: Reinforcement Learning

Lecture 23: Reinforcement Learning Lecture 23: Reinforcement Learning MDPs revisited Model-based learning Monte Carlo value function estimation Temporal-difference (TD) learning Exploration November 23, 2006 1 COMP-424 Lecture 23 Recall:

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

CoM Control for Underactuated 2D Hopping Robots with Series-Elastic Actuation via Higher Order Partial Feedback Linearization

CoM Control for Underactuated 2D Hopping Robots with Series-Elastic Actuation via Higher Order Partial Feedback Linearization CoM Control for Underactuated D Hopping Robots with Series-Elastic Actuation via Higher Order Partial Feedback Linearization Pat Terry and Katie Byl Abstract In this work we introduce a method for enforcing

More information

Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations

Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations Martino Bardi Italo Capuzzo-Dolcetta Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations Birkhauser Boston Basel Berlin Contents Preface Basic notations xi xv Chapter I. Outline

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Least squares temporal difference learning

Least squares temporal difference learning Least squares temporal difference learning TD(λ) Good properties of TD Easy to implement, traces achieve the forward view Linear complexity = fast Combines easily with linear function approximation Outperforms

More information

Coordinating Feet in Bipedal Balance

Coordinating Feet in Bipedal Balance Coordinating Feet in Bipedal Balance S.O. Anderson, C.G. Atkeson, J.K. Hodgins Robotics Institute Carnegie Mellon University soa,cga,jkh@ri.cmu.edu Abstract Biomechanical models of human standing balance

More information

Reinforcement Learning. George Konidaris

Reinforcement Learning. George Konidaris Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Value Function Approximation in Reinforcement Learning using the Fourier Basis

Value Function Approximation in Reinforcement Learning using the Fourier Basis Value Function Approximation in Reinforcement Learning using the Fourier Basis George Konidaris Sarah Osentoski Technical Report UM-CS-28-19 Autonomous Learning Laboratory Computer Science Department University

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted 15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient

More information

NUMERICAL ACCURACY OF TWO BENCHMARK MODELS OF WALKING: THE RIMLESS SPOKED WHEEL AND THE SIMPLEST WALKER

NUMERICAL ACCURACY OF TWO BENCHMARK MODELS OF WALKING: THE RIMLESS SPOKED WHEEL AND THE SIMPLEST WALKER Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms 21 (2014) 137-148 Copyright c 2014 Watam Press NUMERICAL ACCURACY OF TWO BENCHMARK MODELS OF WALKING: THE RIMLESS

More information

A Higher Order Partial Feedback Linearization Based Method for Controlling an Underactuated Hopping Robot with a Compliant Leg

A Higher Order Partial Feedback Linearization Based Method for Controlling an Underactuated Hopping Robot with a Compliant Leg A Higher Order Partial Feedback Linearization Based Method for Controlling an Underactuated Hopping Robot with a Compliant Leg Pat Terry and Katie Byl Abstract This paper considers control techniques for

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo MLR, University of Stuttgart

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function

More information

Identifying and Solving an Unknown Acrobot System

Identifying and Solving an Unknown Acrobot System Identifying and Solving an Unknown Acrobot System Clement Gehring and Xinkun Nie Abstract Identifying dynamical parameters is a challenging and important part of solving dynamical systems. For this project,

More information

Reinforcement Learning In Continuous Time and Space

Reinforcement Learning In Continuous Time and Space Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Energy-Based Feed-Forward and Extended Feedback Linearization Control Strategies for Realistic Series Elastic Actuated Hopping Robots

Energy-Based Feed-Forward and Extended Feedback Linearization Control Strategies for Realistic Series Elastic Actuated Hopping Robots Energy-Based Feed-Forward and Extended Feedback Linearization Control Strategies for Realistic Series Elastic Actuated Hopping Robots Pat Terry and Katie Byl Abstract In this paper, we describe modeling

More information

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B. Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE

More information

Toward Efficient and Robust Biped Walking Optimization

Toward Efficient and Robust Biped Walking Optimization Toward Efficient and Robust Biped Walking Optimization Nihar Talele and Katie Byl Abstract Practical bipedal robot locomotion needs to be both energy efficient and robust to variability and uncertainty.

More information

6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE Review of Q-factors and Bellman equations for Q-factors VI and PI for Q-factors Q-learning - Combination of VI and sampling Q-learning and cost function

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function Approximation Continuous state/action space, mean-square error, gradient temporal difference learning, least-square temporal difference, least squares policy iteration Vien

More information

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN Reinforcement Learning for Continuous Action using Stochastic Gradient Ascent Hajime KIMURA, Shigenobu KOBAYASHI Tokyo Institute of Technology, 4259 Nagatsuda, Midori-ku Yokohama 226-852 JAPAN Abstract:

More information

The Effect of Semicircular Feet on Energy Dissipation by Heel-strike in Dynamic Biped Locomotion

The Effect of Semicircular Feet on Energy Dissipation by Heel-strike in Dynamic Biped Locomotion 7 IEEE International Conference on Robotics and Automation Roma, Italy, 1-14 April 7 FrC3.3 The Effect of Semicircular Feet on Energy Dissipation by Heel-strike in Dynamic Biped Locomotion Fumihiko Asano

More information

Contents. PART I METHODS AND CONCEPTS 2. Transfer Function Approach Frequency Domain Representations... 42

Contents. PART I METHODS AND CONCEPTS 2. Transfer Function Approach Frequency Domain Representations... 42 Contents Preface.............................................. xiii 1. Introduction......................................... 1 1.1 Continuous and Discrete Control Systems................. 4 1.2 Open-Loop

More information

Bipedal Locomotion on Small Feet. Bipedal Locomotion on Small Feet. Pop Quiz for Tony 6/26/2015. Jessy Grizzle. Jessy Grizzle.

Bipedal Locomotion on Small Feet. Bipedal Locomotion on Small Feet. Pop Quiz for Tony 6/26/2015. Jessy Grizzle. Jessy Grizzle. Bipedal Locomotion on Small Feet Jessy Grizzle Elmer G. Gilbert Distinguished University Professor Levin Professor of Engineering ECE and ME Departments Pop Quiz for Tony Can you give the first name of

More information

Lecture «Robot Dynamics»: Dynamics 2

Lecture «Robot Dynamics»: Dynamics 2 Lecture «Robot Dynamics»: Dynamics 2 151-0851-00 V lecture: CAB G11 Tuesday 10:15 12:00, every week exercise: HG E1.2 Wednesday 8:15 10:00, according to schedule (about every 2nd week) office hour: LEE

More information

Neural Network Control of an Inverted Pendulum on a Cart

Neural Network Control of an Inverted Pendulum on a Cart Neural Network Control of an Inverted Pendulum on a Cart VALERI MLADENOV, GEORGI TSENOV, LAMBROS EKONOMOU, NICHOLAS HARKIOLAKIS, PANAGIOTIS KARAMPELAS Department of Theoretical Electrical Engineering Technical

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning.

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning. Monte Carlo is important in practice CSE 190: Reinforcement Learning: An Introduction Chapter 6: emporal Difference Learning When there are just a few possibilitieo value, out of a large state space, Monte

More information

Variance Reduction for Policy Gradient Methods. March 13, 2017

Variance Reduction for Policy Gradient Methods. March 13, 2017 Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits

More information

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight

More information

Olivier Sigaud. September 21, 2012

Olivier Sigaud. September 21, 2012 Supervised and Reinforcement Learning Tools for Motor Learning Models Olivier Sigaud Université Pierre et Marie Curie - Paris 6 September 21, 2012 1 / 64 Introduction Who is speaking? 2 / 64 Introduction

More information

Nonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs

Nonlinear Optimization for Optimal Control Part 2. Pieter Abbeel UC Berkeley EECS. From linear to nonlinear Model-predictive control (MPC) POMDPs Nonlinear Optimization for Optimal Control Part 2 Pieter Abbeel UC Berkeley EECS Outline From linear to nonlinear Model-predictive control (MPC) POMDPs Page 1! From Linear to Nonlinear We know how to solve

More information

LQR-Trees: Feedback Motion Planning on Sparse Randomized Trees

LQR-Trees: Feedback Motion Planning on Sparse Randomized Trees LQR-Trees: Feedback Motion Planning on Sparse Randomized Trees Russ Tedrake Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 Email: russt@mit.edu

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Stable Limit Cycle Generation for Underactuated Mechanical Systems, Application: Inertia Wheel Inverted Pendulum

Stable Limit Cycle Generation for Underactuated Mechanical Systems, Application: Inertia Wheel Inverted Pendulum Stable Limit Cycle Generation for Underactuated Mechanical Systems, Application: Inertia Wheel Inverted Pendulum Sébastien Andary Ahmed Chemori Sébastien Krut LIRMM, Univ. Montpellier - CNRS, 6, rue Ada

More information

Dynamics of Heel Strike in Bipedal Systems with Circular Feet

Dynamics of Heel Strike in Bipedal Systems with Circular Feet Dynamics of Heel Strike in Bipedal Systems with Circular Feet Josep Maria Font and József Kövecses Abstract Energetic efficiency is a fundamental subject of research in bipedal robot locomotion. In such

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

CS 287: Advanced Robotics Fall Lecture 14: Reinforcement Learning with Function Approximation and TD Gammon case study

CS 287: Advanced Robotics Fall Lecture 14: Reinforcement Learning with Function Approximation and TD Gammon case study CS 287: Advanced Robotics Fall 2009 Lecture 14: Reinforcement Learning with Function Approximation and TD Gammon case study Pieter Abbeel UC Berkeley EECS Assignment #1 Roll-out: nice example paper: X.

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Mathematical Theory of Control Systems Design

Mathematical Theory of Control Systems Design Mathematical Theory of Control Systems Design by V. N. Afarias'ev, V. B. Kolmanovskii and V. R. Nosov Moscow University of Electronics and Mathematics, Moscow, Russia KLUWER ACADEMIC PUBLISHERS DORDRECHT

More information

Metastable Walking on Stochastically Rough Terrain

Metastable Walking on Stochastically Rough Terrain Robotics: Science and Systems 8 Zurich, CH, June 5-8, 8 Metastable Walking on Stochastically Rough Terrain Katie Byl and Russ Tedrake Abstract Simplified models of limit-cycle walking on flat terrain have

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Reinforcement Learning: the basics

Reinforcement Learning: the basics Reinforcement Learning: the basics Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud August 6, 2012 1 / 46 Introduction Action selection/planning Learning by trial-and-error

More information

Development of a Deep Recurrent Neural Network Controller for Flight Applications

Development of a Deep Recurrent Neural Network Controller for Flight Applications Development of a Deep Recurrent Neural Network Controller for Flight Applications American Control Conference (ACC) May 26, 2017 Scott A. Nivison Pramod P. Khargonekar Department of Electrical and Computer

More information

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017 Actor-critic methods Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department February 21, 2017 1 / 21 In this lecture... The actor-critic architecture Least-Squares Policy Iteration

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Metastable Walking Machines

Metastable Walking Machines Metastable Walking Machines Katie Byl, Member, IEEE and Russ Tedrake, Member, IEEE Abstract Legged robots that operate in the real world are inherently subject to stochasticity in their dynamics and uncertainty

More information

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition R. G. Gallager January 31, 2011 i ii Preface These notes are a draft of a major rewrite of a text [9] of the same name. The notes and the text are outgrowths

More information

On the Convergence of Optimistic Policy Iteration

On the Convergence of Optimistic Policy Iteration Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology

More information

Reinforcement Learning in Continuous Time and Space

Reinforcement Learning in Continuous Time and Space LETTER Communicated by Peter Dayan Reinforcement Learning in Continuous Time and Space Kenji Doya ATR Human Information Processing Research Laboratories, Soraku, Kyoto 619-288, Japan This article presents

More information

Humanoid Push Recovery

Humanoid Push Recovery Humanoid Push Recovery Benjamin Stephens The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213, USA bstephens@cmu.edu http://www.cs.cmu.edu/ bstephe1 Abstract We extend simple models previously

More information

Overview of the Seminar Topic

Overview of the Seminar Topic Overview of the Seminar Topic Simo Särkkä Laboratory of Computational Engineering Helsinki University of Technology September 17, 2007 Contents 1 What is Control Theory? 2 History

More information