Learning Manipulation Patterns

Size: px
Start display at page:

Download "Learning Manipulation Patterns"

Transcription

1 Introduction Dynamic Movement Primitives Reinforcement Learning Neuroinformatics Group February 29, 2012

2 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons

3 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons What are suitable motion representations? How can we improve by explorative learning?

4 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons What are suitable motion representations? Dynamic Movement Primitives (DMP) How can we improve by explorative learning? Policy Improvement with Path Integrals (PI 2 )

5 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 motion as evolution of dynamical system basis: spring-damper-system τ v t = K(g x t ) D v t τẋ t = v or τẍ t = α(β(g x t ) ẋ t ) x - current position x - current velocity g - goal of motion choose K and D to have critical damping

6 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 spring-damper system generates basic motion towards goal τẍ t = α(β(g x t ) ẋ t ) v

7 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 spring-damper system generates basic motion towards goal τẍ t = α(β(g x t ) ẋ t )

8 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + g T t θ

9 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + g T t θ

10 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + gt T θ

11 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i ψ i (s) = exp( h i (s c i ) 2 ) 1 Gaussians c i logarithmically distributed in [0... 1] s 2 from Schaal et al, Progress Brain Research, 2007

12 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007

13 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max amplitude scaled by initial distance to goal ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007

14 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max amplitude scaled by initial distance to goal influence weighted by canonical time s 0 ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007

15 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s s initially set to exponentially converges to 0

16 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s α c (x actual x expected ) 2 s initially set to exponentially converges to 0 pause influence of force on perturbations

17 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Properties convergence to goal g τẍ t = α(β(g x t ) ẋ t ) + g T t θ τṡ = α s f (s) = gt T θ = i θ iψ i (s) i ψ i(s) (g x 0) s motions are self-similar for different goal or start point coupling of multiple DOF through canonical phase s adapt τ for temporal scaling robust to perturbations due to attractor dynamics decoupling basic goal-directed motion from task-specific trajectory shape weights θ i can be learned with linear regression

18 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Learning from Demonstration record motion x(t), ẋ(t), ẍ(t) choose τ to match duration integrate canonical system s(t) f target (s) = τẍ(t) α(β(g x(t)) ẋ(t)) minimize E = s (f target(s) f (s)) 2 with regression from Ijspeert, Nakanishi, Schaal, ICRA 02

19 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Periodic Motion replace canonical system by limit cycle oscillator φ - phase of oscillation A - amplitude of oscillation τ φ = 1 mod 2π f (φ, A) = A i θ iψ i (φ) i ψ i(φ) ψ i (φ) = exp (h i (cos(φ c i ) 1)) ψ i - van Mises basis functions, i.e. Gaussians living on a circle

20 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Periodic Motion - Examples 3 2 A B C D Time [s] Fig. 10. Modification of thelearned rhythmicdrummingpattern (flexion/extension of theright elbow, R_EB). (A) Trajectorylearned by therhythmicdmp; (B) temporary modification with A 2A in Eq. (16); (C): temporary modification with t t/2in Eqs. (9) and (15); (D): temporarymodificationwith g g+1ineq. (9) (dottedline). Modifiedparameterswereappliedbetweent ¼3sandt ¼7s. Notethatinall modifications, themovementpatternsdonotchangequalitatively, andconvergencetothenewattractor under changed parameters is very rapid. from Schaal et al, Progress in Brain Research, 2007

21 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Robustifying Motions motion from imitation learning is fragile robustify by self-exploration competing RL methods: Stefan Schaal: PI 2 Policy Improvement with Path Integrals Jan Peters: PoWeR Policy Learning by Weighting Exploration with the Returns

22 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Policy Improvement with Path Integrals PI 2 Evangelos Theodorou, PhD 11 Optimize shape parameters θ w.r.t. cost function J Use direct reinforcement learning Exploration directly in policy parameter space θ Use Policy Improvement with Path Integrals PI 2 Derived from principles of optimal control Update rule based on cost-weighted averaging (next slide)

23 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ τẍt = α(β(g xt ) ẋt ) + g T t θ

24 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J τẍt = α(β(g xt ) ẋt ) + g T t θ J(τ i )

25 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore τẍt = α(β(g xt ) ẋt ) + g T t θ J(τ i )

26 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k

27 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k

28 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k

29 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1

30 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1

31 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1

32 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Some Advantages of PI 2 Applicable to very high-dimensional spaces Stulp, F., Buchli, J., Theodorou, E., and Schaal, S. (2010). Reinforcement learning of full-body humanoid motor skills. In 10th IEEE-RAS International Conference on Humanoid Robots. Best paper finalist. no gradient deals with discontinous noisy cost functions update θ within convex hull of ɛ k safe update rule

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From

More information

Path Integral Stochastic Optimal Control for Reinforcement Learning

Path Integral Stochastic Optimal Control for Reinforcement Learning Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute

More information

Lecture 13: Dynamical Systems based Representation. Contents:

Lecture 13: Dynamical Systems based Representation. Contents: Lecture 13: Dynamical Systems based Representation Contents: Differential Equation Force Fields, Velocity Fields Dynamical systems for Trajectory Plans Generating plans dynamically Fitting (or modifying)

More information

Learning Attractor Landscapes for Learning Motor Primitives

Learning Attractor Landscapes for Learning Motor Primitives Learning Attractor Landscapes for Learning Motor Primitives Auke Jan Ijspeert,, Jun Nakanishi, and Stefan Schaal, University of Southern California, Los Angeles, CA 989-5, USA ATR Human Information Science

More information

Trajectory encoding methods in robotics

Trajectory encoding methods in robotics JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL Humanoid and Service Robotics Trajectory encoding methods in robotics Andrej Gams Jožef Stefan Institute Ljubljana, Slovenia andrej.gams@ijs.si 12. 04. 2018

More information

A Probabilistic Representation for Dynamic Movement Primitives

A Probabilistic Representation for Dynamic Movement Primitives A Probabilistic Representation for Dynamic Movement Primitives Franziska Meier,2 and Stefan Schaal,2 CLMC Lab, University of Southern California, Los Angeles, USA 2 Autonomous Motion Department, MPI for

More information

Dynamic Movement Primitives for Cooperative Manipulation and Synchronized Motions

Dynamic Movement Primitives for Cooperative Manipulation and Synchronized Motions Dynamic Movement Primitives for Cooperative Manipulation and Synchronized Motions Jonas Umlauft, Dominik Sieber and Sandra Hirche Abstract Cooperative manipulation, where several robots jointly manipulate

More information

Coordinate Change Dynamic Movement Primitives A Leader-Follower Approach

Coordinate Change Dynamic Movement Primitives A Leader-Follower Approach Coordinate Change Dynamic Movement Primitives A Leader-Follower Approach You Zhou, Martin Do, and Tamim Asfour Abstract Dynamic movement primitives prove to be a useful and effective way to represent a

More information

Path Integral Policy Improvement with Covariance Matrix Adaptation

Path Integral Policy Improvement with Covariance Matrix Adaptation JMLR: Workshop and Conference Proceedings 1 14, 2012 European Workshop on Reinforcement Learning (EWRL) Path Integral Policy Improvement with Covariance Matrix Adaptation Freek Stulp freek.stulp@ensta-paristech.fr

More information

Learning Motor Skills from Partially Observed Movements Executed at Different Speeds

Learning Motor Skills from Partially Observed Movements Executed at Different Speeds Learning Motor Skills from Partially Observed Movements Executed at Different Speeds Marco Ewerton 1, Guilherme Maeda 1, Jan Peters 1,2 and Gerhard Neumann 3 Abstract Learning motor skills from multiple

More information

Learning Variable Impedance Control

Learning Variable Impedance Control Learning Variable Impedance Control Jonas Buchli, Freek Stulp, Evangelos Theodorou, Stefan Schaal November 16, 2011 Abstract One of the hallmarks of the performance, versatility, and robustness of biological

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning

Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &

More information

Learning Policy Improvements with Path Integrals

Learning Policy Improvements with Path Integrals Evangelos Theodorou etheodor@usc.edu Jonas Buchli jonas@buchli.org Stefan Schaal sschaal@usc.edu Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, 370 S.McClintock

More information

Path Integral Policy Improvement with Covariance Matrix Adaptation

Path Integral Policy Improvement with Covariance Matrix Adaptation Path Integral Policy Improvement with Covariance Matrix Adaptation Freek Stulp freek.stulp@ensta-paristech.fr Cognitive Robotics, École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech), Paris

More information

Policy Search for Path Integral Control

Policy Search for Path Integral Control Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2,JanPeters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,

More information

CS545 Contents XIII. Trajectory Planning. Reading Assignment for Next Class

CS545 Contents XIII. Trajectory Planning. Reading Assignment for Next Class CS545 Contents XIII Trajectory Planning Control Policies Desired Trajectories Optimization Methods Dynamical Systems Reading Assignment for Next Class See http://www-clmc.usc.edu/~cs545 Learning Policies

More information

Movement Primitives with Multiple Phase Parameters

Movement Primitives with Multiple Phase Parameters Movement Primitives with Multiple Phase Parameters Marco Ewerton 1, Guilherme Maeda 1, Gerhard Neumann 2, Viktor Kisner 3, Gerrit Kollegger 4, Josef Wiemeyer 4 and Jan Peters 1, Abstract Movement primitives

More information

Tutorial on Policy Gradient Methods. Jan Peters

Tutorial on Policy Gradient Methods. Jan Peters Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup

More information

Learning Dynamical System Modulation for Constrained Reaching Tasks

Learning Dynamical System Modulation for Constrained Reaching Tasks In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS'6) Learning Dynamical System Modulation for Constrained Reaching Tasks Micha Hersch, Florent Guenter, Sylvain Calinon

More information

Learning Movement Primitives

Learning Movement Primitives S. Schaal, J. Peters, J. Nakanishi, and A. Ijspeert, "Learning Movement Primitives," in International Symposium on Robotics Research (ISRR2003), Springer Tracts in Advanced Robotics. Ciena, Italy: Springer,

More information

Variable Impedance Control A Reinforcement Learning Approach

Variable Impedance Control A Reinforcement Learning Approach Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

EM-based Reinforcement Learning

EM-based Reinforcement Learning EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data

More information

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017

Actor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017 Actor-critic methods Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department February 21, 2017 1 / 21 In this lecture... The actor-critic architecture Least-Squares Policy Iteration

More information

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach 200 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 200, Anchorage, Alaska, USA Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral

More information

Probabilistic Movement Primitives under Unknown System Dynamics

Probabilistic Movement Primitives under Unknown System Dynamics To appear in Advanced Robotics Vol., No., Month XX, 8 FULL PAPER Probabilistic Movement Primitives under Unknown System Dynamics Alexandros Paraschos a, Elmar Rueckert a, Jan Peters a,c, and Gerhard Neumann

More information

Encoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive

Encoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive Encoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive Johannes Ernesti, Ludovic Righetti, Martin Do, Tamim Asfour, Stefan Schaal Institute for Anthropomatics, Karlsruhe

More information

Probabilistic Model-based Imitation Learning

Probabilistic Model-based Imitation Learning Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max

More information

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach

Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach Evangelos Theodorou, Jonas Buchli, and Stefan Schaal Abstract Reinforcement learning RL is one of the most general approaches

More information

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki

ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations

More information

Co-evolution of Morphology and Control for Roombots

Co-evolution of Morphology and Control for Roombots Co-evolution of Morphology and Control for Roombots Master Thesis Presentation Ebru Aydın Advisors: Prof. Auke Jan Ijspeert Rico Möckel Jesse van den Kieboom Soha Pouya Alexander Spröwitz Co-evolution

More information

Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach

Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar Abstract We present a novel framework for stiffness and temporal

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors

Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors LETTER Communicated by Hirokazu Tanaka Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors Auke Jan Ijspeert auke.ijspeert@epfl.ch Ecole Polytechnique Fédérale de Lausanne, Lausanne

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2017 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

Imitation Learning of Globally Stable Non-Linear Point-to-Point Robot Motions using Nonlinear Programming

Imitation Learning of Globally Stable Non-Linear Point-to-Point Robot Motions using Nonlinear Programming Imitation Learning of Globally Stable Non-Linear Point-to-Point Robot Motions using Nonlinear Programming S. Mohammad Khansari-Zadeh and Aude Billard Abstract This paper presents a methodology for learning

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2018 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators

Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators Matthew Howard Sethu Vijayakumar September, 7 Abstract We consider the problem of direct policy learning

More information

Using Probabilistic Movement Primitives in Robotics

Using Probabilistic Movement Primitives in Robotics Noname manuscript No. (will be inserted by the editor) Using Probabilistic Movement Primitives in Robotics Alexandros Paraschos Christian Daniel Jan Peters Gerhard Neumann the date of receipt and acceptance

More information

TO date, most robots are still programmed by a smart

TO date, most robots are still programmed by a smart IEEE ROBOTICS & AUTOMATION MAGAZINE, VOL. 17, NO. 2, JUNE 2010 1 Imitation and Reinforcement Learning Practical Learning Algorithms for Motor Primitives in Robotics by Jens Kober and Jan Peters TO date,

More information

Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018

Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018 Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018 OpenAI OpenAI is a non-profit AI research company, discovering and enacting the path to safe artificial general intelligence. OpenAI OpenAI is a non-profit

More information

Optimal Control with Learned Forward Models

Optimal Control with Learned Forward Models Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V

More information

Robotics. Control Theory. Marc Toussaint U Stuttgart

Robotics. Control Theory. Marc Toussaint U Stuttgart Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,

More information

Anytime Planning for Decentralized Multi-Robot Active Information Gathering

Anytime Planning for Decentralized Multi-Robot Active Information Gathering Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania

More information

Variational Inference for Policy Search in changing Situations

Variational Inference for Policy Search in changing Situations in changing Situations Gerhard Neumann GERHARD@IGI.TU-GRAZ.AC.AT Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria Abstract Many policy search algorithms minimize

More information

S12 PHY321: Practice Final

S12 PHY321: Practice Final S12 PHY321: Practice Final Contextual information Damped harmonic oscillator equation: ẍ + 2βẋ + ω0x 2 = 0 ( ) ( General solution: x(t) = e [A βt 1 exp β2 ω0t 2 + A 2 exp )] β 2 ω0t 2 Driven harmonic oscillator

More information

Manipulators. Robotics. Outline. Non-holonomic robots. Sensors. Mobile Robots

Manipulators. Robotics. Outline. Non-holonomic robots. Sensors. Mobile Robots Manipulators P obotics Configuration of robot specified by 6 numbers 6 degrees of freedom (DOF) 6 is the minimum number required to position end-effector arbitrarily. For dynamical systems, add velocity

More information

Factored Contextual Policy Search with Bayesian Optimization

Factored Contextual Policy Search with Bayesian Optimization Factored Contextual Policy Search with Bayesian Optimization Robert Pinsler 1, Peter Karkus 2, Andras Kupcsik 3,4, David Hsu 2 and Wee Sun Lee 2 Abstract Scarce data is a major challenge to scaling robot

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Fundamental problems in the analysis of skilled action (Saltzman 1995) Degrees of freedom Spatiotemporal form Invariance

Fundamental problems in the analysis of skilled action (Saltzman 1995) Degrees of freedom Spatiotemporal form Invariance Fundamental problems in the analysis of skilled action (Saltzman 1995) Degrees of freedom Spatiotemporal form Invariance 1 Degrees of Freedom Actions require the the coordination of a large number of (potentially)

More information

1 Trajectory Generation

1 Trajectory Generation CS 685 notes, J. Košecká 1 Trajectory Generation The material for these notes has been adopted from: John J. Craig: Robotics: Mechanics and Control. This example assumes that we have a starting position

More information

Movement Generation with Circuits of Spiking Neurons

Movement Generation with Circuits of Spiking Neurons no. 2953 Movement Generation with Circuits of Spiking Neurons Prashant Joshi, Wolfgang Maass Institute for Theoretical Computer Science Technische Universität Graz A-8010 Graz, Austria {joshi, maass}@igi.tugraz.at

More information

Non-Parametric Contextual Stochastic Search

Non-Parametric Contextual Stochastic Search Non-Parametric Contextual Stochastic Search Abbas Abdolmaleki,,, Nuno Lau, Luis Paulo Reis,, Gerhard Neumann Abstract Stochastic search algorithms are black-box optimizer of an objective function. They

More information

Lecture Notes: (Stochastic) Optimal Control

Lecture Notes: (Stochastic) Optimal Control Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be

More information

Learning Partially Contracting Dynamical Systems from Demonstrations

Learning Partially Contracting Dynamical Systems from Demonstrations Learning Partially Contracting Dynamical Systems from Demonstrations Harish Ravichandar Department of Electrical Engineering and Computer Sciences University of Connecticut United States harish.ravichandar@uconn.edu

More information

Learning Control for Air Hockey Striking using Deep Reinforcement Learning

Learning Control for Air Hockey Striking using Deep Reinforcement Learning Learning Control for Air Hockey Striking using Deep Reinforcement Learning Ayal Taitler, Nahum Shimkin Faculty of Electrical Engineering Technion - Israel Institute of Technology May 8, 2017 A. Taitler,

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Evaluating Arm Movement Imitation

Evaluating Arm Movement Imitation USC Center for Robotics and Embedded Systems Technical Report CRES-05-009, August 2005 Evaluating Arm Movement Imitation Alexandra Constantin Williams College Williamstown, MA 01267 07aec 2@williams.edu

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Policy gradients Daniel Hennes 26.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Policy based reinforcement learning So far we approximated the action value

More information

TENDON-driven systems provide unique

TENDON-driven systems provide unique 1 Reinforcement Learning and Synergistic Control of the ACT Hand Eric Rombokas, Mark Malhotra, Evangelos Theodorou, Emo Todorov and Yoky Matsuoka Abstract Tendon-driven systems are ubiquitous in biology

More information

Deep Reinforcement Learning via Policy Optimization

Deep Reinforcement Learning via Policy Optimization Deep Reinforcement Learning via Policy Optimization John Schulman July 3, 2017 Introduction Deep Reinforcement Learning: What to Learn? Policies (select next action) Deep Reinforcement Learning: What to

More information

Online Learning in High Dimensions. LWPR and it s application

Online Learning in High Dimensions. LWPR and it s application Lecture 9 LWPR Online Learning in High Dimensions Contents: LWPR and it s application Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Incremental Online Learning in High Dimensions, Neural Computation,

More information

Lecture 8: Policy Gradient I 2

Lecture 8: Policy Gradient I 2 Lecture 8: Policy Gradient I 2 Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Additional reading: Sutton and Barto 2018 Chp. 13 2 With many slides from or derived from David Silver and John Schulman

More information

Minimum Angular Acceleration Control of Articulated Body Dynamics

Minimum Angular Acceleration Control of Articulated Body Dynamics Minimum Angular Acceleration Control of Articulated Body Dynamics Javier R. Movellan UCSD Abstract As robots find applications in daily life conditions it becomes important to develop controllers that

More information

Reinforcement Learning of Variable Admittance Control for Human-Robot Co-manipulation

Reinforcement Learning of Variable Admittance Control for Human-Robot Co-manipulation 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Congress Center Hamburg Sept 28 - Oct 2, 2015. Hamburg, Germany Reinforcement Learning of Variable Admittance Control for

More information

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Model-Based Relative Entropy Stochastic Search

Model-Based Relative Entropy Stochastic Search Model-Based Relative Entropy Stochastic Search Abbas Abdolmaleki 1,2,3, Rudolf Lioutikov 4, Nuno Lau 1, Luis Paulo Reis 2,3, Jan Peters 4,6, and Gerhard Neumann 5 1: IEETA, University of Aveiro, Aveiro,

More information

Iterative Motion Primitive Learning and Refinement by Compliant Motion Control

Iterative Motion Primitive Learning and Refinement by Compliant Motion Control Iterative Motion Primitive Learning and Refinement by Compliant Motion Control Dongheui Lee and Christian Ott Abstract We present an approach for motion primitive learning and refinement for a humanoid

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems

ELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight

More information

Control and Planning with Asymptotically Stable Gait Primitives: 3D Dynamic Walking to Locomotor Rehabilitation?

Control and Planning with Asymptotically Stable Gait Primitives: 3D Dynamic Walking to Locomotor Rehabilitation? July 8, 2010 Dynamic Walking, Cambridge, MA R. Gregg 1 1 Control and Planning with Asymptotically Stable Gait Primitives: 3D Dynamic Walking to Locomotor Rehabilitation? Robert D. Gregg * and Mark W. Spong

More information

8 Example 1: The van der Pol oscillator (Strogatz Chapter 7)

8 Example 1: The van der Pol oscillator (Strogatz Chapter 7) 8 Example 1: The van der Pol oscillator (Strogatz Chapter 7) So far we have seen some different possibilities of what can happen in two-dimensional systems (local and global attractors and bifurcations)

More information

Probabilistic Prioritization of Movement Primitives

Probabilistic Prioritization of Movement Primitives Probabilistic Prioritization of Movement Primitives Alexandros Paraschos 1,, Rudolf Lioutikov 1, Jan Peters 1,3 and Gerhard Neumann Abstract Movement prioritization is a common approach to combine controllers

More information

Methods for Supervised Learning of Tasks

Methods for Supervised Learning of Tasks Methods for Supervised Learning of Tasks Alper Denasi DCT 8.43 Traineeship report Coaching: Prof. dr. H. Nijmeijer Technische Universiteit Eindhoven Department Mechanical Engineering Dynamics and Control

More information

Model-based Imitation Learning by Probabilistic Trajectory Matching

Model-based Imitation Learning by Probabilistic Trajectory Matching Model-based Imitation Learning by Probabilistic Trajectory Matching Peter Englert1, Alexandros Paraschos1, Jan Peters1,2, Marc Peter Deisenroth1 Abstract One of the most elegant ways of teaching new skills

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

Linear and Nonlinear Oscillators (Lecture 2)

Linear and Nonlinear Oscillators (Lecture 2) Linear and Nonlinear Oscillators (Lecture 2) January 25, 2016 7/441 Lecture outline A simple model of a linear oscillator lies in the foundation of many physical phenomena in accelerator dynamics. A typical

More information

School of Informatics, University of Edinburgh

School of Informatics, University of Edinburgh T E H U N I V E R S I T Y O H F R G School of Informatics, University of Edinburgh E D I N B U Institute of Perception, Action and Behaviour Reinforcement Learning for Humanoid Robots - Policy Gradients

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Policy Gradient Methods. February 13, 2017

Policy Gradient Methods. February 13, 2017 Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length

More information

Learning Rhythmic Movements by Demonstration using Nonlinear Oscillators

Learning Rhythmic Movements by Demonstration using Nonlinear Oscillators Learning Rhthmic Movements b Demonstration using Nonlinear Oscillators Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal, Dept. of Computer Science, Universit of Southern California, Los Angeles, USA

More information

Design and Control of Variable Stiffness Actuation Systems

Design and Control of Variable Stiffness Actuation Systems Design and Control of Variable Stiffness Actuation Systems Gianluca Palli, Claudio Melchiorri, Giovanni Berselli and Gabriele Vassura DEIS - DIEM - Università di Bologna LAR - Laboratory of Automation

More information

PROJECT 1 DYNAMICS OF MACHINES 41514

PROJECT 1 DYNAMICS OF MACHINES 41514 PROJECT DYNAMICS OF MACHINES 454 Theoretical and Experimental Modal Analysis and Validation of Mathematical Models in Multibody Dynamics Ilmar Ferreira Santos, Professor Dr.-Ing., Dr.Techn., Livre-Docente

More information

1-DOF Forced Harmonic Vibration. MCE371: Vibrations. Prof. Richter. Department of Mechanical Engineering. Handout 8 Fall 2011

1-DOF Forced Harmonic Vibration. MCE371: Vibrations. Prof. Richter. Department of Mechanical Engineering. Handout 8 Fall 2011 MCE371: Vibrations Prof. Richter Department of Mechanical Engineering Handout 8 Fall 2011 Harmonic Forcing Functions Transient vs. Steady Vibration Follow Palm, Sect. 4.1, 4.9 and 4.10 Harmonic forcing

More information

Identification of Human Skill based on Feedforward / Feedback Switched Dynamical Model

Identification of Human Skill based on Feedforward / Feedback Switched Dynamical Model Identification of Human Skill based on Feedforward / Feedback Switched Dynamical Model Hiroyuki Okuda, Hidenori Takeuchi, Shinkichi Inagaki and Tatsuya Suzuki Department of Mechanical Science and Engineering,

More information

Operational Space Control of Constrained and Underactuated Systems

Operational Space Control of Constrained and Underactuated Systems Robotics: Science and Systems 2 Los Angeles, CA, USA, June 27-3, 2 Operational Space Control of Constrained and Underactuated Systems Michael Mistry Disney Research Pittsburgh 472 Forbes Ave., Suite Pittsburgh,

More information

Policy Gradient Reinforcement Learning for Robotics

Policy Gradient Reinforcement Learning for Robotics Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous

More information

to Motor Adaptation Olivier Sigaud September 24, 2013

to Motor Adaptation Olivier Sigaud September 24, 2013 An Operational Space Control approach to Motor Adaptation Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud September 24, 2013 1 / 49 Table of contents Introduction

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

arxiv: v1 [cs.lg] 13 Dec 2013

arxiv: v1 [cs.lg] 13 Dec 2013 Efficient Baseline-free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE Frank Sehnke Zentrum für Sonnenenergie- und Wasserstoff-Forschung, Industriestr. 6, Stuttgart, BW 70565 Germany

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

DYNAMICS OF MACHINERY 41514

DYNAMICS OF MACHINERY 41514 DYNAMICS OF MACHINERY 454 PROJECT : Theoretical and Experimental Modal Analysis and Validation of Mathematical Models in Multibody Dynamics Holistic Overview of the Project Steps & Their Conceptual Links

More information

CS885 Reinforcement Learning Lecture 7a: May 23, 2018

CS885 Reinforcement Learning Lecture 7a: May 23, 2018 CS885 Reinforcement Learning Lecture 7a: May 23, 2018 Policy Gradient Methods [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 CS885 Spring 2018 Pascal Poupart 1 Outline Stochastic

More information

Coupling Movement Primitives: Interaction with the Environment and Bimanual Tasks

Coupling Movement Primitives: Interaction with the Environment and Bimanual Tasks IEEE T-RO Coupling Movement Primitives: Interaction with the Environment and Bimanual Tasks Andrej Gams, Bojan Nemec, Auke J Ijspeert and Aleš Ude Abstract The framework of dynamic movement primitives

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems

Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems Akihiko Yamaguchi 1 and Christopher G. Atkeson 1 Abstract We explore a model-based approach to reinforcement learning

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement

More information

Inverse Optimality Design for Biological Movement Systems

Inverse Optimality Design for Biological Movement Systems Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google

More information