Learning Manipulation Patterns
|
|
- Amice Bradford
- 5 years ago
- Views:
Transcription
1 Introduction Dynamic Movement Primitives Reinforcement Learning Neuroinformatics Group February 29, 2012
2 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons
3 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons What are suitable motion representations? How can we improve by explorative learning?
4 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons What are suitable motion representations? Dynamic Movement Primitives (DMP) How can we improve by explorative learning? Policy Improvement with Path Integrals (PI 2 )
5 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 motion as evolution of dynamical system basis: spring-damper-system τ v t = K(g x t ) D v t τẋ t = v or τẍ t = α(β(g x t ) ẋ t ) x - current position x - current velocity g - goal of motion choose K and D to have critical damping
6 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 spring-damper system generates basic motion towards goal τẍ t = α(β(g x t ) ẋ t ) v
7 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 spring-damper system generates basic motion towards goal τẍ t = α(β(g x t ) ẋ t )
8 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + g T t θ
9 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + g T t θ
10 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + gt T θ
11 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i ψ i (s) = exp( h i (s c i ) 2 ) 1 Gaussians c i logarithmically distributed in [0... 1] s 2 from Schaal et al, Progress Brain Research, 2007
12 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007
13 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max amplitude scaled by initial distance to goal ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007
14 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max amplitude scaled by initial distance to goal influence weighted by canonical time s 0 ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007
15 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s s initially set to exponentially converges to 0
16 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s α c (x actual x expected ) 2 s initially set to exponentially converges to 0 pause influence of force on perturbations
17 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Properties convergence to goal g τẍ t = α(β(g x t ) ẋ t ) + g T t θ τṡ = α s f (s) = gt T θ = i θ iψ i (s) i ψ i(s) (g x 0) s motions are self-similar for different goal or start point coupling of multiple DOF through canonical phase s adapt τ for temporal scaling robust to perturbations due to attractor dynamics decoupling basic goal-directed motion from task-specific trajectory shape weights θ i can be learned with linear regression
18 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Learning from Demonstration record motion x(t), ẋ(t), ẍ(t) choose τ to match duration integrate canonical system s(t) f target (s) = τẍ(t) α(β(g x(t)) ẋ(t)) minimize E = s (f target(s) f (s)) 2 with regression from Ijspeert, Nakanishi, Schaal, ICRA 02
19 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Periodic Motion replace canonical system by limit cycle oscillator φ - phase of oscillation A - amplitude of oscillation τ φ = 1 mod 2π f (φ, A) = A i θ iψ i (φ) i ψ i(φ) ψ i (φ) = exp (h i (cos(φ c i ) 1)) ψ i - van Mises basis functions, i.e. Gaussians living on a circle
20 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Periodic Motion - Examples 3 2 A B C D Time [s] Fig. 10. Modification of thelearned rhythmicdrummingpattern (flexion/extension of theright elbow, R_EB). (A) Trajectorylearned by therhythmicdmp; (B) temporary modification with A 2A in Eq. (16); (C): temporary modification with t t/2in Eqs. (9) and (15); (D): temporarymodificationwith g g+1ineq. (9) (dottedline). Modifiedparameterswereappliedbetweent ¼3sandt ¼7s. Notethatinall modifications, themovementpatternsdonotchangequalitatively, andconvergencetothenewattractor under changed parameters is very rapid. from Schaal et al, Progress in Brain Research, 2007
21 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Robustifying Motions motion from imitation learning is fragile robustify by self-exploration competing RL methods: Stefan Schaal: PI 2 Policy Improvement with Path Integrals Jan Peters: PoWeR Policy Learning by Weighting Exploration with the Returns
22 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Policy Improvement with Path Integrals PI 2 Evangelos Theodorou, PhD 11 Optimize shape parameters θ w.r.t. cost function J Use direct reinforcement learning Exploration directly in policy parameter space θ Use Policy Improvement with Path Integrals PI 2 Derived from principles of optimal control Update rule based on cost-weighted averaging (next slide)
23 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ τẍt = α(β(g xt ) ẋt ) + g T t θ
24 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J τẍt = α(β(g xt ) ẋt ) + g T t θ J(τ i )
25 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore τẍt = α(β(g xt ) ẋt ) + g T t θ J(τ i )
26 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k
27 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k
28 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k
29 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1
30 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1
31 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1
32 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Some Advantages of PI 2 Applicable to very high-dimensional spaces Stulp, F., Buchli, J., Theodorou, E., and Schaal, S. (2010). Reinforcement learning of full-body humanoid motor skills. In 10th IEEE-RAS International Conference on Humanoid Robots. Best paper finalist. no gradient deals with discontinous noisy cost functions update θ within convex hull of ɛ k safe update rule
CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods
CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From
More informationPath Integral Stochastic Optimal Control for Reinforcement Learning
Preprint August 3, 204 The st Multidisciplinary Conference on Reinforcement Learning and Decision Making RLDM203 Path Integral Stochastic Optimal Control for Reinforcement Learning Farbod Farshidian Institute
More informationLecture 13: Dynamical Systems based Representation. Contents:
Lecture 13: Dynamical Systems based Representation Contents: Differential Equation Force Fields, Velocity Fields Dynamical systems for Trajectory Plans Generating plans dynamically Fitting (or modifying)
More informationLearning Attractor Landscapes for Learning Motor Primitives
Learning Attractor Landscapes for Learning Motor Primitives Auke Jan Ijspeert,, Jun Nakanishi, and Stefan Schaal, University of Southern California, Los Angeles, CA 989-5, USA ATR Human Information Science
More informationTrajectory encoding methods in robotics
JOŽEF STEFAN INTERNATIONAL POSTGRADUATE SCHOOL Humanoid and Service Robotics Trajectory encoding methods in robotics Andrej Gams Jožef Stefan Institute Ljubljana, Slovenia andrej.gams@ijs.si 12. 04. 2018
More informationA Probabilistic Representation for Dynamic Movement Primitives
A Probabilistic Representation for Dynamic Movement Primitives Franziska Meier,2 and Stefan Schaal,2 CLMC Lab, University of Southern California, Los Angeles, USA 2 Autonomous Motion Department, MPI for
More informationDynamic Movement Primitives for Cooperative Manipulation and Synchronized Motions
Dynamic Movement Primitives for Cooperative Manipulation and Synchronized Motions Jonas Umlauft, Dominik Sieber and Sandra Hirche Abstract Cooperative manipulation, where several robots jointly manipulate
More informationCoordinate Change Dynamic Movement Primitives A Leader-Follower Approach
Coordinate Change Dynamic Movement Primitives A Leader-Follower Approach You Zhou, Martin Do, and Tamim Asfour Abstract Dynamic movement primitives prove to be a useful and effective way to represent a
More informationPath Integral Policy Improvement with Covariance Matrix Adaptation
JMLR: Workshop and Conference Proceedings 1 14, 2012 European Workshop on Reinforcement Learning (EWRL) Path Integral Policy Improvement with Covariance Matrix Adaptation Freek Stulp freek.stulp@ensta-paristech.fr
More informationLearning Motor Skills from Partially Observed Movements Executed at Different Speeds
Learning Motor Skills from Partially Observed Movements Executed at Different Speeds Marco Ewerton 1, Guilherme Maeda 1, Jan Peters 1,2 and Gerhard Neumann 3 Abstract Learning motor skills from multiple
More informationLearning Variable Impedance Control
Learning Variable Impedance Control Jonas Buchli, Freek Stulp, Evangelos Theodorou, Stefan Schaal November 16, 2011 Abstract One of the hallmarks of the performance, versatility, and robustness of biological
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2, Jan Peters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More informationRobotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning
Robotics Part II: From Learning Model-based Control to Model-free Reinforcement Learning Stefan Schaal Max-Planck-Institute for Intelligent Systems Tübingen, Germany & Computer Science, Neuroscience, &
More informationLearning Policy Improvements with Path Integrals
Evangelos Theodorou etheodor@usc.edu Jonas Buchli jonas@buchli.org Stefan Schaal sschaal@usc.edu Computer Science, Neuroscience, & Biomedical Engineering University of Southern California, 370 S.McClintock
More informationPath Integral Policy Improvement with Covariance Matrix Adaptation
Path Integral Policy Improvement with Covariance Matrix Adaptation Freek Stulp freek.stulp@ensta-paristech.fr Cognitive Robotics, École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech), Paris
More informationPolicy Search for Path Integral Control
Policy Search for Path Integral Control Vicenç Gómez 1,2, Hilbert J Kappen 2,JanPeters 3,4, and Gerhard Neumann 3 1 Universitat Pompeu Fabra, Barcelona Department of Information and Communication Technologies,
More informationCS545 Contents XIII. Trajectory Planning. Reading Assignment for Next Class
CS545 Contents XIII Trajectory Planning Control Policies Desired Trajectories Optimization Methods Dynamical Systems Reading Assignment for Next Class See http://www-clmc.usc.edu/~cs545 Learning Policies
More informationMovement Primitives with Multiple Phase Parameters
Movement Primitives with Multiple Phase Parameters Marco Ewerton 1, Guilherme Maeda 1, Gerhard Neumann 2, Viktor Kisner 3, Gerrit Kollegger 4, Josef Wiemeyer 4 and Jan Peters 1, Abstract Movement primitives
More informationTutorial on Policy Gradient Methods. Jan Peters
Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup
More informationLearning Dynamical System Modulation for Constrained Reaching Tasks
In Proceedings of the IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS'6) Learning Dynamical System Modulation for Constrained Reaching Tasks Micha Hersch, Florent Guenter, Sylvain Calinon
More informationLearning Movement Primitives
S. Schaal, J. Peters, J. Nakanishi, and A. Ijspeert, "Learning Movement Primitives," in International Symposium on Robotics Research (ISRR2003), Springer Tracts in Advanced Robotics. Ciena, Italy: Springer,
More informationVariable Impedance Control A Reinforcement Learning Approach
Variable Impedance Control A Reinforcement Learning Approach Jonas Buchli, Evangelos Theodorou, Freek Stulp, Stefan Schaal Abstract One of the hallmarks of the performance, versatility, and robustness
More informationProbabilistic Model-based Imitation Learning
Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max
More informationEM-based Reinforcement Learning
EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data
More informationActor-critic methods. Dialogue Systems Group, Cambridge University Engineering Department. February 21, 2017
Actor-critic methods Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department February 21, 2017 1 / 21 In this lecture... The actor-critic architecture Least-Squares Policy Iteration
More informationReinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach
200 IEEE International Conference on Robotics and Automation Anchorage Convention District May 3-8, 200, Anchorage, Alaska, USA Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral
More informationProbabilistic Movement Primitives under Unknown System Dynamics
To appear in Advanced Robotics Vol., No., Month XX, 8 FULL PAPER Probabilistic Movement Primitives under Unknown System Dynamics Alexandros Paraschos a, Elmar Rueckert a, Jan Peters a,c, and Gerhard Neumann
More informationEncoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive
Encoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive Johannes Ernesti, Ludovic Righetti, Martin Do, Tamim Asfour, Stefan Schaal Institute for Anthropomatics, Karlsruhe
More informationProbabilistic Model-based Imitation Learning
Probabilistic Model-based Imitation Learning Peter Englert, Alexandros Paraschos, Jan Peters,2, and Marc Peter Deisenroth Department of Computer Science, Technische Universität Darmstadt, Germany. 2 Max
More informationReinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach
Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach Evangelos Theodorou, Jonas Buchli, and Stefan Schaal Abstract Reinforcement learning RL is one of the most general approaches
More informationELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches. Ville Kyrki
ELEC-E8119 Robotics: Manipulation, Decision Making and Learning Policy gradient approaches Ville Kyrki 9.10.2017 Today Direct policy learning via policy gradient. Learning goals Understand basis and limitations
More informationCo-evolution of Morphology and Control for Roombots
Co-evolution of Morphology and Control for Roombots Master Thesis Presentation Ebru Aydın Advisors: Prof. Auke Jan Ijspeert Rico Möckel Jesse van den Kieboom Soha Pouya Alexander Spröwitz Co-evolution
More informationStiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach
Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar Abstract We present a novel framework for stiffness and temporal
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationDynamical Movement Primitives: Learning Attractor Models for Motor Behaviors
LETTER Communicated by Hirokazu Tanaka Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors Auke Jan Ijspeert auke.ijspeert@epfl.ch Ecole Polytechnique Fédérale de Lausanne, Lausanne
More informationReinforcement Learning II. George Konidaris
Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2017 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes
More informationImitation Learning of Globally Stable Non-Linear Point-to-Point Robot Motions using Nonlinear Programming
Imitation Learning of Globally Stable Non-Linear Point-to-Point Robot Motions using Nonlinear Programming S. Mohammad Khansari-Zadeh and Aude Billard Abstract This paper presents a methodology for learning
More informationReinforcement Learning II. George Konidaris
Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2018 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes
More informationReconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators
Reconstructing Null-space Policies Subject to Dynamic Task Constraints in Redundant Manipulators Matthew Howard Sethu Vijayakumar September, 7 Abstract We consider the problem of direct policy learning
More informationUsing Probabilistic Movement Primitives in Robotics
Noname manuscript No. (will be inserted by the editor) Using Probabilistic Movement Primitives in Robotics Alexandros Paraschos Christian Daniel Jan Peters Gerhard Neumann the date of receipt and acceptance
More informationTO date, most robots are still programmed by a smart
IEEE ROBOTICS & AUTOMATION MAGAZINE, VOL. 17, NO. 2, JUNE 2010 1 Imitation and Reinforcement Learning Practical Learning Algorithms for Motor Primitives in Robotics by Jens Kober and Jan Peters TO date,
More informationLearning Dexterity Matthias Plappert SEPTEMBER 6, 2018
Learning Dexterity Matthias Plappert SEPTEMBER 6, 2018 OpenAI OpenAI is a non-profit AI research company, discovering and enacting the path to safe artificial general intelligence. OpenAI OpenAI is a non-profit
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationRobotics. Control Theory. Marc Toussaint U Stuttgart
Robotics Control Theory Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati equations (differential, algebraic, discrete-time), controllability,
More informationAnytime Planning for Decentralized Multi-Robot Active Information Gathering
Anytime Planning for Decentralized Multi-Robot Active Information Gathering Brent Schlotfeldt 1 Dinesh Thakur 1 Nikolay Atanasov 2 Vijay Kumar 1 George Pappas 1 1 GRASP Laboratory University of Pennsylvania
More informationVariational Inference for Policy Search in changing Situations
in changing Situations Gerhard Neumann GERHARD@IGI.TU-GRAZ.AC.AT Institute for Theoretical Computer Science, Graz University of Technology, A-8010 Graz, Austria Abstract Many policy search algorithms minimize
More informationS12 PHY321: Practice Final
S12 PHY321: Practice Final Contextual information Damped harmonic oscillator equation: ẍ + 2βẋ + ω0x 2 = 0 ( ) ( General solution: x(t) = e [A βt 1 exp β2 ω0t 2 + A 2 exp )] β 2 ω0t 2 Driven harmonic oscillator
More informationManipulators. Robotics. Outline. Non-holonomic robots. Sensors. Mobile Robots
Manipulators P obotics Configuration of robot specified by 6 numbers 6 degrees of freedom (DOF) 6 is the minimum number required to position end-effector arbitrarily. For dynamical systems, add velocity
More informationFactored Contextual Policy Search with Bayesian Optimization
Factored Contextual Policy Search with Bayesian Optimization Robert Pinsler 1, Peter Karkus 2, Andras Kupcsik 3,4, David Hsu 2 and Wee Sun Lee 2 Abstract Scarce data is a major challenge to scaling robot
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationFundamental problems in the analysis of skilled action (Saltzman 1995) Degrees of freedom Spatiotemporal form Invariance
Fundamental problems in the analysis of skilled action (Saltzman 1995) Degrees of freedom Spatiotemporal form Invariance 1 Degrees of Freedom Actions require the the coordination of a large number of (potentially)
More information1 Trajectory Generation
CS 685 notes, J. Košecká 1 Trajectory Generation The material for these notes has been adopted from: John J. Craig: Robotics: Mechanics and Control. This example assumes that we have a starting position
More informationMovement Generation with Circuits of Spiking Neurons
no. 2953 Movement Generation with Circuits of Spiking Neurons Prashant Joshi, Wolfgang Maass Institute for Theoretical Computer Science Technische Universität Graz A-8010 Graz, Austria {joshi, maass}@igi.tugraz.at
More informationNon-Parametric Contextual Stochastic Search
Non-Parametric Contextual Stochastic Search Abbas Abdolmaleki,,, Nuno Lau, Luis Paulo Reis,, Gerhard Neumann Abstract Stochastic search algorithms are black-box optimizer of an objective function. They
More informationLecture Notes: (Stochastic) Optimal Control
Lecture Notes: (Stochastic) Optimal ontrol Marc Toussaint Machine Learning & Robotics group, TU erlin Franklinstr. 28/29, FR 6-9, 587 erlin, Germany July, 2 Disclaimer: These notes are not meant to be
More informationLearning Partially Contracting Dynamical Systems from Demonstrations
Learning Partially Contracting Dynamical Systems from Demonstrations Harish Ravichandar Department of Electrical Engineering and Computer Sciences University of Connecticut United States harish.ravichandar@uconn.edu
More informationLearning Control for Air Hockey Striking using Deep Reinforcement Learning
Learning Control for Air Hockey Striking using Deep Reinforcement Learning Ayal Taitler, Nahum Shimkin Faculty of Electrical Engineering Technion - Israel Institute of Technology May 8, 2017 A. Taitler,
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationEvaluating Arm Movement Imitation
USC Center for Robotics and Embedded Systems Technical Report CRES-05-009, August 2005 Evaluating Arm Movement Imitation Alexandra Constantin Williams College Williamstown, MA 01267 07aec 2@williams.edu
More informationReinforcement Learning
Reinforcement Learning Policy gradients Daniel Hennes 26.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Policy based reinforcement learning So far we approximated the action value
More informationTENDON-driven systems provide unique
1 Reinforcement Learning and Synergistic Control of the ACT Hand Eric Rombokas, Mark Malhotra, Evangelos Theodorou, Emo Todorov and Yoky Matsuoka Abstract Tendon-driven systems are ubiquitous in biology
More informationDeep Reinforcement Learning via Policy Optimization
Deep Reinforcement Learning via Policy Optimization John Schulman July 3, 2017 Introduction Deep Reinforcement Learning: What to Learn? Policies (select next action) Deep Reinforcement Learning: What to
More informationOnline Learning in High Dimensions. LWPR and it s application
Lecture 9 LWPR Online Learning in High Dimensions Contents: LWPR and it s application Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Incremental Online Learning in High Dimensions, Neural Computation,
More informationLecture 8: Policy Gradient I 2
Lecture 8: Policy Gradient I 2 Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Additional reading: Sutton and Barto 2018 Chp. 13 2 With many slides from or derived from David Silver and John Schulman
More informationMinimum Angular Acceleration Control of Articulated Body Dynamics
Minimum Angular Acceleration Control of Articulated Body Dynamics Javier R. Movellan UCSD Abstract As robots find applications in daily life conditions it becomes important to develop controllers that
More informationReinforcement Learning of Variable Admittance Control for Human-Robot Co-manipulation
2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Congress Center Hamburg Sept 28 - Oct 2, 2015. Hamburg, Germany Reinforcement Learning of Variable Admittance Control for
More informationLearning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods
Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods Yaakov Engel Joint work with Peter Szabo and Dmitry Volkinshtein (ex. Technion) Why use GPs in RL? A Bayesian approach
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationModel-Based Relative Entropy Stochastic Search
Model-Based Relative Entropy Stochastic Search Abbas Abdolmaleki 1,2,3, Rudolf Lioutikov 4, Nuno Lau 1, Luis Paulo Reis 2,3, Jan Peters 4,6, and Gerhard Neumann 5 1: IEETA, University of Aveiro, Aveiro,
More informationIterative Motion Primitive Learning and Refinement by Compliant Motion Control
Iterative Motion Primitive Learning and Refinement by Compliant Motion Control Dongheui Lee and Christian Ott Abstract We present an approach for motion primitive learning and refinement for a humanoid
More informationMachine Learning I Continuous Reinforcement Learning
Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t
More informationELEC4631 s Lecture 2: Dynamic Control Systems 7 March Overview of dynamic control systems
ELEC4631 s Lecture 2: Dynamic Control Systems 7 March 2011 Overview of dynamic control systems Goals of Controller design Autonomous dynamic systems Linear Multi-input multi-output (MIMO) systems Bat flight
More informationControl and Planning with Asymptotically Stable Gait Primitives: 3D Dynamic Walking to Locomotor Rehabilitation?
July 8, 2010 Dynamic Walking, Cambridge, MA R. Gregg 1 1 Control and Planning with Asymptotically Stable Gait Primitives: 3D Dynamic Walking to Locomotor Rehabilitation? Robert D. Gregg * and Mark W. Spong
More information8 Example 1: The van der Pol oscillator (Strogatz Chapter 7)
8 Example 1: The van der Pol oscillator (Strogatz Chapter 7) So far we have seen some different possibilities of what can happen in two-dimensional systems (local and global attractors and bifurcations)
More informationProbabilistic Prioritization of Movement Primitives
Probabilistic Prioritization of Movement Primitives Alexandros Paraschos 1,, Rudolf Lioutikov 1, Jan Peters 1,3 and Gerhard Neumann Abstract Movement prioritization is a common approach to combine controllers
More informationMethods for Supervised Learning of Tasks
Methods for Supervised Learning of Tasks Alper Denasi DCT 8.43 Traineeship report Coaching: Prof. dr. H. Nijmeijer Technische Universiteit Eindhoven Department Mechanical Engineering Dynamics and Control
More informationModel-based Imitation Learning by Probabilistic Trajectory Matching
Model-based Imitation Learning by Probabilistic Trajectory Matching Peter Englert1, Alexandros Paraschos1, Jan Peters1,2, Marc Peter Deisenroth1 Abstract One of the most elegant ways of teaching new skills
More informationRobotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:
Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the
More informationLinear and Nonlinear Oscillators (Lecture 2)
Linear and Nonlinear Oscillators (Lecture 2) January 25, 2016 7/441 Lecture outline A simple model of a linear oscillator lies in the foundation of many physical phenomena in accelerator dynamics. A typical
More informationSchool of Informatics, University of Edinburgh
T E H U N I V E R S I T Y O H F R G School of Informatics, University of Edinburgh E D I N B U Institute of Perception, Action and Behaviour Reinforcement Learning for Humanoid Robots - Policy Gradients
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationPolicy Gradient Methods. February 13, 2017
Policy Gradient Methods February 13, 2017 Policy Optimization Problems maximize E π [expression] π Fixed-horizon episodic: T 1 Average-cost: lim T 1 T r t T 1 r t Infinite-horizon discounted: γt r t Variable-length
More informationLearning Rhythmic Movements by Demonstration using Nonlinear Oscillators
Learning Rhthmic Movements b Demonstration using Nonlinear Oscillators Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal, Dept. of Computer Science, Universit of Southern California, Los Angeles, USA
More informationDesign and Control of Variable Stiffness Actuation Systems
Design and Control of Variable Stiffness Actuation Systems Gianluca Palli, Claudio Melchiorri, Giovanni Berselli and Gabriele Vassura DEIS - DIEM - Università di Bologna LAR - Laboratory of Automation
More informationPROJECT 1 DYNAMICS OF MACHINES 41514
PROJECT DYNAMICS OF MACHINES 454 Theoretical and Experimental Modal Analysis and Validation of Mathematical Models in Multibody Dynamics Ilmar Ferreira Santos, Professor Dr.-Ing., Dr.Techn., Livre-Docente
More information1-DOF Forced Harmonic Vibration. MCE371: Vibrations. Prof. Richter. Department of Mechanical Engineering. Handout 8 Fall 2011
MCE371: Vibrations Prof. Richter Department of Mechanical Engineering Handout 8 Fall 2011 Harmonic Forcing Functions Transient vs. Steady Vibration Follow Palm, Sect. 4.1, 4.9 and 4.10 Harmonic forcing
More informationIdentification of Human Skill based on Feedforward / Feedback Switched Dynamical Model
Identification of Human Skill based on Feedforward / Feedback Switched Dynamical Model Hiroyuki Okuda, Hidenori Takeuchi, Shinkichi Inagaki and Tatsuya Suzuki Department of Mechanical Science and Engineering,
More informationOperational Space Control of Constrained and Underactuated Systems
Robotics: Science and Systems 2 Los Angeles, CA, USA, June 27-3, 2 Operational Space Control of Constrained and Underactuated Systems Michael Mistry Disney Research Pittsburgh 472 Forbes Ave., Suite Pittsburgh,
More informationPolicy Gradient Reinforcement Learning for Robotics
Policy Gradient Reinforcement Learning for Robotics Michael C. Koval mkoval@cs.rutgers.edu Michael L. Littman mlittman@cs.rutgers.edu May 9, 211 1 Introduction Learning in an environment with a continuous
More informationto Motor Adaptation Olivier Sigaud September 24, 2013
An Operational Space Control approach to Motor Adaptation Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud September 24, 2013 1 / 49 Table of contents Introduction
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationarxiv: v1 [cs.lg] 13 Dec 2013
Efficient Baseline-free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE Frank Sehnke Zentrum für Sonnenenergie- und Wasserstoff-Forschung, Industriestr. 6, Stuttgart, BW 70565 Germany
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationDYNAMICS OF MACHINERY 41514
DYNAMICS OF MACHINERY 454 PROJECT : Theoretical and Experimental Modal Analysis and Validation of Mathematical Models in Multibody Dynamics Holistic Overview of the Project Steps & Their Conceptual Links
More informationCS885 Reinforcement Learning Lecture 7a: May 23, 2018
CS885 Reinforcement Learning Lecture 7a: May 23, 2018 Policy Gradient Methods [SutBar] Sec. 13.1-13.3, 13.7 [SigBuf] Sec. 5.1-5.2, [RusNor] Sec. 21.5 CS885 Spring 2018 Pascal Poupart 1 Outline Stochastic
More informationCoupling Movement Primitives: Interaction with the Environment and Bimanual Tasks
IEEE T-RO Coupling Movement Primitives: Interaction with the Environment and Bimanual Tasks Andrej Gams, Bojan Nemec, Auke J Ijspeert and Aleš Ude Abstract The framework of dynamic movement primitives
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationNeural Networks and Differential Dynamic Programming for Reinforcement Learning Problems
Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems Akihiko Yamaguchi 1 and Christopher G. Atkeson 1 Abstract We explore a model-based approach to reinforcement learning
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement
More informationInverse Optimality Design for Biological Movement Systems
Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google
More information