Learning Manipulation Patterns

Size: px

Start display at page:

Download "Learning Manipulation Patterns"

Amice Bradford
5 years ago
Views:

1 Introduction Dynamic Movement Primitives Reinforcement Learning Neuroinformatics Group February 29, 2012

2 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons

3 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons What are suitable motion representations? How can we improve by explorative learning?

4 Introduction Dynamic Movement Primitives Reinforcement Learning Motivation shift to reactive motion generation using dynamical systems robustify movement skeletons What are suitable motion representations? Dynamic Movement Primitives (DMP) How can we improve by explorative learning? Policy Improvement with Path Integrals (PI 2 )

5 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 motion as evolution of dynamical system basis: spring-damper-system τ v t = K(g x t ) D v t τẋ t = v or τẍ t = α(β(g x t ) ẋ t ) x - current position x - current velocity g - goal of motion choose K and D to have critical damping

6 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 spring-damper system generates basic motion towards goal τẍ t = α(β(g x t ) ẋ t ) v

7 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 spring-damper system generates basic motion towards goal τẍ t = α(β(g x t ) ẋ t )

8 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + g T t θ

9 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + g T t θ

10 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Dynamic Movement Primitives Ijspeert, Nakanishi, Schaal, ICRA 02 new idea: add external force to represent complex trajectory shapes τẍ t = α(β(g x t ) ẋ t ) + gt T θ

11 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i ψ i (s) = exp( h i (s c i ) 2 ) 1 Gaussians c i logarithmically distributed in [0... 1] s 2 from Schaal et al, Progress Brain Research, 2007

12 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007

13 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max amplitude scaled by initial distance to goal ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007

14 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion External Force f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s weighted sum of basis functions ψ i soft-max amplitude scaled by initial distance to goal influence weighted by canonical time s 0 ψ i (s) = exp( h i (s c i ) 2 ) s 2 from Schaal et al, Progress Brain Research, 2007

15 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s s initially set to exponentially converges to 0

16 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Canonical System f (s) = g T t θ = i θ iψ i (s) i ψ i(s) (g x 0) s decouple external force from spring-damper evolution new phase / time variable s τṡ = α s α c (x actual x expected ) 2 s initially set to exponentially converges to 0 pause influence of force on perturbations

17 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Properties convergence to goal g τẍ t = α(β(g x t ) ẋ t ) + g T t θ τṡ = α s f (s) = gt T θ = i θ iψ i (s) i ψ i(s) (g x 0) s motions are self-similar for different goal or start point coupling of multiple DOF through canonical phase s adapt τ for temporal scaling robust to perturbations due to attractor dynamics decoupling basic goal-directed motion from task-specific trajectory shape weights θ i can be learned with linear regression

18 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Learning from Demonstration record motion x(t), ẋ(t), ẍ(t) choose τ to match duration integrate canonical system s(t) f target (s) = τẍ(t) α(β(g x(t)) ẋ(t)) minimize E = s (f target(s) f (s)) 2 with regression from Ijspeert, Nakanishi, Schaal, ICRA 02

19 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Periodic Motion replace canonical system by limit cycle oscillator φ - phase of oscillation A - amplitude of oscillation τ φ = 1 mod 2π f (φ, A) = A i θ iψ i (φ) i ψ i(φ) ψ i (φ) = exp (h i (cos(φ c i ) 1)) ψ i - van Mises basis functions, i.e. Gaussians living on a circle

20 Introduction Dynamic Movement Primitives Reinforcement Learning Goal-Directed Motion Periodic Motion Periodic Motion - Examples 3 2 A B C D Time [s] Fig. 10. Modification of thelearned rhythmicdrummingpattern (flexion/extension of theright elbow, R_EB). (A) Trajectorylearned by therhythmicdmp; (B) temporary modification with A 2A in Eq. (16); (C): temporary modification with t t/2in Eqs. (9) and (15); (D): temporarymodificationwith g g+1ineq. (9) (dottedline). Modifiedparameterswereappliedbetweent ¼3sandt ¼7s. Notethatinall modifications, themovementpatternsdonotchangequalitatively, andconvergencetothenewattractor under changed parameters is very rapid. from Schaal et al, Progress in Brain Research, 2007

21 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Robustifying Motions motion from imitation learning is fragile robustify by self-exploration competing RL methods: Stefan Schaal: PI 2 Policy Improvement with Path Integrals Jan Peters: PoWeR Policy Learning by Weighting Exploration with the Returns

22 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Policy Improvement with Path Integrals PI 2 Evangelos Theodorou, PhD 11 Optimize shape parameters θ w.r.t. cost function J Use direct reinforcement learning Exploration directly in policy parameter space θ Use Policy Improvement with Path Integrals PI 2 Derived from principles of optimal control Update rule based on cost-weighted averaging (next slide)

23 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ τẍt = α(β(g xt ) ẋt ) + g T t θ

24 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J τẍt = α(β(g xt ) ẋt ) + g T t θ J(τ i )

25 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore τẍt = α(β(g xt ) ẋt ) + g T t θ J(τ i )

26 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k

27 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k

28 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k

29 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1

30 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1

31 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Input: DMP with initial parameters θ, cost function J While (cost not converged) Explore sample exploration vectors execute DMP determine cost Update weighted averaging with Boltzmann dist. parameter update τẍt = α(β(g xt ) ẋt ) + g T t (θ + ɛ i,k ) J(τ i ) ɛ i,k N (0, Σ) θ k = θ + ɛ k θ θ + θ P(τ i,k ) = exp( 1 λ J(τ i,k )) k exp( 1 λ J(τ i,k )) K θt = P(τ i i,k )ɛ i,k k=1

32 Introduction Dynamic Movement Primitives Reinforcement Learning PI 2 Some Advantages of PI 2 Applicable to very high-dimensional spaces Stulp, F., Buchli, J., Theodorou, E., and Schaal, S. (2010). Reinforcement learning of full-body humanoid motor skills. In 10th IEEE-RAS International Conference on Humanoid Robots. Best paper finalist. no gradient deals with discontinous noisy cost functions update θ within convex hull of ɛ k safe update rule

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods

CITEC SummerSchool 2013 Learning From Physics to Knowledge Selected Learning Methods CITEC SummerSchool 23 Learning From Physics to Knowledge Selected Learning Methods Robert Haschke Neuroinformatics Group CITEC, Bielefeld University September, th 23 Robert Haschke (CITEC) Learning From