Gaussian Processes for Bayesian Filtering

Size: px

Start display at page:

Download "Gaussian Processes for Bayesian Filtering"

Mercy Stone
6 years ago
Views:

1 Gaussian Processes for Bayesian Filtering Dieter Fox University of Washington Joint work with Jonathan Ko, Brian Ferris, Marc Deisenroth, Ashish Deshpande, Neil Lawrence Funded by NSF and ONR 1

2 Bayes Filters Task: EsNmate state of a dynamical system from sensor data and control informanon Key problems in roboncs LocalizaNon, mapping, people and object tracking, acnvity recogninon, POMDPs, Various instannanons / approximanons Kalman filter, EKF, UKF, ADF, parncle filters, grid filters 2

[Ko-F: RSS-08, ARJ-09] GP- BayesFilters u(k-1) u(k) σ u(k+1) Q(k) s(k-1) GP Dynamics dynamics model s(k) µ s(k+1) z(k-1) z(k) GP Observation observation z(k+1) model µ σ R(k) Typical Bayesian

3 [Ko-F: RSS-08, ARJ-09] GP- BayesFilters u(k-1) u(k) σ u(k+1) Q(k) s(k-1) GP Dynamics dynamics model s(k) µ s(k+1) z(k-1) z(k) GP Observation observation z(k+1) model µ σ R(k) Typical Bayesian filtering Parametric dynamics and observanon models Approximate posterior via sampling (PF), sigma points (UKF), linearizanon (EKF), moment matching (ADF) GP- BayesFilters GP dynamics and observanon models Noise derived from GP predicnon uncertainty Can be integrated seamlessly into Bayes filters: EKF, UKF, PF, ADF 3

4 Learning GP Dynamics and ObservaNon Models Ground truth training sequence: S = [s 1,s 2,...,s n ],Z = [z 1,z 2,...,z n ],U = [u 1,u 2,...,u n ] Learn observanon and dynamics GPs: s k GP observation model z k [ ] k u k GP dynamics model s, Δs k = s k+1 s k [ ] k u k EGP dynamics model s, r k = Δs k f (s k,u k ) Learn separate GP for each output dimension Diagonal noise matrix [Deisenroth-etal] introduced GP-ADFs and EP for smoothing in GP dynamical systems 4

5 GP- PF PropagaNon S k S k+1 for m = 1... M : s m k+1 = GP µ s m k,u ( ) + sample( GP Σ (s m k )) Propagate each particle using GP prediction Sample from GP uncertainty One GP mean and variance prediction per particle 5

6 GP- EKF PropagaNon µ k, k µ k+1, k+1 µ k+1 = GP µ (µ k ) G = GP µ (µ k ) s k+1 = G k G T + GP Σ (µ k ) Propagate mean using GP prediction Use gradient of GP to propagate covariance 6

7 GP- UKF PropagaNon µ k, k µ k+1, k+1 ( ) χ k = µ k, µ k + γ k, µ k γ k for i = 0...2n : χ k+1 = GP µ (χ k ) 2n i i µ k+1 = ω m χ k+1 i=0 2n k+1 = ω i i c (χ k+1 µ k+1 )(χ i k+1 µ k+1 ) T + GP Σ (µ k ) i=0 Propagate each sigma point using GP predicnon 2d+1 sigma points - > 2d+1 GP mean predicnons 7

8 From EKF to GP- EKF 8

9 [Ferris- Haehnel- Fox: RSS- 06] WiFi- Based LocaNon EsNmaNon Mean Similar to [Schwaighofer-etal: NIPS-03] 9 Variance

10 Building Model 10

11 Tracking Example 11

12 WiFi- SLAM: [Ferris- F- Lawrence: IJCAI- 07] Mapping without Ground Truth Using GPLVMs 12

gravity, etc, into account GP-BayesFilters and parametric

13 Blimp Testbed Task: Track a blimp with two webcams Baseline: Parametric model that takes drag, thrust, gravity, etc, into account GP-BayesFilters and parametric model trained on ground truth data 13 obtained with Vicon motion capture system

14 GP- UKF Tracking Example Blue ellipses: sigma points projected into observanon space Green ellipse: Mean state esnmate 14

15 Tracking Results Percentage reduction in RMS over parametric baseline Cross validanon with 900 Nmesteps for training hetgp: HeteroscedasNc GP with variable noise [KersNng- etal: ICML- 07] sparsegp: sparsified to 50 acnve points [Snelson- Ghahramani: NIPS- 06] 15

16 Dealing with Training Data Sparsity Full process model tracking No right turn process model tracking Training data for right turns removed 16

17 HeteroscedasNc GP Heterosc. GP Regular GP Grey shading indicates tail motor power 17

18 GP- UKF Issue Training data density Simulated robot moving in circuit while observing landmarks Sigma points in region of low training sample density Poor GP predicnon leads to large UKF tracking error landmarks Sigma points poserr velerr GPUKF GPEKF GPPF

19 Going Latent SomeNmes ground truth states are not or only parnally available Instead of opnmizing over GP hyperparameters only, opnmize over latent states S as well 19

[Ko-F: RSS-09, ARJ-10] GP Latent Variable Models u1 u2 u 3 u n s 1 GP s2 GP 3 s s n GP GP GP GP z 1 z z 2 3 z n Latent variable models [Lawrence: NIPS- 03, Wang- etal: PAMI- 08] Learn latent

20 [Ko-F: RSS-09, ARJ-10] GP Latent Variable Models u1 u2 u 3 u n s 1 GP s2 GP 3 s s n GP GP GP GP z 1 z z 2 3 z n Latent variable models [Lawrence: NIPS- 03, Wang- etal: PAMI- 08] Learn latent states and GPs in one opnmizanon argmax S,ΘZ,Θ S log p(s,θ Z,Θ S Z,U,S ) = log p(z S,Θ Z ) + log p(s U,Θ S ) + log p(s S ) + log p(θ Z ) + log p(θ S ) + const Can take noisy labels into account 20

21 Slotcar Testbed Track contains banked curves, elevanon changes Custom IMU with gyros and accelerometers built by Intel Research Seakle ObservaNons very noisy, perceptual aliasing 21

22 PredicNve Capability Observation error Lookahead [steps] Latent space dimensionality: GPBFL 3D, HSE- HMM 20D HSE- HMM [Song- etal: ICML- 10] much more efficient GPBFL opnmizanon can incroporate noisy labels 22

23 Simple Trajectory Replay u 1 u 2 GP GP u u s 1 GP S s 2 GP GP z z z 1 z 2 Learning Human demonstrates control Learn latent states using GPBF- Learn Learn mapping from state to control Replay Track state using GP- BayesFilter Use control given by control GP 23

24 Trajectory Replay 24

25 Time Alignment 1d latent posinon vs. ground truth track posinon Blue indicates GPBFL alignment (mulnple laps) 25

26 Learning from Noisy State Labels Position along track (cm) Frame # 1d latent space indicates car posinon on track Learn posinon of car when ininalized with noisy weak labels Shading indicates control values (darker is stronger) 26

27 Comparison with Subspace IdenNficaNon ObservaNon Nme Simulated system with 1D observanon and 1D control input PredicNon errors N4SID [Overschee- 94]: 8.08 KCCA subspace idennficanon method [Kawahara- etal: NIPS- 07]: 6.42 GPBF- Learn:

28 ACT Hand Control [Deshpande- Ko- F- Matsuoka: IJRR- 13] 1. InvesNgaNon of muscle- joint kinemancal relanonship 2. How to control joints with muscles? 28

ACT Hand Tendon Arrangements Tendon hood

29 ACT Hand Tendon Arrangements Tendon hood structure for extensors CriNcal for preserving hand funcnonality Slides over the bones and joints We have non- linear, non- constant relanonships between muscles and joints [Wilkinson et al, ICRA 03] 29

30 Determine Muscle- Joint KinemaNcs Move finger in its ranges of monon Record joint angle and muscle excursion data Determine mappings using joint and muscle data Determine moment arm matrix using mapping funcnons muscle lengths joint angles l j = f j (θ) j =1,..., 6 l = R(θ) θ R ij = l mi θ j = f i θ j non-constant moment arm matrix 30 [Deshpande et al, BioRob 08, J Biomch 09]

31 Force OpNmized Joint Control Determine the desired joint torques Desired joint posinons Finger dynamics Determine muscle forces τ joint = R T F muscle 31

32 Force OpNmized Joint Control Beker posinon tracking than with polynomial fit MoNons are not smooth 32

33 GP- Based Control 33

34 RL with GP Dynamics Models So far GPs for filtering and predicnon subspace id trajectory replay hand control Now: IncorporaNon of GP models into RL 34

35 [Deisenroth-etal, ICML-11, RSS-11] PILCO: ProbabilisNc Inference for Learning Control Model-based policy search to minimize given cost function Policy: mapping from state to control Rollout: plan using current policy and GP dynamics model Policy parameter update via CG/BFGS Highly data efficient 35

uncertainties Long-term planning requires approximate inference: moment matching Model uncertainties are integrated

36 Model Learning and Approximate Inference Gaussian Process Forward Model Approximate Inference for Policy Learning 2 x t (x, u ) t 1 t 1 Probabilistic GP model consistently describes model uncertainties Long-term planning requires approximate inference: moment matching Model uncertainties are integrated out analytically (opposed to MC [Bagnell-00]) Deisenroth-etal also introduced GP-ADFs and EP for smoothing in GP 36 dynamical systems

Controlling a Low- Cost RoboNc Low- cost system ($500 for robot arm and Kinect) Very noisy No sensor informanon about robot s joint configuranon used Goal: Learn to stack tower of 5 blocks from

37 Controlling a Low- Cost RoboNc Low- cost system ($500 for robot arm and Kinect) Very noisy No sensor informanon about robot s joint configuranon used Goal: Learn to stack tower of 5 blocks from scratch Kinect camera for tracking block in end- effector State: coordinates (3D) of block center (from Kinect camera) 4 controlled DoF 20 learning trials for stacking 5 blocks (5 seconds long each) Account for system noise, e.g., Robot arm Image processing Manipulator 37

Collision Avoidance 1.2 1 y dist. to target (in m) 0.2 0.8 0.1 0.6 0.4 0 0.2 0.1 0 0.2 0.2 0.4 0.3 0.2 0.1 0 0.1 x dist.

38 Collision Avoidance y dist. to target (in m) x dist. to target (in m) Use valuable prior information about obstacles if available Incorporation into planning à penalize in cost function 38

risky-successful) Learning slightly slower, but with significantly fewer

39 Collision Avoidance Results Experimental Setup Training runs (during learning) with collisions Cautious learning and exploration (rather safe than risky-successful) Learning slightly slower, but with significantly fewer collisions during training Average collision reduction (during training): 32.5% à 0.5% 39

40 Summary GPs provide flexible modeling framework Take data noise and uncertainty due to data sparsity into account Seamless integranon into Bayes filters CombinaNon with parametric models increases accuracy and reduces amount of training data Subspace idennficanon via extended GPLVMs Data efficient RL ComputaNonal complexity is a key problem Advances in GPs and GPLVMs can be leveraged 40

Gaussian with mean ( µ ) and standard deviation ( σ)

Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (