Gaussian with mean ( µ ) and standard deviation ( σ)

Size: px

Start display at page:

Download "Gaussian with mean ( µ ) and standard deviation ( σ)"

Shauna Rice
5 years ago
Views:

2 Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics

3 X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b

4 , ~ ) ( ) ( ), ( ~ ), ( ~ σ σ µ σ σ σ µ σ σ σ σ µ σ µ N X p X p N X N X

5 Picture from [Bishop: Pattern Recognition and Machine Learning, 006] p(x) = Ν(µ,Σ) x b x = Σ = x a x b, µ = µ a µ b Σ aa Σ ab Σ ba Σ bb 1 p(x) = (π ) d/ Σ 1 (x µ)t Σ 1 (x µ) e 1/ x a 10/6/16 CSE-571: Robotics 5

6 Slide from Pieter Abbeel " µ = [1; 0] " Σ = [1 0; 0 1] " µ = [-.5; 0] " Σ = [1 0; 0 1] " µ = [-1; -1.5] " Σ = [1 0; 0 1] 10/6/16 CSE-571: Robotics 6

7 Slide from Pieter Abbeel! µ = [0; 0]! Σ = [1 0 ; 0 1] " µ = [0; 0] " Σ = [.6 0 ; 0.6] " µ = [0; 0] " Σ = [ 0 ; 0 ] 10/6/16 CSE-571: Robotics 7

8 Slide from Pieter Abbeel " µ = [0; 0] " Σ = [1 0; 0 1] " µ = [0; 0] " Σ = [1 0.5; 0.5 1] " µ = [0; 0] " Σ = [1 0.8; 0.8 1] 10/6/16 CSE-571: Robotics 8

9 Slide from Pieter Abbeel " µ = [0; 0] " Σ = [1-0.5 ; ] " µ = [0; 0] " Σ = [1-0.8 ; ] " µ = [0; 0] " Σ = [ ; ] 10/6/16 CSE-571: Robotics 9

Pictures from [Bishop: PRML, 006] p Marginalizing joint distribution results in a Gaussian x a x b = Ν µ a µ b, Σ aa Σ ba Σ ab Σ bb p(x a ) = p( x a, x b )dx b ( ) p(x a ) = Ν µ a, Σ aa Conditioning

10 Pictures from [Bishop: PRML, 006] p Marginalizing joint distribution results in a Gaussian x a x b = Ν µ a µ b, Σ aa Σ ba Σ ab Σ bb p(x a ) = p( x a, x b )dx b ( ) p(x a ) = Ν µ a, Σ aa Conditioning also leads to a Gaussian ( ) p(x a x b ) = Ν µ a b, Σ a b µ a b = µ a + Σ ab Σ 1 bb (x b µ b ) Cross co-variance Prior Variance (b) Σ a b = Σ aa Σ ab Σ 1 bb Σ ba Observed value Prior mean (b) Prior Variance (a) Shrink term (>= 0) 10/6/16 CSE-571: Robotics 10

11 10/6/16 CSE-571: Robotics 11

12 Modeling the relationship between real-valued variables in data Sensor models, dynamics models, stock market etc Two broad classes of models: Parametric: Learn a model of the data, use model to make new predictions Eg: Linear, Non-linear, Neural Networks etc. Non-Parametric: Keep the data around and use it to make new predictions Eg: Nearest Neighbor methods, Locally Weighted Regression, Gaussian Processes etc. 10/6/16 CSE-571: Robotics 1

13 Idea: Summarize data using a learned model: Linear, Polynomial Neural Networks etc y 1 0 Parametric models Computationally efficient, tradeoff complexity vs generalization 1 3 Training set Linear Polynomial 4 Polynomial x 10/6/16 CSE-571: Robotics 13

14 Idea: Use nearest neighbor s prediction (with some interpolation) Non-parametric, keeps all data Ex: 1-NN, NN with linear interpolation Easy. Needs lot of data Best you can do in limit of infinite data Computationally expensive in high dimensions Y Non Parametric models Training set 1 NN NN Linear X 10/6/16 CSE-571: Robotics 14

15 Idea: Interpolate based on close training data Closeness defined using a kernel function Test output is a weighted interpolation of training outputs Locally Weighted Regression, Gaussian Processes Can model arbitrary (smooth) functions Need to keep around some (maybe all) training data Y Smooth Non Parametric models Training set LWR NN GP GP Var X 10/6/16 CSE-571: Robotics 15

16 10/6/16 CSE-571: Robotics 16

17 Non-parametric regression model Distribution over functions Fully specified by training data, mean and covariance functions Covariance given by kernel which measures distance of inputs in kernel space 10/6/16 CSE-571: Robotics 17

18 Given, inputs (x) and targets(y): D= {( x, y ),( x, y ),,( x, y )} = ( X, y) 1 1 GPs model the targets as a noisy function of the inputs: y i = f (x i ) +ε; ε ~ N (0,σ n ) Formally, a GP is a collection of random variables, any finite number of which have a joint Gaussian distribution: f (x) ~ GP(m(x), k(x, x')) m(x) = E[ f (x)] k(x, x') = E[( f (x) m(x))( f (x') m(x'))] 10/6/16 CSE-571: Robotics 18 n n

19 Given a (finite) set of inputs (X), GP models the outputs (y) as jointly Gaussian: m(x 1 ) m(x ) m =! K = m(x n ) P( y X ) = N (m(x ), K(X, X ) +σ n I) k(x 1, x 1 ) k(x 1, x n ) k(x, x 1 )! k(x i, x i )! k(x n, x 1 ) k(x n, x n ) Usually, we assume zero-mean prior Noise Can define other mean functions (constant, polynomials etc) 10/6/16 CSE-571: Robotics 19

20 Covariance matrix (K) is defined through the kernel function: Specifies covariance of the outputs as the function of inputs Example: Squared Exponential Kernel Covariance proportional to distance in input space Similar input points will have similar outputs k(x, x ʹ) = σ f e 1 ( x ʹ x )W ( x x ʹ )T 10/6/16 CSE-571: Robotics 0

21 Pictures from [Bishop: PRML, 006] GP prior: Outputs jointly zero-mean Gaussian: P(y X) = Ν(0, K +σ n I) 10/6/16 CSE-571: Robotics 1

22 Training data: 1 1 Test pair (y unknown): {x *, y * } GP outputs are jointly Gaussian: Conditioning on y: ( ) P( y * x *,y,x) = N µ *,σ * µ * = k * T σ * = k ** k * T ( K +σ n I) 1 y ( K +σ n I) 1 k * k * [i] = k(x *,x i ); k ** = k(x *,x * ) D= {( x, y ),( x, y ),,( x, y )} = ( X, y) P(y, y * X, x * ) = N(µ, Σ); P(y X) = N(0, Κ +σ n I) 10/6/16 CSE-571: Robotics n n p(x a x b ) = Ν ( µ a b, Σ ) a b µ a b = µ a + Σ ab Σ 1 bb (x b µ b ) Σ a b = Σ aa Σ ab Σ 1 bb Σ ba Recall conditional

23 10/6/16 CSE-571: Robotics 3

24 Noise Standard deviation ( σ n) Affects how likely a new observation changes predictions (and covariance) Kernel (choose based on data) SE, Exponential, Matern etc. Kernel hyperparameters: SE kernel: k(x, x ʹ) = σ f e 1 ( x ʹ Length scale (how fast the function changes) Scale factor (how large the function variance is) x )W ( x x ʹ )T 10/6/16 CSE-571: Robotics 4

25 Pictures from [Bishop: PRML, 006] k(x, x ) = θ 0 exp θ 1 x x +θ +θ 3 xt x' 10/6/16 CSE-571: Robotics 5

26 Maximize data log likelihood: θ * = arg max θ p(y X,θ) log p(y X,θ) = 1 yt ( K + σ n I) 1 y 1 log K + σ n ( I) n log π Compute derivatives wrt. params n l Optimize using conjugate gradient descent θ = σ,, σ f 10/6/16 CSE-571: Robotics 6

27 10/6/16 CSE-571: Robotics 7

28 Learn hyperparameters via numerical methods Learn noise model at the same time 10/6/16 CSE-571: Robotics 8

29 System: Commercial blimp envelope with custom gondola XScale based computer with Bluetooth connectivity Two main motors with tail motor (3D control) Ground truth obtained via VICON motion capture system 10/6/16 CSE-571: Robotics 9

1-D state=[pos,rot,transvel,rotvel] Describes evolution of state as ODE Forces / torques considered: buoyancy, gravity, drag, thrust 16 parameters are learned

30 1-D state=[pos,rot,transvel,rotvel] Describes evolution of state as ODE Forces / torques considered: buoyancy, gravity, drag, thrust 16 parameters are learned by optimization on ground truth motion capture data = = ) * ( ) * ( ) ( 1 1 ω ω ω ξ ω ξ J Torques J Mv Forces M H v R v p dt d s e b! 10/6/16 CSE-571: Robotics 30

31 c Δs o c 1 Δs 1 s s 3 o 3 s 1 Use ground truth state to extract: Dynamics data D S = [ s1, c1 ], Δs1, [ s, c], Δs Learn model using Gaussian process regression Learn process noise inherent in system 10/6/16 CSE-571: Robotics 31

32 c 1 Δs 1 s1 Combine GP model with parametric model D X = f ([ s 1, c1 ]) [ s1, c1 ], Δs1 f ([ s1, c1 ]) Advantages Captures aspects of system not considered by parametric model Learns noise model in same way as GP-only models Higher accuracy for same amount of training data s 10/6/16 CSE-571: Robotics 3

33 Dynamics model error Propagation method pos(mm) rot(deg) vel(mm/s) rotvel(deg/s) Param GPonly EGP training points, mean error over 900 test points For dynamics model, 0.5 sec predictions 10/6/16 CSE-571: Robotics 33

34 Heteroscedastic (state dependent) noise Non-stationary GPs Coupled outputs Sparse GPs Online: Decide whether or not to accept new point Remove points Optimize small set of points Classification Laplace approximation No closed-form solution, sampling 10/6/16 CSE-571: Robotics 34

parametric models increases accuracy and reduces need for

35 GPs provide ﬂexible modeling framework Take data noise and uncertainty due to data sparsity into account Combination with parametric models increases accuracy and reduces need for training data Computational complexity is a key problem 10/6/16 CSE-571: Robotics 35

36 Website: GP book: GPLVM: GPDM: Bishop book: 10/6/16 CSE-571: Robotics 36

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori