Latent state estimation using control theory

Size: px

Start display at page:

Download "Latent state estimation using control theory"

Silvia Beasley
5 years ago
Views:

1 Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen

2 Smoothing problem Given observed data y :T, estimate (statistics of) marginal posteriors p(x t y :T ) with p(x :T y :T ) = p(x :T)p(y :T x :T ) p(y :T ) p(y :T ) = x :T p(x :T )p(y :T x :T ) NB is different from filtering problem p(x t y :t ). Bert Kappen

3 Smoothing problem Given observed data y :T, estimate (statistics of) marginal posteriors p(x t y :T ) with p(x :T y :T ) = p(x :T)p(y :T x :T ) p(y :T ) p(y :T ) = x :T p(x :T )p(y :T x :T ) NB is different from filtering problem p(x t y :t ). Common approaches: Kalman Filtering: exact for linear dynamics and Gaussian observations Variational, EP, extended Kalman Filter: valid for simple posterior distributions Particle filtering, SMC: generic, but inefficient Bert Kappen

4 Time series inference Prior process p(x :T x ): dx t = dw t x = Observation at end time only: p(y T x T ) = exp( βx T ) t Red: p(x t y :t, x ) (exact) Blue: p(x t y :t, x ) (PF) Black: p(x t y :T, x ) (exact) Bert Kappen 3

5 Time series inference Prior process p(x :T x ): dx t = dw t x = Observation at end time only: p(y T x T ) = exp( βx T ) t Red: p(x t y :t, x ) (exact) Blue: p(x t y :t, x ) (PF + resampling) Black: p(x t y :T, x ) (exact) Bert Kappen 4

6 Time series inference Prior process p(x :T x ): dx t = dw t x = Observation at end time only: p(y T x T ) = exp( βx T ) t t Samples from filtered distribution do not well represent smoothed distribution. Bert Kappen 5

7 Control approach to time-series inference Outline: Introduction to control theory and path integral control theory Optimal control optimal sampling Adaptive importance sampler: Estimate improved sampler from self-generated data fmri application Bert Kappen 6

8 Control theory.5.5 Consider a stochastic dynamical system dx t = f (X t, u)dt + dw t E(dW t,i dw t. j ) = ν i j dt Given X find control function u(x, t) that minimizes the expected future cost C = E ( φ(x T ) + T ) dtv(x t, u(x t, t)) Bert Kappen 7

Control theory.5.5.5.5 Standard approach: define J(x, t) is optimal cost-to-go from x, t.

9 Control theory Standard approach: define J(x, t) is optimal cost-to-go from x, t. J(x, t) = min u t:t E u (φ(x T ) + T t ) dtv(x t, u(x t, t)) X t = x J satisfies a partial differential equation t J(t, x) = min u ( V(x, u) + f (x, u) x J(x, t) + ) ν xj(x, t) J(x, T) = φ(x) with u = u(x, t).this is HJB equation. Optimal control u (x, t) defines distribution over trajectories p (τ) (= p(τ x, )). Bert Kappen 8

10 Path integral control theory.5.5 dx t = f (X t, t)dt + g(x t, t)(u(x t, t)dt + dw t ) X = x Goal is to find function u(x, t) that minimizes C = E u ( S (τ) + T dt u(x t, t) ) S (τ) = φ(x T ) + T V(X t, t) Bert Kappen 9

11 Path integral control theory Equivalent formulation: Find distribution over trajectories p that minimizes C(p) = dτp(τ) ( S (τ) + log p(τ) ) q(τ) q(τ x, ) is distribution over uncontrolled trajectories. The optimal solution is given by p (τ) = (τ) ψq(τ)e S E u T dt u(x t, t) = dτp(τ) log p(τ) q(τ). Bert Kappen

12 Path integral control theory Equivalent formulation: Find distribution over trajectories p that minimizes C(p) = dτp(τ) ( S (τ) + log p(τ) ) q(τ) q(τ x, ) is distribution over uncontrolled trajectories. The optimal solution is given by p (τ) = ψ q(τ)e S (τ) = p(τ u ). Equivalence of optimal control and discounted cost (Girsanov) Bert Kappen

13 Path integral control theory The optimal control cost is C(p ) = log ψ = J(x, ) with ψ = dτq(τ)e S (τ) = E q e S J(x, t) can be computed by forward sampling from q. Bert Kappen

14 Estimating ψ = Ee S ESS =.8, C= Sample N trajectories from uncontrolled dynamics τ i q(τ) w i = e S (τ i) ˆψ = N i w i ˆψ unbiased estimate of ψ. Sampling efficiency is inversely proportional to variance in (normalized) w i. ES S = N + N Var(w) Bert Kappen 3

15 Importance sampling ESS =.8, C=3.7 ESS = 3.5, C=5. ESS=9.5, C= Sample N trajectories from controlled dynamics and reweight yields unbiased estimate of cost-to-go: τ i p(τ) w i = e S (τ i) q(τ i) p(τ i ) = e S u(τ i ) ˆψ = N i w i S u (τ) = S (τ) + T dt u(x t, t) + T u(x t, t)dw t Bert Kappen 4

16 Importance sampling ESS =.8, C=3.7 ESS = 3.5, C=5. ESS=9.5, C= S u (τ) = S (τ) + T dt u(x t, t) + T u(x t, t)dw t Thm: Better u (in the sense of optimal control) provides a better sampler (in the sense of effective sample size). Optimal u = u (in the sense of optimal control) requires only one sample and S u (τ) deterministic! Thijssen, Kappen 5 Bert Kappen 5

17 Proof Control cost is C(p) = E p ( S (τ) + log p(τ) q(τ)) = ES u Using Jensen s inequality: C = log q(τ)e S (τ) = log τ τ p(τ) S (τ) log p(τ)e q(τ) τ p(τ) ( S (τ) + log p(τ) ) q(τ) Bert Kappen 6

18 ( ) Control cost is C(p) = E p S (τ) + log p(τ) q(τ) Proof Using Jensen s inequality: C = log q(τ)e S (τ) = log τ τ p(τ) S (τ) log p(τ)e q(τ) τ p(τ) ( S (τ) + log p(τ) ) q(τ) = C(p) The inequality is saturated when S (τ) + log p(τ) q(τ) side evaluate to S (τ) + log p(τ) q(τ). has zero variance: left and right This is realized when p = p 3. 3 p exists when τ q(τ)e S (τ) < Bert Kappen 7

19 The Path Integral Cross Entropy (PICE) method We wish to estimate ψ = S (τ) dτq(τ)e The optimal (zero variance) importance sampler is p (τ) = ψ q(τ)e S (τ). We approximate p (τ) with p u (τ), where u(x, t θ) is a parametrized control function. Following the Cross Entropy method, we minimise KL(p p u ). θ KL(p p u ) θ E u e S u T dw t u(x t, t θ) θ u(x, t θ) is arbitrary. Estimate gradient by sampling. Kappen, Ruiz 6 Bert Kappen 8

20 Coordination of UAVs Chao Xu ACC 7 Bert Kappen 9

21 Control Inference Fleming, Mitter Bert Kappen

22 Path integral control theory Equivalent formulation: Find distribution over trajectories p that minimizes C(p) = dτp(τ) ( S (τ) + log p(τ) ) q(τ) q(τ x, ) is distribution over uncontrolled trajectories. The optimal solution is given by p (τ) = ψ q(τ)e S (τ) = p(τ u ). q(τ) is the prior process p(x :T x ) and e S (τ) = t p(y t x t ). Bert Kappen

Control versus particle smoothing Linear stochastic dynamics, Gaussian observations at t =, T. Control method with linear time dependent controller u(x, t) = a(t)x+b(t).

23 Control versus particle smoothing Linear stochastic dynamics, Gaussian observations at t =, T. Control method with linear time dependent controller u(x, t) = a(t)x+b(t). N = particles, 5 IS iterations. ESS from.5% to 98%. FS with N = particles. ESS = %. FFBSi with N = M = forward and backward particles. ESS = 7%. Control error in posterior mean is orders of magnitude smaller than FFBSi error. Bert Kappen

24 Control versus particle smoothing Linear stochastic dynamics, Gaussian observations at t =, T. FFBSi error in posterior mean (average over time) increases with more unlikely observations, while control error does not. Bert Kappen 3

25 Control method scales with many observations Linear stochastic dynamics, Gaussian observations at t =, T. Bert Kappen 4

26 Non linear problem 5 dimensional neural network model with 4 observation of one neuron. Left: Number of iterations needed to reach ESS=. versus number of observations (N = 6). Annealing vital for large number of observations. Right: Number of iterations needed to reach ESS=. versus number of particles (5 observations). Annealing allows significant less number of particles. Bert Kappen 5

27 Non linear problem 5 dimensional neural network model with 4 observation of one neuron. Variance of posterior mean for APIS, FS and FFBSi versus number of observations. N = 6 forward particles. Backward particles such that APIS and FFBSi have similar CPU time. Bert Kappen 6

28 Non linear problem 5 dimensional neural network model with 4 observation of one neuron. ESS for typical run of APIS with annealing. N = 75 Bert Kappen 7

29 Infering neural activity from BOLD fmri Subjects were asked to respond as fast as possible to a visual or auditory stimulus Measure fmri BOLD response in motor cortex Reconstruct neural signal from BOLD signal. Neural activity dz t = AZ t dt + σ z dw t Hemodynamics Vasodilation s(z, s, f ) Blood flow f (s) Balloon model Blood volume v( f, v) Deoxygenated hemoglobin q( f, v, q) BOLD y(v, q) Bert Kappen 8

30 Infering neural activity from BOLD fmri Subjects were asked to respond as fast as possible to a visual or auditory stimulus Measure fmri BOLD response in motor cortex Reconstruct neural signal from BOLD signal. Neural activity dz t = AZ t dt + σ z dw t Hemodynamics Vasodilation s(z, s, f ) Blood flow f (s) Balloon model Blood volume v( f, v) Deoxygenated hemoglobin q( f, v, q) BOLD y(v, q) Bert Kappen 9

Infering neural activity from BOLD fmri Subjects were asked to respond as fast as possible to a visual or auditory stimulus Measure fmri BOLD response in motor cortex Reconstruct neural signal from

31 Infering neural activity from BOLD fmri Subjects were asked to respond as fast as possible to a visual or auditory stimulus Measure fmri BOLD response in motor cortex Reconstruct neural signal from BOLD signal. Neural activity dz t = AZ t dt + σ z dw t Hemodynamics Vasodilation s(z, s, f ) Blood flow f (s) Balloon model Blood volume v( f, v) Deoxygenated hemoglobin q( f, v, q) BOLD y(v, q) Bert Kappen 3

32 Infering neural activity from BOLD fmri u(x, t) = a(t)x + b(t). 5. particles, iterations. Error in stimulus time reconstruction.5.77s. Error using fit of two Gamma functions.8.5s Bert Kappen 3

33 Summary Smoothing problem (Bayesian posterior computation in time series) can be formulated as a stochastic optimal control problem (Fleming 99). Bert Kappen 3

34 Summary Smoothing problem (Bayesian posterior computation in time series) can be formulated as a stochastic optimal control problem (Fleming 99). Better controller = Better sampler. Bert Kappen 33

35 Summary Smoothing problem (Bayesian posterior computation in time series) can be formulated as a stochastic optimal control problem (Fleming 99). Better controller = Better sampler. Parametrized controller found by optimizing Cross Entropy can be (in principle) arbitrary accurate. Bert Kappen 34

36 Summary Smoothing problem (Bayesian posterior computation in time series) can be formulated as a stochastic optimal control problem (Fleming 99). Better controller = Better sampler. Parametrized controller found by optimizing Cross Entropy can be (in principle) arbitrary accurate. Improved performance compared to state of the art SMC. Bert Kappen 35

37 Summary Smoothing problem (Bayesian posterior computation in time series) can be formulated as a stochastic optimal control problem (Fleming 99). Better controller = Better sampler. Parametrized controller found by optimizing Cross Entropy can be (in principle) arbitrary accurate. Improved performance compared to state of the art SMC. Excellent parallelization possible. Bert Kappen 36

38 Thank you! Kappen, Hilbert Johan, and Hans Christian Ruiz. Adaptive importance sampling for control and inference. Journal of Statistical Physics 6.5 (6): Ruiz, Hans-Christian, and Hilbert J. Kappen. Particle Smoothing for Hidden Diffusion Processes: Adaptive Path Integral Smoother. IEEE Transactions on Signal Processing 65. (7): Bert Kappen 37

An efficient approach to stochastic optimal control. Bert Kappen SNN Radboud University Nijmegen the Netherlands

An efficient approach to stochastic optimal control Bert Kappen SNN Radboud University Nijmegen the Netherlands Bert Kappen Examples of control tasks Motor control Bert Kappen Pascal workshop, 27-29 May