Optimal filtering and the dual process

Size: px

Start display at page:

Download "Optimal filtering and the dual process"

Augustine Holt
5 years ago
Views:

1 Optimal filtering and the dual process Omiros Papaspiliopoulos ICREA & Universitat Pompeu Fabra Joint work with Matteo Ruggiero Collegio Carlo Alberto & University of Turin

2 The filtering problem X t0 X t1 X t2 Y 0 Y 1 Y 2 Hidden Markov model represented as a graphical model Signal: latent Markov chain X tn X, transition kernel P t(x, dx ) and initial distribution ν Emission densities: Data Y n Y with conditional density f x(y) (also consider non-dominated filtering models) Goal: evaluate the filtering distributions ν n(dx) := L(X tn Y 0,..., Y n) Subcase: if X t0 = X t1 =, it s a classical Bayesian inference problem

3 Filtering recursions Mathematically, it s the solutions to the recursion ν 0 = φ Y0 (ν), ν n = φ Yn (ψ tn t n 1 (ν n)), n > 0, where for probability measure ξ: Update: φ y(ξ)(dx) = fx(y)ξ(dx), p ξ (y) = p ξ (y) conjugate pair (f x, F) : ξ F = φ y(ξ) F f x(y)ξ(dx) X Prediction: ψ t(ξ)(dx ) = ξ(dx)p t(x, dx ) X Note that update and prediction operators satisfy ( n ) n ( w i p ξi (y) φ y w i ξ i = i=1 i=1 j w jp ξj (y) φy(ξ n ) n i), ψ t w i ξ i = w i ψ t(ξ i ) i=1 i=1

4 Finite-dimensional filters A finite-dimensional filter is a solution to the previous recursion s.t. there exists a finite-dimensional family F f of probability measures and ν F f φ y(ν), ψ t(ν) F f so evolution of the mixtures can be described by successive transformations of a finite-dimensional parameter : Nonparametric : any model where X is finite set (Baum filter) Parametric: linear Gaussian system (Kalman filter)

5 Motivating model: Cox-type process Dynamic version of the standard conjugate Bayesian model for count data: Y n X tn Poisson(X tn ) X tn Markov chain with gamma marginal For example X tn = U 2 n U n = a U n 1 + β η n with stationary N ( β 2 ) 0, 1 a 2 when a < 1

6 More generally, we can relate the signal to the Feller/square root/cir process/continuous-state branching process with immigration, e.g Kawazu & Watanabe (Theory Probab. Appl., 1971) dx t = (δσ 2 2γX t)dt + 2σ X tdb t, δ, γ, σ > 0 This process for δ 2 it has a stationary and limiting distribution, which is gamma(δ/2, γ/σ 2 ) For the economy of this talk, we focus on δ 2 (otherwise the boundary behaviour needs to be explicitly specified).

7 Computable filters We will classify a filter as computable when ν n is characterised in terms of a finite number of parameters that can be computed at a cost that grows polynomially with n. Special case are finite-dimensional filters for which the number of parameters does not grow with n, hence the cost of computing them grows linearly with n. We will identify filters that evolve in mixtures of finite-dimensional distributions indexed by (m, θ). We will present the framework and then provide examples of processes/signals that fit in it.

8 Conditions A1 (Reversibility): π(dx)p t(x, dx ) = π(dx )P t(x, dx). Notation: M = Z K + = { m = (m 1,..., m K ) : m j Z +, j = 1,..., K }. 0; e j ; m = i m i; product order on M; m i for i m

9 A 2D example for M (2,2) (2,1) (1,2) (2,0) (1,1) (0,2) (1,0) (0,1) (0,0)

10 A2 (Conjugacy): For Θ R l, l Z +, let h : X M Θ R + be such that sup x h(x, m, θ) < for all m M, θ Θ, and h(x, 0, θ) = 1 for some θ Θ. F = {h(x, m, θ)π(dx), m M, θ Θ} assumed to be a family of probability measures s.t. there exist functions t : Y M M and T : Y Θ Θ with m t(y, m) increasing and such that φ y(h(x, m, θ)π(dx)) = h(x, t(y, m), T (y, θ))π(dx). Note: p h(x,m,θ)π(dx) (y) =: c(m, θ, y) = which does not depend on x. fx(y)h(x, m, θ) h(x, t(y, m), T (y, θ))

11 A3 (Duality): Let r : Θ Θ, λ : Z + R + be increasing, ρ : Θ R + be continuous, and consider a Markov process (M t, Θ t) with state-space M Θ such that: dθ t/dt = r(θ t), Θ 0 = θ 0, when at (M t, Θ t) = (m, θ), the process jumps down to state (m e j, θ) with instantaneous rate λ( m )ρ(θ)m j. and it is dual to X t wrt functions h, i.e., x X, m M, θ Θ, t 0: E x [h(x t, m, θ)] = E (m,θ) [h(x, M t, Θ t)] When K = 0 or l = 0 the dual is just Θ t or M t

12 Remarks Absorption/Ergodicity M t can only jump to smaller states and 0 is absorbing Duality functions Radon-Nikodym derivatives between measures that are conjugate to the emission density Transition probabilities p m,n(t; θ) = P[M t = n M 0 = m, Θ 0 = θ], n, m M, n m. analytic expressions exist (Proposition 2.1)

13 Duality and optimal filtering: propagation Proposition Under A1-A2-A3 ψ t(h(x, m, θ)π(dx)) = p m,m i (t; θ)h(x, m i, Θ t)π(dx), 0 i m Proof. ψ t(h(x, m, θ)π(dx)) = h(x, m, θ)π(dx)p t(x, dx ) = h(x, m, θ)π(dx )P t(x, dx) X X = π(dx )E x [h(x t, m, θ)] = π(dx )E (m,θ) [h(x, M t, Θ t)] = p m,n(t; θ)h(x, n, Θ t)π(dx ) n m

14 Duality and optimal filtering: update Proposition For { F f = wmh(x, m, θ)π(dx) : Λ M, Λ <, m Λ wm 0, } m Λ wm = 1. and under A1-A2-A3, Ff is closed under prediction and update: ( ) φ y wmh(x, m, θ)π(dx) = ŵnh(x, n, T (y, θ))π(dx) m Λ n t(y,λ) with t(y, Λ) := {n : n = t(y, m), m Λ} ŵ n w mc(m, θ, y) for n = t(y, m), n t(y,λ) ŵn = 1, and ( ) ψ t w mh(x, m, θ)π(dx) = m Λ n G(Λ) ( m Λ,m n ) w mp m,n(t; θ) h(x, n, θ t)π(dx).

15 Generators (Care is needed with domains but will skip for economy of time) The generator of a Markov process X t, with semigroup operator P t, is a linear operator A with domain denoted D(A), linked to the semigroup operator via the Kolmogorov backward equation Ptf(x) = (APtf)(x), t f D(A), where on the left hand side P th(x) is differentiated in t for given x, whereas on the right hand side, A acts on P th(x) as a function of x for given t

16 Local duality (Care is needed with domains but will skip for economy of time) If X t solves an SDE on R d dx t = b(x t)dt + σ(x t)db t, then its generator is (Af)(x) = d i=1 b i (x) f(x) x i d i,j=1 a i,j (x) 2 f(x) x i x j, f D(A), for a i,j (x) := (σ(x)σ(x) T ) i,j If (M t, Θ t) is Markov process as in A3 its generator is K l g(m, θ) (Ag)(m, θ) = λ( m )ρ(θ) m i [g(m e i, θ) g(m, θ)]+ r i (θ), g D(A), θ i=1 i=1

17 A4 (Local duality): The function h(x, m, θ) defined in A2 is such that h(x, m, θ), as a function of x belongs to D(A) for all (m, θ) M Θ, as a function of (m, θ) belongs to D(A) for all x X, and x X, m M, θ Θ (Ah(, m, θ))(x) = (Ah(x,, ))(m, θ) (1) This implies the duality in A3. (Proof: resolvents/laplace transforms)

18 I will focus on the properties of the signal and illustrate the dual. Clearly, for computable filtering we need the emission density to be conjugate to the family F. I will present results for: Linear diffusion signals - Gaussian - only deterministic dual CIR signals - gamma - both deterministic and death process dual Due to time constraints I will skip results for: Multi-dimensional Wright-Fisher signals - Dirichlet - only death process dual Fleming Viot - Dirichlet process prior - only death process dual Dawson-Watanabe - gamma process - both deterministic and death process dual

19 Linear diffusion signals SDE dx t = σ2 α (Xt γ)dt + 2σdB t, Invariant measure π(dx) Normal(dx; γ, α). Generator A = (σ 2 γ/α σ 2 x/α) d d2 + σ2 dx dx 2 Duality functions h(x, µ, τ) = ( α ) 1/2 (x µ)2 exp { + τ 2τ } (x γ)2 Normal(dx; µ, τ) = 2α π(dx)

20 Linear diffusion signals Kalman filter - in 1 line A small calculation yields: Ah(, µ, τ)(x) = σ2 (γ µ) α µ h(x, µ, τ) + 2σ2 (1 τ/α) h(x, µ, τ). τ Hence the dual is purely deterministic and described in terms of the ODEs: dµ t/dt = σ2 α (γ µt)dt, dτt/dt = 2σ2 (1 τ t/α)dt.

21 CIR signals Some care is needed with domains SDE, for σ > 0, γ > 0, δ 2 dx t = (δσ 2 2γX t)dt + 2σ X tdb t Invariant measure π(dx) gamma(dx; δ/2, γ/σ 2 ). Generator Duality functions h(x, m, θ) = = A = (δσ 2 2γx) d dx + 2σ2 x d2 dx 2, Γ(δ/2) ( γ ) δ/2 θ δ/2+m Γ(δ/2 + m) σ 2 x m exp{ (θ γ/σ 2 )x} gamma(dx; δ/2 + m, θ) π(dx)

22 CIR signals A direct calculation yields: Ah(, m, θ)(x) = 2mσ 2 θ h(x, m 1, θ) + σ 2 (δ + 2m)(θ γ/σ 2 )h(x, m + 1, θ) σ 2 [2mθ + (δ + 2m)(θ γ/σ 2 )]h(x, m, θ) = 2mσ 2 θ (h(x, m 1, θ) h(x, m, θ)) + σ 2 (θ γ/σ 2 )(2θx δ 2m)h(x, m, θ) = 2mσ 2 θ (h(x, m 1, θ) h(x, m, θ)) + 2σ 2 θ(γ/σ 2 θ) h(x, m, θ) θ Therefore, we consider a two-component process (M t, Θ t) with generator A and λ(m) = 2mσ 2, r(θ) = 2σ 2 θ(γ/σ 2 θ), ρ(θ) = θ. and the above calculation gives Ah(, m, θ)(x) = Ah(x,, )(m, θ).

23 CIR signals Under this setting Θ t = γ θe 2γt σ 2 θe 2γt + γ/σ 2 θ which implies that the transition probabilities for the death process simplify to binomial probabilities p m,m i (t; θ) = Bin ( γ m i ; m, σ 2 (θe2γt + γ/σ 2 θ) 1) which in turns immediately yields the result ψ t(gamma(m + δ/2, θ)) = m ( Bin k ; m, k=0 γ σ 2 (θe2γt + γ/σ 2 θ) 1) gamma ( k + δ/2, γ σ 2 θe 2γt ) θe 2γt + γ/σ 2. θ

24 Discussion/Extensions Complexity Pruning Smoothing/Likelihood/Simulation Approximation Other models

25 Complexity L(D n Y 0,..., Y n 1 ) has support on Λ n {θ n} with Λ n+1 = G(t(Y n, Λ n)) and weights that can be computed recursively. The component probabilities at time n can be computed at a cost that is at most O( Λ n 2 ), but Λ n increases with n. Proposition Under the assumption that t(y, m) = m + N(y), where N : Y M, we have that ( ) n ( ) K Λ n = G m 0 + N(Y i ) 1 + dn K i=1 where d n = m 0 + n i=1 N(Y i). When the observations follow a stationary process, d n will be of order n, thus overall complexity of filtering O(n 2K ), where the constant depends on K but not n

Optimal filtering and the dual process

Optimal filtering and the dual process Omiros Papaspiliopoulos ICREA & Universitat Pompeu Fabra Joint work with Matteo Ruggiero Collegio Carlo Alberto & University of Turin Outline The filtering problem