Variantes Algorithmiques et Justifications Théoriques

Size: px
Start display at page:

Download "Variantes Algorithmiques et Justifications Théoriques"

Transcription

1 complément scientifique École Doctorale MATISSE IRISA et INRIA, salle Markov jeudi 26 janvier 2012 Variantes Algorithmiques et Justifications Théoriques François Le Gland INRIA Rennes et IRMAR

2 ...programme of the day #1 more general models, from non linear and non Gaussian systems to hidden Markov models and partially observed Markov chains, so as to handle e.g. regime / mode switching correlation between state noise and observation noise #2 for each of these models (or just for most general model), representation of P[X 0:n dx 0:n Y 0:n ] as a Gibbs Boltzmann distribution, with recursive formulation, and idem for P[X n dx n Y 0:n ] #3 particle approximation (SIS and SIR algorithms) from either representations #4 asymptotic behaviour as sample size goes to infinity #5 numerous algorithmic variants

3 1 (some notations : 1) some notations (continued) if X is a random variable taking values in E, then mapping φ E[φ(X)] or equivalently A P[X A] defines a probability distribution µ on E, denoted as µ(dx) = P[X dx] and such that E[φ(X)] = caracterizes uncertainty about X E φ(x) µ(dx) = µ,φ or P[X A] = µ(a)

4 2 (some notations : 2) transition probability kernel M(x,dx ) on E collection of probability distributions on E indexed by x E acts on functions according to M φ(x) = E M(x,dx ) φ(x ) and acts on probability distributions according to µm(dx ) = µ(dx) M(x,dx ) seen as a mixture distribution caracterized by µm,φ = [ µ(dx) M(x,dx )] φ(x ) = E E E E µ(dx) [ = µ,m φ E M(x,dx ) φ(x )]

5 Non linear and non Gaussian systems, and beyond non linear and non Gaussian systems hidden Markov models partially observed Markov chains likelihood free models

6 3 (non linear and non Gaussian systems : 1) non linear and non Gaussian systems prior model for hidden state taking values in E X k = f k (X k 1,W k ) with W k p W k (dw) initial condition X 0 η 0 (dx) observation taking values in R d with additive noise admitting a density Y k = h k (X k )+V k with V k qk V (v)dv random variables X 0, W 1,,W k, and V 0,V 1,,V k, are mutually independent but non necessarily Gaussian only requirement (to be used later on) : easy to simulate a r.v. according to η 0 (dx) or to p W k (dw) evaluate function q V k (v) for any v Rd

7 4 (non linear and non Gaussian systems : 2) Proposition hidden states {X k } form a Markov chain taking values in E, i.e. P[X k dx X 0:k 1 ] = P[X k dx X k 1 ] characterization in terms of transition kernel P[X k dx X k 1 = x] = Q k (x,dx ) defined (implicitly) by its action on functions Q k φ(x) = E[φ(X k ) X k 1 = x] = E[φ(f k (X k 1,W k )) X k 1 = x] = φ(f k (x,w)) p W k (w)dw R m

8 5 (non linear and non Gaussian systems : 3) Remark easy to simulate next state X k given X k 1 = x, i.e. to simulate a r.v. according to Q k (x,dx ) for a given x E indeed, set X k = f k (x,w k ) where W k is simulated according to p W k (dw) Remark in general, transition kernel Q k (x,dx ) does not admit a density indeed, conditionnally to X k 1 = x, r.v. X k necessarily belongs to subset M(x) = {x R m : there exist w R p such that x = f k (x,w)} if p < m and under some mild regularity assumptions, this subset of R m has zero Lebesgue measure therefore, conditionnally to X k 1 = x, probability distribution Q k (x,dx ) of r.v. X k cannot have a density w.r.t. Lebesgue measure on R m

9 6 (non linear and non Gaussian systems : 4) Remark if f k (x,w) = b k (x)+w and if probability distribution p W k W k admits a density, still denoted p W k (w), i.e. if (dw) of r.v. X k = b k (X k 1 )+W k with W k p W k (w)dw then a more explicit expression is available Q k (x,dx ) = p W k (x b k (x))dx i.e. transition kernel Q k (x,dx ) admits an (easy to evaluate) density indeed, change of variable x = b k (x)+w yields Q k φ(x) = φ(b k (x)+w) p W k (w)dw R m = φ(x ) p W k (x b k (x)))dx R m

10 7 (non linear and non Gaussian systems : 5) Proposition observations {Y k } satisfy memoryless channel assumption, i.e. P[Y 0:n dy 0:n X 0:n ] = n k=0 P[Y k dy k X k ] characterization in terms of emission density define likelihood function P[Y k dy X k = x] = q V k (y h k (x))dy g k (x) = q V k (Y k h k (x)) a quantitative measure of consistency between possible hidden state x E and actual observation Y k Remark easy to evaluate g k (x) for any x E

11 8 (hidden Markov models : 1) hidden Markov models motivating example hybrid continuous / discrete systems X k = f k (s k 1,X k 1,W k ) Y k = h k (X k )+V k where regime / mode sequence {s k } forms a Markov chain with finite state space does not fit into non linear and non Gaussian systems, however hidden states and modes {(X k,s k )} jointly form a Markov chain observations {Y k } satisfy memoryless channel assumption i.e. fits into hidden Markov models Remark easy to simulate next state (X k,s k ) given (X k 1,s k 1 ) = (x,s)

12 9 (hidden Markov models : 2) more generally, hidden states {X k } could form a Markov chain taking values in a quite general space E, e.g. hybrid continuous / discrete differentiable manifold constrained graphical (collection de connected edges) characterization in terms of transition kernel and initial distribution P[X k dx X k 1 = x] = Q k (x,dx ) P[X 0 dx] = η 0 (dx) joint probability distribution of hidden states X 0:n verifies n P[X 0:n dx 0:n ] = η 0 (dx 0 ) Q k (x k 1,dx k ) k=1

13 10 (hidden Markov models : 3) user should respect displacement constraints due to obstacles, as read on map

14 11 (hidden Markov models : 4) simplified model : user walks on a Voronoi graph, displacement constraints due to obstacles are taken automatically into account

15 12 (hidden Markov models : 5) observations {Y k } could verify memoryless channel assumption, i.e. P[Y 0:n dy 0:n X 0:n ] = n k=0 P[Y k dy k X k ] characterization in terms of emission density P[Y k dy X k = x] = g k (x,y)λ F k(dy) where nonnegative measure λ F k (dy) defined on F does not depend on x E define (abuse of notation) likelihood function as g k (x) = g k (x,y k ) a quantitative measure of consistency between x E and observation Y k joint conditional distribution of observations Y 0:n given hidden states X 0:n verifies P[Y 0:n dy 0:n X 0:n = x 0:n ] = n g k (x k,y k ) λ F 0 (dy 0 ) λ F n(dy n ) k=0

16 13 (hidden Markov models : 6) representation as X k 1 X k X k+1 Y k 1 Y k Y k+1 arrows represent dependency between random variables only requirement (to be used later on) : easy to simulate for any x E, a r.v. according to transition kernel Q k (x,dx ) evaluate for any x E, likelihood function g k (x )

17 14 (hidden Markov models : 7) hidden Markov models : importance decomposition motivation : from simulations seen last week, basic paradigm particles move according to prior model, described by its transition kernel new particles are weighted by evaluating likelihood function hopefully, resulting weighted empirical distribution provides reasonable approximation to non tractable Bayesian filter concern / questions : is this safe? could more information be used in mutation step?

18 15 (hidden Markov models : 8) recall indoor navigation example : if user is detected by a beacon with known location a and with finite range R, then necessarily user position is within detection disk centered at a and with radius R in other words, generating particles according to prior model alone could result in (a few, some, many, all) particles outside detection disk, i.e. useless particles, waste why not generate explicitly all new particles within disk, and accomodate for wrong model by changing weights? more generally, why not (and how) use next observation to move particles? ideal situation would be particles move according to posterior model warning

19 16 (hidden Markov models : 9) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample

20 17 (hidden Markov models : 10) 1.4 prior distribution (histogram view) prior Figure 2: Prior density and histogramme associated with generated sample

21 18 (hidden Markov models : 11) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample

22 19 (hidden Markov models : 12) prior distribution, likelihood function and posterior distribution (weighted sample view) prior likelihood posterior Figure 3a: Prior density, likelihood function, posterior density and weighted sample

23 20 (hidden Markov models : 13) prior distribution, likelihood function and posterior distribution (histogram view) prior likelihood posterior Figure 4a: Prior density, likelihood function, posterior density and histogramme associated with weighted sample

24 21 (hidden Markov models : 14) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample

25 22 (hidden Markov models : 15) prior distribution, likelihood function and posterior distribution (weighted sample view) prior likelihood posterior Figure 3b: Prior density, likelihood function, posterior density and weighted sample (more difficult)

26 23 (hidden Markov models : 16) prior distribution, likelihood function and posterior distribution (histogram view) prior likelihood posterior Figure 4b: Prior density, likelihood function, posterior density and histogramme associated with weighted sample (more difficult)

27 24 (hidden Markov models : 17) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample

28 25 (hidden Markov models : 18) prior distribution, likelihood function and posterior distribution (weighted sample view) prior likelihood posterior Figure 3c: Prior density, likelihood function, posterior density and weighted sample (just impossible)

29 26 (hidden Markov models : 19) possible (non unique) decomposition and as product of γ 0 (dx) = g 0 (x) η 0 (dx) = g imp 0 (x)η imp 0 (dx) R k (x,dx ) = Q k (x,dx ) g k (x ) = g imp k (x,x ) Q imp k (x,dx ) a nonnegative weight function g imp 0 (x) or g imp k (x,x ) a probability distribution η imp 0 (dx) or a transition kernel Q imp k (x,dx ) respectively, only requirement about proposed decomposition : easy to simulate a r.v. according to η imp 0 (dx) simulate for any x E, a r.v. according to Q imp k (x,dx ) evaluate for any x,x E, weighting function g imp k (x,x ) attention : evaluating weighting function g imp k (x,x ) requires some knowledge about transition kernels Q imp k (x,dx ) and Q k (x,dx ) (was not required originally)

30 27 (hidden Markov models : 20) popular (optimal) importance decomposition : blind vs. guided mutation and alternatively i.e. P[X k dx,y k dy X k 1 = x] = P[Y k dy X k = x,x k 1 = x] }{{} g k (x,y ) λ k (dy ) P[X k dx,y k dy X k 1 = x] with (abuse of notation) = P[X k dx Y k = y,x k 1 = x] }{{} Q k (x,y,dx ) P[X k dx X k 1 = x] }{{} Q k (x,dx ) P[Y k dy X k 1 = x] }{{} ĝ k (x,y ) λ k (dy ) R k (x,dx ) = g k (x ) Q k (x,dx ) = ĝ k (x) Q k (x,dx ) ĝ k (x) = ĝ k (x,y k ) and Qk (x,dx ) = Q k (x,y k,dx )

31 28 (hidden Markov models : 21) remaining question : how easy is it to simulate for any x E, a r.v. according to Q k (x,dx )? evaluate for any x E, weighting function ĝ k (x)? positive answer in special case : linear observations and additive Gaussian noise X k = f k (X k 1 )+σ k (X k 1 ) W k Y k = H k X k +V k indeed (for simplicity, assume σ k (x) = I) Y k = H k [f k (X k 1 )+W k ]+V k = H k f k (X k 1 )+(H k W k +V k ) conditionally on X k 1 = x, r.v. (X k,y k ) is jointly Gaussian, with mean and covariance matrix f k (x) Q W k Q W k H k and H k f k (x) H k Q W k H k Q W k H k +QV k

32 29 (partially observed Markov chains : 1) partially observed Markov chains motivating example : assume (unsynchronized) sensors take noisy observations of different components of hidden state at different time instants, e.g. X k = (Xk 1,X2 k ) and for simplicity X k = f(x k 1 )+W k h 1 (Xk 1)+V1 k Y k = H 2 X 2 k +V2 k at odd time instants at even time instants observing all components of hidden state is fine, but processing partial observations at each time instant can be risky, since likelihood functions will be flat along some directions : ideally, try to collect and process simultaneously two successive observations, so that likelihood functions are more peaky

33 30 (partially observed Markov chains : 2) down sampling : set X k = X 2k+1 and Ȳ k = Y 2k+1 Y 2k+2 state equation X k = X 2k+1 = f(x 2k )+W 2k+1 = f(f(x 2k 1 )+W 2k )+W 2k+1 i.e. X k = f( X k 1, W k ) with W k = W 2k W 2k+1

34 31 (partially observed Markov chains : 3) observation equation : introducing projections π 1 and π 2 on 1st and 2nd components of state vector, yields i.e. Ȳ k = Y 2k+1 Y 2k+2 = = h1 (X 1 2k+1 )+V1 2k+1 H 2 X 2 2k+2 +V2 2k+2 h 1 (π 1 (X 2k+1 ))+V 1 2k+1 H 2 π 2 (f(x 2k+1 )+W 2k+2 )+V 2 2k+2 Ȳ k = h( X k )+ V k with V k = V 1 2k+1 H 2 π 2 (W 2k+2 )+V 2 2k+2

35 32 (partially observed Markov chains : 4) resulting system X k = f( X k 1, W k ) Ȳ k = h( X k )+ V k with W k = W 2k W 2k+1 and Vk = V 1 2k+1 H 2 π 2 (W 2k+2 )+V 2 2k+2 clearly W k and V k 1 share W 2k in common and are correlated, hence dependent, and memoryless channel assumption cannot hold

36 33 (partially observed Markov chains : 5) trick : decompose W k = M V k 1 + B k where B k and V k 1 are now independent, substitute in state equation and import V k 1 = Ȳk 1 h( X k 1 ) from observation equation, yielding X k = f( X k 1,M (Ȳk 1 h( X k 1 ))+ B k ) Ȳ k = h( X k )+ V k does not fit into hidden Markov model, hidden state alone does not form a Markov chain however, hidden states and observations {( X k,ȳk)} jointly form a Markov chain, the second component of which only is observed

37 34 (partially observed Markov chains : 6) even more generally, with previous motivating example in mind, hidden states and observations {(X k,y k )} could jointly form a Markov chain taking values in product space E F characterization in terms of transition kernel P[X k dx,y k dy X k 1 = x,y k 1 = y] = R k (x,y,y,dx ) λ F k(y,dy ) and initial distribution P[X 0 dx,y 0 dy] = γ 0 (y,dx) λ F 0 (dy) attention : hidden states {X k } alone need not form a Markov chain joint probability distribution of hidden states and observations (X 0:n,Y 0:n ) P[X 0:n dx 0:n,Y 0:n dy 0:n ] = γ 0 (y 0,dx 0 ) n R k (x k 1,y k 1,y k,dx k ) λ F 0 (dy 0 ) λ F k(y k 1,dy k ) k=1

38 35 (partially observed Markov chains : 7) required (non unique) decomposition partially observed Markov chains : importance decomposition γ 0 (dx) = g imp 0 (x)η imp 0 (dx) and as product of R k (x,dx ) = g imp k (x,x ) Q imp k (x,dx ) a nonnegative weight function g imp 0 (x) or g imp k (x,x ) a probability distribution η imp 0 (dx) or a transition kernel Q imp k (x,dx ) respectively, only requirement about proposed decomposition : easy to simulate a r.v. according to η imp 0 (dx) simulate for any x E, a r.v. according to Q imp k (x,dx ) evaluate for any x,x E, weighting function g imp k (x,x )

39 36 (likelihood free models : 1) likelihood free models so far, at least implicitly, additive observation noise has been assumed Y k = h(x k )+V k with V k qk V (v) dv with known and explicit form for probability density qk V(v) this was key assumption in deriving expression of density emission P[Y k dy X k = x] = g k (x,y) λ k (dy) hence explicit expression of likelihood function questions : could anything be said in more general cases where no explicit expression is available for a density, or it does not even exist non additive observation noise, with dimension smaller than observation, i.e. Y k = h(x k,v k ) perfect observations, i.e. observation noise is simply not present Y k = h(x k )

40 37 (likelihood free models : 2) trick, a form of ABC (approximate Bayesian computation) : pretend that observations are produced by slightly perturbed but regular model, i.e. or or Y k = h(x k )+V k +εu k Y k = h(x k,v k )+εu k Y k = h(x k )+εu k depending on the case under consideration, with U k q U k (u)du and set (X k,v k ) as new hidden state new requirement : easy to simulate (X k,v k ) jointly evaluate density q U k (u)

41 Bayesian filter hidden Markov models representation as Gibbs Boltzmann distribution recursive formulation partially observed Markov chains + given importance decomposition representation as Gibbs Boltzmann distribution

42 38 (Bayesian filter : hidden Markov models : representation : 1) Bayesian filter : hidden Markov models : representation Theorem joint conditional distribution of hidden state sequence X 0:n given observations Y 0:n as a Gibbs Boltzmann distribution P[X 0:n dx 0:n Y 0:n ] n k=0 g k (x k ) }{{} g 0:n (x 0:n ) η 0 (dx 0 ) n k=1 Q k (x k 1,dx k ) } {{ } η 0:n (dx 0:n ) with likelihood functions defined (abuse of notation) as g k (x) = g k (x,y k ) and with joint probability distribution of hidden state sequence X 0:n η 0:n (dx 0:n ) = P[X 0:n dx 0:n ] = η 0 (dx 0 ) n Q k (x k 1,dx k ) k=1

43 39 (Bayesian filter : hidden Markov models : representation : 2) general principle : p X Y=y (x) = p X,Y(x,y) p Y (y) = p X Y (x) p X,Y (x,y) Proof Bayes rule + Markov property + memoryless channel assumption, yield joint probability distribution of hidden states and observations (X 0:n,Y 0:n ) hence P[X 0:n dx 0:n,Y 0:n dy 0:n ] = P[Y 0:n dy 0:n X 0:n = x 0:n ] P[X 0:n dx 0:n ] = η 0 (dx 0 ) n k=1 Q k (x k 1,dx k ) P[X 0:n dx 0:n Y 0:n ] η 0 (dx 0 ) n k=1 n k=0 g k (x k,y k ) λ F 0 (dy 0 ) λ F n(dy n ) Q k (x k 1,dx k ) n k=0 g k (x k )

44 40 (Bayesian filter : hidden Markov models : representation : 3) Remark for any function f depending on whole trajectory E[f(X 0:n ) Y 0:n ] f(x 0:n ) g 0:n (x 0:n ) η 0:n (dx 0:n ) E E E[f(X 0:n ) n k=0 g k (X k )] expectation w.r.t. hidden state sequence X 0:n, while observations Y 0:n are fixed implicit parameters in likelihood functions : recall (abuse of notation) g k (x) = g k (x,y k ) if f = φ π depends only upon last state, then µ n,φ = E[φ(X n ) Y 0:n ] E[φ(X n ) n k=0 g k (X k )] = γ n,φ which defines unnormalized distribution γ n (dx) implicitly, through its action on arbitrary functions

45 41 (Bayesian filter : hidden Markov models : representation : 4) for a given importance decomposition P[X 0:n dx 0:n Y 0:n ] η 0 (dx 0 ) n k=1 Q k (x k 1,dx k ) n k=0 g k (x k ) η imp 0 (dx 0 ) n k=1 Q imp k (x k 1,dx k ) } {{ } η imp 0:n (dx 0:n) n k=0 g imp k (x k ) } {{ } g imp 0:n (x 0:n)

46 42 (Bayesian filter : hidden Markov models : recursive formulation : 1) Bayesian filter : hidden Markov models : recursive formulation Theorem Bayesian filter µ k (dx) = P[X k dx Y 0:k ] satisfies µ k 1 prediction η k = µ k 1 Q k with initial condition η 0 (dx) = P[X 0 dx] correction µ k = g k η k Remark in Theorem statement µ k 1 Q k (dx ) = E µ k 1 (dx)q k (x,dx ) denotes mixture distribution resulting from transition kernel Q k (x,dx ) acting on probability distribution µ k 1 (dx), and g k η k = g k η k η k,g k denotes (projective) product of prior probability distribution η k (dx ) with likelihood function g k (x )

47 43 (Bayesian filter : hidden Markov models : recursive formulation : 2) Proof recall representation for joint conditional probability distribution of hidden state sequence X 0:k given observations Y 0:k P[X 0:k dx 0:k Y 0:k ] η 0 (dx 0 ) k p=1 Q p (x p 1,dx p ) k p=0 g p (x p ) g k (x k ) Q k (x k 1,dx k ) P[X 0:k 1 dx 0:k 1 Y 0:k 1 ] integration w.r.t. variables x 0:k 1 (and in RHS, first w.r.t. variables x 0:k 2 and next w.r.t. variable x k 1 ), provides conditional distribution of current hidden state X k given observations Y 0:k, i.e. Bayesian filter, as µ k (dx k ) = P[X k dx k Y 0:k ] g k (x k ) Q k (x k 1,dx k ) P[X k 1 dx k 1 Y 0:k 1 ] g k (x k ) E µ k 1 (dx k 1 ) Q k (x k 1,dx k ) E } {{ } η k (dx k )

48 44 (Bayesian filter : hidden Markov models : recursive formulation : 3) Remark unnormalized version satisfies recurrent relation γ k (dx ) = g k (x ) γ k 1 (dx) Q k (x,dx ) and µ k = γ k γ k,1 or equivalently E γ k (dx ) = E γ k 1 (dx) R k (x,dx ) introducing nonnegative kernel R k (x,dx ) = Q k (x,dx ) g k (x )

49 45 (Bayesian filter : partially observed Markov chains : representation : 1) Bayesian filter : partially observed Markov chains : representation Theorem joint conditional distribution of hidden state sequence X 0:n given observations Y 0:n P[X 0:n dx 0:n Y 0:n ] γ 0 (dx 0 ) n k=1 R k (x k 1,dx k ) with nonnegative distribution defined (abuse of notation) as γ 0 (dx) = γ 0 (Y 0,dx) and with nonnegative kernel defined (abuse of notation) as R k (x k 1,dx k ) = R k (x k 1,Y k 1,Y k,dx k )

50 46 (Bayesian filter : partially observed Markov chains : representation : 2) general principle : p X Y=y (x) = p X,Y(x,y) p Y (y) = p X Y (x) p X,Y (x,y) Proof by definition joint probability distribution of hidden states and observations (X 0:n,Y 0:n ) P[X 0:n dx 0:n,Y 0:n dy 0:n ] = γ 0 (y 0,dx 0 ) n k=1 R k (x k 1,y k 1,y k,dx k ) λ F 0 (dy 0 ) λ F k(y k 1,dy k ) hence P[X 0:n dx 0:n Y 0:n ] γ 0 (dx 0 ) n R k (x k 1,dx k ) k=1

51 47 (Bayesian filter : partially observed Markov chains : representation : 3) for a given importance decomposition P[X 0:n dx 0:n Y 0:n ] γ 0 (dx 0 ) n k=1 R k (x k 1,dx k ) η imp 0 (dx 0 ) n k=1 Q imp k (x k 1,dx k ) } {{ } η imp 0:n (dx 0:n) n k=0 g imp k (x k ) } {{ } g imp 0:n (x 0:n)

52 48 (Bayesian filter : partially observed Markov chains : recursive formulation : 1) Bayesian filter : partially observed Markov chains : recursive formulation Theorem Bayesian filter µ k (dx) = P[X k dx Y 0:k ] satisfies µ k (dx ) µ k 1 (dx) R k (x,dx ) with initial condition µ 0 (dx) γ 0 (dx) E Remark unnormalized version satisfies recurrent relation γ k (dx ) = γ k 1 (dx) R k (x,dx ) and µ k = γ k γ k,1 E

53 49 (Bayesian filter : partially observed Markov chains : recursive formulation : 2) Proof recall representation for joint conditional probability distribution of hidden state sequence X 0:k given observations Y 0:k P[X 0:k dx 0:k Y 0:k ] γ 0 (dx 0 ) k p=1 R p (x p 1,dx p ) R k (x k 1,dx k ) P[X 0:k 1 dx 0:k 1 Y 0:k 1 ] integration w.r.t. variables x 0:k 1 (and in RHS, first w.r.t. variables x 0:k 2 and next w.r.t. variable x k 1 ), provides conditional distribution of current hidden state X k given observations Y 0:k, i.e. Bayesian filter, as µ k (dx k ) = P[X k dx k Y 0:k ] R k (x k 1,dx k ) P[X k 1 dx k 1 Y 0:k 1 ] E E µ k 1 (dx k 1 ) R k (x k 1,dx k )

54 Monte Carlo approximation : particle filters Monte Carlo methods : importance sampling importance sampling SIS algorithm derived from Bayesian filter representation recursive formulation redistribution SIR algorithm, adaptive redistribution derived directly from Bayesian filter recursive formulation estimation error, CLT

55 50 (Monte Carlo methods : importance sampling : 1) Monte Carlo methods if computing an integral (or a mathematical expectation) µ,φ = φ(x) µ(dx) = E[φ(X)] with X µ(dx) E is difficult, but simulating a r.v. according to distribution µ is easy, then introduce empirical probability distribution S N (µ) = 1 N where (ξ 1,,ξ N ) is an N sample distributed according to µ, and approximation by law of large numbers i=1 µ,φ S N (µ),φ = 1 N δ ξi φ(ξ i ) i=1 S N (µ),φ µ,φ in probability as N, with speed 1/ N

56 51 (Monte Carlo methods : importance sampling : 2) indeed S N (µ) µ,φ = 1 N hence (non asymptotical) mean square error (φ(ξ i ) µ,φ ) i=1 since E S N (µ) µ,φ 2 = 1 N var(φ,µ) 1 N 2 i,j=1 E[(φ(ξ i ) µ,φ ) (φ(ξ j ) µ,φ )] = 1 N 2 i=1 E φ(ξ i ) µ,φ 2 }{{} var(φ, µ) and central limit theorem holds N S N (µ) µ,φ = 1 N (φ(ξ i ) µ,φ ) = N(0,var(φ,µ)) N in distribution as N i=1

57 52 (Monte Carlo methods : importance sampling : 3) important special case : Gibbs Boltzmann distribution µ = g η = gη η,g i.e. µ,φ = η,gφ η,g with (non unique) decomposition in terms of a probability distribution η a nonnegative function g introduce unnormalized distribution defined by γ,φ = η,gφ = E[g(Ξ)φ(Ξ)] hence µ,φ = η,gφ η,g where r.v. Ξ is distributed according to η motivation : Bayes rule = γ,φ γ,1 ( ) posterior distribution likelihood function prior distribution

58 53 (Monte Carlo methods : importance sampling : 4) if simulating a r.v. according to µ is difficult, but simulating a r.v. according to η and evaluating nonnegative function g(x) for any x is easy, then it is possible to approximate µ by a weighted empirical probability distribution associated with a sample distributed according to η and weighted with nonnegative function g(x) even though normalizing constant η, g might be unknown

59 54 (Monte Carlo methods : importance sampling : 5) importance sampling idea : approximate numerator and denominator in ( ) with a unique sample distributed according to η : introduce approximation γ,φ = η,gφ S N (η),gφ = 1 N g(ξ i )φ(ξ i ) i=1 hence µ,φ = g η,φ g S N (η),φ = g(ξ i )φ(ξ i ) i=1 g(ξ i ) where (ξ 1,,ξ N ) is an N sample with common probability distribution η i=1

60 55 (Monte Carlo methods : importance sampling : 6) in other words and γ γ N = gs N (η) = 1 N µ µ N = g S N (η) = i=1 g(ξ i )δ ξ i i=1 g(ξ i ) g(ξ j ) j=1 δ ξ i = w i δ ξ i where nonnegative normalized weights (w 1,,w N ) are defined for any i = 1 N by w i = g(ξi ) g(ξ j ) j=1 i=1

61 56 (importance sampling SIS algorithm : 1) importance sampling SIS algorithm recall Bayesian filter representation as a Gibbs Boltzmann distribution µ 0:n = g 0:n η 0:n = g 0:nη 0:n η 0:n,g 0:n with g 0:n (x 0:n ) = i.e. µ 0:n,f = η 0:n,g 0:n f η 0:n,g 0:n n k=0 g k (x k 1,x k ) and with joint probability distribution of hidden states X 0:n n η 0:n (dx 0:n ) = P[X 0:n dx 0:n ] = η 0 (dx 0 ) Q k (x k 1,dx k ) unnormalized version defined as k=1 γ 0:n,f = η 0:n,g 0:n f = E[g 0:n (X 0:n )f(x 0:n )] = γ 0:n,f γ 0:n,1 and if f = φ π depends only upon last state and not on whole trajectory, then n γ 0:n,φ π = E[g 0:n (X 0:n ) φ π(x 0:n )] = E[φ(X n ) g k (X k 1,X k )] = γ n,φ k=0

62 57 (importance sampling SIS algorithm : 2) importance sampling : approximation γ 0:n,f = η 0:n,g 0:n f S N (η 0:n ),g 0:n f = 1 N g 0:n (ξ0:n)f(ξ i 0:n) i i=1 and µ 0:n,f = g 0:n η 0:n,f g 0:n S N (η 0:n ),f = g 0:n (ξ0:n)f(ξ i 0:n) i i=1 g 0:n (ξ0:n) i i=1 for any function f depending on whole trajectory, where (ξ 1 0:n,,ξ N 0:n) is an N sample with common probability distribution η 0:n

63 58 (importance sampling SIS algorithm : 3) in particular if f = φ π depends only upon last state and not on whole sequence, then γ n,φ = γ 0:n,φ π 1 g 0:n (ξ i N 0:n)φ(ξn) i and i=1 g 0:n (ξ0:n)φ(ξ i n) i µ n,φ = µ 0:n,φ π i=1 g 0:n (ξ0:n) i for any function φ, where (ξ 1 0:n,,ξ N 0:n) is an N sample with common probability distribution η 0:n, and for i = 1 N ξ i n = π(ξ i 0:n) denotes last state of sequence ξ i 0:n = (ξ i 0,,ξ i n) i=1

64 59 (importance sampling SIS algorithm : 4) in other words and µ n µ N n = γ n γ N n = 1 N i=1 g 0:n (ξ0:n) i δ ξ i n i=1 g 0:n (ξ0:n) i δ ξ i = n g 0:n (ξ j 0:n ) j=1 wn i δ ξ i n where nonnegative normalized weights (wn, 1,wn N ) are defined for any i = 1 N by wn i = g 0:n(ξ0:n) i g 0:n (ξ j 0:n ) j=1 i=1

65 60 (importance sampling SIS algorithm : 5) SIS algorithm importance sampling approximation : non recursive depth first implementation simulate an N sample of hidden state sequences (ξ 1 0:n,,ξ N 0:n) : independently for any i = 1 N, simulate a sequence ξ i 0:n = (ξ i 0,,ξ i n), i.e. simulate a r.v. ξ i 0 according to η 0 (dx) for any k = 1 n simulate a r.v. ξ i k according to Q k(ξ i k 1,dx ) and define for any i = 1 N g 0:n (ξ i 0:n) = n k=0 g k (ξk 1,ξ i k) i and wn i = g 0:n(ξ0:n) i g 0:n (ξ0:n) i j=1

66 61 (importance sampling SIS algorithm : 6) importance sampling approximation : non recursive implementation for nonlinear and non Gaussian systems simulate an N sample of hidden state sequences (ξ 1 0:n,,ξ N 0:n) : independently for any i = 1 N, simulate a sequence ξ i 0:n = (ξ i 0,,ξ i n), i.e. simulate a r.v. ξ i 0 according to η 0 (dx) for any k = 1 n simulate a r.v. W i k according to pw k (dw) and set ξi k = f k(ξ i k 1,Wi k ) and define for any i = 1 N g 0:n (ξ i 0:n) = n k=0 qk V (Y k h k (ξk)) i and wn i = g 0:n(ξ0:n) i g 0:n (ξ0:n) i j=1

67 62 (importance sampling SIS algorithm : 7) recursive formulation of weights updating for any k = 1 n and for any i = 1 N wk i = g 0:k(ξ0:k) i = g 0:k (ξ j 0:k ) j=1 g 0:k 1 (ξ0:k 1) i g k (ξk 1,ξ i k) i = g 0:k 1 (ξ j 0:k 1 ) g k(ξ j k 1,ξj k ) j=1 wk 1 i g k (ξk 1,ξ i k) i w j k 1 g k(ξ j k 1,ξj k ) j=1 benefit : allows breadth first implementation

68 63 (importance sampling SIS algorithm : 8) SIS algorithm (sequential importance sampling) : recursive implementation for k = 0, independently for any i = 1 N simulate a r.v. ξ 0 i according to η 0(dx), and define w0 i = g 0(ξ0) i g 0 (ξ j 0 ) j=1 for any k = 1 n, independently for any i = 1 N simulate a r.v. ξ i k according to Q k(ξ i k 1,dx ), and update weight as wk i = wi k 1 g k(ξk 1 i,ξi k ) w j k 1 g k(ξ j k 1,ξj k ) j=1

69 64 (importance sampling SIS algorithm : 9) SIS algorithm (sequential importance sampling) : recursive implementation for nonlinear and non Gaussian systems for k = 0, independently for any i = 1 N simulate a r.v. ξ 0 i according to η 0(dx), and define w0 i = qv 0 (Y 0 h 0 (ξ0)) i q0 V (Y 0 h 0 (ξ j 0 )) j=1 for any k = 1 n, independently for any i = 1 N simulate a r.v. Wk i according to pw k (dw) and set ξi k = f k(ξk 1 i,wi k ), and update weight as wk i = wi k 1 qv k (Y k h k (ξk i)) w j k 1 qv k (Y k h k (ξ j k )) j=1

70 65 (importance sampling SIS algorithm : 10) pros : higher weights are allocated to simulated sequences that are often consistent with observations cons : weights are evaluated afterwards, and do not have impact on how sequences are simulated (blind simulation strategy) + along a given sequence, weights are accumulated in a multiplicative way weights degeneracy : in practice, one single sequence receives a much larger weight than all other sequences, whose contributions are therefore negligible memory effect : a sequence cannot be consistent with all observations a sequence that is consistent (resp. inconsistent) with current observation, but inconsistent (resp. consistent) with earlier observations, will receive a small (resp. a large) weight proposed solutions use observations to guide how sequences are simulated from time to time, replicate / terminate sequences according to their respective weights

71 66 (SIR algorithm : 1) approximate Bayesian filter using recursive formulation µ n (dx) = P[X n dx Y 0:n ] µ k 1 prediction η k = µ k 1 Q k vith initial condition µ 0 = g 0 η 0 correction µ k = g k η k idea : look for approximations in the form of (possibly weighted) empirical probability distributions η k η N k = vk i δ ξ i et µ k µ N k = k i=1 associated with population of N particles characterized by positions (ξ 1 k,,ξn k ) in E wk i δ ξ i k nonnegative normalized weights (v 1 k,,vn k ) and (w1 k,,wn k ) i=1 SIR algorithm

72 67 (SIR algorithm : 2) initial approximation : using importance sampling µ 0 = g 0 η 0 g 0 S N (η 0 ) = i=1 g 0 (ξ0) i δ ξ i = 0 g 0 (ξ j 0 ) w0 i δ ξ i 0 i=1 j=1 where variables (ξ 1 0,,ξ N 0 ) are i.i.d. with common probability distribution η 0 correction step : clearly, from definition µ N k = g k η N k = i=1 vk i g k(ξk i) δ ξ i = k v j k g k(ξ j k ) wk i δ ξ i k i=1 j=1 which automatically has desired form

73 68 (SIR algorithm : 3) prediction step : from definition µ N k 1Q k,φ = µ N k 1(dx) Q k (x,dx )φ(x ) for any function φ, hence = = i=1 w i k 1 Q k (ξk 1,dx i )φ(x ) [ wk 1 i Q k (ξk 1,dx i )]φ(x ) i=1 in form of a finite mixture, with µ N k 1Q k = wk 1 i m i k i=1 m i k(dx ) = Q k (ξ i k 1,dx ) for any i = 1 N requires further approximation (several sampling schemes available)

74 69 (SIR algorithm : 4) multinomial resampling simulate an N sample (ξ 1 k,,ξn k ) according to µn k 1 Q k, and set µ N k 1Q k η N k = S N (µ N k 1Q k ) = 1 N δ ξ i = k i=1 vk i δ ξ i k i=1 with v i k = 1/N for any i = 1 N weights are used to select (without replacement) mixture components with higher weights, with expected consequence that components with higher weights are selected several times conversely, components with lower weights are possibly discarded and will not further contribute to approximation if R i denotes how many times i th mixture component has been selected, or equivalently how many samples in new approximation originate from i th mixture component, for any i = 1 N, then r.v. (R 1,,R N ) has a multinomial distribution

75 70 (SIR algorithm : 5) intuitively, if all mixture weights are equal (or close) to 1/N, i.e. if distribution of mixture weights is close to equidistribution, then selecting mixture components could be counter productive weigths preservation simulate one individual exactly from each mixture component and preserve its weight, i.e. independently for any i = 1 N simulate ξk i according to m i k (dx ) = Q k (ξk 1 i,dx ) and set µ N k 1Q k η N k = wk 1 i δ ξ i = k i=1 vk i δ ξ i k i=1 with v i k = wi k 1 for any i = 1 N intuitively, this approach is appropriate if distribution of mixture weights is close to equidistribution, and less appropriate in extreme case where most weights are zero, except a few components with positive weights

76 71 (SIR algorithm : 6) SIR algorithm (sampling with importance resampling) : recursive implementation for k = 0, independently for any i = 1 N simulate a r.v. ξ i 0 according to η 0 (dx), and define w0 i = g 0(ξ0) i g 0 (ξ j 0 ) j=1 for any k = 1 n, independently for any i = 1 N select an individual ξ i k 1 among population (ξ1 k 1,,ξN k 1 ) and according to weights (w 1 k 1,,wN k 1 ) simulate a r.v. ξ i k according to Q k( ξ i k 1,dx ) and define wk i = g k(ξk 1 i,ξi k ) g k (ξ j k 1,ξj k ) j=1

77 72 (SIR algorithm : 7) SIR algorithm (sampling with importance resampling) : recursive formulation for nonlinear and non Gaussian systems for k = 0, independently for any i = 1 N simulate a r.v. ξ i 0 according to η 0 (dx), and define w0 i = qv 0 (Y 0 h 0 (ξ0)) i q0 V (Y 0 h 0 (ξ j 0 )) j=1 for any k = 1 n, independently for any i = 1 N select and individual ξ i k 1 among population (ξ1 k 1,,ξN k 1 ) and according to weights (w 1 k 1,,wN k 1 ) simulate a r.v. W i k according to pw k (dw) and set ξi k = f k( ξ i k 1,Wi k ) and define wk i = qv k (Y k h k (ξk i)) qk V (Y k h k (ξ j k )) j=1

78 73 (SIR algorithm : 8) to summarize, particles (ξ 1 k 1,,ξN k 1 ) are selected according to their respective weights (w 1 k 1,,wN k 1 ) [selection step] evolve according to transition probabilities Q k (x,dx ) [mutation step] and are weighted by evaluating likelihood function g k [weighting step] pros : weights do not accumulate along each sequence, but are used to select (or resample) particles particles with larger (resp. smaller) weights are replicated (resp. are terminated) by keeping only most probable particles at each time instant, expected benefit is to concentrate available computing power within regions of interest

79 74 (SIR algorithm : 9) cons : introduces additional randomness, in resampling (selection) step proposed solutions alternate resampling strategies, that allocate an (almost) deterministic number of offsprings to each selected particle adaptive resampling, only when weights (wk 1,,wN k unbalanced (far from equidistribution) ) are too much cons : because of replication, fewer truly distinct positions are available (sample impoverishment) positions degeneracy : in practice, implicitly rely on mutation step to bring diversity again proposed solution after resampling (selection) step, add some random move to each selected particle, or apply some artificial Markovian dynamics (Metropolis Hastings, Gibbs sampling, etc.)

80 75 (particle filtering : adaptive sampling / resampling : 1) given a finite mixture m = w i m i i=1 adaptive SIR algorithm selecting mixture components is interesting only if weights (w 1,,w N ) are far from equidistribution several heuristic criteria have been proposed to quantify departure from equidistribution, and to decide wether particles should be resampled or not, e.g. effective sample size entropy

81 76 (particle filtering : adaptive sampling / resampling : 2) χ 2 distance and effective sample size χ 2 distance between two probability vectors p = (p 1,,p N ) and q = (q 1,,q N ) is defined as χ 2 (p,q) = i=1 q i ( p i q i 1) 2 in particular for p = w = (w 1,,w N ) and q = (1/N,,1/N), it holds hence 0 1 N (N w i 1) 2 = 1 N i=1 (N w i ) 2 1 = N i=1 1 N eff = 1 / [ wi 2 ] N i=1 wi 2 1 where equality is attained at equidistribution, which suggests to resample if H(w 1,,w N ) = N i=1 for some threshold H red > 0 still to be fixed i=1 w 2 i 1 = N N eff 1 H red

82 77 (estimation error, CLT : 1) on the way to asymptotic results (in 3 slides) recall linear evolution for unnormalized version of Bayesian filter γ k = γ k 1 R k = g k (γ k 1 Q k ) = g k (µ k 1 Q k ) γ k 1,1 = g k η k γ k 1,1 with initial condition γ 0 = g 0 η 0 proposed particle approximation for unnormalized distribution γk N = g k ηk N γk 1,1 N with initial condition γ0 N = g 0 η0 N and η0 N = S N (η 0 ) : clearly γk N,1 = ηk N,g k γk 1,1 N and γ0 N,1 = η0 N,g 0 and it follows γ N k γ N k,1 = g k η N k = µ N k and γ N 0 γ N 0,1 = g 0 η N 0 = µ N 0 normalized version of proposed particle approximation for γk N SIR bootstrap approximation µ N k for Bayesian filter coincides with

83 78 (estimation error, CLT : 2) Remark key for induction : for any k = 1 n and by difference hence γ N k γ k = g k η N k γ N k 1,1 g k (γ k 1 Q k ) = g k (γ N k 1Q k γ k 1 Q k )+g k (η N k µ N k 1Q k ) γ N k 1,1 γ N k γ k,φ = γ N k 1 γ k 1,Q k (g k φ) + η N k µ N k 1Q k,g k φ γ N k 1,1 error at current generation, evaluated on function φ, is decomposed into error at previous generation, evaluated on function R k φ = Q k (g k φ) local error resulting from Monte Carlo approximation even though samples are actually dependent, because of resampling at each generation, conditionally on previous generations, new samples are generated independently

84 79 (estimation error, CLT : 3) with this conditioning argument, error estimates sup E γn k γ k,φ φ: φ =1 γ k,1 c k N and sup E µ N k µ k,φ 2 c k φ: φ =1 N of order 1/ N, and CLT N γ N k γ k,φ γ k,1 = N(0,V k (φ)) and N µ N k µ k,φ = N(0,v k (φ)) with v k (φ) = V k (φ µ k,φ ) can be obtained by induction

85 Some algorithmic variants regularization progressive weighting, MCMC iterations sample size adaptation marginalization aka Rao Blackwellization interacting Kalman filters interacting finite state (Baum) filters

86 80 (marginalization aka Rao Blackwellization : 1) conditionning as a variance reduction technique if E[f(X 1,X 2 )] = E[E[f(X 1,X 2 ) X 2 ]] = E[F(X 2 )] F(x 2 ) = E[f(X 1,X 2 ) X 2 = x 2 ] = has an explicit expression, then Monte Carlo estimator 1 N E 1 f(x 1,x 2 ) P[X 1 dx 1 X 2 = x 2 ] F(Xi) 2 E[F(X 2 )] = E[f(X 1,X 2 )] i=1 where (X1, 2,XN 2 ) is an N sample with same common distribution as X2, has smaller variance than Monte Carlo estimator 1 f(x 1 N i,xi) 2 E[f(X 1,X 2 )] i=1 where ((X1,X 1 1), 2,(XN 1,X2 N ) is an N sample with same common distribution as (X 1,X 2 )

87 81 (marginalization aka Rao Blackwellization : 2) 1st example : conditionnally linear Gaussian systems X L k = F L k (X NL k 1) X L k 1 +f L k(x NL k 1)+W L k X NL k = Fk NL (Xk 1) NL Xk 1 L +fk NL (Xk 1)+W NL k NL Y k = h k (X NL k )+V k clearly E[φ(X L n,x NL n ) n k=0 g k (X NL k )] = E[E[φ(X L n,x NL n ) X NL 0:n] n k=0 g k (X NL k )] and conditional distribution of said linear component Xn L given said nonlinear component sequence X0:n, NL L NL is Gaussian, with mean X k and covariance matrix given explicitly, in recursive form, by Kalman filter equation P L NL k introduce new hidden state {(X NL k, X L NL k,p L NL k )} instead of {(Xk L,XNL k )} benefit : explore with particles subspace associated with nonlinear components, and associated with each particle, a Kalman filter estimates linear components

88 82 (marginalization aka Rao Blackwellization : 3) 2nd example : non linear systems with Markovian switching regimes / modes X k = f k (s k 1,X k 1,W k ) Y k = h k (X k )+V k where regime / mode sequence {s k } forms a Markov chain with finite state space clearly E[φ(s n,x n ) n k=0 g k (X k )] = E[E[φ(s n,x n ) X 0:n ] n k=0 g k (X k )] and conditional distribution of regime / mode s n given continuous components sequence X 0:n, is a finite dimensional probability vector defined by p i n = P[s n = i X 0:n ] for any i I given explicitly, in recursive form, by solving Baum forward equation introduce new hidden state {(X k,p k )} instead of {(s k,x k )} benefit : avoid sampling finite state space

89 Conclusion particle filtering provides an implementation of Bayesian approach that is intuitive, easy to understand and implement flexible, adapts to many models, many algorithmic variants available numerically efficient, through some selection mechanism amenable to mathematical analysis

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

TSRT14: Sensor Fusion Lecture 8

TSRT14: Sensor Fusion Lecture 8 TSRT14: Sensor Fusion Lecture 8 Particle filter theory Marginalized particle filter Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 8 Gustaf Hendeby Spring 2018 1 / 25 Le 8: particle filter theory,

More information

Lecture 7: Optimal Smoothing

Lecture 7: Optimal Smoothing Department of Biomedical Engineering and Computational Science Aalto University March 17, 2011 Contents 1 What is Optimal Smoothing? 2 Bayesian Optimal Smoothing Equations 3 Rauch-Tung-Striebel Smoother

More information

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Particle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Motivation For continuous spaces: often no analytical formulas for Bayes filter updates

More information

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007 Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP

Advanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP Advanced Monte Carlo integration methods P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP MCQMC 2012, Sydney, Sunday Tutorial 12-th 2012 Some hyper-refs Feynman-Kac formulae,

More information

A Backward Particle Interpretation of Feynman-Kac Formulae

A Backward Particle Interpretation of Feynman-Kac Formulae A Backward Particle Interpretation of Feynman-Kac Formulae P. Del Moral Centre INRIA de Bordeaux - Sud Ouest Workshop on Filtering, Cambridge Univ., June 14-15th 2010 Preprints (with hyperlinks), joint

More information

STABILITY AND UNIFORM APPROXIMATION OF NONLINEAR FILTERS USING THE HILBERT METRIC, AND APPLICATION TO PARTICLE FILTERS 1

STABILITY AND UNIFORM APPROXIMATION OF NONLINEAR FILTERS USING THE HILBERT METRIC, AND APPLICATION TO PARTICLE FILTERS 1 The Annals of Applied Probability 0000, Vol. 00, No. 00, 000 000 STABILITY AND UNIFORM APPROXIMATION OF NONLINAR FILTRS USING TH HILBRT MTRIC, AND APPLICATION TO PARTICL FILTRS 1 By François LeGland and

More information

cappe/

cappe/ Particle Methods for Hidden Markov Models - EPFL, 7 Dec 2004 Particle Methods for Hidden Markov Models Olivier Cappé CNRS Lab. Trait. Commun. Inform. & ENST département Trait. Signal Image 46 rue Barrault,

More information

Particle Filters: Convergence Results and High Dimensions

Particle Filters: Convergence Results and High Dimensions Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline

More information

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Rao-Blackwellized Particle Filter for Multiple Target Tracking

Rao-Blackwellized Particle Filter for Multiple Target Tracking Rao-Blackwellized Particle Filter for Multiple Target Tracking Simo Särkkä, Aki Vehtari, Jouko Lampinen Helsinki University of Technology, Finland Abstract In this article we propose a new Rao-Blackwellized

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

State Estimation of Linear and Nonlinear Dynamic Systems

State Estimation of Linear and Nonlinear Dynamic Systems State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Sequential Monte Carlo Samplers for Applications in High Dimensions

Sequential Monte Carlo Samplers for Applications in High Dimensions Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling

More information

Contraction properties of Feynman-Kac semigroups

Contraction properties of Feynman-Kac semigroups Journées de Statistique Marne La Vallée, January 2005 Contraction properties of Feynman-Kac semigroups Pierre DEL MORAL, Laurent MICLO Lab. J. Dieudonné, Nice Univ., LATP Univ. Provence, Marseille 1 Notations

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Introduction. log p θ (y k y 1:k 1 ), k=1

Introduction. log p θ (y k y 1:k 1 ), k=1 ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Sensor Fusion: Particle Filter

Sensor Fusion: Particle Filter Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Monte Carlo Approximation of Monte Carlo Filters

Monte Carlo Approximation of Monte Carlo Filters Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London

More information

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Department of Biomedical Engineering and Computational Science Aalto University April 28, 2010 Contents 1 Multiple Model

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection

More information

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS

RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014

More information

A new class of interacting Markov Chain Monte Carlo methods

A new class of interacting Markov Chain Monte Carlo methods A new class of interacting Marov Chain Monte Carlo methods P Del Moral, A Doucet INRIA Bordeaux & UBC Vancouver Worshop on Numerics and Stochastics, Helsini, August 2008 Outline 1 Introduction Stochastic

More information

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting

More information

AUTOMOTIVE ENVIRONMENT SENSORS

AUTOMOTIVE ENVIRONMENT SENSORS AUTOMOTIVE ENVIRONMENT SENSORS Lecture 5. Localization BME KÖZLEKEDÉSMÉRNÖKI ÉS JÁRMŰMÉRNÖKI KAR 32708-2/2017/INTFIN SZÁMÚ EMMI ÁLTAL TÁMOGATOTT TANANYAG Related concepts Concepts related to vehicles moving

More information

An Brief Overview of Particle Filtering

An Brief Overview of Particle Filtering 1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Lecture Particle Filters. Magnus Wiktorsson

Lecture Particle Filters. Magnus Wiktorsson Lecture Particle Filters Magnus Wiktorsson Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace p(x)

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral

Mean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral Mean field simulation for Monte Carlo integration Part II : Feynman-Kac models P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN CNRS & Nice Sophia Antipolis Univ.

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Blind Equalization via Particle Filtering

Blind Equalization via Particle Filtering Blind Equalization via Particle Filtering Yuki Yoshida, Kazunori Hayashi, Hideaki Sakai Department of System Science, Graduate School of Informatics, Kyoto University Historical Remarks A sequential Monte

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Lecture 4: State Estimation in Hidden Markov Models (cont.) EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Particle Filters. Outline

Particle Filters. Outline Particle Filters M. Sami Fadali Professor of EE University of Nevada Outline Monte Carlo integration. Particle filter. Importance sampling. Degeneracy Resampling Example. 1 2 Monte Carlo Integration Numerical

More information

Markov Chain Monte Carlo Methods for Stochastic

Markov Chain Monte Carlo Methods for Stochastic Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013

More information

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Data assimilation in high dimensions

Data assimilation in high dimensions Data assimilation in high dimensions David Kelly Courant Institute New York University New York NY www.dtbkelly.com February 12, 2015 Graduate seminar, CIMS David Kelly (CIMS) Data assimilation February

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Lecture 4 October 18th

Lecture 4 October 18th Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information