Variantes Algorithmiques et Justifications Théoriques
|
|
- August Jasper Terry
- 6 years ago
- Views:
Transcription
1 complément scientifique École Doctorale MATISSE IRISA et INRIA, salle Markov jeudi 26 janvier 2012 Variantes Algorithmiques et Justifications Théoriques François Le Gland INRIA Rennes et IRMAR
2 ...programme of the day #1 more general models, from non linear and non Gaussian systems to hidden Markov models and partially observed Markov chains, so as to handle e.g. regime / mode switching correlation between state noise and observation noise #2 for each of these models (or just for most general model), representation of P[X 0:n dx 0:n Y 0:n ] as a Gibbs Boltzmann distribution, with recursive formulation, and idem for P[X n dx n Y 0:n ] #3 particle approximation (SIS and SIR algorithms) from either representations #4 asymptotic behaviour as sample size goes to infinity #5 numerous algorithmic variants
3 1 (some notations : 1) some notations (continued) if X is a random variable taking values in E, then mapping φ E[φ(X)] or equivalently A P[X A] defines a probability distribution µ on E, denoted as µ(dx) = P[X dx] and such that E[φ(X)] = caracterizes uncertainty about X E φ(x) µ(dx) = µ,φ or P[X A] = µ(a)
4 2 (some notations : 2) transition probability kernel M(x,dx ) on E collection of probability distributions on E indexed by x E acts on functions according to M φ(x) = E M(x,dx ) φ(x ) and acts on probability distributions according to µm(dx ) = µ(dx) M(x,dx ) seen as a mixture distribution caracterized by µm,φ = [ µ(dx) M(x,dx )] φ(x ) = E E E E µ(dx) [ = µ,m φ E M(x,dx ) φ(x )]
5 Non linear and non Gaussian systems, and beyond non linear and non Gaussian systems hidden Markov models partially observed Markov chains likelihood free models
6 3 (non linear and non Gaussian systems : 1) non linear and non Gaussian systems prior model for hidden state taking values in E X k = f k (X k 1,W k ) with W k p W k (dw) initial condition X 0 η 0 (dx) observation taking values in R d with additive noise admitting a density Y k = h k (X k )+V k with V k qk V (v)dv random variables X 0, W 1,,W k, and V 0,V 1,,V k, are mutually independent but non necessarily Gaussian only requirement (to be used later on) : easy to simulate a r.v. according to η 0 (dx) or to p W k (dw) evaluate function q V k (v) for any v Rd
7 4 (non linear and non Gaussian systems : 2) Proposition hidden states {X k } form a Markov chain taking values in E, i.e. P[X k dx X 0:k 1 ] = P[X k dx X k 1 ] characterization in terms of transition kernel P[X k dx X k 1 = x] = Q k (x,dx ) defined (implicitly) by its action on functions Q k φ(x) = E[φ(X k ) X k 1 = x] = E[φ(f k (X k 1,W k )) X k 1 = x] = φ(f k (x,w)) p W k (w)dw R m
8 5 (non linear and non Gaussian systems : 3) Remark easy to simulate next state X k given X k 1 = x, i.e. to simulate a r.v. according to Q k (x,dx ) for a given x E indeed, set X k = f k (x,w k ) where W k is simulated according to p W k (dw) Remark in general, transition kernel Q k (x,dx ) does not admit a density indeed, conditionnally to X k 1 = x, r.v. X k necessarily belongs to subset M(x) = {x R m : there exist w R p such that x = f k (x,w)} if p < m and under some mild regularity assumptions, this subset of R m has zero Lebesgue measure therefore, conditionnally to X k 1 = x, probability distribution Q k (x,dx ) of r.v. X k cannot have a density w.r.t. Lebesgue measure on R m
9 6 (non linear and non Gaussian systems : 4) Remark if f k (x,w) = b k (x)+w and if probability distribution p W k W k admits a density, still denoted p W k (w), i.e. if (dw) of r.v. X k = b k (X k 1 )+W k with W k p W k (w)dw then a more explicit expression is available Q k (x,dx ) = p W k (x b k (x))dx i.e. transition kernel Q k (x,dx ) admits an (easy to evaluate) density indeed, change of variable x = b k (x)+w yields Q k φ(x) = φ(b k (x)+w) p W k (w)dw R m = φ(x ) p W k (x b k (x)))dx R m
10 7 (non linear and non Gaussian systems : 5) Proposition observations {Y k } satisfy memoryless channel assumption, i.e. P[Y 0:n dy 0:n X 0:n ] = n k=0 P[Y k dy k X k ] characterization in terms of emission density define likelihood function P[Y k dy X k = x] = q V k (y h k (x))dy g k (x) = q V k (Y k h k (x)) a quantitative measure of consistency between possible hidden state x E and actual observation Y k Remark easy to evaluate g k (x) for any x E
11 8 (hidden Markov models : 1) hidden Markov models motivating example hybrid continuous / discrete systems X k = f k (s k 1,X k 1,W k ) Y k = h k (X k )+V k where regime / mode sequence {s k } forms a Markov chain with finite state space does not fit into non linear and non Gaussian systems, however hidden states and modes {(X k,s k )} jointly form a Markov chain observations {Y k } satisfy memoryless channel assumption i.e. fits into hidden Markov models Remark easy to simulate next state (X k,s k ) given (X k 1,s k 1 ) = (x,s)
12 9 (hidden Markov models : 2) more generally, hidden states {X k } could form a Markov chain taking values in a quite general space E, e.g. hybrid continuous / discrete differentiable manifold constrained graphical (collection de connected edges) characterization in terms of transition kernel and initial distribution P[X k dx X k 1 = x] = Q k (x,dx ) P[X 0 dx] = η 0 (dx) joint probability distribution of hidden states X 0:n verifies n P[X 0:n dx 0:n ] = η 0 (dx 0 ) Q k (x k 1,dx k ) k=1
13 10 (hidden Markov models : 3) user should respect displacement constraints due to obstacles, as read on map
14 11 (hidden Markov models : 4) simplified model : user walks on a Voronoi graph, displacement constraints due to obstacles are taken automatically into account
15 12 (hidden Markov models : 5) observations {Y k } could verify memoryless channel assumption, i.e. P[Y 0:n dy 0:n X 0:n ] = n k=0 P[Y k dy k X k ] characterization in terms of emission density P[Y k dy X k = x] = g k (x,y)λ F k(dy) where nonnegative measure λ F k (dy) defined on F does not depend on x E define (abuse of notation) likelihood function as g k (x) = g k (x,y k ) a quantitative measure of consistency between x E and observation Y k joint conditional distribution of observations Y 0:n given hidden states X 0:n verifies P[Y 0:n dy 0:n X 0:n = x 0:n ] = n g k (x k,y k ) λ F 0 (dy 0 ) λ F n(dy n ) k=0
16 13 (hidden Markov models : 6) representation as X k 1 X k X k+1 Y k 1 Y k Y k+1 arrows represent dependency between random variables only requirement (to be used later on) : easy to simulate for any x E, a r.v. according to transition kernel Q k (x,dx ) evaluate for any x E, likelihood function g k (x )
17 14 (hidden Markov models : 7) hidden Markov models : importance decomposition motivation : from simulations seen last week, basic paradigm particles move according to prior model, described by its transition kernel new particles are weighted by evaluating likelihood function hopefully, resulting weighted empirical distribution provides reasonable approximation to non tractable Bayesian filter concern / questions : is this safe? could more information be used in mutation step?
18 15 (hidden Markov models : 8) recall indoor navigation example : if user is detected by a beacon with known location a and with finite range R, then necessarily user position is within detection disk centered at a and with radius R in other words, generating particles according to prior model alone could result in (a few, some, many, all) particles outside detection disk, i.e. useless particles, waste why not generate explicitly all new particles within disk, and accomodate for wrong model by changing weights? more generally, why not (and how) use next observation to move particles? ideal situation would be particles move according to posterior model warning
19 16 (hidden Markov models : 9) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample
20 17 (hidden Markov models : 10) 1.4 prior distribution (histogram view) prior Figure 2: Prior density and histogramme associated with generated sample
21 18 (hidden Markov models : 11) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample
22 19 (hidden Markov models : 12) prior distribution, likelihood function and posterior distribution (weighted sample view) prior likelihood posterior Figure 3a: Prior density, likelihood function, posterior density and weighted sample
23 20 (hidden Markov models : 13) prior distribution, likelihood function and posterior distribution (histogram view) prior likelihood posterior Figure 4a: Prior density, likelihood function, posterior density and histogramme associated with weighted sample
24 21 (hidden Markov models : 14) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample
25 22 (hidden Markov models : 15) prior distribution, likelihood function and posterior distribution (weighted sample view) prior likelihood posterior Figure 3b: Prior density, likelihood function, posterior density and weighted sample (more difficult)
26 23 (hidden Markov models : 16) prior distribution, likelihood function and posterior distribution (histogram view) prior likelihood posterior Figure 4b: Prior density, likelihood function, posterior density and histogramme associated with weighted sample (more difficult)
27 24 (hidden Markov models : 17) 1.4 prior distribution (sample view) prior Figure 1: Prior density and generated sample
28 25 (hidden Markov models : 18) prior distribution, likelihood function and posterior distribution (weighted sample view) prior likelihood posterior Figure 3c: Prior density, likelihood function, posterior density and weighted sample (just impossible)
29 26 (hidden Markov models : 19) possible (non unique) decomposition and as product of γ 0 (dx) = g 0 (x) η 0 (dx) = g imp 0 (x)η imp 0 (dx) R k (x,dx ) = Q k (x,dx ) g k (x ) = g imp k (x,x ) Q imp k (x,dx ) a nonnegative weight function g imp 0 (x) or g imp k (x,x ) a probability distribution η imp 0 (dx) or a transition kernel Q imp k (x,dx ) respectively, only requirement about proposed decomposition : easy to simulate a r.v. according to η imp 0 (dx) simulate for any x E, a r.v. according to Q imp k (x,dx ) evaluate for any x,x E, weighting function g imp k (x,x ) attention : evaluating weighting function g imp k (x,x ) requires some knowledge about transition kernels Q imp k (x,dx ) and Q k (x,dx ) (was not required originally)
30 27 (hidden Markov models : 20) popular (optimal) importance decomposition : blind vs. guided mutation and alternatively i.e. P[X k dx,y k dy X k 1 = x] = P[Y k dy X k = x,x k 1 = x] }{{} g k (x,y ) λ k (dy ) P[X k dx,y k dy X k 1 = x] with (abuse of notation) = P[X k dx Y k = y,x k 1 = x] }{{} Q k (x,y,dx ) P[X k dx X k 1 = x] }{{} Q k (x,dx ) P[Y k dy X k 1 = x] }{{} ĝ k (x,y ) λ k (dy ) R k (x,dx ) = g k (x ) Q k (x,dx ) = ĝ k (x) Q k (x,dx ) ĝ k (x) = ĝ k (x,y k ) and Qk (x,dx ) = Q k (x,y k,dx )
31 28 (hidden Markov models : 21) remaining question : how easy is it to simulate for any x E, a r.v. according to Q k (x,dx )? evaluate for any x E, weighting function ĝ k (x)? positive answer in special case : linear observations and additive Gaussian noise X k = f k (X k 1 )+σ k (X k 1 ) W k Y k = H k X k +V k indeed (for simplicity, assume σ k (x) = I) Y k = H k [f k (X k 1 )+W k ]+V k = H k f k (X k 1 )+(H k W k +V k ) conditionally on X k 1 = x, r.v. (X k,y k ) is jointly Gaussian, with mean and covariance matrix f k (x) Q W k Q W k H k and H k f k (x) H k Q W k H k Q W k H k +QV k
32 29 (partially observed Markov chains : 1) partially observed Markov chains motivating example : assume (unsynchronized) sensors take noisy observations of different components of hidden state at different time instants, e.g. X k = (Xk 1,X2 k ) and for simplicity X k = f(x k 1 )+W k h 1 (Xk 1)+V1 k Y k = H 2 X 2 k +V2 k at odd time instants at even time instants observing all components of hidden state is fine, but processing partial observations at each time instant can be risky, since likelihood functions will be flat along some directions : ideally, try to collect and process simultaneously two successive observations, so that likelihood functions are more peaky
33 30 (partially observed Markov chains : 2) down sampling : set X k = X 2k+1 and Ȳ k = Y 2k+1 Y 2k+2 state equation X k = X 2k+1 = f(x 2k )+W 2k+1 = f(f(x 2k 1 )+W 2k )+W 2k+1 i.e. X k = f( X k 1, W k ) with W k = W 2k W 2k+1
34 31 (partially observed Markov chains : 3) observation equation : introducing projections π 1 and π 2 on 1st and 2nd components of state vector, yields i.e. Ȳ k = Y 2k+1 Y 2k+2 = = h1 (X 1 2k+1 )+V1 2k+1 H 2 X 2 2k+2 +V2 2k+2 h 1 (π 1 (X 2k+1 ))+V 1 2k+1 H 2 π 2 (f(x 2k+1 )+W 2k+2 )+V 2 2k+2 Ȳ k = h( X k )+ V k with V k = V 1 2k+1 H 2 π 2 (W 2k+2 )+V 2 2k+2
35 32 (partially observed Markov chains : 4) resulting system X k = f( X k 1, W k ) Ȳ k = h( X k )+ V k with W k = W 2k W 2k+1 and Vk = V 1 2k+1 H 2 π 2 (W 2k+2 )+V 2 2k+2 clearly W k and V k 1 share W 2k in common and are correlated, hence dependent, and memoryless channel assumption cannot hold
36 33 (partially observed Markov chains : 5) trick : decompose W k = M V k 1 + B k where B k and V k 1 are now independent, substitute in state equation and import V k 1 = Ȳk 1 h( X k 1 ) from observation equation, yielding X k = f( X k 1,M (Ȳk 1 h( X k 1 ))+ B k ) Ȳ k = h( X k )+ V k does not fit into hidden Markov model, hidden state alone does not form a Markov chain however, hidden states and observations {( X k,ȳk)} jointly form a Markov chain, the second component of which only is observed
37 34 (partially observed Markov chains : 6) even more generally, with previous motivating example in mind, hidden states and observations {(X k,y k )} could jointly form a Markov chain taking values in product space E F characterization in terms of transition kernel P[X k dx,y k dy X k 1 = x,y k 1 = y] = R k (x,y,y,dx ) λ F k(y,dy ) and initial distribution P[X 0 dx,y 0 dy] = γ 0 (y,dx) λ F 0 (dy) attention : hidden states {X k } alone need not form a Markov chain joint probability distribution of hidden states and observations (X 0:n,Y 0:n ) P[X 0:n dx 0:n,Y 0:n dy 0:n ] = γ 0 (y 0,dx 0 ) n R k (x k 1,y k 1,y k,dx k ) λ F 0 (dy 0 ) λ F k(y k 1,dy k ) k=1
38 35 (partially observed Markov chains : 7) required (non unique) decomposition partially observed Markov chains : importance decomposition γ 0 (dx) = g imp 0 (x)η imp 0 (dx) and as product of R k (x,dx ) = g imp k (x,x ) Q imp k (x,dx ) a nonnegative weight function g imp 0 (x) or g imp k (x,x ) a probability distribution η imp 0 (dx) or a transition kernel Q imp k (x,dx ) respectively, only requirement about proposed decomposition : easy to simulate a r.v. according to η imp 0 (dx) simulate for any x E, a r.v. according to Q imp k (x,dx ) evaluate for any x,x E, weighting function g imp k (x,x )
39 36 (likelihood free models : 1) likelihood free models so far, at least implicitly, additive observation noise has been assumed Y k = h(x k )+V k with V k qk V (v) dv with known and explicit form for probability density qk V(v) this was key assumption in deriving expression of density emission P[Y k dy X k = x] = g k (x,y) λ k (dy) hence explicit expression of likelihood function questions : could anything be said in more general cases where no explicit expression is available for a density, or it does not even exist non additive observation noise, with dimension smaller than observation, i.e. Y k = h(x k,v k ) perfect observations, i.e. observation noise is simply not present Y k = h(x k )
40 37 (likelihood free models : 2) trick, a form of ABC (approximate Bayesian computation) : pretend that observations are produced by slightly perturbed but regular model, i.e. or or Y k = h(x k )+V k +εu k Y k = h(x k,v k )+εu k Y k = h(x k )+εu k depending on the case under consideration, with U k q U k (u)du and set (X k,v k ) as new hidden state new requirement : easy to simulate (X k,v k ) jointly evaluate density q U k (u)
41 Bayesian filter hidden Markov models representation as Gibbs Boltzmann distribution recursive formulation partially observed Markov chains + given importance decomposition representation as Gibbs Boltzmann distribution
42 38 (Bayesian filter : hidden Markov models : representation : 1) Bayesian filter : hidden Markov models : representation Theorem joint conditional distribution of hidden state sequence X 0:n given observations Y 0:n as a Gibbs Boltzmann distribution P[X 0:n dx 0:n Y 0:n ] n k=0 g k (x k ) }{{} g 0:n (x 0:n ) η 0 (dx 0 ) n k=1 Q k (x k 1,dx k ) } {{ } η 0:n (dx 0:n ) with likelihood functions defined (abuse of notation) as g k (x) = g k (x,y k ) and with joint probability distribution of hidden state sequence X 0:n η 0:n (dx 0:n ) = P[X 0:n dx 0:n ] = η 0 (dx 0 ) n Q k (x k 1,dx k ) k=1
43 39 (Bayesian filter : hidden Markov models : representation : 2) general principle : p X Y=y (x) = p X,Y(x,y) p Y (y) = p X Y (x) p X,Y (x,y) Proof Bayes rule + Markov property + memoryless channel assumption, yield joint probability distribution of hidden states and observations (X 0:n,Y 0:n ) hence P[X 0:n dx 0:n,Y 0:n dy 0:n ] = P[Y 0:n dy 0:n X 0:n = x 0:n ] P[X 0:n dx 0:n ] = η 0 (dx 0 ) n k=1 Q k (x k 1,dx k ) P[X 0:n dx 0:n Y 0:n ] η 0 (dx 0 ) n k=1 n k=0 g k (x k,y k ) λ F 0 (dy 0 ) λ F n(dy n ) Q k (x k 1,dx k ) n k=0 g k (x k )
44 40 (Bayesian filter : hidden Markov models : representation : 3) Remark for any function f depending on whole trajectory E[f(X 0:n ) Y 0:n ] f(x 0:n ) g 0:n (x 0:n ) η 0:n (dx 0:n ) E E E[f(X 0:n ) n k=0 g k (X k )] expectation w.r.t. hidden state sequence X 0:n, while observations Y 0:n are fixed implicit parameters in likelihood functions : recall (abuse of notation) g k (x) = g k (x,y k ) if f = φ π depends only upon last state, then µ n,φ = E[φ(X n ) Y 0:n ] E[φ(X n ) n k=0 g k (X k )] = γ n,φ which defines unnormalized distribution γ n (dx) implicitly, through its action on arbitrary functions
45 41 (Bayesian filter : hidden Markov models : representation : 4) for a given importance decomposition P[X 0:n dx 0:n Y 0:n ] η 0 (dx 0 ) n k=1 Q k (x k 1,dx k ) n k=0 g k (x k ) η imp 0 (dx 0 ) n k=1 Q imp k (x k 1,dx k ) } {{ } η imp 0:n (dx 0:n) n k=0 g imp k (x k ) } {{ } g imp 0:n (x 0:n)
46 42 (Bayesian filter : hidden Markov models : recursive formulation : 1) Bayesian filter : hidden Markov models : recursive formulation Theorem Bayesian filter µ k (dx) = P[X k dx Y 0:k ] satisfies µ k 1 prediction η k = µ k 1 Q k with initial condition η 0 (dx) = P[X 0 dx] correction µ k = g k η k Remark in Theorem statement µ k 1 Q k (dx ) = E µ k 1 (dx)q k (x,dx ) denotes mixture distribution resulting from transition kernel Q k (x,dx ) acting on probability distribution µ k 1 (dx), and g k η k = g k η k η k,g k denotes (projective) product of prior probability distribution η k (dx ) with likelihood function g k (x )
47 43 (Bayesian filter : hidden Markov models : recursive formulation : 2) Proof recall representation for joint conditional probability distribution of hidden state sequence X 0:k given observations Y 0:k P[X 0:k dx 0:k Y 0:k ] η 0 (dx 0 ) k p=1 Q p (x p 1,dx p ) k p=0 g p (x p ) g k (x k ) Q k (x k 1,dx k ) P[X 0:k 1 dx 0:k 1 Y 0:k 1 ] integration w.r.t. variables x 0:k 1 (and in RHS, first w.r.t. variables x 0:k 2 and next w.r.t. variable x k 1 ), provides conditional distribution of current hidden state X k given observations Y 0:k, i.e. Bayesian filter, as µ k (dx k ) = P[X k dx k Y 0:k ] g k (x k ) Q k (x k 1,dx k ) P[X k 1 dx k 1 Y 0:k 1 ] g k (x k ) E µ k 1 (dx k 1 ) Q k (x k 1,dx k ) E } {{ } η k (dx k )
48 44 (Bayesian filter : hidden Markov models : recursive formulation : 3) Remark unnormalized version satisfies recurrent relation γ k (dx ) = g k (x ) γ k 1 (dx) Q k (x,dx ) and µ k = γ k γ k,1 or equivalently E γ k (dx ) = E γ k 1 (dx) R k (x,dx ) introducing nonnegative kernel R k (x,dx ) = Q k (x,dx ) g k (x )
49 45 (Bayesian filter : partially observed Markov chains : representation : 1) Bayesian filter : partially observed Markov chains : representation Theorem joint conditional distribution of hidden state sequence X 0:n given observations Y 0:n P[X 0:n dx 0:n Y 0:n ] γ 0 (dx 0 ) n k=1 R k (x k 1,dx k ) with nonnegative distribution defined (abuse of notation) as γ 0 (dx) = γ 0 (Y 0,dx) and with nonnegative kernel defined (abuse of notation) as R k (x k 1,dx k ) = R k (x k 1,Y k 1,Y k,dx k )
50 46 (Bayesian filter : partially observed Markov chains : representation : 2) general principle : p X Y=y (x) = p X,Y(x,y) p Y (y) = p X Y (x) p X,Y (x,y) Proof by definition joint probability distribution of hidden states and observations (X 0:n,Y 0:n ) P[X 0:n dx 0:n,Y 0:n dy 0:n ] = γ 0 (y 0,dx 0 ) n k=1 R k (x k 1,y k 1,y k,dx k ) λ F 0 (dy 0 ) λ F k(y k 1,dy k ) hence P[X 0:n dx 0:n Y 0:n ] γ 0 (dx 0 ) n R k (x k 1,dx k ) k=1
51 47 (Bayesian filter : partially observed Markov chains : representation : 3) for a given importance decomposition P[X 0:n dx 0:n Y 0:n ] γ 0 (dx 0 ) n k=1 R k (x k 1,dx k ) η imp 0 (dx 0 ) n k=1 Q imp k (x k 1,dx k ) } {{ } η imp 0:n (dx 0:n) n k=0 g imp k (x k ) } {{ } g imp 0:n (x 0:n)
52 48 (Bayesian filter : partially observed Markov chains : recursive formulation : 1) Bayesian filter : partially observed Markov chains : recursive formulation Theorem Bayesian filter µ k (dx) = P[X k dx Y 0:k ] satisfies µ k (dx ) µ k 1 (dx) R k (x,dx ) with initial condition µ 0 (dx) γ 0 (dx) E Remark unnormalized version satisfies recurrent relation γ k (dx ) = γ k 1 (dx) R k (x,dx ) and µ k = γ k γ k,1 E
53 49 (Bayesian filter : partially observed Markov chains : recursive formulation : 2) Proof recall representation for joint conditional probability distribution of hidden state sequence X 0:k given observations Y 0:k P[X 0:k dx 0:k Y 0:k ] γ 0 (dx 0 ) k p=1 R p (x p 1,dx p ) R k (x k 1,dx k ) P[X 0:k 1 dx 0:k 1 Y 0:k 1 ] integration w.r.t. variables x 0:k 1 (and in RHS, first w.r.t. variables x 0:k 2 and next w.r.t. variable x k 1 ), provides conditional distribution of current hidden state X k given observations Y 0:k, i.e. Bayesian filter, as µ k (dx k ) = P[X k dx k Y 0:k ] R k (x k 1,dx k ) P[X k 1 dx k 1 Y 0:k 1 ] E E µ k 1 (dx k 1 ) R k (x k 1,dx k )
54 Monte Carlo approximation : particle filters Monte Carlo methods : importance sampling importance sampling SIS algorithm derived from Bayesian filter representation recursive formulation redistribution SIR algorithm, adaptive redistribution derived directly from Bayesian filter recursive formulation estimation error, CLT
55 50 (Monte Carlo methods : importance sampling : 1) Monte Carlo methods if computing an integral (or a mathematical expectation) µ,φ = φ(x) µ(dx) = E[φ(X)] with X µ(dx) E is difficult, but simulating a r.v. according to distribution µ is easy, then introduce empirical probability distribution S N (µ) = 1 N where (ξ 1,,ξ N ) is an N sample distributed according to µ, and approximation by law of large numbers i=1 µ,φ S N (µ),φ = 1 N δ ξi φ(ξ i ) i=1 S N (µ),φ µ,φ in probability as N, with speed 1/ N
56 51 (Monte Carlo methods : importance sampling : 2) indeed S N (µ) µ,φ = 1 N hence (non asymptotical) mean square error (φ(ξ i ) µ,φ ) i=1 since E S N (µ) µ,φ 2 = 1 N var(φ,µ) 1 N 2 i,j=1 E[(φ(ξ i ) µ,φ ) (φ(ξ j ) µ,φ )] = 1 N 2 i=1 E φ(ξ i ) µ,φ 2 }{{} var(φ, µ) and central limit theorem holds N S N (µ) µ,φ = 1 N (φ(ξ i ) µ,φ ) = N(0,var(φ,µ)) N in distribution as N i=1
57 52 (Monte Carlo methods : importance sampling : 3) important special case : Gibbs Boltzmann distribution µ = g η = gη η,g i.e. µ,φ = η,gφ η,g with (non unique) decomposition in terms of a probability distribution η a nonnegative function g introduce unnormalized distribution defined by γ,φ = η,gφ = E[g(Ξ)φ(Ξ)] hence µ,φ = η,gφ η,g where r.v. Ξ is distributed according to η motivation : Bayes rule = γ,φ γ,1 ( ) posterior distribution likelihood function prior distribution
58 53 (Monte Carlo methods : importance sampling : 4) if simulating a r.v. according to µ is difficult, but simulating a r.v. according to η and evaluating nonnegative function g(x) for any x is easy, then it is possible to approximate µ by a weighted empirical probability distribution associated with a sample distributed according to η and weighted with nonnegative function g(x) even though normalizing constant η, g might be unknown
59 54 (Monte Carlo methods : importance sampling : 5) importance sampling idea : approximate numerator and denominator in ( ) with a unique sample distributed according to η : introduce approximation γ,φ = η,gφ S N (η),gφ = 1 N g(ξ i )φ(ξ i ) i=1 hence µ,φ = g η,φ g S N (η),φ = g(ξ i )φ(ξ i ) i=1 g(ξ i ) where (ξ 1,,ξ N ) is an N sample with common probability distribution η i=1
60 55 (Monte Carlo methods : importance sampling : 6) in other words and γ γ N = gs N (η) = 1 N µ µ N = g S N (η) = i=1 g(ξ i )δ ξ i i=1 g(ξ i ) g(ξ j ) j=1 δ ξ i = w i δ ξ i where nonnegative normalized weights (w 1,,w N ) are defined for any i = 1 N by w i = g(ξi ) g(ξ j ) j=1 i=1
61 56 (importance sampling SIS algorithm : 1) importance sampling SIS algorithm recall Bayesian filter representation as a Gibbs Boltzmann distribution µ 0:n = g 0:n η 0:n = g 0:nη 0:n η 0:n,g 0:n with g 0:n (x 0:n ) = i.e. µ 0:n,f = η 0:n,g 0:n f η 0:n,g 0:n n k=0 g k (x k 1,x k ) and with joint probability distribution of hidden states X 0:n n η 0:n (dx 0:n ) = P[X 0:n dx 0:n ] = η 0 (dx 0 ) Q k (x k 1,dx k ) unnormalized version defined as k=1 γ 0:n,f = η 0:n,g 0:n f = E[g 0:n (X 0:n )f(x 0:n )] = γ 0:n,f γ 0:n,1 and if f = φ π depends only upon last state and not on whole trajectory, then n γ 0:n,φ π = E[g 0:n (X 0:n ) φ π(x 0:n )] = E[φ(X n ) g k (X k 1,X k )] = γ n,φ k=0
62 57 (importance sampling SIS algorithm : 2) importance sampling : approximation γ 0:n,f = η 0:n,g 0:n f S N (η 0:n ),g 0:n f = 1 N g 0:n (ξ0:n)f(ξ i 0:n) i i=1 and µ 0:n,f = g 0:n η 0:n,f g 0:n S N (η 0:n ),f = g 0:n (ξ0:n)f(ξ i 0:n) i i=1 g 0:n (ξ0:n) i i=1 for any function f depending on whole trajectory, where (ξ 1 0:n,,ξ N 0:n) is an N sample with common probability distribution η 0:n
63 58 (importance sampling SIS algorithm : 3) in particular if f = φ π depends only upon last state and not on whole sequence, then γ n,φ = γ 0:n,φ π 1 g 0:n (ξ i N 0:n)φ(ξn) i and i=1 g 0:n (ξ0:n)φ(ξ i n) i µ n,φ = µ 0:n,φ π i=1 g 0:n (ξ0:n) i for any function φ, where (ξ 1 0:n,,ξ N 0:n) is an N sample with common probability distribution η 0:n, and for i = 1 N ξ i n = π(ξ i 0:n) denotes last state of sequence ξ i 0:n = (ξ i 0,,ξ i n) i=1
64 59 (importance sampling SIS algorithm : 4) in other words and µ n µ N n = γ n γ N n = 1 N i=1 g 0:n (ξ0:n) i δ ξ i n i=1 g 0:n (ξ0:n) i δ ξ i = n g 0:n (ξ j 0:n ) j=1 wn i δ ξ i n where nonnegative normalized weights (wn, 1,wn N ) are defined for any i = 1 N by wn i = g 0:n(ξ0:n) i g 0:n (ξ j 0:n ) j=1 i=1
65 60 (importance sampling SIS algorithm : 5) SIS algorithm importance sampling approximation : non recursive depth first implementation simulate an N sample of hidden state sequences (ξ 1 0:n,,ξ N 0:n) : independently for any i = 1 N, simulate a sequence ξ i 0:n = (ξ i 0,,ξ i n), i.e. simulate a r.v. ξ i 0 according to η 0 (dx) for any k = 1 n simulate a r.v. ξ i k according to Q k(ξ i k 1,dx ) and define for any i = 1 N g 0:n (ξ i 0:n) = n k=0 g k (ξk 1,ξ i k) i and wn i = g 0:n(ξ0:n) i g 0:n (ξ0:n) i j=1
66 61 (importance sampling SIS algorithm : 6) importance sampling approximation : non recursive implementation for nonlinear and non Gaussian systems simulate an N sample of hidden state sequences (ξ 1 0:n,,ξ N 0:n) : independently for any i = 1 N, simulate a sequence ξ i 0:n = (ξ i 0,,ξ i n), i.e. simulate a r.v. ξ i 0 according to η 0 (dx) for any k = 1 n simulate a r.v. W i k according to pw k (dw) and set ξi k = f k(ξ i k 1,Wi k ) and define for any i = 1 N g 0:n (ξ i 0:n) = n k=0 qk V (Y k h k (ξk)) i and wn i = g 0:n(ξ0:n) i g 0:n (ξ0:n) i j=1
67 62 (importance sampling SIS algorithm : 7) recursive formulation of weights updating for any k = 1 n and for any i = 1 N wk i = g 0:k(ξ0:k) i = g 0:k (ξ j 0:k ) j=1 g 0:k 1 (ξ0:k 1) i g k (ξk 1,ξ i k) i = g 0:k 1 (ξ j 0:k 1 ) g k(ξ j k 1,ξj k ) j=1 wk 1 i g k (ξk 1,ξ i k) i w j k 1 g k(ξ j k 1,ξj k ) j=1 benefit : allows breadth first implementation
68 63 (importance sampling SIS algorithm : 8) SIS algorithm (sequential importance sampling) : recursive implementation for k = 0, independently for any i = 1 N simulate a r.v. ξ 0 i according to η 0(dx), and define w0 i = g 0(ξ0) i g 0 (ξ j 0 ) j=1 for any k = 1 n, independently for any i = 1 N simulate a r.v. ξ i k according to Q k(ξ i k 1,dx ), and update weight as wk i = wi k 1 g k(ξk 1 i,ξi k ) w j k 1 g k(ξ j k 1,ξj k ) j=1
69 64 (importance sampling SIS algorithm : 9) SIS algorithm (sequential importance sampling) : recursive implementation for nonlinear and non Gaussian systems for k = 0, independently for any i = 1 N simulate a r.v. ξ 0 i according to η 0(dx), and define w0 i = qv 0 (Y 0 h 0 (ξ0)) i q0 V (Y 0 h 0 (ξ j 0 )) j=1 for any k = 1 n, independently for any i = 1 N simulate a r.v. Wk i according to pw k (dw) and set ξi k = f k(ξk 1 i,wi k ), and update weight as wk i = wi k 1 qv k (Y k h k (ξk i)) w j k 1 qv k (Y k h k (ξ j k )) j=1
70 65 (importance sampling SIS algorithm : 10) pros : higher weights are allocated to simulated sequences that are often consistent with observations cons : weights are evaluated afterwards, and do not have impact on how sequences are simulated (blind simulation strategy) + along a given sequence, weights are accumulated in a multiplicative way weights degeneracy : in practice, one single sequence receives a much larger weight than all other sequences, whose contributions are therefore negligible memory effect : a sequence cannot be consistent with all observations a sequence that is consistent (resp. inconsistent) with current observation, but inconsistent (resp. consistent) with earlier observations, will receive a small (resp. a large) weight proposed solutions use observations to guide how sequences are simulated from time to time, replicate / terminate sequences according to their respective weights
71 66 (SIR algorithm : 1) approximate Bayesian filter using recursive formulation µ n (dx) = P[X n dx Y 0:n ] µ k 1 prediction η k = µ k 1 Q k vith initial condition µ 0 = g 0 η 0 correction µ k = g k η k idea : look for approximations in the form of (possibly weighted) empirical probability distributions η k η N k = vk i δ ξ i et µ k µ N k = k i=1 associated with population of N particles characterized by positions (ξ 1 k,,ξn k ) in E wk i δ ξ i k nonnegative normalized weights (v 1 k,,vn k ) and (w1 k,,wn k ) i=1 SIR algorithm
72 67 (SIR algorithm : 2) initial approximation : using importance sampling µ 0 = g 0 η 0 g 0 S N (η 0 ) = i=1 g 0 (ξ0) i δ ξ i = 0 g 0 (ξ j 0 ) w0 i δ ξ i 0 i=1 j=1 where variables (ξ 1 0,,ξ N 0 ) are i.i.d. with common probability distribution η 0 correction step : clearly, from definition µ N k = g k η N k = i=1 vk i g k(ξk i) δ ξ i = k v j k g k(ξ j k ) wk i δ ξ i k i=1 j=1 which automatically has desired form
73 68 (SIR algorithm : 3) prediction step : from definition µ N k 1Q k,φ = µ N k 1(dx) Q k (x,dx )φ(x ) for any function φ, hence = = i=1 w i k 1 Q k (ξk 1,dx i )φ(x ) [ wk 1 i Q k (ξk 1,dx i )]φ(x ) i=1 in form of a finite mixture, with µ N k 1Q k = wk 1 i m i k i=1 m i k(dx ) = Q k (ξ i k 1,dx ) for any i = 1 N requires further approximation (several sampling schemes available)
74 69 (SIR algorithm : 4) multinomial resampling simulate an N sample (ξ 1 k,,ξn k ) according to µn k 1 Q k, and set µ N k 1Q k η N k = S N (µ N k 1Q k ) = 1 N δ ξ i = k i=1 vk i δ ξ i k i=1 with v i k = 1/N for any i = 1 N weights are used to select (without replacement) mixture components with higher weights, with expected consequence that components with higher weights are selected several times conversely, components with lower weights are possibly discarded and will not further contribute to approximation if R i denotes how many times i th mixture component has been selected, or equivalently how many samples in new approximation originate from i th mixture component, for any i = 1 N, then r.v. (R 1,,R N ) has a multinomial distribution
75 70 (SIR algorithm : 5) intuitively, if all mixture weights are equal (or close) to 1/N, i.e. if distribution of mixture weights is close to equidistribution, then selecting mixture components could be counter productive weigths preservation simulate one individual exactly from each mixture component and preserve its weight, i.e. independently for any i = 1 N simulate ξk i according to m i k (dx ) = Q k (ξk 1 i,dx ) and set µ N k 1Q k η N k = wk 1 i δ ξ i = k i=1 vk i δ ξ i k i=1 with v i k = wi k 1 for any i = 1 N intuitively, this approach is appropriate if distribution of mixture weights is close to equidistribution, and less appropriate in extreme case where most weights are zero, except a few components with positive weights
76 71 (SIR algorithm : 6) SIR algorithm (sampling with importance resampling) : recursive implementation for k = 0, independently for any i = 1 N simulate a r.v. ξ i 0 according to η 0 (dx), and define w0 i = g 0(ξ0) i g 0 (ξ j 0 ) j=1 for any k = 1 n, independently for any i = 1 N select an individual ξ i k 1 among population (ξ1 k 1,,ξN k 1 ) and according to weights (w 1 k 1,,wN k 1 ) simulate a r.v. ξ i k according to Q k( ξ i k 1,dx ) and define wk i = g k(ξk 1 i,ξi k ) g k (ξ j k 1,ξj k ) j=1
77 72 (SIR algorithm : 7) SIR algorithm (sampling with importance resampling) : recursive formulation for nonlinear and non Gaussian systems for k = 0, independently for any i = 1 N simulate a r.v. ξ i 0 according to η 0 (dx), and define w0 i = qv 0 (Y 0 h 0 (ξ0)) i q0 V (Y 0 h 0 (ξ j 0 )) j=1 for any k = 1 n, independently for any i = 1 N select and individual ξ i k 1 among population (ξ1 k 1,,ξN k 1 ) and according to weights (w 1 k 1,,wN k 1 ) simulate a r.v. W i k according to pw k (dw) and set ξi k = f k( ξ i k 1,Wi k ) and define wk i = qv k (Y k h k (ξk i)) qk V (Y k h k (ξ j k )) j=1
78 73 (SIR algorithm : 8) to summarize, particles (ξ 1 k 1,,ξN k 1 ) are selected according to their respective weights (w 1 k 1,,wN k 1 ) [selection step] evolve according to transition probabilities Q k (x,dx ) [mutation step] and are weighted by evaluating likelihood function g k [weighting step] pros : weights do not accumulate along each sequence, but are used to select (or resample) particles particles with larger (resp. smaller) weights are replicated (resp. are terminated) by keeping only most probable particles at each time instant, expected benefit is to concentrate available computing power within regions of interest
79 74 (SIR algorithm : 9) cons : introduces additional randomness, in resampling (selection) step proposed solutions alternate resampling strategies, that allocate an (almost) deterministic number of offsprings to each selected particle adaptive resampling, only when weights (wk 1,,wN k unbalanced (far from equidistribution) ) are too much cons : because of replication, fewer truly distinct positions are available (sample impoverishment) positions degeneracy : in practice, implicitly rely on mutation step to bring diversity again proposed solution after resampling (selection) step, add some random move to each selected particle, or apply some artificial Markovian dynamics (Metropolis Hastings, Gibbs sampling, etc.)
80 75 (particle filtering : adaptive sampling / resampling : 1) given a finite mixture m = w i m i i=1 adaptive SIR algorithm selecting mixture components is interesting only if weights (w 1,,w N ) are far from equidistribution several heuristic criteria have been proposed to quantify departure from equidistribution, and to decide wether particles should be resampled or not, e.g. effective sample size entropy
81 76 (particle filtering : adaptive sampling / resampling : 2) χ 2 distance and effective sample size χ 2 distance between two probability vectors p = (p 1,,p N ) and q = (q 1,,q N ) is defined as χ 2 (p,q) = i=1 q i ( p i q i 1) 2 in particular for p = w = (w 1,,w N ) and q = (1/N,,1/N), it holds hence 0 1 N (N w i 1) 2 = 1 N i=1 (N w i ) 2 1 = N i=1 1 N eff = 1 / [ wi 2 ] N i=1 wi 2 1 where equality is attained at equidistribution, which suggests to resample if H(w 1,,w N ) = N i=1 for some threshold H red > 0 still to be fixed i=1 w 2 i 1 = N N eff 1 H red
82 77 (estimation error, CLT : 1) on the way to asymptotic results (in 3 slides) recall linear evolution for unnormalized version of Bayesian filter γ k = γ k 1 R k = g k (γ k 1 Q k ) = g k (µ k 1 Q k ) γ k 1,1 = g k η k γ k 1,1 with initial condition γ 0 = g 0 η 0 proposed particle approximation for unnormalized distribution γk N = g k ηk N γk 1,1 N with initial condition γ0 N = g 0 η0 N and η0 N = S N (η 0 ) : clearly γk N,1 = ηk N,g k γk 1,1 N and γ0 N,1 = η0 N,g 0 and it follows γ N k γ N k,1 = g k η N k = µ N k and γ N 0 γ N 0,1 = g 0 η N 0 = µ N 0 normalized version of proposed particle approximation for γk N SIR bootstrap approximation µ N k for Bayesian filter coincides with
83 78 (estimation error, CLT : 2) Remark key for induction : for any k = 1 n and by difference hence γ N k γ k = g k η N k γ N k 1,1 g k (γ k 1 Q k ) = g k (γ N k 1Q k γ k 1 Q k )+g k (η N k µ N k 1Q k ) γ N k 1,1 γ N k γ k,φ = γ N k 1 γ k 1,Q k (g k φ) + η N k µ N k 1Q k,g k φ γ N k 1,1 error at current generation, evaluated on function φ, is decomposed into error at previous generation, evaluated on function R k φ = Q k (g k φ) local error resulting from Monte Carlo approximation even though samples are actually dependent, because of resampling at each generation, conditionally on previous generations, new samples are generated independently
84 79 (estimation error, CLT : 3) with this conditioning argument, error estimates sup E γn k γ k,φ φ: φ =1 γ k,1 c k N and sup E µ N k µ k,φ 2 c k φ: φ =1 N of order 1/ N, and CLT N γ N k γ k,φ γ k,1 = N(0,V k (φ)) and N µ N k µ k,φ = N(0,v k (φ)) with v k (φ) = V k (φ µ k,φ ) can be obtained by induction
85 Some algorithmic variants regularization progressive weighting, MCMC iterations sample size adaptation marginalization aka Rao Blackwellization interacting Kalman filters interacting finite state (Baum) filters
86 80 (marginalization aka Rao Blackwellization : 1) conditionning as a variance reduction technique if E[f(X 1,X 2 )] = E[E[f(X 1,X 2 ) X 2 ]] = E[F(X 2 )] F(x 2 ) = E[f(X 1,X 2 ) X 2 = x 2 ] = has an explicit expression, then Monte Carlo estimator 1 N E 1 f(x 1,x 2 ) P[X 1 dx 1 X 2 = x 2 ] F(Xi) 2 E[F(X 2 )] = E[f(X 1,X 2 )] i=1 where (X1, 2,XN 2 ) is an N sample with same common distribution as X2, has smaller variance than Monte Carlo estimator 1 f(x 1 N i,xi) 2 E[f(X 1,X 2 )] i=1 where ((X1,X 1 1), 2,(XN 1,X2 N ) is an N sample with same common distribution as (X 1,X 2 )
87 81 (marginalization aka Rao Blackwellization : 2) 1st example : conditionnally linear Gaussian systems X L k = F L k (X NL k 1) X L k 1 +f L k(x NL k 1)+W L k X NL k = Fk NL (Xk 1) NL Xk 1 L +fk NL (Xk 1)+W NL k NL Y k = h k (X NL k )+V k clearly E[φ(X L n,x NL n ) n k=0 g k (X NL k )] = E[E[φ(X L n,x NL n ) X NL 0:n] n k=0 g k (X NL k )] and conditional distribution of said linear component Xn L given said nonlinear component sequence X0:n, NL L NL is Gaussian, with mean X k and covariance matrix given explicitly, in recursive form, by Kalman filter equation P L NL k introduce new hidden state {(X NL k, X L NL k,p L NL k )} instead of {(Xk L,XNL k )} benefit : explore with particles subspace associated with nonlinear components, and associated with each particle, a Kalman filter estimates linear components
88 82 (marginalization aka Rao Blackwellization : 3) 2nd example : non linear systems with Markovian switching regimes / modes X k = f k (s k 1,X k 1,W k ) Y k = h k (X k )+V k where regime / mode sequence {s k } forms a Markov chain with finite state space clearly E[φ(s n,x n ) n k=0 g k (X k )] = E[E[φ(s n,x n ) X 0:n ] n k=0 g k (X k )] and conditional distribution of regime / mode s n given continuous components sequence X 0:n, is a finite dimensional probability vector defined by p i n = P[s n = i X 0:n ] for any i I given explicitly, in recursive form, by solving Baum forward equation introduce new hidden state {(X k,p k )} instead of {(s k,x k )} benefit : avoid sampling finite state space
89 Conclusion particle filtering provides an implementation of Bayesian approach that is intuitive, easy to understand and implement flexible, adapts to many models, many algorithmic variants available numerically efficient, through some selection mechanism amenable to mathematical analysis
Sequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationTSRT14: Sensor Fusion Lecture 8
TSRT14: Sensor Fusion Lecture 8 Particle filter theory Marginalized particle filter Gustaf Hendeby gustaf.hendeby@liu.se TSRT14 Lecture 8 Gustaf Hendeby Spring 2018 1 / 25 Le 8: particle filter theory,
More informationLecture 7: Optimal Smoothing
Department of Biomedical Engineering and Computational Science Aalto University March 17, 2011 Contents 1 What is Optimal Smoothing? 2 Bayesian Optimal Smoothing Equations 3 Rauch-Tung-Striebel Smoother
More informationParticle Filters. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Motivation For continuous spaces: often no analytical formulas for Bayes filter updates
More informationParticle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007
Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationAdvanced Monte Carlo integration methods. P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP
Advanced Monte Carlo integration methods P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP MCQMC 2012, Sydney, Sunday Tutorial 12-th 2012 Some hyper-refs Feynman-Kac formulae,
More informationA Backward Particle Interpretation of Feynman-Kac Formulae
A Backward Particle Interpretation of Feynman-Kac Formulae P. Del Moral Centre INRIA de Bordeaux - Sud Ouest Workshop on Filtering, Cambridge Univ., June 14-15th 2010 Preprints (with hyperlinks), joint
More informationSTABILITY AND UNIFORM APPROXIMATION OF NONLINEAR FILTERS USING THE HILBERT METRIC, AND APPLICATION TO PARTICLE FILTERS 1
The Annals of Applied Probability 0000, Vol. 00, No. 00, 000 000 STABILITY AND UNIFORM APPROXIMATION OF NONLINAR FILTRS USING TH HILBRT MTRIC, AND APPLICATION TO PARTICL FILTRS 1 By François LeGland and
More informationcappe/
Particle Methods for Hidden Markov Models - EPFL, 7 Dec 2004 Particle Methods for Hidden Markov Models Olivier Cappé CNRS Lab. Trait. Commun. Inform. & ENST département Trait. Signal Image 46 rue Barrault,
More informationParticle Filters: Convergence Results and High Dimensions
Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationL09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms
L09. PARTICLE FILTERING NA568 Mobile Robotics: Methods & Algorithms Particle Filters Different approach to state estimation Instead of parametric description of state (and uncertainty), use a set of state
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationRao-Blackwellized Particle Filter for Multiple Target Tracking
Rao-Blackwellized Particle Filter for Multiple Target Tracking Simo Särkkä, Aki Vehtari, Jouko Lampinen Helsinki University of Technology, Finland Abstract In this article we propose a new Rao-Blackwellized
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationState Estimation of Linear and Nonlinear Dynamic Systems
State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 7 Sequential Monte Carlo methods III 7 April 2017 Computer Intensive Methods (1) Plan of today s lecture
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationSequential Monte Carlo Samplers for Applications in High Dimensions
Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling
More informationContraction properties of Feynman-Kac semigroups
Journées de Statistique Marne La Vallée, January 2005 Contraction properties of Feynman-Kac semigroups Pierre DEL MORAL, Laurent MICLO Lab. J. Dieudonné, Nice Univ., LATP Univ. Provence, Marseille 1 Notations
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationIntroduction. log p θ (y k y 1:k 1 ), k=1
ESAIM: PROCEEDINGS, September 2007, Vol.19, 115-120 Christophe Andrieu & Dan Crisan, Editors DOI: 10.1051/proc:071915 PARTICLE FILTER-BASED APPROXIMATE MAXIMUM LIKELIHOOD INFERENCE ASYMPTOTICS IN STATE-SPACE
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationSensor Fusion: Particle Filter
Sensor Fusion: Particle Filter By: Gordana Stojceska stojcesk@in.tum.de Outline Motivation Applications Fundamentals Tracking People Advantages and disadvantages Summary June 05 JASS '05, St.Petersburg,
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationMonte Carlo Approximation of Monte Carlo Filters
Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include: Arnaud Doucet, Axel Finke, Anthony Lee, Nick Whiteley 7th January 2014 Context & Outline Filtering in State-Space
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationAdvanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London
More informationLecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations
Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations Department of Biomedical Engineering and Computational Science Aalto University April 28, 2010 Contents 1 Multiple Model
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationEVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER Zhen Zhen 1, Jun Young Lee 2, and Abdus Saboor 3 1 Mingde College, Guizhou University, China zhenz2000@21cn.com 2 Department
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Topics to be Covered Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Ancestral Sampling Rejection
More informationRAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS
RAO-BLACKWELLIZED PARTICLE FILTER FOR MARKOV MODULATED NONLINEARDYNAMIC SYSTEMS Saiat Saha and Gustaf Hendeby Linöping University Post Print N.B.: When citing this wor, cite the original article. 2014
More informationA new class of interacting Markov Chain Monte Carlo methods
A new class of interacting Marov Chain Monte Carlo methods P Del Moral, A Doucet INRIA Bordeaux & UBC Vancouver Worshop on Numerics and Stochastics, Helsini, August 2008 Outline 1 Introduction Stochastic
More informationConcentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand
Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting
More informationAUTOMOTIVE ENVIRONMENT SENSORS
AUTOMOTIVE ENVIRONMENT SENSORS Lecture 5. Localization BME KÖZLEKEDÉSMÉRNÖKI ÉS JÁRMŰMÉRNÖKI KAR 32708-2/2017/INTFIN SZÁMÚ EMMI ÁLTAL TÁMOGATOTT TANANYAG Related concepts Concepts related to vehicles moving
More informationAn Brief Overview of Particle Filtering
1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems
More informationDynamic System Identification using HDMR-Bayesian Technique
Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationLecture Particle Filters. Magnus Wiktorsson
Lecture Particle Filters Magnus Wiktorsson Monte Carlo filters The filter recursions could only be solved for HMMs and for linear, Gaussian models. Idea: Approximate any model with a HMM. Replace p(x)
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationMean field simulation for Monte Carlo integration. Part II : Feynman-Kac models. P. Del Moral
Mean field simulation for Monte Carlo integration Part II : Feynman-Kac models P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN CNRS & Nice Sophia Antipolis Univ.
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationKalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein
Kalman filtering and friends: Inference in time series models Herke van Hoof slides mostly by Michael Rubinstein Problem overview Goal Estimate most probable state at time k using measurement up to time
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationBlind Equalization via Particle Filtering
Blind Equalization via Particle Filtering Yuki Yoshida, Kazunori Hayashi, Hideaki Sakai Department of System Science, Graduate School of Informatics, Kyoto University Historical Remarks A sequential Monte
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationEfficient Variational Inference in Large-Scale Bayesian Compressed Sensing
Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing George Papandreou and Alan Yuille Department of Statistics University of California, Los Angeles ICCV Workshop on Information
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationLecture 4: State Estimation in Hidden Markov Models (cont.)
EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationA Note on Auxiliary Particle Filters
A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationF denotes cumulative density. denotes probability density function; (.)
BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models
More informationStrong Lens Modeling (II): Statistical Methods
Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationParticle Filters. Outline
Particle Filters M. Sami Fadali Professor of EE University of Nevada Outline Monte Carlo integration. Particle filter. Importance sampling. Degeneracy Resampling Example. 1 2 Monte Carlo Integration Numerical
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationWhy do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationData assimilation in high dimensions
Data assimilation in high dimensions David Kelly Courant Institute New York University New York NY www.dtbkelly.com February 12, 2015 Graduate seminar, CIMS David Kelly (CIMS) Data assimilation February
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More information