Advanced Monte Carlo integration methods P. Del Moral (INRIA team ALEA) INRIA & Bordeaux Mathematical Institute & X CMAP MCQMC 2012, Sydney, Sunday Tutorial 12-th 2012 Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting Particle Systems with appl., Springer (2004) Sequential Monte Carlo Samplers JRSS B. (2006). (joint work with Doucet & Jasra) On the concentration of interacting processes. Foundations & Trends in Machine Learning (2012). (joint work with Hu & Wu) [+ Refs] More references on the website http://www.math.u-bordeaux1.fr/ delmoral/index.html [+ Links]
Some basic notation Monte Carlo methods A brief introduction The Importance sampling trick The Metropolis-Hasting model The Gibbs Sampler Measure valued equations Sequence of target probabilities Probability mass transport models Nonlinear Markov transport models Boltzmann-Gibbs measures Feynman-Kac models Updating-Prediction models Path space models Application domains extensions A little stochastic analysis Interacting Monte Carlo models Interacting/Adaptive MCMC methods Mean field interacting particle models The 4 particle estimates
Some basic notation Monte Carlo methods Measure valued equations Feynman-Kac models Interacting Monte Carlo models
Basic notation P(E) probability meas., B(E) bounded functions on E. (µ, f ) P(E) B(E) µ(f ) = µ(dx) f (x) Q(x 1, dx 2 ) integral operators x 1 E 1 x 2 E 2 Q(f )(x 1 ) = Q(x 1, dx 2 )f (x 2 ) [µq](dx 2 ) = µ(dx 1 )Q(x 1, dx 2 ) (= [µq](f ) = µ[q(f )] ) Boltzmann-Gibbs transformation (updating Bayes type rule) [Positive and bounded potential function G] µ(dx) Ψ G (µ)(dx) = 1 µ(g) G(x) µ(dx)
Basic notation E = {1,..., d} integral operations matrix operations Indeed : with µ = [µ(1),..., µ(d)] Q = (Q(i, j)) 1 i,j d f = µq = [(µq)(1),..., (µq)(d)] & Q(f ) = Q(f )(1). Q(f )(d) f (1). f (d) (µq)(j) = i µ(i)q(i, j) & Q(f )(i) = j Q(i, j)f (j)... and of course the duality formula µ(f ) = i µ(i)f (i)
Some basic notation Monte Carlo methods A brief introduction The Importance sampling trick The Metropolis-Hasting model The Gibbs Sampler Measure valued equations Feynman-Kac models Interacting Monte Carlo models
Introduction Objective : Given a target probab. η(dx) compute the map η : f η(f ) = f (x) η(dx)?? Monte Carlo methods : η N := 1 N δ X i N η 1 i N with two traditional types of samples (X i ) i 1 i.i.d. with common law η (X i ) i 1 Markov chain with invariant measure η
The Importance sampling trick (f > 0) Key formula η(f ) = f (x) dη dη (x) π(dx) = π (f g) with g = dπ dπ Sample X i i.i.d. with common law π and set Note that η N = 1 N for the updated twisted measure 1 i N g(x i ) δ X i N Var ( η N (f ) ) = η ( g f 2) η(f ) 2 = 0 π(dx) = Ψ f (η)(dx) := 1 η(f ) f (x) η(dx) ( g = η(f )/f )
The Metropolis-Hasting model From x we propose x with law P(x, dx ) and accept the move with proba a(x, x ) := 1 η(dx )P(x, dx) η(dx)p(x, dx ) The Markov transition M(x, dx ) is η-reversible η(dx)m(x, dx ) x x = η(dx)p(x, dx ) a(x, x ) = [η(dx)p(x, dx )] [η(dx )P(x, dx)] = η(dx )M(x, dx) Fixed point equation η(dx) M(x, dx ) = (ηm)(dx ) = η(dx ) ηm = η
The Metropolis-Hasting model Markov chain samples X 1 M X2 M X3 M M... Xn 1 with the Markov transport equation P (X n dx n ) = P (X n 1 dx n 1 ) }{{}}{{} =η n(dx n) =η n 1(dx n 1) M Xn M... P (X n dx n X n 1 = x n 1 ) }{{} =M(x n 1,dx n) Linear measure valued equation : η n = η n 1 M n η = ηm
Example 1 Boltzmann-Gibbs target measure : η(dx) = η n (dx) = 1 Z n e βnv (x) λ(dx) with β n = 1/T n The M-H transition M n s.t. η n = η n M n : Proposition of moves P(x, dx ) Acceptance rate ( a n (x, x ) = 1 e βn[v (x ) V (x)] If P is λ-reversible then we have (with a + = max (a, 0)) [ λ(dx )P(x ]), dx) λ(dx)p(x, dx ) a n (x, x ) = e βn[v (x ) V (x)] + stochastic style steepest descent Some mixing pb. : β n large high rejection/local minima absorptions
Example 2 Restriction probability: η(dx) = η n (dx) = 1 1 An (x) λ(dx) Z n with A n A The M-H transition M n s.t. η n = η n M n = Shaker of the set A n λ-reversible propositions P(x, dx ) Acceptance iff x A n Example of λ-reversible moves : λ = N (0, 1) = Law(W ) x = a x + 1 a 2 W a [0, 1] Some mixing pb. : a or A n too small high rejection
Gibbs samplers Metropolis-Hasting model On product state spaces η(dx) := η(d(x 1, x 2 )) on E = (E 1 E 2 ) Desintegration formulae : η(d(x 1, x 2 )) = η 1 (dx 1 ) P 2 (x 1, dx 2 ) = η 2 (dx 2 ) P 1 (x 2, dx 1 ) Unit acceptance rates : x = (x 1, x 2 ) P1 x = (x 1, x 2 ) P1 x = (x 1, x 2) a 1 (x, x ) := 1 [η 2(dx 2 )P 1 (x 2, dx 1 )] P 1(x 2, dx 1 ) [η 2 (dx 2 )P 1 (x 2, dx 1 )] P 1 (x 2, dx 1 ) = 1 = a 2(x, x )
Some basic notation Monte Carlo methods Measure valued equations Sequence of target probabilities Probability mass transport models Nonlinear Markov transport models Boltzmann-Gibbs measures Feynman-Kac models Interacting Monte Carlo models
Measure valued equations Measure valued equations = Sequence of (target) probabilities (with complexity) η 0 η 1... η n 1 η n η n+1... Examples : Boltzmann-Gibbs w.r.t. β n η n (dx) = 1 Z n e βnv (x) λ(dx) Restriction models w.r.t. A n η n (dx) = 1 Z n 1 An (x) λ(dx)
Probability mass transport models Reminder :Two types of transformations : Given a potential function G(x) 0, and a Markov transition M(x, dy) Boltzmann-Gibbs transformation (= Bayes type updating rule) Ψ G : η(dx) Ψ G (η)(dx) = 1 η(g) G(x) η(dx) Markov transport equation M : η(dx) (ηm)(dx) = η(dx ) M(x, dx)
Key observation Boltzmann-Gibbs transform = Nonlinear Markov transport Ψ G (η) = ηs G,η with the Markov transition S G,η (x, dx ) = ɛg(x)δ x (dx ) + (1 ɛg(x)) Ψ G (η)(dx ) for any ɛ [0, 1] s.t. ɛ G 1 Proof : S G,η (f )(x) = ɛg(x)f (x) + (1 ɛg(x)) Ψ G (η)(f ) η(s G,η (f )) = ɛη(gf ) + (1 ɛη(g)) η(gf ) η(g) = Ψ G (η)(f )
Boltzmann-Gibbs measures η n (dx) := 1 Z n e βnv (x) λ(dx) with β n For any MCMC transition M n with target η n η n = η n M n Updating of the temperature parameter η n+1 = Ψ Gn (η n ) with G n = e (βn+1 βn)v Proof : e βn+1v = e (βn+1 βn)v e βnv Consequence : η n+1 = η n+1 M n+1 = Ψ Gn (η n )M n+1 η n+1 = Ψ Gn (η n )M n+1
Restriction models η n (dx) := 1 Z n 1 An λ(dx) with A n For any MCMC transition M n with target η n Updating of the subset η n = η n M n η n+1 = Ψ Gn (η n ) with G n = 1 An+1 Proof : 1 An+1 = 1 An+1 1 An Consequence : η n+1 = η n+1 M n+1 = Ψ Gn (η n )M n+1 η n+1 = Ψ Gn (η n )M n+1
Product models η n (dx) := 1 Z n { n p=0 h p (x) } λ(dx) with h p 0 For any MCMC transition M n with target η n = η n M n. Updating of the product Proof : η n+1 = Ψ Gn (η n ) with G n = h n+1 { n+1 } { p=0 h n } p = h n+1 p=0 h p Consequence : η n+1 = η n+1 M n+1 = Ψ Gn (η n )M n+1 η n+1 = Ψ Gn (η n )M n+1
Some basic notation Monte Carlo methods Measure valued equations Feynman-Kac models Updating-Prediction models Path space models Application domains extensions A little stochastic analysis Interacting Monte Carlo models
Feynman-Kac models The solution of any measure valued equation of the form η n = Ψ Gn 1 (η n 1 )M n is given by a normalized Feynman-Kac model of the form η n (f ) = γ n (f )/γ n (1) with the positive measure γ n defined by ( ) n 1 γ n (f ) = E f (X n ) G p (X p ) p=0 where X n stands for the Markov chain with transitions P(X n dx n X n 1 = x n 1 ) = M n (x n 1, dx n )
Path space models If we take the historical process X n = (X 0,..., X n ) E n = E n+1 and G p (X n ) = G n (X n ) then we have γ n (f n ) = E ( f n (X n ) ) n 1 G p (X p ) η n = Q n p=0 with the Feynman-Kac model on path space dq n := 1 G Z n p (X p ) dp }{{} n 0 p<n =law(x 0,...,X n) Evolution equation : Q n = Ψ Gn (Q n 1 )M n with the Markov transition M n of the historical process X n
Application domains extensions Confinements : Random walk X n Z d, X 0 = 0 & G n := 1 [ L,L], L > 0. Q n = Law ((X 0,..., X n ) X p [ L, L], 0 p < n) and Z n = γ n (1) = Proba (X p [ L, L], 0 p < n) Self avoiding walks X n = (X 0,..., X n ) & G n (X n ) = 1 Xn {X 0,...,X n 1} Q n = Law ((X 0,..., X n ) X p X q, 0 p < q < n) and Z n = γ n (1) = Proba (X p X q, 0 p < q < n)
Filtering: M n (x n 1, dx n ) = p(x n x n 1 ) dx n & G n (x n ) = p(y n x n ) and Q n+1 = Law ((X 0,..., X n+1 ) Y 0 = y 0,..., Y n = y n ) Z n+1 = γ n+1 (1) = p(y 0,..., y n ) Hidden Markov chain models = product models dp(θ (y 0,..., y n )) }{{} η n(dθ) n p(y p θ, (y 0,..., y p 1 )) }{{} dp(θ) }{{} h p(θ) λ(dθ) p=0
Continuous time models tn+1 X n := X [t n,t n+1[ & G n (X n ) = exp V s (X s )ds t n 0 p<n { tn } G p (X p ) = exp V s (X s )ds t 0 or using a simple Euler s scheme X t p = X p tn t e 0 [V s (X s )ds+ws (X )dbs] s 0 p<n e Vtp (Xp) t+wtp (Xp) t N p(0,1)
A little analysis with 3 keys formulae Time marginal measures = Path space measures: [ X n := (X 0,..., X n ) & G n (X n ) = G n (X n )] = η n = Q n Normalizing constants (= Free energy models): Z n = E G p (X p ) = η p (G p ) 0 p<n 0 p<n Proof (Z n = γ n (1)) : γ n+1 (1) = E G n (X n ) G p (X p ) = γ n (G n ) = η n (G n ) γ n (1) 0 p (n 1)
The last key Backward Markov models with Q n (d(x 0,..., x n )) η 0 (dx 0 )Q 1 (x 0, dx 1 )... Q n (x n 1, dx n ) Q n (x n 1, dx n ) := G n 1 (x n 1 )M n (x n 1, dx n ) If we set hyp = H n (x n 1, x n ) ν n (dx n ) 1 η n+1 (dx) = η n (G n ) η n (H n+1 (., x)) ν n+1 (dx) M n+1,ηn (x n+1, dx n ) = η n(dx n ) H n+1 (x n, x n+1 ) η n (H n+1 (., x n+1 )) then we find the backward equation η n+1 (dx n+1 ) M n+1,ηn (x n+1, dx n ) = 1 η n (G n ) η n(dx n ) Q n+1 (x n, dx n+1 )
The last key (continued) Q n (d(x 0,..., x n )) η 0 (dx 0 )Q 1 (x 0, dx 1 )... Q n (x n 1, dx n ) η n+1 (dx n+1 ) M n+1,ηn (x n+1, dx n ) η n (dx n ) Q n+1 (x n, dx n+1 ) Backward Markov chain model : Q n (d(x 0,..., x n )) = η n (dx n ) M n,ηn 1 (x n, dx n 1 )... M 1,η0 (x 1, dx 0 ) with the dual/backward Markov transitions M n+1,ηn (x n+1, dx n ) η n (dx n ) H n+1 (x n, x n+1 )
Some basic notation Monte Carlo methods Measure valued equations Feynman-Kac models Interacting Monte Carlo models Interacting/Adaptive MCMC methods Mean field interacting particle models The 4 particle estimates
Interacting Monte Carlo models Objective : Solve a nonlinear measure valued equation η n+1 = Φ n+1 (η n ) Two classes of interacting Monte Carlo models Interacting/Adaptive MCMC methods Mean field/interacting particle systems
Interacting/Adaptive MCMC methods n, run an interacting sequence of MCMC algorithms Xn 1 Xn 2... Xn j......... [target η n ] Xn+1 1 Xn+1 2... X j n+1 X j+1 n+1... [target η n+1 ] s.t. X j n+1 M [n+1] η X j+1 n+1 depends on η = η j n := 1 j A single fixed point compatibility condition : References η Φ n+1 (η)m [n+1] η = Φ n+1 (η) 1 i j δ X i n k η n A Functional Central Limit Theorem for a Class of Interacting MCMC Models EJP (2009). (joint work with Bercu & Doucet) Sequentially Interacting Markov chain Monte Carlo. AoS (2010). (joint work with Brockwell & Doucet) Interacting MCMC methods for solving nonlinear measure-valued equations. AAP (2010) (joint work with Doucet) Fluctuations of Interacting MCMC Models. SPA (2012). (joint work with Bercu & Doucet)
Mean field interacting particle models Key idea : The solution of any measure valued process η n = Φ n (η n 1 ) can be seen as the law η n = Law ( X n ) of a Markov chain X n for some transitions P ( X n dx n X n 1 = x n 1 ) = Kn,ηn 1 (x n 1, dx n ) Notice that X n = Perfect sampler Example : the Feynman-Kac updating-prediction mapping Φ n (η) = Ψ Gn 1 (η)m n = η S Gn 1,ηM n = ηk n,η }{{} K n,η
Mean field particle interpretation We approximate the exact/perfect transitions by running a X n X n+1 K n+1,ηn (X n, dx n+1 ) Markov chain ξ n = (ξ 1 n,..., ξ N n ) E N n 1 N 1 i N Particle transitions ( 1 i N) s.t. δ ξ i n := η N n N η n ξ i n ξ i n+1 K n+1,η N n (ξ i n, dx n+1 )
Discrete generation mean field particle model Schematic picture : ξ n E N n ξ n+1 E N n+1 ξ 1 n. ξ i n.. ξ N n K n+1,η N n ξ 1 n+1. ξ i n+1. ξ N n+1 Rationale : η N n N η n = K n+1,η N n N K n+1,ηn = ξ i n+1 almost iid copies of X n+1 = η N n+1 N η n+1
Updating-prediction models genetic particle model : ξ 1 ṇ. ξ i ṇ. ξ N n S Gn,η N n ξ 1 n. ξ i n. ξ N n M n+1 Accept/Reject/Recycling/Selection transition : S Gn,η N n (ξi n, dx) ξ 1 n+1. ξ i n+1. ξ N n+1 := ɛ n G n (ξn) i δ ξ i n (dx) + ( 1 ɛ n G n (ξn) ) i N G n(ξ j n ) j=1 N k=1 Gn(ξk )δ ξ j (dx) n n Ex. : G n = 1 A, ɛ n = 1 G n (ξ i n) = 1 A (ξ i n) FK-Mean field particle models = sequential Monte Carlo, population Monte Carlo, genetic algorithms, particle filters, pruning, spawning, reconfiguration, quantum Monte carlo, go with the winner...
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
Graphical illustration : η N n := 1 N 1 i N δ ξ i n η n
The 4 particle estimates Genealogical tree evolution (N, n) = (3, 3) = = = Individuals in the current population = Almost i.i.d. samples w.r.t. FK marginal meas. η n η N n := 1 N 1 i N δ ξ i n N η n = solution of a nonlinear m.v.p.
Two more particle estimates = = = Ancestral lines = Almost i.i.d. sampled paths w.r.t. Q n. ( ξ i 0,n, ξ1,n, i..., ξn,n i ) := i-th ancetral line i-th current individual = ξn i 1 N 1 i N δ (ξ i 0,n,ξ i 1,n,...,ξi n,n) N Q n Unbiased particle free energy functions Zn N = ηp N (G p ) N Z n = 0 p<n 0 p<n η p (G p )
... and the last particle measure Q N n (d(x 0,..., x n )) := η N n (dx n ) M n,η N n 1 (x n, dx n 1 )... M 1,η N 0 (x 1, dx 0 ) with the random particle matrices: M n+1,η N n (x n+1, dx n ) η N n (dx n ) H n+1 (x n, x n+1 ) Example: Normalized additive functionals f n (x 0,..., x n ) = 1 n + 1 Q N n (f n ) := 1 n + 1 0 p n 0 p n f p (x p ) η N n M n,η N n 1... M p+1,η N p (f p ) }{{} matrix operations