Mean field simulation for Monte Carlo integration Part II : Feynman-Kac models P. Del Moral INRIA Bordeaux & Inst. Maths. Bordeaux & CMAP Polytechnique Lectures, INLN CNRS & Nice Sophia Antipolis Univ. 2012 Some hyper-refs Mean field simulation for Monte Carlo integration. Chapman & Hall - Maths & Stats [600p.] (May 2013). Feynman-Kac formulae, Genealogical & Interacting Particle Systems with appl., Springer [573p.] (2004) Particle approximations of Lyapunov exponents connected to Schrödinger operators and Feynman-Kac semigroups. ESAIM-P&S (2003) (joint work with L. Miclo). Coalescent tree based functional representations for some Feynman-Kac particle models. Annals of Applied Probability (2009) (joint work with F. Patras, S. Rubenthaler). On the concentration of interacting processes. Foundations & Trends in Machine Learning [170p.] (2012). (joint work with P. Hu & L.M. Wu) More references on the websitehttp://www.math.u-bordeaux1.fr/ delmoral/index.html [+ Links]
Contents Introduction Mean field simulation Interacting jumps models Feynman-Kac models Path integration models The 3 keys formulae Stability properties Some illustrations ( Part III) Markov restrictions Multilevel splitting Absorbing Markov chains Quasi-invariant measures Gradient of Markov semigroups Some bad tempting ideas Interacting particle interpretations Nonlinear evolution equation Mean field particle models Graphical illustration Island particle models Concentration inequalities Current population models Particle free energy/genealogical tree models Backward particle models
Introduction Mean field simulation Interacting jumps models Feynman-Kac models Some illustrations ( Part III) Some bad tempting ideas Interacting particle interpretations Concentration inequalities
Introduction Mean field simulation = Universal adaptive & interacting sampling technique Part I 2 types of stochastic interacting particle models: Diffusive particle models with mean field drifts [McKean-Vlasov style] Interacting jump particle models [Boltzmann & Feynman-Kac style]
Part II Interacting jumps models Interacting jumps = Recycling transitions = Discrete generation models ( geometric jump times)
Equivalent particle algorithms Sequential Monte Carlo Sampling Resampling Particle Filters Prediction Updating Genetic Algorithms Mutation Selection Evolutionary Population Exploration Branching-selection Diffusion Monte Carlo Free evolutions Absorption Quantum Monte Carlo Walkers motions Reconfiguration Sampling Algorithms Transition proposals Accept-reject-recycle
Equivalent particle algorithms Sequential Monte Carlo Sampling Resampling Particle Filters Prediction Updating Genetic Algorithms Mutation Selection Evolutionary Population Exploration Branching-selection Diffusion Monte Carlo Free evolutions Absorption Quantum Monte Carlo Walkers motions Reconfiguration Sampling Algorithms Transition proposals Accept-reject-recycle More lively buzzwords: Bootstrapping, spawning, cloning, pruning, replenish, cloning, splitting, condensation, resampled Monte Carlo, enrichment, go with the winner, subset simulation, rejection and weighting, look-a-head sampling, pilot exploration,..
A single stochastic model Particle interpretation of Feynman-Kac path integrals
Introduction Feynman-Kac models Path integration models The 3 keys formulae Stability properties Some illustrations ( Part III) Some bad tempting ideas Interacting particle interpretations Concentration inequalities
Feynman-Kac models FK models = Markov chain X n E n functions G n : E n [0, [ dq n := 1 G Z n p (X p ) dp n with P n = Law(X 0,..., X n ) 0 p<n Flow of n-marginals η n (f ) = γ n (f )/γ n (1) with γ n (f ) := E f (X n ) G p (X p ) 0 p<n
Feynman-Kac models FK models = Markov chain X n E n functions G n : E n [0, [ dq n := 1 G Z n p (X p ) dp n with P n = Law(X 0,..., X n ) 0 p<n Flow of n-marginals η n (f ) = γ n (f )/γ n (1) with γ n (f ) := E f (X n ) 0 p<n G p (X p ) Evolution equations : with M n Markov trans. of X n and Q n+1 (x, dy) = G n (x)m n+1 (x, dy) γ n+1 = γ n Q n+1 and η n+1 = Ψ Gn (η n )M n+1
The 3 keys formulae Time marginal measures = Path space measures: γ n (f n ) = E f n (X n ) G p (X p ) 0 p<n [ X n := (X 0,..., X n ) & G n (X n ) = G n (X n )] = η n = Q n Normalizing constants (= Free energy models): Z n = E G p (X p ) = η p (G p ) 0 p<n 0 p<n
The last key Backward Markov models with Q n (d(x 0,..., x n )) η 0 (dx 0 )Q 1 (x 0, dx 1 )... Q n (x n 1, dx n ) Q n (x n 1, dx n ) := G n 1 (x n 1 )M n (x n 1, dx n ) If we set hyp = H n (x n 1, x n ) ν n (dx n ) 1 η n+1 (dx) = η n (G n ) η n (H n+1 (., x)) ν n+1 (dx) M n+1,ηn (x n+1, dx n ) = η n(dx n ) H n+1 (x n, x n+1 ) η n (H n+1 (., x n+1 )) then we find the backward equation η n+1 (dx n+1 ) M n+1,ηn (x n+1, dx n ) = 1 η n (G n ) η n(dx n ) Q n+1 (x n, dx n+1 )
The last key (continued) Q n (d(x 0,..., x n )) η 0 (dx 0 )Q 1 (x 0, dx 1 )... Q n (x n 1, dx n ) η n+1 (dx n+1 ) M n+1,ηn (x n+1, dx n ) η n (dx n ) Q n+1 (x n, dx n+1 ) Backward Markov chain model : Q n (d(x 0,..., x n )) = η n (dx n ) M n,ηn 1 (x n, dx n 1 )... M 1,η0 (x 1, dx 0 ) with the dual/backward Markov transitions M n+1,ηn (x n+1, dx n ) η n (dx n ) H n+1 (x n, x n+1 )
Stability properties Transition/Excursions/Path spaces X n = (X n, X n+1) X n = X [T n,t n+1] X n = (X 0,..., X n) Continuous time models Langevin diffusions tn+1 X n = X [t n,t n+1] & G n (X n ) = exp V t (X t )dt t n OR Euler schemes (Langevin diff. Metropolis-Hasting moves) OR Fully continuous time particle models Schrödinger operators Important observation: ( t ) γ t (1) = E exp V s (X s )ds 0 d dt γ t(f ) = γ t (L V t (f )) with L V t = L t + V t t = exp η s (V s )ds with η t = γ t /γ t (1) 0
Stability properties Change of probability measures-importance sampling (IS) - Sequential Monte Carlo methods (SMC) : For any target probability measures of the form Q n+1 (d(x 0,..., x n+1 )) Q n (d(x 0,..., x n )) Q n+1 (x n, dx n+1 ) η 0 (dx 0 )Q 1 (x 0, dx 1 )... Q n+1 (x n, dx n+1 ) and any Markov transition M n+1 s.t. Q n+1(x n,.) M n+1 (x n,.) Target at time (n + 1) G n (x n, x n+1 ) = Target at time (n) Twisted transition = dq n+1(x n,.) dm n+1 (x n,.) (x n+1)
Stability properties Change of probability measures-importance sampling (IS) - Sequential Monte Carlo methods (SMC) : For any target probability measures of the form Q n+1 (d(x 0,..., x n+1 )) Q n (d(x 0,..., x n )) Q n+1 (x n, dx n+1 ) η 0 (dx 0 )Q 1 (x 0, dx 1 )... Q n+1 (x n, dx n+1 ) and any Markov transition M n+1 s.t. Q n+1(x n,.) M n+1 (x n,.) Target at time (n + 1) G n (x n, x n+1 ) = Target at time (n) Twisted transition = dq n+1(x n,.) dm n+1 (x n,.) (x n+1) Feynman-Kac model with X n = ( ) X n, X n+1 Q n = 1 G p (X p ) Z n dp n with P n = Law(X 0,..., X n ) 0 p<n
Introduction Feynman-Kac models Some illustrations ( Part III) Markov restrictions Multilevel splitting Absorbing Markov chains Quasi-invariant measures Gradient of Markov semigroups Some bad tempting ideas Interacting particle interpretations Concentration inequalities
Markov restrictions Confinements: X n random walk Z d A & G n := 1 A Q n = Law ((X 0,..., X n ) X p A, 0 p < n) Z n = Proba (X p A, 0 p < n) SAW : X n = (X p) 0 p n & G n (X n ) = 1 X n {X 0,...,X n 1 } Q n = Law ( (X 0,..., X n) X p X q, 0 p < q < n ) Z n = Proba ( X p X q, 0 p < q < n )
Multilevel splitting Decreasing level sets A n, with B non critical recurrent subset. T n := inf {t T n 1 : X t (A n B)} Excursion valued Feynman-Kac model: X n = (X t ) t [Tn,T n+1] & G n (X n ) = 1 An+1 (X T n+1 ) ( ) Q n = Law X [T X 0,T n] hits A n 1 before B Z n = P(X hits A n 1 before B)
Absorbing Markov chains X c n E c n absorption (1 G n) X c n exploration M n+1 X c n+1 Q n = Law((X c 0,..., X c n ) T abs. n) & Z n = Proba ( T abs. n ) Quasi-invariant measures : (G n, M n ) = (G, M) & M µ-reversible 1 n log P ( T abs. n ) n λ = top spect. of Q(x, dy) = G(x)M(x, dy) [Frobenius theo] Q(h) = λh = λ eigenfunction (ground state) P(X c n dx T abs. > n) n 1 µ(h) h(x) µ(dx)
Doob h-processes X h with Q n (d(x 0,..., x n )) P((X h 0,..., X h n ) d(x 0,..., x n )) h 1 (x n ) M h (x, dy) = 1 λ h 1 (x)q(x, dy)h(y) = M(x, dy)h(y) M(h)(x) Invariant measure µ h = µ h M h & Additive functionals F n (x 0,..., x n ) = 1 n + 1 0 p n f (x p ) = Q n (F n ) n µ h (f ) If G = G θ depends on some θ R f := θ log G θ θ log 1 λθ n n + 1 θ log Zθ n+1 = Q n (F n )
Gradient of Markov semigroups X n+1 (x) = F n (X n (x), W n ) ( X 0 (x) = x R d) P n (f )(x) := E (f (X n (x))) First variational equation X n+1 x (x) = A n (x, W n ) X n (x) with A(i,j) x Random process on the sphere U 0 = u 0 S d 1 n (x, w) = F n(., i w) x j (x) X n x U n+1 = A n (X n, W n )U n / A n (X n, W n )U n = (x) u 0 X n x (x) u 0 Feynman-Kac model X n = (X n, U n, W n ) & G n (x, u, w) = A n (x, w) u P n+1 (f )(x) u 0 = E F (X n+1 ) }{{} f (X n+1) U n+1 G p (X p ) 0 p n }{{} Xn x (x) u0
Introduction Feynman-Kac models Some illustrations ( Part III) Some bad tempting ideas Interacting particle interpretations Concentration inequalities
Bad tempting ideas I.i.d. weighted samples Xn i Z n := E G p (X p ) Zn N := 1 N 0 p<n or in terms of killing-absorption models Z n = P (T n) Zn N := 1 N N G p (Xp) i i=1 0 p<n 1 i N 1 T i n
Bad tempting ideas I.i.d. weighted samples Xn i Z n := E G p (X p ) Zn N := 1 N 0 p<n or in terms of killing-absorption models Z n = P (T n) Zn N := 1 N N G p (Xp) i i=1 0 p<n 1 i N 1 T i n Example : X n simple RW Z d, G n = 1 [ 10,10] (killed at the boundary) n = n(ω) : Zn N = 0
Bad tempting ideas I.i.d. weighted samples Xn i Z n := E G p (X p ) Zn N := 1 N 0 p<n or in terms of killing-absorption models Z n = P (T n) Zn N := 1 N N G p (Xp) i i=1 0 p<n 1 i N 1 T i n Example : X n simple RW Z d, G n = 1 [ 10,10] (killed at the boundary) and N E ( [Z ] N 2 ) n 1 Z n n = n(ω) : Z N n = 0 = 1 Z n Z n Proba(X p A, 0 p < n) 1 = P (T n) 1
Our objective Find an unbiased estimate Zn N s.t. ( [Z ] N 2 ) N E n 1 c n Z n using the multiplicative formula E G p (X p ) = 0 p<n 0 p<n η p (G p ) And estimating/learning each (larger) terms in the product η p (G p ) η N p (G p ) with η N p = 1 N 1 i N δ ξ i n
Introduction Feynman-Kac models Some illustrations ( Part III) Some bad tempting ideas Interacting particle interpretations Nonlinear evolution equation Mean field particle models Graphical illustration Island particle models Concentration inequalities
Flow of n-marginals [X n Markov with transitions M n ] η n (f ) = γ n (f )/γ n (1) with γ n (f ) := E f (X n ) G p (X p ) 0 p<n (γ n (1) = Z n ) Nonlinear evolution equation : η n+1 = Ψ Gn (η n )M n+1 Z n+1 = η n (G n ) Z n Nonlinear m.v.p. = Law of a Markov X n (perfect sampler) η n+1 = Φ n+1 (η n ) = η n (S n,ηn M n+1 ) = η n K n+1,ηn = Law(X n+1 )
Examples related to product models η n (dx) := 1 Z n { n p=0 h p (x) } λ(dx) with h p 0 2 illustrations: h p (x) = e (βp+1 βp)v (x) β p = η n (dx) = 1 Z n e βnv (x) λ(dx) h p (x) = 1 Ap+1 (x) A p = η n (dx) = 1 Z n 1 An (x) λ(dx) For any MCMC transitions M n with target η n, we have η n+1 = η n+1 M n+1 = Ψ hn+1 (η n )M n+1 Feynman-Kac model
Mean field particle model McKean Markov chain model η n+1 = η n K n+1,ηn = Law(X n ) Markov chain ξ n = (ξ i n) 1 i N E N n ξn i ξn+1 i K n+1,η N n (ξn, i dx) with ηn N = 1 N and the (unbiased) particle normalizing constants Zn+1 N = ηn N (G n ) Zn N = ηp N (G p ) 0 p n 1 i N δ ξ i n
Mean field particle model Mean field FK simulation ξn i ξn+1 i K n+1,η N n = S n,η N n M n+1 Sequential particle simulation technique (SMC) G n -acceptance-rejection with recycling M n+1 -propositions
Mean field particle model Mean field FK simulation ξ i n ξ i n+1 K n+1,η N n = S n,η N n M n+1 Sequential particle simulation technique (SMC) G n -acceptance-rejection with recycling M n+1 -propositions Genetic type branching particle algorithm (GA) ξ n G n selection ξ n M n mutation ξ n+1
Mean field particle model Mean field FK simulation ξ i n ξ i n+1 K n+1,η N n = S n,η N n M n+1 Sequential particle simulation technique (SMC) G n -acceptance-rejection with recycling M n+1 -propositions Genetic type branching particle algorithm (GA) ξ n G n selection ξ n M n mutation ξ n+1 Reconfiguration Monte Carlo (particles walkers) (QMC) (Selection, Mutation) = (Reconfiguration, exploration)
Continuous time Feynman-Kac particle models Master equation η t (.) = γ t(.) γ t (1) = Law(X t) d dt η t(f ) = η t (L t,ηt (f ))
Continuous time Feynman-Kac particle models Master equation η t (.) = γ t(.) γ t (1) = Law(X t) d dt η t(f ) = η t (L t,ηt (f )) (ex. : V t = U t 0) L t,ηt (f )(x) = L t(f )(x) + }{{} U t (x) }{{} free exploration acceptance rate (f (y) f (x)) η t (dy) }{{} interacting jump law L t,ηt (f ) = L t(f ) U t f + }{{} U t η t (f ) }{{} Schrödinger s op. normalizing-stabilizing term Particle model: Survival-acceptance rates Recycling jumps
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
Graphical illustration : η n η N n := 1 N 1 i N δ ξ i n
How to use the full ancestral tree model? G n 1 (x n 1 )M n (x n 1, dx n ) hyp = H n (x n 1, x n ) ν n (dx n ) Q n (d(x 0,..., x n )) = η n (dx n ) M n,ηn 1 (x n, dx n 1 ) }{{}... M 1,η0 (x 1, dx 0 ) η n 1(dx n 1) H n(x n 1,x n)
How to use the full ancestral tree model? G n 1 (x n 1 )M n (x n 1, dx n ) hyp = H n (x n 1, x n ) ν n (dx n ) Q n (d(x 0,..., x n )) = η n (dx n ) M n,ηn 1 (x n, dx n 1 ) }{{}... M 1,η0 (x 1, dx 0 ) η n 1(dx n 1) H n(x n 1,x n) Particle approximation = Random stochastic matrices Q N n (d(x 0,..., x n )) = η N n (dx n ) M n,η N n 1 (x n, dx n 1 )... M 1,η N 0 (x 1, dx 0 )
How to use the full ancestral tree model? G n 1 (x n 1 )M n (x n 1, dx n ) hyp = H n (x n 1, x n ) ν n (dx n ) Q n (d(x 0,..., x n )) = η n (dx n ) M n,ηn 1 (x n, dx n 1 ) }{{}... M 1,η0 (x 1, dx 0 ) η n 1(dx n 1) H n(x n 1,x n) Particle approximation = Random stochastic matrices Q N n (d(x 0,..., x n )) = η N n (dx n ) M n,η N n 1 (x n, dx n 1 )... M 1,η N 0 (x 1, dx 0 ) Ex.: Additive functionals f n (x 0,..., x n ) = 1 n+1 0 p n f p(x p ) Q N n (f n ) := 1 n + 1 0 p n η N n M n,η N n 1... M p+1,η N p (f p ) }{{} matrix operations
4 particle estimates Individuals ξ i n almost iid with law η n η N n = 1 N 1 i N δ ξ i n Path space models Ancestral lines almost iid with law Q n η N n := 1 N 1 i N δ Ancestral linen(i) Backward particle model Q N n (d(x 0,..., x n )) = η N n (dx n ) M n,η N n 1 (x n, dx n 1 )... M 1,η N 0 (x 1, dx 0 ) Normalizing constants Z n+1 = η p (G p ) N Zn+1 N = ηp N (G p ) 0 p n 0 p n (Unbiased)
Island models Reminder : the unbiased property E f n (X n ) G p (X p ) = E η n N (f n ) ηp N (G p ) 0 p<n = E F n (X n ) with the Island evolution Markov chain model 0 p<n 0 p<n G p (X p ) X n := ηn N and G n (X n ) = ηn N (G n ) = X n (G n ) particle model with (X n, G n (X n )) = Interacting Island particle model
Some key advantages Mean field models=stochastic linearization/perturbation technique η N n = η N n 1K n,η N n 1 + 1 N V N n with V N n V n independent centered Gaussian fields. η n = η n 1 K n,ηn 1 stable Non propagation of local sampling errors = Uniform control w.r.t. the time horizon No burning, no need to study the stability of MCMC models. Stochastic adaptive grid approximation Nonlinear system positive - beneficial interactions. Simple and natural sampling algorithm. Local conditional iid samples Stability of nonlinear sg New concentration and empirical process theory
Introduction Feynman-Kac models Some illustrations ( Part III) Some bad tempting ideas Interacting particle interpretations Concentration inequalities Current population models Particle free energy/genealogical tree models Backward particle models
Current population models Constants (c 1, c 2 ) related to (bias,variance), c finite constant Test functions/observables f n 1, (x 0, n 0, N 1). When E n = R d : F n (y) := η n ( 1(,y] ) and F N n (y) := η N n ( 1(,y] ) with y R d The probability of any of the following events is greater than 1 e x. η N n η n (fn ) c 1 N ( 1 + x + x ) + c 2 N x sup [ η N ] p η p (fp ) c x log (n + e)/n 0 p n F N n F n c d (x + 1)/N
Particle free energy/genealogical tree models Constants (c 1, c 2 ) related to (bias,variance), c finite constant (x 0, n 0, N 1). The probability of any of the following events is greater than 1 e x 1 n log ZN n 1 n log Z n c 1 N ( 1 + x + x ) + c 2 N x [ η N n Q n ] (fn ) c1 (n + 1) N ( 1 + x + x ) + c2 (n + 1) N x with η N n = Genealogical tree models := η N n (in path space)
Backward particle models Constants (c 1, c 2 ) related to (bias,variance), c finite constant. For any normalized additive functional f n with f p 1, (x 0, n 0, N 1) The probability of the following event is greater than 1 e x [ Q N ] n Q n (fn ) 1 c1 N (1 + (x + x x)) + c 2 N(n + 1)