Sampling multimodal densities in high dimensional sampling space

Size: px

Start display at page:

Download "Sampling multimodal densities in high dimensional sampling space"

Daniel Walton
6 years ago
Views:

1 Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS & Telecom ParisTech Paris, France Journées MAS Toulouse, Août 4

2 Introduction Sample from a target distribution π dλ on X R l, when π is (possibly) known up to a normalizing constant, Hereafter, to make the notations simpler, π is assumed to be normalized and in the context π is multimodal large dimension Research guided by Computational Bayesian Statistics π: the a posteriori distribution, known up to a normalizing constant Needed: algorithms to explore π, to compute expectations w.r.t. π,.

3 Introduction Talk based on joint works with Eric Moulines, Amandine Schreck (Telecom ParisTech) Pierre Priouret (Paris VI) Benjamin Jourdain, Tony Lelièvre, Gabriel Stoltz (ENPC) Estelle Kuhn (INRA)

4 Introduction Outline Introduction Usual Monte Carlo samplers The proposal mecanism Adaptive Monte Carlo samplers Conclusion Tempering-based Monte Carlo samplers Biasing Potential-based Monte Carlo sampler Convergence Analysis

5 Introduction Usual Monte Carlo samplers Usual Monte Carlo samplers Markov chain Monte Carlo (MCMC) Sample a Markov chain (X k ) k having π as unique invariant distribution Approximation: π n δ Xk n k= Example: Hastings-Metropolis algorithm with proposal kernel q(x,y) given X k, sample Y q(x k, ) accept-reject mecanism { Y with probability π(y ) q(y,x k ) X k+ = π(x k ) q(x k,y ) X k otherwise

6 Introduction Usual Monte Carlo samplers Usual Monte Carlo samplers Markov chain Monte Carlo (MCMC) Sample a Markov chain (X k ) k having π as unique invariant distribution Approximation: π n δ Xk n k= Example: Hastings-Metropolis algorithm with proposal kernel q(x,y) given X k, sample Y q(x k, ) accept-reject mecanism { Y with probability π(y ) q(y,x k ) X k+ = π(x k ) q(x k,y ) X k otherwise Importance Sampling (IS) Sample i.i.d. points (X k ) k with density q - proposal distribution chosen by the user Approximation: π n π(x k ) n q(x k= k ) δ X k

7 Introduction The proposal mecanism The proposal mecanism: MCMC Toy example in the case: Hastings-Metropolis algorithm with Gaussian proposal kernel ( q(x,y) exp ) (y x)t Σ (y x) Acceptance-rejection ratio: π(y ) π(x k ) Fig.: For three different values of Σ: [top] Plot of the chain (in R);[bottom] autocorrelation function

8 Introduction The proposal mecanism The proposal mecanism: Importance Sampling (/) Toy example: compute x π(x)dx when π(x) t(3) R ( + x 3 ) Consider in turn the proposal q equal to a Student t() and then to a Normal N (,) Nbr of samples when q t(). 5 5 Nbr of samples when q N(,) Plot of the densities q (green, blue) and π (in red) Boxplot computed from runs of the algorithm

9 Introduction The proposal mecanism The proposal mecanism: Importance Sampling (/) The efficiency of the algorithm depends upon the proposal distribution q: if few large weights and the others negligible, the approximation is likely not accurate Monitoring the convergence: there exist criteria measuring the proportion of ineffective draws : Coefficient of Variation Effective Sample Size Normalized perplexity

10 Introduction Adaptive Monte Carlo samplers Adaptive Monte Carlo samplers To fix some design parameters and make the samplers more efficient: adaptive Monte Carlo samplers were proposed Adaptive Algorithms: - The optimal design parameters are defined as the solutions of an optimality criterion. In practice, it can not be solved explicitly. - Based on the past history of the sampler, solve an approximation of this criterion and compute the design parameters for the current run of the samplers. - Repeat the scheme: adaption/sampling.

11 Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of adaptive MCMC (/) Adaptive Hastings-Metropolis algorithm with Gaussian proposal distribution ( q Σ(x,y) exp ) (y x)t Σ (y x) Design parameters: the covariance matrix Σ Optimal criterion: by using the scaling approach for Markov Chains, it is advocated pioneering work: Roberts, Gelman, Gilks (997) Σ = (.38) l covariance of π Iterative algorithm Haario, Saksman, Tamminen () Adaption Update the covariance matrix Σ t = (.38) l Σ (π) t Sampling one step of a Hastings-Metropolis algorithm with proposal q Σt to sample X t+.

Example: multimodality Target distribution: mixture of Gaussian in R.

12 Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of adaptive MCMC (/) Nevertheless, this receipe is not designed for any context. Example: multimodality Target distribution: mixture of Gaussian in R. The means of the Gaussians are indicated with a red cross. 5 6 i.i.d. draws Adaptive Hastings Metropolis: 5 6 draws

13 Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of Adaptive Importance Sampling (/) Design parameter: the proposal distribution Optimal criterion: choose the proposal density q among a (parametric) family Q as the solution of ( ) π(x) argmin q Q log π(x) λ(dx) argmax q(x) q Q log q(x) π(x) λ(dx) Iterative algorithm: O. Cappé, A. Guillin, J.M. Marin, C.Robert (4) Adaption Update the sampling distribution n q t = argmax q Q log q(x (t ) π(x (t ) k ) k ) n k= q t (X (t ) k ) Sampling Draw points (X (t) k ) k + importance reweighting π n π(x (t) k ) n k= q t(x (t) k ) δ X (t) k

14 Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of Adaptive Importance Sampling (/) Nevertheless, it is known that such Importance Sampling techniques are not robust to the dimension: when sampling on R l with l > 5, the degeneracy of the importance ratios can not be avoided. π(x k ) q(x k )

15 Introduction Conclusion Conclusion Usual adaptive Monte Carlo samplers are not robust (enough) to the context of multimodality of the target distribution π: how to jump from modes to modes. large dimension of the sampling space Importance Sampling: MCMC: π(x) q(x) π(y)q(y,x) π(x)q(x,y) = π(y) when q is a symetric kernel π(x) New Monte Carlo samplers combine tempering techniques and/or biasing potential techniques and sampling steps.

16 Tempering-based Monte Carlo samplers Outline Introduction Tempering-based Monte Carlo samplers The Equi-Energy sampler Biasing Potential-based Monte Carlo sampler Convergence Analysis

17 Tempering-based Monte Carlo samplers Tempering: the idea.5.4 densité cible.3.. densité tempérée Learn a well fitted proposal mecanism by considering tempered versions π /T (T > ) of the target distribution π. Hereafter, an example where tempering is plugged in a MCMC sampler.

18 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (/6) Kou, Zhou and Wong (6) In the MCMC proposal mecanism, allow to pick a point from an auxiliary process designed to have better mixing properties. Auxiliary process With target π β Y Y Y t- Y t θ θ θ t- θ t The process of interest With target π X X X t- X t

given the current state X t the samples Y,,Y t from the auxiliary

distribution π with probability ɛ, choose a point Y l among the

19 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (/6) Algorithm: at iteration t, given the current state X t the samples Y,,Y t from the auxiliary process with probability ɛ, draw X t+ MCMC kernel with invariant distribution π with probability ɛ, choose a point Y l among the auxiliary samples in the same energy level as X t and accept/reject the move X t+ = Y l..5 equi-energy jump target distribution.4.3 local move.. tempered distribution boundary boundary current state

20 Sampling multimodal densities in high dimensional sampling space Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (3/6), numerical illustration π is a mixture of Gaussian distributions. With 4 auxiliary processes π β4,,π β, Target density at temperature < β4 < < β <. Target density at temperature 4 Target density at temperature draws means of the components draws means of the components Target density at temperature draws means of the components 3 Target density at temperature Target density : mixture of dim Gaussian draws means of the components draws means of the components draws means of the components

21 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (4/6), numerical illustration Schreck, F. and Moulines (3) Problem: Motif sampling in biological sequences - objective: find where motifs (a subsequence of length w = ) are, in a ADN sequence of length L =. Observation: (s,,s L) with s l {A,C,G,T }. Quantity of interest: motifs position collected in (a,,a L) with a j {,,w}.... w.... w...

22 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (4/6), numerical illustration Schreck, F. and Moulines (3) Result: EE with 4 auxiliary chains and 3 energy rings

23 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (5/6), design parameters Design parameters the probability of interaction ɛ the number of auxiliary processes and the scale of the β i the energy rings the MCMC kernels for the local moves Convergence Analysis: Andrieu, Jasra, Doucet and Del Moral (7,8); F., Moulines, Priouret (); F., Moulines, Priouret and Vandekerkhove (3) Adaptive version of Equi Energy Sampler: Schreck, F. and Moulines (3); Baragatti, Grimaud and Pommeret (3)

24 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (6/6), transition kernel Let us describe the conditional distribution of X t+ given the past: P θt (X t,a) = ( ɛ)p (X t,a) + ɛ g(x t,y)θ t(dy) }{{} proposition with selection where θ t = t t δ Yk k=

25 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (6/6), transition kernel Let us describe the conditional distribution of X t+ given the past: P θt (X t,a) = ( ɛ)p (X t,a) + ɛ π(y) g(y,xt) πβ (X t) π(x t)g(x t,y) π β (y) }{{} acceptance-rejection g(x t,y)θ t(dy) }{{} proposition with selection where θ t = t t δ Yk k=

26 Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (6/6), transition kernel Let us describe the conditional distribution of X t+ given the past: P θt (X t,a) = ( ɛ)p (X t,a) + ɛ π(y) g(y,xt) πβ (X t) π(x t)g(x t,y) π β (y) }{{} acceptance-rejection g(x t,y)θ t(dy) }{{} proposition with selection where θ t = t t δ Yk k=

27 Biasing Potential-based Monte Carlo sampler Outline Introduction Tempering-based Monte Carlo samplers Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Convergence Analysis

28 Biasing Potential-based Monte Carlo sampler The idea Among the Importance Sampling Monte Carlo sampler π n n t= π(x t) q (X t) δx t where (X t) t approximates q Idea from the molecular dynamics field; see e.g. Chopin, Lelièvre and Stoltz () for the extension to Computational Statistics Choose a proposal distribution of the form q (x) = π(x) exp ( A(ξ(x))) where A(ξ(x)) is a biasing potential depending on few directions of metastability ξ(x) and such that q is less multimodal than π. Example: Consider a partition of X in d strata: X,,X d and set ξ(x) = i for any x X i.

29 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Example: Wang-Landau algorithms (/4) Wang and Landau () - very popular algorithm in the molecular dynamics field Wang-Landau type algorithms: learn adaptively the proposal distribution At iteration t:. Proposal Distribution q * At the same time, Learn the proposal distribution Sample points X k approximating this proposal distribution Compute the associated importance weights θ k - approximation q t of q - draw X t approximating q t, and compute its associated importance weight π(x t)/q t(x t). Approximate the target π π n n t= π(x t) q t(x t) δx t

30 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Wang-Landau algorithms (/4) Key idea: q is obtained by locally biasing the target distribution d π(x) q (x) θ I X i (x) (i) i= where - X,,X d is a partition of the sampling space X. - the weights θ (i) are given by θ (i) = π(x) dx. X i With this biasing strategy, the proposal distribution visits each stratum X i with the same frequency X i q (x) dx = d.

31 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Wang-Landau algorithms (/4) Key idea: q is obtained by locally biasing the target distribution d π(x) q (x) θ I X i (x) (i) i= where - X,,X d is a partition of the sampling space X. - the weights θ (i) are given by θ (i) = π(x) dx. X i With this biasing strategy, the proposal distribution visits each stratum X i with the same frequency X i q (x) dx = d. Unfortunately, - θ (i) are unknown and have to be learnt on the fly. - exact sampling under q is not possible, but it can be replaced by a MCMC step.

32 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Wang-Landau algorithms (3/4) Wang-Landau algorithm: at iteration t, given the current point X t the current bias θ t = (θ t(),,θ t(d)) Draw a new point X t+ MCMC with invariant distribution q t(x) Update the bias θ t+. d i= π(x) θ t(i) I X i (x) 3 In parallel, update the approximation of π ( ) π n d d θ t(i)i Xi (X t) n t= i= δ Xt

33 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Wang-Landau algorithms (4/4) To learn θ on the fly: Different strategies in the literature, based on Stochastic Approximation algorithms with controlled Markov chain dynamics (X t ) t θ t+(i) = θ t(i) + γ t+ H i(θ t,x t+) where H i is chosen so that - penalize the stratum currently visited: H i(θ t,x t+) > iff X t+ X i - the mean field function θ H(θ,x) q (x)dx admits θ as the unique root.

34 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Wang-Landau algorithms (4/4) To learn θ on the fly: Different strategies in the literature, based on Stochastic Approximation algorithms with controlled Markov chain dynamics (X t ) t θ t+(i) = θ t(i) + γ t+ H i(θ t,x t+) where H i is chosen so that - penalize the stratum currently visited: H i(θ t,x t+) > iff X t+ X i - the mean field function θ H(θ,x) q (x)dx admits θ as the unique root. Two examples of updating rules: if X t+ X i θ t+(i) = θ t(i) + γ t+ θ t(i)( θ t(i)) θ t+ (k) = θ t(k) γ t+ θ t(i)θ t(k) k i S t+ (j) = S t(j) + γ θ t(j) I Xj (X t+ ) θ t+ (j) = S t+ (j) d r= S t+(r)

35 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Transition kernel The conditional distribution of X t+ given the past is a MCMC kernel with invariant distribution q t, denoted by P θt Example: HM with Gaussian proposal distribution ( P θ (x,a) = π(y) θ(str(x)) A π(x) θ(str(y)) ( + δ x(a) π(y) π(x) ) N (x,σ)[dy] ) θ(str(x)) N (x,σ)[dy] θ(str(y))

Landau algorithm: 5 strata, obtained by partitioning the energy

36 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Numerical illustration, a toy example Target distribution: mixture of Gaussian in R. The means of the Gaussians are indicated with a red cross Wang Landau algorithm: 5 strata, obtained by partitioning the energy levels. 5 6 draws approximating q : the sampler was able to jump the deep valleys and draw points around all the modes.

37 Sampling multimodal densities in high dimensional sampling space Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Numerical illustration: Structure of a protein In biophysics, structure of a protein from its sequence. AB model: two types of monomers A (hydrophobic) and B (hydophilic), linked by rigidbonds of unit length to form (D) chains. Given a sequence, what is the optimal shape of the N monomers? Minimize the energy function H(x) on where x = (x,, x,3,, x N,N ) [ π,π] N N H(x) = 4 i= ( cos(xi,i+ ) ) N + 4 N i= j=i+ r ij C(σ i,σ j ) r ij 6 x i,j is the angle between i-th and j-th bond vector r ij is the distance between monomers i,j C(σ i,σ j ) = (resp. / and /) between monomers AA (resp. BB and AB).

38 Sampling multimodal densities in high dimensional sampling space Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Numerical illustration: Structure of a protein In biophysics, structure of a protein from its sequence. AB model: two types of monomers A (hydrophobic) and B (hydophilic), linked by rigidbonds of unit length to form (D) chains. Given a sequence, what is the optimal shape of the N monomers? Minimize the energy function H(x) on min H(x) max π n(x) exp( β nh(x)) β n >

39 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Numerical illustration: Structure of a protein In biophysics, structure of a protein from its sequence. AB model: two types of monomers A (hydrophobic) and B (hydophilic), linked by rigidbonds of unit length to form (D) chains. Given a sequence, what is the optimal shape of the N monomers? (left) WL: initial config with energy.945; (center) WL: optimal config with energy 3.95; (right) optimal config in the literature with energy 3.94

40 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Design parameters (/4) Choice of the biasing potential A(ξ(x)) i.e. in the Wang-Landau algorithms - Number of strata and the strata - The update strategy for the bias vector θ t The MCMC kernels with target distribution q t Convergence analysis: Liang (5); Liang, Liu and Carroll (7); Atchadé and Liu (); Jacob and Ryder (); F., Jourdain, Kuhn, Lelièvre and Stoltz (4a); F., Jourdain, Lelièvre and Stoltz (submitted) Efficiency analysis: F., Jourdain, Kuhn, Lelièvre and Stoltz (4b); F., Jourdain, Lelièvre and Stoltz (submitted) Adaptive Wang Landau: Bornn, Jacob, Del Moral and Doucet ()

41 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Design parameters (/4) Role on the limiting behavior of the sampler: convergence occurs whatever the number of strata and the strata, for many MCMC samplers and many update strategies of the bias vector. Role on the transient phase of the sampler: for example, how long is the exit time from a mode? Let us illustrate the role of some design parameters on the exit time from a mode when: π(x,x ) exp( β U(x,x ))I [ R,R] (x ) d strata (see the right plot); the chains are initialised at (,)

42 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Design parameters (3/4) beta=4 beta= x x 6 Fig.: [left] Wang Landau, T = and d = 48. [right] Hastings Metropolis, T = 6 ; the red line is at x =

43 Biasing Potential-based Monte Carlo sampler Wang-Landau samplers Design parameters (4/4) F., Jourdain, Lelièvre, Stoltz (4) We compute the (mean) exit times t β from the left mode (time to reach the right mode x > ) for different values of d and [left] a fixed proposal scale σ in the MCMC samplers; [right] a proposal scale σ /d in the MCMC samplers. We observe t β = C(β,σ,d) exp(βµ) with a slope µ independent of β e+9 e+8 e+7 d = 96 d = 48 d = 4 d = d = 6 d = 3 slope.5 e+9 e+8 e+7 d = 4 d = d = 6 d = 3 slope.5 Exit time e+6 Exit time e β β

44 Convergence Analysis Outline Introduction Tempering-based Monte Carlo samplers Biasing Potential-based Monte Carlo sampler Convergence Analysis Controlled Markov chains Sufficient conditions for the cvg in distribution Convergence results

45 Convergence Analysis Controlled Markov chains Controlled Markov chains (/) These new samplers combine adaption/interaction and sampling: the draws (X t) t are from a controlled Markov chain E [h(x t+) F t] = h(y)p θt (X t,dy) where (P θ,θ Θ) is a family of Markov kernels having an invariant distribution π θ. Examples Wang Landau: the conditional distribution X t+ F t is a MCMC kernel with invariant distribution distribution q t d π(x) i= θ t (i) I X i (x). Here, π θ depends on θ and its expression is known. Equi-Energy: the conditional distribution X t+ F t is a MCMC kernel indexed by the empirical distribution θ t of the auxiliary process. Here, π θ exists but its expression is unknown. 3 Adaptive Hastings-Metropolis: the conditional distribution X t+ F t is a MCMC kernel with invariant distribution π and proposal distribution N (X t,θ t) Here, all the kernels have the same invariant distribution.

46 Convergence Analysis Controlled Markov chains Controlled Markov chains (/) Question: let (P θ,θ Θ) be a family of Markov kernels having the same invariant distribution π. Let (θ t) t be some F t-adapted random processes and draw X t+ F t P θt (X t, ) Does (X t) t converges (say in distribution) to π?

47 Convergence Analysis Controlled Markov chains Controlled Markov chains (/) No. Question: let (P θ,θ Θ) be a family of Markov kernels having the same invariant distribution π. Let (θ t) t be some F t-adapted random processes and draw X t+ F t P θt (X t, ) Does (X t) t converges (say in distribution) to π? Example: if X t = if X t = X t+ P (X t, ) X t+ P (X t, ) where ( ) tl t P l = l. t l t l We have πp l = π with π (,) but the transition matrix of (X t) t is ( ) t t P = with invariant distribution π (t t t,t )

48 Convergence Analysis Sufficient conditions for the cvg in distribution Sufficient conditions for the cvg in distribution (/3) Kernel P θt-n (X t-n, ) Kernel P θt- (X t-, ) X t-n X t-n+ X t Adaptive MCMC θ t-n θ t-n+ θ t Time t-n Time t-n+ Time t X t-n Kernel P θt-n iterated N times X t Frozen chain Under π θt-n X t In the limit

49 Convergence Analysis Sufficient conditions for the cvg in distribution Sufficient conditions for the cvg in distribution (/3) ] ] E [h(x t ) past t N h(y) π θ (dy) = E [h(x t ) past t N h(y) P N θ (X t N t N,dy) + h(y) P N θ (X t N t N,dy) h(y) π θn N (dy) + h(y) π θn N (dy) h(y) π θ (dy) Diminishing adaption condition Roughly speaking: dist(p θ,p θ ) dist(θ,θ ) If θ t θ t are close, then the transition kernels P θt and P θt are close also. Containment condition Roughly speaking: lim dist(p θ N,π θ ) = N at some rate depending smoothly on θ. Regularity in θ of π θ so that lim t θ t = θ = dist (π θt π θ )

50 Convergence Analysis Sufficient conditions for the cvg in distribution Sufficient conditions for the cvg in distribution (3/3) F., Moulines, Priouret () Assume A. (Containment condition) π θ s.t. π θ P θ = π θ for any ɛ >, there exists a non-decreasing positive sequence {r ɛ(n),n } such that lim sup n r ɛ(n)/n = and lim sup E n B. (Diminishing adaptation) For any ɛ >, r ɛ(n) lim n j= [ E [ P rɛ(n) θ n rɛ(n) (X n rɛ(n), ) π θn rɛ(n) tv ] ɛ ] sup P θn rɛ(n)+j (x, ) P θn rɛ(n) (x, ) tv = x C. (Convergence of the invariant distributions) (π θn ) n converges weakly to π almost-surely. Then for any bounded and continuous function f lim E [f(x n n)] = π(f)

51 Convergence Analysis Convergence results Convergence results The literature provides sufficient conditions for Convergence in distribution of (X t) t Strong law of large numbers for (X t) t Central Limit Theorem for (X t) t G.O. Roberts, J.S. Rosenthal. Coupling and Ergodicity of Adaptive Markov chain Monte Carlo algorithms. J. Appl. Prob. (7) G. Fort, E. Moulines, P. Priouret. Convergence of adaptive MCMC algorithms: ergodicity and law of large numbers. Ann. Stat. G. Fort, E. Moulines, P. Priouret and P. Vandekerkhove. A Central Limit Theorem for Adaptive and Interacting Markov Chain. Bernoulli, 3. Conditions successfully applied to establish the convergence of Adaptive Hastings-Metropolis, (adaptive) Equi-Energy, Wang-Landau,

52 Convergence Analysis Convergence results Example: Application to Wang Landau (/) Theorem ( F., Jourdain, Kuhn, Lelièvre, Stoltz (4-a)) Assume Then for any bounded measurable function f lim E [f(x t t)] = f(x) q (x) dλ(x) lim T T T f(x t) = f(x) q (x) dλ(x) almost-surely t=

53 Convergence Analysis Convergence results Example: Application to Wang Landau (/) Theorem ( F., Jourdain, Kuhn, Lelièvre, Stoltz (4-a)) Assume The target distribution π dλ satisfies < inf X π sup X π < and inf i π(x i) >. For any θ, P θ is a Hastings-Metropolis kernel with invariant distribution d i= π(x) θ(i) I X i (x) and proposal distribution q(x,y)dλ(y) such that inf X q >. 3 The step-size sequence is non-increasing, positive, γ t = γt < t t

54 Convergence Analysis Convergence results Example: Application to Wang Landau (/) Sketch of proof (.) The containment condition: There exist ρ (,) and C such that sup x sup Pθ(x, ) t π θ TV C ρ t θ (.) The diminishing adaption condition: There exists C such that for any θ,θ sup P θ (x, ) P θ (x, ) TV C x d θ(i) θ (i) The update of the parameter satisfies: there exists C such that t i= θ t+ θ t C γ t+ (3.) Convergence of π θn Requires to prove the convergence of Stochastic Approximation algorithm with controlled Markov chain dynamics.

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from