Stochastic Approximation Monte Carlo and Its Applications

Size: px
Start display at page:

Download "Stochastic Approximation Monte Carlo and Its Applications"

Transcription

1 Stochastic Approximation Monte Carlo and Its Applications Faming Liang Department of Statistics Texas A&M University

2 1. Liang, F., Liu, C. and Carroll, R.J. (2007) Stochastic approximation in Monte Carlo computation. JASA, 102, Liang, F. (2007) Annealing stochastic approximation Monte Carlo for neural network training. Machine Learning, 68, Liang, F. (2007) Continuous contour Monte Carlo for marginal density estimation with an application to a spatial statistical model, JCGS, 16(3), Liang, F. (2007) Improving SAMC using smoothing methods: theory and applications. Annals of Statistics, to appear. 5. Cheon, S. and Liang, F. (2008) Phylogenetic Tree Reconstruction Using Stochastic Approximation Monte Carlo. BioSystems, 91, Liang, F. (2007) Improving Stochastic Approximation Markov chain Monte Carlo by Trajectory Averaging. Submitted to Bernoulli.

3 7. Liang, F., Chen, M-H. and Joseph G. Ibrahim (2007) SAMC for Monte Carlo Integration with applications to high dimensional regression problems. Submitted to Statistica Sinica.

4 Stochastic Approximation Monte Carlo Problems Two motivation examples Example 1: Suppose we are interested in sampling from the following mixture Gaussian distribution, f(x) = 1 3 N 2(µ 1, Σ 1 ) N 2(µ 2, Σ 2 ) N 2(µ 3, Σ 3 ), where µ 1 = ( ) 8 8 Σ 1 = ( 1 ) µ 2 = ( 6 6) Σ 2 = ( 1 ) µ 3 = ( 0 0) Σ 3 = ( )

5 Stochastic Approximation Monte Carlo Problems (a) (b) (c) (d) Figure 1: Plot (a) shows the contour plot of the density.

6 Stochastic Approximation Monte Carlo Problems Example 2: Consider minimizing the following function on [ 1.1, 1.1] 2 U(x, y) = (x sin(20y) + y sin(20x)) 2 cosh(sin(10x)x) (x cos(10y) y sin(10x)) 2 cosh(cos(20y)y), whose global minimum is , attained at (x, y) = ( , ) and (1.0445, ).

7 Stochastic Approximation Monte Carlo Problems (a) (b) Z Y X y x Figure 2: Grid and contour representation of a function defined on [ 1.1, 1.1] 2.

8 Stochastic Approximation Monte Carlo Problems The above examples can be formulated to simulate from a Boltzmann distribution, f(x) = cψ(x), x X, (1) where c is a constant, ψ(x) = exp( U(x)/τ), τ is called the temperature, and U(x) is called the energy function. Two basic MCMC algorithms (1) Metropolis-Hastings algorithm (Metropolis et al, 1953; Hastings, 1970) (2) The Gibbs sampler. (Geman and Geman, 1984).

9 Stochastic Approximation Monte Carlo Problems Metropolis-Hastings Algorithm (a) Propose a new state y from a proposal distribution T (x t y), where x t denotes the state of the Markov chain at time t. (b) Accept y with the probability min{ ψ(y)t (y x t) ψ(x t )T (x t y), 1}. If it is accepted, set x t+1 = y, otherwise, set x t+1 = x t.

10 Stochastic Approximation Monte Carlo Problems Difficulty On the energy landscape of these systems, there are a multitude of local minima separated by high energy barriers. The sampler tends to get trapped in one of local energy minima indefinitely, rendering the simulation ineffective. Typical Problems in Scientific Computation 1. Protein folding. 2. Phylogenetic tree reconstruction. 3. Neural Networks. 4. Some spatial statistical problems, e.g., Ising model, disease mapping.

11 Stochastic Approximation Monte Carlo Literature review Strategies for improving MCMC 1. The use of auxiliary variables: Swendsen-Wang algorithm (Swendsen and Wang, 1987) Parallel tempering (Geyer, 1991) Simulated tempering (Marinari and Parisi, 1992) Evolutionary Monte Carlo (Liang and Wong, 2001) Strength and weakness: The temperature is typically treated as an auxiliary variable. Simulations at high temperatures broaden the space of sampling, and thus are able to help the system to escape from local energy minima.

12 Stochastic Approximation Monte Carlo Literature review 2. The use of past samples: Multicanonical (Berg and Neuhaus, 1991) 1/k-ensemble algorithm (Hesselbo and Stinchcombe, 1995; Liang, 2004) Wang-Landau (WL) Algorithm (Wang and Landau, 2001; Liang, 2005) Dynamic weighting (Wong and Liang, 1997) Dynamically weighted importance sampling (Liang, 2002)

13 Stochastic Approximation Monte Carlo Literature review Strength and weakness: Dynamic weighting: The variability of the weights is too high. Multicanonical and related algorithms: They are usually used for discrete systems. In the multicanonical algorithm, the trial distribution is defined as: f (x) = 1 #{y : U(y) = U(x)}, where x and y take values on a discrete set. There is no rigorous theory to support their convergence.

14 Stochastic Approximation Monte Carlo Algorithm Basic Idea Partition the sample space into different subregions: E 1,..., E m, M i=1 E i = X, and E i E j = for i j. Let g i = E i ψ(x)dx, and choose π = (π 1,..., π m ), π i 0, and i π i = 1. Sampling from the distribution p θ (x) m i=1 ψ(x) I(x E i). e θ(i) If θ (i) = log(g i /π i ) for all i, sampling from p θ (x) will result in a random walk in the space of subregions with each subregion being sampled with probability π i (viewing each subregion as a single point ). Therefore, sampling from p θ (x) can avoid the local trap problem encountered in sampling from f(x).

15 Stochastic Approximation Monte Carlo Algorithm Algorithm Setting Condition (A 1 ) The sequence {a k } k=0 and satisfies the conditions: a k =, k=1 and for some τ (0, 1) lim (ka k ) =, k is non-increasing, positive lim (a 1 k+1 a 1 k ) = 0, k (2) a (1+τ)/2 k k < (3) k=1 It is clear that a k = 1/k η, η (1/2, 1], satisfies (2). Then (3) holds for any τ > 1/η 1.

16 Stochastic Approximation Monte Carlo Algorithm Algorithm 1. (Sampling) Draw sample x k+1 with a single MH iteration for which the invariant distribution is p k (x) = 1 Z k 2. (Weight updating) Set m i=1 ψ(x) I(x E i ). e θ(i) k θ k+1 = θ k + a k+1 H(θ k, x k+1 ), where H(θ k, x k+1 ) = e xk+1 π and e xk+1 = (I(x k+1 E 1 ),..., I(x k+1 E m )). 3. (Varying trunaction) If θ k+1 Θ, set θ k+1 = θ k+1. Otherwise, set θ k+1 = θ k+1 + c, where c is chosen such that θ k+1 + c Θ.

17 Stochastic Approximation Monte Carlo Algorithm Lyapunov condition on h(θ) Let x, y denote the Euclidean inner product. (A 2 ) The function h : Θ R d is continuous, and there exists a continuously differentiable function v : Θ [0, ) such that (i) For any integer M > 0, the level set V M = {θ Θ, v(θ) M} Θ is compact. (ii) There exists M 0 > 0 such that Θ = {θ Θ, v(θ), h(θ) = 0} int(v M0 ), and v(θ), h(θ) < 0 for any θ Θ\V M0, where int(a) denotes the interior of set A. (iii) For all θ Θ, v(θ), h(θ) 0, and int(v( Θ)) =.

18 Stochastic Approximation Monte Carlo Algorithm Stability condition on h(θ) (A 3 ) The mean field function h(θ) is measurable and locally bounded. There exist a stable matrix F (i.e., all eigenvalues of F are with negative real parts), γ > 0, and ρ (τ, 1] such that for any point θ 0 Θ, h(θ) F (θ θ 0 ) c 1 θ θ 0 1+ρ, θ {θ : θ θ 0 γ}, where c 1 is a constant.

19 Stochastic Approximation Monte Carlo Algorithm Drift condition (A 4 ) For any θ Θ, the transition kernel P θ is irreducible and aperiodic. In addition, there exists a function V : X κ [1, ) and constants α 2 and β (0, 1] such that for any compact subset K Θ, (i) There exist a set C X, an integer l, constants 0 < λ < 1, b, ς, δ > 0 and a probability measure ν such that sup θ K sup θ K sup θ K PθV l α (x) λv α (x) + bi(x C), x X (4). P θ V α (x) ςv α (x), x X. (5) P l θ(x, A) δν(a), x C, A B. (6)

20 Stochastic Approximation Monte Carlo Algorithm (ii) There exists a constant c such that for all x X, sup H(θ, x) cv (x). (7) θ K sup H(θ, x) H(θ, x) cv (x) θ θ (8) β. (θ,θ ) K (iii) There exists a constant c such that for all (θ, θ ) K K, P θ g P θ g V c 2 g V θ θ β, g L V. (9) P θ g P θ g V α c 2 g V α θ θ β, g L(10) V α.

21 Stochastic Approximation Monte Carlo Algorithm Theoretical Results Lemma 1 Assume the drift condition (A 4 ) and sup x X V (x) <. Let ɛ k = H(θ k, x k+1 ) h(θ k ). There exist R d -valued random processes {e k } k 1, {ν k } k 1, and {ς k } k 1 defined on a probability space (Ω, F, P) such that (i) ɛ k = e k + ν k + ς k. (ii) {e k } is a martingale difference sequence, and 1 n n k=1 e k N(0, Q) in distribution, where Q = lim k E(e k e k ). (iii) E ν k = O(a (1+τ)/2 k ), where τ is given in condition (A 1 ). (iv) n k=0 a kς k = O(a n ).

22 Stochastic Approximation Monte Carlo Algorithm THEOREM 1 (Convergence) (Liang et al., 2007) Let α σ denote the number of itertaions for which the σ-th truncation occurs in the SAMC simulation. Assume the conditions (A 1 ) and (A 2 ) hold, and there exists a drift function V (x) such that sup x X V (x) < and the drift condition (A 4 ) holds. Then there exists a number σ such that α σ < a.s., α σ+1 = a.s., and {θ k } given by the SAMC algorithm has no truncation for k α σ, i.e., θ k+1 = θ k + a k H(θ k, x k+1 ), k α σ, and θ (i) k { c + log( E i ψ(x)dx) log(π i + ν), if E i,. if E i =, (11) where ν = j {i:e i = } π j/(m m 0 ) and m 0 is the number of empty subregions. The constant c can be determined by imposing an extra constraint on θ k, e.g., θ (m) k = 0 for all k 0.

23 Stochastic Approximation Monte Carlo Algorithm THEOREM 2 (Averaging Normality) (Liang 2007, submitted) Under the conditions of Theorem 2, we have k θ k = 1 k i=1 θ i is asymptotically efficient; that is, k( θk θ 0 ) N(0, S) as k, where S = F 1 Q(F 1 ) T, and Q is as defined in Lemma 1.

24 Stochastic Approximation Monte Carlo Algorithm THEOREM 3 (IWIW property) (Liang et al., 2007, submitted) If the desired sampling distribution is uniform over all the subregions, i.e., π 1 = = π m = 1/m SAMC is invariant with respect to the importance weights (IWIW). This Theorem implies that the integral E f h(x) can be estimated online by E f h(x) = n k=1 w kh(x k ) n k=1 w, k where w k = m i=1 eθ(i) k I(xk E i ). As n, E f h(x) E f h(x), for the same reason that the usual importance sampling estimate converges.

25 Stochastic Approximation Monte Carlo Algorithm Implementation Issues 1. Sample space partition. It can be made according to our goal and the complexity of the problem. Here are some examples: (a) Importance sampling: Energy function, maximum energy difference 2. (b) Model selection: Model index. 2. Desired sampling distribution. (a) Set π to bias the sampling to low energy regions if we aim to minimize the energy function. (b) Set π to be uniform if we aim at estimation.

26 Stochastic Approximation Monte Carlo Algorithm 3. Choices of η, t 0 and the number of iterations. The diagnostic statistic: ɛ f (E i ) = { bπi (π i +ν) π i 100%, +ν if E i, 0, if E i =, (12) for i = 1,..., m. If max m i=1 ɛ f (E i ) is large, say, greater than 10%, the convergence of the run should be questioned. In this case, SAMC should be re-run with more iterations, a larger value of t 0, or a smaller value of η.

27 SAMC Applications Demonstration x f(x) Table 1: The unnormalized mass function of the 10-state distribution. Table 2: Comparison of SAMC and MH for the 10-state example, where the Bias and Standard Error (of the Bias) were calculated based on 100 independent runs. Algorithm Bias ( 10 3 ) Standard Error ( 10 3 ) CPU time (seconds) SAMC MH The sample space was partitioned according to the mass function into five subregions: E 1 = {8}, E 2 = {2}, E 3 = {5, 6}, E 4 = {3, 9} and E 5 = {1, 4, 7, 10}. The desired sampling distribution is uniform over 5 subregions.

28 SAMC Applications Demonstration (a) MH samples (b) SAMC samples (c) Log weight of SAMC samples ACF ACF Lag Lag iterations (in thousands) Figure 3: Computational results for the 10-state example. (a) Autocorrelation plot of the MH samples. (b) Autocorrelation plot of the SAMC samples. (c) Log-weight of the SAMC samples.

29 Importance Sampling Spatial Autologistic Models Let s = {s i : i D} denote the observed binary data, where s i is called a spin and D is the set of indices of the spins. Let D denote the total number of spins in D, and N(i) denote a set of neighbors of spin i. The likelihood function of the model is f(s α, β) = 1 ϕ(α, β) exp where (α, β) Ω, and ϕ(α, β) = for all possible s α i D s i + β 2 i D exp α s j + β 2 j D s i i D j N(i) s i s j j N(i), s j (13) When β is large, say, 0.5, the configuration s tends to have large clusters of the same orientation, which fluctuate very slowly..

30 Importance Sampling Spatial Autologistic Models Methods to resolve the difficulty in normalizing constant evaluation: (a) Working on a pseudo-likelihood function (Besag, 1975): P L(α, β s) = i D e s i(α+β P j N(i) s j) e α+β P j N(i) s j + e α β P j N(i) s j. (14) The resulting estimate is called MPLE. (b) Working on a Monte Carlo log-likelihood (up to a constant)(geyer and Thompson, 1992): L n (α, β s) = α i D s i + β 2 i D s i ( j N(i) s j ) log[ 1 n n k=1 ψ(α, β, s (k) ) ψ(α, β, s (k) ) ]. (15) The resulting estimate is called MCMLE.

31 Importance Sampling Spatial Autologistic Models A natural choice for the trial distribution is a mixture distribution of the form p mix(s) = 1 m m j=1 p(s α j, β j ), (16) where the values of the parameters (α 1, β 1 ),..., (α m, β m ) are prespecified. To complete this idea, the key is to estimate ϕ(α j, β j ),..., ϕ(α m, β m ) (up to a common multiplicative constant).

32 Importance Sampling Spatial Autologistic Models Estimate single-mcmle SAMC RMSE(T1 sim ) RMSE(T2 sim ) Table 3: Comparison of the accuracy of the SAMC and single-mcmles for the US cancer data. T 1 = i s i, T 2 = i s i( j s j)/2, RMSE(T sim 5 T sim i k=1 (T sim,k i Ti obs ) 2 /5, where i = 1, 2, and T sim,k i calculated based on the k th estimate of (α, β). i ) is calculated as denotes the value of

33 Importance Sampling Spatial Autologistic Models True Observations Fitted mortality rate Figure 4: The U.S. cancer mortality rate data. (a) The mortality map of liver and gallbladder cancer (including bile ducts) for white males during the decade The black squares denote the counties of high cancer mortality rate, and the white squares denote the counties of low cancer mortality rate. (b) Fitted cancer mortality rates. The cancer mortality rate of each county is represented by the gray level of the corresponding square.

34 Importance Sampling Spatial Autologistic Models (a) (b) probability log normalizing constant beta alpha beta alpha Figure 5: Computational results of SAMC. (a) Estimate of log ϕ(α, β) on a lattice with α{ 0.5, 0.45,..., 0.5} and β {0, 0.05,..., 0.5}. (b) Estimate of P (s i = +1 α, β) on a lattice with α { 0.49, 0.47,..., 0.49} and β {0.01, 0.03,..., 0.49}.

35 Importance Sampling Spatial Autologistic Models (a) SAMC (b) SAMC alpha beta *10^7 4*10^7 6*10^7 8*10^7 10^8 0 2*10^7 4*10^7 6*10^7 8*10^7 10^8 iteration iteration (c) RJMCMC (d) RJMCMC alpha beta *10^7 4*10^7 6*10^7 8*10^7 10^8 0 2*10^7 4*10^7 6*10^7 8*10^7 10^8 iteration iteration Figure 6: Comparison of SAMC and RJMCMC. Plots (a) and (b) show, respectively, the sample paths of α and β in a run of SAMC. Plots (c) and (d) show, respectively, the sample paths of α and β in a run of RJMCMC.

36 Kernel Smoothing SSAMC Motivation for Smoothing SAMC Intuitively, x t may contain some information on its neighboring subregions, so the visiting to its neighboring subregions should also be penalized to some extent in the next iteration. The efficiency of SAMC can be improved by including at each iteration a smoothing step, which distributes the information contained in each sample to its neighboring subregions. The new algorithm is thus called smoothing-samc or SSAMC for simplicity.

37 Kernel Smoothing SSAMC Motivation Examples We note that for many problems, E 1,..., E m can be regarded as a sequence of naturally ordered categories. Here are some examples. Model selection: The model space X can be partitioned according to the index of models, and the subregions can be naturally ordered according to the number of parameters contained in each model. Function optimization: The solution space X can be partitioned according to the objective function, and the subregions can also be naturally ordered according to the objective function.

38 Kernel Smoothing SSAMC are samples generated using a MH ker- k,..., x(κ) k nel with the invariant distribution p θk (x). Suppose that x (1) Since κ is usually a small number, say, 10 to 20, the samples form a sparse frequency vector e xk = (e (i) k,..., e(m) k ) with e (i) k = κ l=1 I(x(l) k E i).

39 Kernel Smoothing SSAMC The frequency estimate can be improved by a smoothing method. The Nadaraya-Watson (NW) kernel estimator works as follows: p (i) k = m j=1 W ( Λ(i j) mh k ) e (j) k κ m j=1 W ( Λ(i j) mh k ), (17) where W (z) is a kernel function with bandwidth h k, and Λ is a rough estimate of the range of λ(x), x X. By assuming that W (z) has a bounded support, we can show p (i) k e(i) k /κ = O(h k).

40 Algorithm SSAMC k,..., x(κ) k using the MH algorithm with the proposal distribution q(x k (i), ) and the invariant distribution p θk (x), where x (0) k = x (κ) k 1. (a) (Sampling) Simulate samples x (1) (b) (Smoothing) Calculate p k = ( p (i) k,..., p(m) k ) using a kernel smoothing method. (c) (Weight updating) Set θ = θ k + a k+1 ( p k π). (18) If θ Θ, set θ k+1 = θ ; otherwise, set θ k+1 = θ + c, where c can be any number which satisfies the condition θ + c Θ.

41 Change-Point identification SSAMC Notations Let Z = (z 1, z 2,, z n ) denote a sequence of independent observations. Let ϑ (k) denote a configuration of ϑ with k ones, which represents a model of k change points. Let η (k) = (ϑ (k), µ 1, σ 2 1,, µ k+1, σ 2 k+1 ). Let X k denote the space of models with k change points, ϑ (k) X k, and X = n k=0 X k.

42 Change-Point identification SSAMC Assuming appropriate prior distributions, integrating out the parameters µ 1, σ1, 2, µ k+1, σk+1 2 from the full posterior distribution, and taking a logarithm, we have log P (ϑ (k) Z) = a k + k ( c i c i log 2π k+1 i=1 + α ) log [ β {1 2 log(c i c i 1 ) log Γ( c i c i c i j=c i 1 +1 zj 2 ( c i j=c i 1 +1 z j) 2 ]}. 2(c i c i 1 ) (19) + α)

43 Change-Point identification SSAMC Observation Time Figure 7: Comparison of the true change-point pattern (horizontal lines) and its MAP estimate (vertical lines).

44 Change-Point identification SSAMC SSAMC SAMC RJMCMC k prob(%) SD prob(%) SD prob(%) SD Table 4: The estimated posterior distribution P (X k Z) for the change-point identification example. SD: standard deviation of the estimates.

45 SAMC Applications Stochastic Optimization Algorithm Mean Standard Error Minimum Maximum Proportion SAMC Annealing Annealing Annealing Table 5: Comparison of SAMC and simulated annealing. Annealing-1, Annealing-2, and Annealing-3 correspond to the runs with t high = 5, t high = 2, and t high = 1, respectively.

46 SAMC Applications Stochastic Optimization x y O O (a) GWL x y O O (b) Metropolis (t=5) x y O O (c) Metropolis (t=0.1) Figure 8: Sample paths of SAMC and the Metropolis-Hastings algorithm. The circles show the global minimum locations. (a) The sample path of a SAMC run. (b) The sample path of a Metropolis-Hastings run at t = 5. (c) The sample path of a Metropolis- Hastings run at t = 0.1.

47 SAMC Applications Stochastic Optimization Annealing SAMC The algorithm initiates the search in the entire sample space X 0 = m i=1 E i, and then iteratively searches in the set X t = ϖ(u (t) min +ℵ) i=1 E i, t = 1, 2,..., (20) where ϖ(u) denotes the index of the subregion that a sample x with energy u belongs to, U (t) min is the best function value obtained until iteration t, and ℵ > 0 is a user specified parameter which determines the broadness of the sample space at each iteration. Since the sample space shrinks iteration by iteration, the algorithm is called annealing SAMC.

48 SAMC Applications Stochastic Optimization Algorithm Mean S.D. Minimum Maximum Succ Iter( 10 6 ) Time ASAMC m SAMC m SA m SA m BFGS s Table 6: Comparison of the ASAMC and SA algorithms for the two-spiral example. Succ denotes the number of runs (out of 20) found a solution with energy less than 0.2.

49 SAMC Applications Stochastic Optimization x y (a) x y (b) Figure 9: Two-spiral problem: Classification maps learned by a MLP of 30 hidden units. The black and white points show the training data for two different spirals. (a) The classification map learned in one run. (b) The classification map learned in 20 runs.

50 SAMC Applications Stochastic Optimization Advantages of SAMC over simulated annealing 1. Simulated Annealing: It requires the temperature decrease so lowly, 1 at the rate, that it is impossible to be implemented exactly in log(t) practice. 2. SAMC: The modification factor γ can decrease much faster, at the rate 1 t. In an annealing version of SAMC, X t will converge in distribution to f(x)i(x E ɛ ) as t, where E ɛ = {x : H(x) < H min + ɛ}. Further work: convergence rate of annealing SAMC.

51 SAMC Discussion Other Applications Importance sampling (Liang et al., 2007, JASA) Marginal density estimation (Liang, 2007, JCGS) Normalizing constant estimation (Liang, 2007, Encyclsoepedia of Artifical Intelligence) protein folding simulation (Liang, 2004, J. Chem. Phys) Phylogenetic tree reconstruction (Cheon and Liang, 2007, BioSystems) Variable selection for high dimensional regression (Liang, Chen and Ibrahim, 2007)

52 SAMC High Dimensioanl Regression (a) SAMC (b) MH best energy values best energy values iterations iterations Figure 10: Progression of the best energy values in (a) SAMC and (b) MH runs for a high dimensional regression problem with n = 150 and p = 600 (Liang et al., 2007).

53 SAMC Phylogeny Estimation Figure 11: Comparison of the phylogenetic trees produced by SSAMC, BAMBE, and MrBayes for the simulated example. The respective log-likelihood values of the trees are (a) , (b) , (c) , (d)

54 SAMC Phylogeny Estimation Figure 12: Comparison of the MAP trees produced by SSAMC, MrBayes, and BAMBE for African cichlid fish example. The respective log-likelihood values are , and

55 SAMC Phylogeny Estimation CPU time SSAMC BAMBE MrBayes Number of taxa Figure 13: CPU times cost by a single run ( iterations) of SSAMC, BAMBE and MrBayes.

for Global Optimization with a Square-Root Cooling Schedule Faming Liang Simulated Stochastic Approximation Annealing for Global Optim

for Global Optimization with a Square-Root Cooling Schedule Faming Liang Simulated Stochastic Approximation Annealing for Global Optim Simulated Stochastic Approximation Annealing for Global Optimization with a Square-Root Cooling Schedule Abstract Simulated annealing has been widely used in the solution of optimization problems. As known

More information

Stochastic Approximation in Monte Carlo Computation

Stochastic Approximation in Monte Carlo Computation Stochastic Approximation in Monte Carlo Computation Faming Liang, Chuanhai Liu and Raymond J. Carroll 1 June 26, 2006 Abstract The Wang-Landau algorithm is an adaptive Markov chain Monte Carlo algorithm

More information

Stochastic Approximation in Monte Carlo Computation

Stochastic Approximation in Monte Carlo Computation Stochastic Approximation in Monte Carlo Computation Faming Liang, Chuanhai Liu and Raymond J. Carroll 1 June 22, 2006 Abstract The Wang-Landau algorithm is an adaptive Markov chain Monte Carlo algorithm

More information

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants Faming Liang Texas A& University Sooyoung Cheon Korea University Spatial Model Introduction

More information

Learning Bayesian Networks for Biomedical Data

Learning Bayesian Networks for Biomedical Data Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Markov Chain Monte Carlo Lecture 4

Markov Chain Monte Carlo Lecture 4 The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.

More information

A = {(x, u) : 0 u f(x)},

A = {(x, u) : 0 u f(x)}, Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Likelihood Inference for Lattice Spatial Processes

Likelihood Inference for Lattice Spatial Processes Likelihood Inference for Lattice Spatial Processes Donghoh Kim November 30, 2004 Donghoh Kim 1/24 Go to 1234567891011121314151617 FULL Lattice Processes Model : The Ising Model (1925), The Potts Model

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Chapter 12 PAWL-Forced Simulated Tempering

Chapter 12 PAWL-Forced Simulated Tempering Chapter 12 PAWL-Forced Simulated Tempering Luke Bornn Abstract In this short note, we show how the parallel adaptive Wang Landau (PAWL) algorithm of Bornn et al. (J Comput Graph Stat, to appear) can be

More information

USING STOCHASTIC APPROXIMATION MONTE CARLO. A Dissertation SOOYOUNG CHEON DOCTOR OF PHILOSOPHY

USING STOCHASTIC APPROXIMATION MONTE CARLO. A Dissertation SOOYOUNG CHEON DOCTOR OF PHILOSOPHY PROTEIN FOLDING AND PHYLOGENETIC TREE RECONSTRUCTION USING STOCHASTIC APPROXIMATION MONTE CARLO A Dissertation by SOOYOUNG CHEON Submitted to the Oice of Graduate Studies of Texas A&M University in partial

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

General Construction of Irreversible Kernel in Markov Chain Monte Carlo General Construction of Irreversible Kernel in Markov Chain Monte Carlo Metropolis heat bath Suwa Todo Department of Applied Physics, The University of Tokyo Department of Physics, Boston University (from

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Recent Advances in Regional Adaptation for MCMC

Recent Advances in Regional Adaptation for MCMC Recent Advances in Regional Adaptation for MCMC Radu Craiu Department of Statistics University of Toronto Collaborators: Yan Bai (Statistics, Toronto) Antonio Fabio di Narzo (Statistics, Bologna) Jeffrey

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Use of sequential structure in simulation from high-dimensional systems

Use of sequential structure in simulation from high-dimensional systems Use of sequential structure in simulation from high-dimensional systems Faming Liang* Department of Statistics, Texas A&M University, College Station, Texas 77843-343 Received 2 December 2002; revised

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

COMBINING STRATEGIES FOR PARALLEL STOCHASTIC APPROXIMATION MONTE CARLO ALGORITHM OF BIG DATA. A Dissertation FANG-YU LIN

COMBINING STRATEGIES FOR PARALLEL STOCHASTIC APPROXIMATION MONTE CARLO ALGORITHM OF BIG DATA. A Dissertation FANG-YU LIN COMBINING STRATEGIES FOR PARALLEL STOCHASTIC APPROXIMATION MONTE CARLO ALGORITHM OF BIG DATA A Dissertation by FANG-YU LIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

Learning Deep Boltzmann Machines using Adaptive MCMC

Learning Deep Boltzmann Machines using Adaptive MCMC Ruslan Salakhutdinov Brain and Cognitive Sciences and CSAIL, MIT 77 Massachusetts Avenue, Cambridge, MA 02139 rsalakhu@mit.edu Abstract When modeling high-dimensional richly structured data, it is often

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

The Wang-Landau algorithm in general state spaces: Applications and convergence analysis

The Wang-Landau algorithm in general state spaces: Applications and convergence analysis The Wang-Landau algorithm in general state spaces: Applications and convergence analysis Yves F. Atchadé and Jun S. Liu First version Nov. 2004; Revised Feb. 2007, Aug. 2008) Abstract: The Wang-Landau

More information

Parallel Tempering I

Parallel Tempering I Parallel Tempering I this is a fancy (M)etropolis-(H)astings algorithm it is also called (M)etropolis (C)oupled MCMC i.e. MCMCMC! (as the name suggests,) it consists of running multiple MH chains in parallel

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Lecture V: Multicanonical Simulations.

Lecture V: Multicanonical Simulations. Lecture V: Multicanonical Simulations. 1. Multicanonical Ensemble 2. How to get the Weights? 3. Example Runs (2d Ising and Potts models) 4. Re-Weighting to the Canonical Ensemble 5. Energy and Specific

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Discrete solid-on-solid models

Discrete solid-on-solid models Discrete solid-on-solid models University of Alberta 2018 COSy, University of Manitoba - June 7 Discrete processes, stochastic PDEs, deterministic PDEs Table: Deterministic PDEs Heat-diffusion equation

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Computer intensive statistical methods

Computer intensive statistical methods Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods p. /36 Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Markov Chain Monte Carlo Methods p. 2/36 Markov Chains

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Adaptive Markov Chain Monte Carlo: Theory and Methods

Adaptive Markov Chain Monte Carlo: Theory and Methods Chapter Adaptive Markov Chain Monte Carlo: Theory and Methods Yves Atchadé, Gersende Fort and Eric Moulines 2, Pierre Priouret 3. Introduction Markov chain Monte Carlo (MCMC methods allow to generate samples

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

Bayesian Classification and Regression Trees

Bayesian Classification and Regression Trees Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny

More information

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Randomized Quasi-Monte Carlo for MCMC

Randomized Quasi-Monte Carlo for MCMC Randomized Quasi-Monte Carlo for MCMC Radu Craiu 1 Christiane Lemieux 2 1 Department of Statistics, Toronto 2 Department of Statistics, Waterloo Third Workshop on Monte Carlo Methods Harvard, May 2007

More information

MCMC for non-linear state space models using ensembles of latent sequences

MCMC for non-linear state space models using ensembles of latent sequences MCMC for non-linear state space models using ensembles of latent sequences Alexander Y. Shestopaloff Department of Statistical Sciences University of Toronto alexander@utstat.utoronto.ca Radford M. Neal

More information

Optimally adjusted mixture sampling and locally weighted histogram analysis. Zhiqiang Tan 1 February 2014, revised October 2015

Optimally adjusted mixture sampling and locally weighted histogram analysis. Zhiqiang Tan 1 February 2014, revised October 2015 Optimally adjusted mixture sampling and locally weighted histogram analysis Zhiqiang Tan 1 February 2014, revised October 2015 Abstract. Consider the two problems of simulating observations and estimating

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US Gerdie Everaert 1, Lorenzo Pozzi 2, and Ruben Schoonackers 3 1 Ghent University & SHERPPA 2 Erasmus

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Bridge estimation of the probability density at a point. July 2000, revised September 2003

Bridge estimation of the probability density at a point. July 2000, revised September 2003 Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

Optimally adjusted mixture sampling and locally weighted histogram analysis. Zhiqiang Tan 1. February 2014, revised October 2014

Optimally adjusted mixture sampling and locally weighted histogram analysis. Zhiqiang Tan 1. February 2014, revised October 2014 Optimally adjusted mixture sampling and locally weighted histogram analysis Zhiqiang Tan 1 February 2014, revised October 2014 Abstract. Consider the two problems of simulating observations and estimating

More information

Advanced Sampling Algorithms

Advanced Sampling Algorithms + Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Advanced Sampling Algorithms Aditya Kulkarni Tobias Bertelsen Nirandika Wanigasekara Malay Singh

More information

DISCUSSION PAPER EQUI-ENERGY SAMPLER WITH APPLICATIONS IN STATISTICAL INFERENCE AND STATISTICAL MECHANICS 1,2,3

DISCUSSION PAPER EQUI-ENERGY SAMPLER WITH APPLICATIONS IN STATISTICAL INFERENCE AND STATISTICAL MECHANICS 1,2,3 The Annals of Statistics 2006, Vol. 34, No. 4, 1581 1619 DOI: 10.1214/009053606000000515 Institute of Mathematical Statistics, 2006 DISCUSSION PAPER EQUI-ENERGY SAMPLER WITH APPLICATIONS IN STATISTICAL

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview

More information

Monte Carlo and cold gases. Lode Pollet.

Monte Carlo and cold gases. Lode Pollet. Monte Carlo and cold gases Lode Pollet lpollet@physics.harvard.edu 1 Outline Classical Monte Carlo The Monte Carlo trick Markov chains Metropolis algorithm Ising model critical slowing down Quantum Monte

More information

Recent Developments of Iterative Monte Carlo Methods for Big Data Analysis

Recent Developments of Iterative Monte Carlo Methods for Big Data Analysis Recent Developments of Iterative Monte Carlo Methods for Big Data Analysis Faming Liang Texas A&M University February 10, 2014 Abstract Iterative Monte Carlo methods, such as MCMC, stochastic approximation,

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Markov Chain Monte Carlo Simulations and Their Statistical Analysis An Overview

Markov Chain Monte Carlo Simulations and Their Statistical Analysis An Overview Markov Chain Monte Carlo Simulations and Their Statistical Analysis An Overview Bernd Berg FSU, August 30, 2005 Content 1. Statistics as needed 2. Markov Chain Monte Carlo (MC) 3. Statistical Analysis

More information

Rare Event Sampling using Multicanonical Monte Carlo

Rare Event Sampling using Multicanonical Monte Carlo Rare Event Sampling using Multicanonical Monte Carlo Yukito IBA The Institute of Statistical Mathematics This is my 3rd oversea trip; two of the three is to Australia. Now I (almost) overcome airplane

More information

Sampling multimodal densities in high dimensional sampling space

Sampling multimodal densities in high dimensional sampling space Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS & Telecom ParisTech Paris, France Journées MAS Toulouse, Août 4 Introduction Sample from a target distribution

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information