Markov Chain Monte Carlo Lecture 6

Similar documents
Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Lecture Notes on Linear Regression

EEE 241: Linear Systems

The Study of Teaching-learning-based Optimization Algorithm

Report on Image warping

Generalized Linear Methods

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Information Geometry of Gibbs Sampler

Boostrapaggregating (Bagging)

Lecture 20: November 7

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Chapter - 2. Distribution System Power Flow Analysis

CS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo

An adaptive SMC scheme for ABC. Bayesian Computation (ABC)

Feature Selection: Part 1

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Lecture 3: Probability Distributions

Probabilistic Graphical Models

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Mean Field / Variational Approximations

Linear Regression Analysis: Terminology and Notation

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Lecture 10 Support Vector Machines II

Adaptive evolutionary Monte Carlo algorithm for optimization with applications to sensor placement problems

Outline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)

Week 5: Neural Networks

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Problem Points Score Total 100

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Checking Pairwise Relationships. Lecture 19 Biostatistics 666

Kernel Methods and SVMs Extension

Appendix B: Resampling Algorithms

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

A Robust Method for Calculating the Correlation Coefficient

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Solving of Single-objective Problems based on a Modified Multiple-crossover Genetic Algorithm: Test Function Study

Temperature. Chapter Heat Engine

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

1 The Mistake Bound Model

x = , so that calculated

IV. Performance Optimization

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Gaussian Mixture Models

4DVAR, according to the name, is a four-dimensional variational method.

Web Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong,

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Chapter 2 Real-Coded Adaptive Range Genetic Algorithm

SDMML HT MSc Problem Sheet 4

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Sampling Self Avoiding Walks

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

The Geometry of Logit and Probit

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Structure and Drive Paul A. Jensen Copyright July 20, 2003

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Inductance Calculation for Conductors of Arbitrary Shape

Topic 23 - Randomized Complete Block Designs (RCBD)

Chapter 3 Describing Data Using Numerical Measures

THEORY OF GENETIC ALGORITHMS WITH α-selection. André Neubauer

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Linear Approximation with Regularization and Moving Least Squares

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

This model contains two bonds per unit cell (one along the x-direction and the other along y). So we can rewrite the Hamiltonian as:

THE ROBUSTNESS OF GENETIC ALGORITHMS IN SOLVING UNCONSTRAINED BUILDING OPTIMIZATION PROBLEMS

Ensemble Methods: Boosting

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Quantifying Uncertainty

Supporting Information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Some modelling aspects for the Matlab implementation of MMA

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

PHYS 705: Classical Mechanics. Calculus of Variations II

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

ECE559VV Project Report

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Chapter 13: Multiple Regression

Lecture 4. Instructor: Haipeng Luo

Lecture 12: Discrete Laplacian

Simulation and Random Number Generation

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

XII.3 The EM (Expectation-Maximization) Algorithm

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

ANOVA. The Observations y ij

Which Separator? Spring 1

Probability Theory (revisited)

Numerical Heat and Mass Transfer

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

A HYBRID DIFFERENTIAL EVOLUTION -ITERATIVE GREEDY SEARCH ALGORITHM FOR CAPACITATED VEHICLE ROUTING PROBLEM

VQ widely used in coding speech, image, and video

Transcription:

where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways of specfcaton of the tral dstrbutons and updatng the populaton Markov Chan Monte Carlo Lecture 6 An actvely pursued research drecton for allevatng the local-trap problem suffered by the Metropols-Hastngs (MH) algorthm s the populatonbased MCMC, where a populaton of Markov chans are run n parallel, each equpped wth possbly dfferent but related nvarant dstrbutons. Informaton exchange between dfferent chans provdes a means for the target chans to learn from past samples, and ths n turn mproves the convergence of the target chans. Mathematcally, the populaton-based MCMC may be descrbed as follows. In order to smulate from a target dstrbuton f(x), one smulates of an augmented system wth the nvarant dstrbuton f(x 1,..., x N ) = N =1 f (x ), (1)

of Markov chans lead to dfferent algorthms, such as the adaptve drecton samplng (Glks et al., 1994), conjugate gradent Monte Carlo (Lu, Lang and Wong, 2000), parallel temperng (Geyer, 1991; Hukushma and Nemoto, 1996), evolutonary Monte Carlo (Lang and Wong, 2000, 2001), sequental parallel temperng (Lang, 2003), and equ-energy sampler (Kou, Zhou and Wong, 2006).

f(r) r d 1 f(x (t) + r e ), (2) Markov Chan Monte Carlo Lecture 6 Adaptve drecton samplng Adaptve drecton samplng (ADS) (Glks et al., 1994) s an early populaton-based MCMC method, n whch each dstrbuton f (x) s dentcal to the target dstrbuton, and at each teraton, one sample s randomly selected from the current populaton to undergo an update along a drecton toward another sample randomly selected from the remanng set of the current populaton. An mportant form of the ADS s the snooker algorthm. 1. Select one ndvdual, say x (t) c, at random from the current populaton x (t). The x (t) c s called the current pont. 2. Select another ndvdual, say x (t) a, from the remanng set of the current populaton,.e., {x (t) : c}, and form a drecton e t = x (t) c x (t) a. The ndvdual x (t) a s called the anchor pont. 3. Set y c = x (t) a +r t e t, where r t s a scalar sampled from the densty

where d s the dmenson of x, and the factor r d 1 s derved from a transformaton Jacoban (Roberts and Glks, 1994). 4. Form the new populaton x (t+1) by replacng x (t) c all other ndvduals unchanged (.e., set x (t+1) by y c and leavng = x (t) for c).

To show the sampler s proper, we need to show that at the equlbrum the new sample y c s ndependent of the x (t) for a and s dstrbuted as f(x). Ths fact follows drectly from the followng lemma, whch s a generalzed verson of Lemma 3.1 of Roberts and Glks (1994) and was proved by Lu, Lang and Wong (2000). Lemma 0.1 (Lu, Lang and Wong, 2000) Suppose x π(x) and y s any fxed pont n a d-dmensonal space. Let e = x y. If r s drawn from dstrbuton f(r) r d 1 π(y + re), then x = y + re follows the dstrbuton π(x). If y s generated from a dstrbuton ndependent of x, then x s ndependent of y.

Conjugate gradent Monte Carlo (Lu, Lang and Wong, 2000) Let x (t) = (x (t) 1,..., x (t) N ) denote the current populaton of samples. One teraton of the CGMC sampler conssts of the followng steps. 1. Select one ndvdual, say x (t) c, at random from the current populaton x (t). 2. Select another ndvdual, say x (t) a, at random from the remanng set of the populaton,.e. {x (t) : c}. Startng wth x (t) a, conduct a determnstc search, usng the conjugate gradent method or the steepest descent method, to fnd a local mode of f(x). Denote the local mode by z a (t), whch s called the anchor pont. 3. Set y c = z (t) a sampled from the densty + r t e t, where e t = x (t) c z a (t), and r t s a scalar f(r) r d 1 f(z (t) a + r t e t ), (3) where d s the dmenson of x, and the factor r d 1 s derved from the transformaton Jacoban.

4. Form the new populaton x (t+1) by replacng x (t) c other ndvduals unchanged (.e., set x (t+1) = x (t) by y c and leavng for c). The gradent-based optmzaton procedure performed n step 2 can be replaced by some other optmzaton procedures, for example, a short run of smulated annealng (Krkpatrck et al., 1983). Snce the local optmzaton step s usually expensve n computaton, Lu, Lang and Wong (2000) proposed the multple-try MH algorthm for the lne samplng step, whch enables effectve use of the local modal nformaton of the dstrbuton and thus mprove the convergence of the algorthm.

Sample MH Algorthm (Lewandowsk and Lu, 2008) In adaptve drecton samplng and conjugate gradent Monte Carlo, when updatng the populaton, one frst selects an ndvdual from the populaton and then updates the selected ndvdual usng the standard Metropols-Hastngs procedure. If the canddate state s of hgh qualty relatve to the whole populaton, one certanly wants to keep t n the populaton. However, the acceptance of the canddate state depends on the qualty of the ndvdual that s selected for updatng. To mprove the acceptance rate of hgh qualty canddates and to mprove the set {x (t) : = 1,..., N} as a sample of sze N from f(x), Lewandowsk and Lu (2008) proposed the samplng Metropols-Hastngs (SMH) algorthm.

Sample MH Algorthm Take one canddate draw x (t) 0 from a proposal dstrbuton g(x) on X, and compute the acceptance probablty α (t) 0 = N =0 g(x (t) ) f(x (t) N =1 g(x (t) ) f(x (t) ) ) mn 0 k N. g(x (t) k ) f(x (t) k ) Draw U Unf0, 1, and set { } S t+1 = x (t+1) 1,..., x (t+1) n { St, f U > α (t) 0 ; = { } x (t) 1,..., x (t) 1, x(t) 0, x (t) +1,..., x(t) n, f U α (t) 0,

where s chosen from (1,..., n) wth the probablty weghts ( ) g(x (t) (t) 1 ) g(xn ),...,. f(x (t) 1 ) f(x (t) n ) Thus, x t+1 and x t dffer by one element at most. It s easy to see that n the case of N = 1, SMH reduces to the tradtonal MH wth ndependence proposals. The mert of SMH s that to accept a canddate state, t compares the canddate wth the whole populaton, nstead of a sngle ndvdual randomly selected from the current populaton. Lewandowsk and Lu (2008) show that SMH wll converge under mld condtons to the target dstrbuton N =1 f(x ) for {x 1,..., x N }, and can be more effcent than the tradtonal MH and adaptve drecton samplng.

Parallel temperng (Geyer, 1991) Parallel temperng smulates n parallel a sequence of dstrbutons f (x) exp( H(x)/T ), = 1,..., n, (4) where T s the temperature assocated wth the dstrbuton f (x). The temperatures form a ladder T 1 > T 2 > > T n 1 > T n 1, so f n (x) f(x) corresponds to the target dstrbuton. The dea underlyng ths algorthm can be explaned as follows: Rasng temperature flattens the energy landscape of the dstrbuton and thus eases the MH traversal of the sample space, the hgh densty samples generated at the hgh temperature levels can be transmtted to the target temperature level through the exchange operatons, and ths n turn mproves convergence of the target Markov chan.

N ) denote the current populaton of samples. One teraton of parallel temperng conssts of the followng steps. Let x (t) = (x (t) 1,..., x (t) usng the MH algo- 1. Parallel MH step: Update each x (t) rthm. to x (t+1) 2. State swappng step: Try to exchange x (t+1) wth ts neghbors: Set j = 1 or + 1 accordng to probabltes q e (, j), where q e (, + 1) = q e (, 1) = 0.5 for 1 < < N and q e (1, 2) = q e (N, N 1) = 1, and accept the swap wth probablty mn { 1, exp ( [ H(x (t+1) ] [ ) H(x (t+1) 1 j ) 1 T T j ])}. (5)

Evolutonary Monte Carlo (Lang and Wong, 2000, 2001) The genetc algorthm (Holland, 1975) has been successfully appled to many hard optmzaton problems, such as the travelng salesman problem, proten foldng, machne learnng, among others. It s known that ts crossover operator s the key to the power of the genetc algorthm, whch makes t possble to explore a far greater range of potental solutons to a problem than conventonal optmzaton algorthms. Motvated by the genetc algorthm, Lang and Wong (2000, 2001) proposed the evolutonary Monte Carlo algorthm (EMC), whch ncorporates most attractve features of the genetc algorthm nto the framework of Markov chan Monte Carlo. EMC works n a fashon smlar to parallel temperng: A populaton of Markov chans are smulated n parallel wth each chan havng a dfferent temperature. The dfference between the two algorthms s that EMC ncludes a genetc operator, namely, the crossover operator n ts smulaton. The numercal results ndcate that the crossover operator mproves the convergence of the smulaton and that EMC can outperform parallel temperng n almost all scenaros.

Suppose the target dstrbuton of nterest s wrtten n the form f(x) exp{ H(x)}, x X R d, where the dmenson d > 1, and H(x) s called the ftness functon n terms of genetc algorthms. Let x = {x 1,..., x N } denote a populaton of sze N wth x from the dstrbuton wth densty f (x) exp{ H(x)/T }. In terms of genetc algorthms, x s called a chromosome or an ndvdual, each element of x s called a gene, and a realzaton of the element s called a genotype. As n parallel temperng, the temperatures form a decreasng ladder T 1 > T 2 > > T N 1, wth f N (x) beng the target dstrbuton.

Mutaton The mutaton operator s defned as an addtve Metropols-Hastngs move. One chromosome, say x k, s randomly selected from the current populaton x. A new chromosome s generated by addng a random vector e k so that y k = x k + e k, (6) where the scale of e k s chosen such that the operaton has a moderate acceptance rate, e.g., 0.2 to 0.5, as suggested by Gelman, Roberts and Glks (1996). The new populaton y = {x 1,, x k 1, y k, x k+1,, x N } s accepted wth probablty mn(1,r m ), where r m = f(y) f(x) T (x y) T (y x) = exp { H(y k) H(x k ) T k } T (x y) T (y x), (7) and T ( ) denotes the transton probablty between populatons.

f(r) r d 1 f(x + re). (8) Markov Chan Monte Carlo Lecture 6 Crossover One type of crossover operators that works for the real-coded chromosomes s the so-called real crossover, whch ncludes the k-pont and unform crossover operators. They are called real crossover by Wrght (1991) to ndcate that they are appled to real-coded chromosomes. In addton to the real crossover, Lang and Wong (2001a) proposed the snooker crossover operator, whch works as follows: 1. Randomly select one chromosome, say x, from the current populaton x. 2. Select the other chromosome, say x j, from the sub-populaton x\ {x } wth a probablty proportonal to exp{ H(x j )/T s }, where T s s called the selecton temperature. 3. Let e = x x j, and y = x j + re, where r (, ) s a random varable sampled from the densty

4. Construct a new populaton by replacng x wth the offsprng y, and replace x by y.

Exchange Ths operaton s the same as that used n parallel temperng (Geyer, 1991; Hukushma and Nemoto, 1996). Gven the current populaton x and the temperature ladder t, (x, t) = (x 1, T 1,, x N, T N ), one tres to make an exchange between x and x j wthout changng the t s. The new populaton s accepted wth probablty mn(1,r e ), ( )} 1 r e = f(x ) f(x) T (x x ) T (x x) = exp { (H(x ) H(x j )) T 1 T j (9) Typcally, the exchange s only performed on neghborng temperature levels.,

The Algorthm Based on the operators descrbed above, the algorthm can be summarzed as follows. Gven an ntal populaton x = {x 1,, x N } and a temperature ladder t = {T 1, T 2,, T N }, EMC terates between the followng two steps: 1. Apply ether mutaton or crossover operator to the populaton wth probablty q m and 1 q m, respectvely. The q m s called the mutaton rate. 2. Try to exchange x wth x j for N pars (, j) wth beng sampled unformly on {1,, N} and j = ± 1 wth probablty q e (, j), where q e (, + 1) = q e (, 1) = 0.5 and q e (1, 2) = q e (N, N 1) = 1.

Consder smulatng from a 2D mxture normal dstrbuton f(x) = 1 20 2πσ k=1 w k exp{ 1 2σ 2 (x µ k) (x µ k )}, (10) where σ = 0.1, w 1 = = w 20 = 0.05. The mean vectors µ 1, µ 2,, µ 20 (gven n Table 1) are unformly drawn from the rectangle [0, 10] [0, 10]. Among them, components 2, 4, and 15 are well separated from the others. The dstance between component 4 and ts nearest neghborng component s 3.15, and the dstance between component 15 and ts nearest neghborng component (except component 2) s 3.84, whch are 31.5 and 38.4 tmes of the standard devaton, respectvely. Mxng the components across so long dstances puts a great challenge on EMC.

Table 1: Mean vectors of the 20 components of the mxture normal dstrbuton (Lang and Wong, 2001). k µ k1 µ k2 k µ k1 µ k2 k µ k1 µ k2 k µ k1 µ k2 1 2.18 5.76 6 3.25 3.47 11 5.41 2.65 16 4.93 1.50 2 8.67 9.59 7 1.70 0.50 12 2.70 7.88 17 1.83 0.09 3 4.24 8.48 8 4.59 5.60 13 4.98 3.70 18 2.26 0.31 4 8.41 1.68 9 6.91 5.81 14 1.14 2.39 19 5.54 6.86 5 3.93 8.82 10 6.87 5.40 15 8.33 9.50 20 1.69 8.11

Table 2: Comparson of EMC and parallel temperng for the mxture normal example. (Lang and Wong, 2001) parameter true value EMC-A EMC-B PT est. SD est. SD est. SD µ 1 4.48 4.48 0.004 4.44 0.026 3.78 0.032 µ 2 4.91 4.91 0.008 4.86 0.023 4.34 0.044 Σ 11 5.55 5.55 0.006 5.54 0.051 3.66 0.111 Σ 22 9.86 9.84 0.010 9.78 0.048 8.55 0.049 Σ 12 2.61 2.59 0.011 2.58 0.043 1.29 0.084

x y 0 2 4 6 8 10 0 2 4 6 8 10 (a) evolutonary samplng x y 0 2 4 6 8 10 0 2 4 6 8 10 (b) parallel temperng Fgure 1: The sample path of the frst 10000 teratons at temperature t = 1. (a) EMC. (b) Parallel temperng. (Lang and Wong, 2001a)

x y 0 2 4 6 8 10 0 2 4 6 8 10 (a) evolutonary samplng x y 0 2 4 6 8 10 0 2 4 6 8 10 (b) parallel temperng Fgure 2: The plot of whole samples. (a) EMC. (b) Parallel temperng. (Lang and Wong, 2001a)