Markov Chain Monte Carlo Lecture 1

Similar documents
STAT232B Importance and Sequential Importance Sampling

Nonlife Actuarial Models. Chapter 14 Basic Monte Carlo Methods

2 Random Variable Generation

Stat 451 Lecture Notes Monte Carlo Integration

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

Variance Reduction. in Statistics, we deal with estimators or statistics all the time

Randomized Quasi-Monte Carlo for MCMC

Scientific Computing: Monte Carlo

CSCI-6971 Lecture Notes: Monte Carlo integration

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

S6880 #13. Variance Reduction Methods

Exercises with solutions (Set D)

16 : Markov Chain Monte Carlo (MCMC)

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet

Bayesian Inference and MCMC

Exponential Families

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet

Statistics 3657 : Moment Approximations

Monte Carlo and cold gases. Lode Pollet.

Stat 451 Lecture Notes Simulating Random Variables

Generating pseudo- random numbers

Part 4: Multi-parameter and normal models

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 2: Repetition of probability theory and statistics

Preliminary statistics

Stochastic Simulation Variance reduction methods Bo Friis Nielsen

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

ISyE 3044 Fall 2017 Test #1a Solutions

Random Variables and Their Distributions

2 Inference for Multinomial Distribution

If we want to analyze experimental or simulated data we might encounter the following tasks:

Sampling from complex probability distributions

MATH5835 Statistical Computing. Jochen Voss

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Numerical Methods I Monte Carlo Methods

Bayesian Inference II. Course lecturer: Loukia Meligkotsidou

Stochastic Processes and Monte-Carlo methods. University of Massachusetts: Fall Luc Rey-Bellet

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

EE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002

Simulation. Where real stuff starts

Computational statistics

Malvin H. Kalos, Paula A. Whitlock. Monte Carlo Methods. Second Revised and Enlarged Edition WILEY- BLACKWELL. WILEY-VCH Verlag GmbH & Co.

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Random Number Generators

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Previously Monte Carlo Integration

Markov chain Monte Carlo

Surveying the Characteristics of Population Monte Carlo

Precept 4: Hypothesis Testing

Lecture 21: Convergence of transformations and generating a random variable

From Random Numbers to Monte Carlo. Random Numbers, Random Walks, Diffusion, Monte Carlo integration, and all that

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

Monte Carlo Methods for Computation and Optimization (048715)

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Appendix A : Introduction to Probability and stochastic processes

ECE534, Spring 2018: Solutions for Problem Set #3

Random Numbers and Simulation

17 : Markov Chain Monte Carlo

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Adaptive Population Monte Carlo

Last Lecture. Biostatistics Statistical Inference Lecture 14 Obtaining Best Unbiased Estimator. Related Theorems. Rao-Blackwell Theorem

Chapter 5 continued. Chapter 5 sections

Continuous Random Variables

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

EE/CpE 345. Modeling and Simulation. Fall Class 9

Uniform Random Number Generators

Simulation. Where real stuff starts

Monte Carlo Methods. Handbook of. University ofqueensland. Thomas Taimre. Zdravko I. Botev. Dirk P. Kroese. Universite de Montreal

Bayesian Methods with Monte Carlo Markov Chains II

Monte Carlo Methods. Leon Gu CSD, CMU

Adaptive Rejection Sampling with fixed number of nodes

Advanced Statistical Computing

Simulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data

Introduction to Machine Learning CMU-10701

Chapter 3: Random Variables 1

There are self-avoiding walks of steps on Z 3

Non-parametric Inference and Resampling

Monte Carlo Methods. Jesús Fernández-Villaverde University of Pennsylvania

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Lecture 2 : CS6205 Advanced Modeling and Simulation

STAT Chapter 5 Continuous Distributions

3D HP Protein Folding Problem using Ant Algorithm

Sequential Monte Carlo Methods for Bayesian Computation

Brief introduction to Markov Chain Monte Carlo

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 14: Monday, May 2

1 Review of Probability

Econometrics I, Estimation

ST5215: Advanced Statistical Theory

Practical unbiased Monte Carlo for Uncertainty Quantification

19 : Slice Sampling and HMC

Generating the Sample

Transcription:

What are Monte Carlo Methods? The subject of Monte Carlo methods can be viewed as a branch of experimental mathematics in which one uses random numbers to conduct experiments. Typically the experiments are done on a computer using anywhere from hundreds to billions of random numbers. Two categories of Monte Carlo experiments (1) Direct simulation of a random system. (2) Addition of artificial randomness to a system.

Direct simulation of a random system We study a real system that involves some randomness, and we wish to learn what behavior can be expected without actually watching the real system. We first formulate a mathematical model by identifying the key random (and nonrandom) variables which describe the given system. Then we run a random copy of this model on a computer, typically many times over different values of random variables. Finally we collect data from these runs and analyze the data. Here are some applications: (a) Operations research: e.g., hospital management. (b) Reliability theory. (c) Physical, biological, and social sciences: models with complex nondeterministic time evolution, including particle motion, population growth, epidemics, and social phenomena.

Addition of artificial randomness to a system One represents the underlying problem (which may be completely deterministic) as part of a different random system and then performs a direct simulation of this new system. Example: estimating π. (a) Markov chain Monte Carlo: for problems in statistical physics (e.g., proteins, quantum systems) and in Bayesian statistics (for complicated posterior distributions). (b) Optimization: e.g. solving the traveling salesman and knapsack problems by simulated annealing or genetic algorithms.

Figure 1: Monte Carlo estimation of π. Markov Chain Monte Carlo Lecture 1 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1.5 1.0 0.5 0.0 0.5 1.0 1.5

Basic problems of Monte Carlo (a) Draw random variables X π(x) Sometimes with unknown normalizing constant (b) Estimate the integral I = h(x)π(x)dx = E π h(x)

Example: Estimate the integral I = 1 0 h(x)dx. Draw u 1,..., u n iid from Uniform(0,1). Law of large numbers: Î = 1 n [h(u 1) + + h(u n )] I, as n Error Computation: Unbiased Variance E(Î) = I. Var(Î) = Var(h) n = 1 n 1 0 (h(x) I)2 dx.

Pseudo-random number generator A sequence of pseudo-random number (U i ) is a deterministic sequence of numbers in [0,1] having the same relevant statistical properties as a sequence of random numbers. von Neumann s middle square method: start with 8653, square it and make the middle 4 digits: 8653, 8744, 4575, 9306, 6016, 1922,... Congruential generator: X i = ax i 1 mod M; andu i = X i /M. say, a = 7 5 = 16807, and M = 2 31 1. Other methods and statistical testing (see Ripley 1987; Marsaglia and Zaman, 1993)

Generating random variables from simple distributions (a) The inversion method. Theorem 0.1 (Probability Integral Transformation Theorem) If U Uniform[0, 1] and F is a cdf, then X = F 1 (U) inf{x : F (x) u} follows F. Example: Generation of exponential random variables. F (x) = 1 exp( λx). Set F (x) = U, and derive that x = 1 λ log(1 U). Generate U Unif[0, 1]. Set x = 1 λ log(u). s exp(λ).

(b) The rejection method. Generate x from g(x), where g(x) is called the envelope distribution. Draw u from unif[0,1] Accept x if u < π(x)/cg(x). The accepted x follows π(x). Efficiency: The acceptance rate= π(x)dx/[c g(x)dx] = 1/c. Example: simulate from truncated Gaussian π(x) φ(x)i x>k Set g(x) = φ(x). The efficiency is 1 Φ(k).

Figure 2: Illustration of the rejection method.

(c) Specialized methods. Generating standard Gaussian variables. (polar method) Generating Beta or Dirichlet variables. (Gentle, J.E., 1998)

Variance Reduction Methods Control variables Suppose of interest is µ = E(X). Let C be an r.v. with E(C) = µ c and be correlated with X. Let X(b) = X + b(c µ C ). Var(X(b)) = Var(X) + b 2 Var(C) + 2bCov(X, C). Choose b so as to have a reduced variance Var(X(b)) = (1 ρ 2 XC )Var(X). Another scenario: E(C) = µ. Then we can form X(b) = bx + (1 b)c.

Antithetic Variates a way to produce negatively correlated Monte Carlo samples. Then Cov(X, X ) 0. X = F 1 (U); X = F 1 (1 U) Exercise: Liu, 2001) prove that X and X are negatively correlated. (J.S.

Rao-Blackwellization This method reflects a basic principle in Monte Carlo computation: One should carry out analytical computation as much as possible. A straightforward estimator of I = E π h(x) is Î = 1 m {h(x(1) + + h(x (m) }. Suppose that x can be decomposed into two parts (x 1, x 2 ) and that the conditional expectation E[h(x) x 2 ] can be carried out analytically (integrating out x 1 ). An alternative estimator of I is Ĩ = 1 m {E[h(x) x(1) 2 ] + + E[h(x) x (m) 2 }. Both Î and Ĩ are unbiased because of the simple fact that E π h(x) = E π [E{h(x) x 2 }].

Following from the identity we have Var{h(x)} = Var{E[h(x) x 2 ]} + E{Var[h(x) x 2 ]}, Var(Î) = 1 m Var{h(x)} 1 m Var{E[h(x) x 2]} = Var(Ĩ).

Importance Sampling and Weighted sample Of interest µ = h(x)π(x)dx. We can (and sometimes prefer to) draw samples from g(x), which concentrates on region of importance and has a larger support set than that of π(x). Let X 1,..., X N be samples from g(x), then 1 N ˆµ = N i=1 w w i h(x i ), i where w i = π(x i )/g(x i ). An alternative estimator is µ = 1 N i=1 n w i h(x i ), i=1 provided that the normalizing constants of g and π are known.

Importance Sampling For the simpler estimator: E( µ) = µ and Var( µ) = 1 N Var g where Z = h(x)w(x). {h(x)π(x)} 1 = g(x) N Var g{z}, For the more useful one: The normalizing constants of g and π are not required, and E g w(x) = 1. and E(ˆµ) = E { Z W } µ Cov( Z, W ) + µvar( W ), Var(ˆµ) 1 N [µ2 Var g (W ) + Var g (Z) 2µCov g (W, Z)].

Hence, the mean squared error (MSE) of µ is and that for ˆµ is MSE( µ) = 1 N Var g{z}, MSE(ˆµ) = [E g (ˆµ) µ] 2 + Var g (ˆµ) = 1 N MSE( µ) + 1 N [µ2 Var g (W ) 2µCov g (W, Z)] + O(N 2 ) Then MSE(ˆµ) is smaller in comparison with MSE( µ) when µ 2 Var g (W ) 2µCov g (W, Z) < 0.

Properly weighted samples is said properly weighted with respect to π The set {(X i, W i )} N i=1 if for any h(x), µ = E π h(x) can be approximated as ˆµ = 1 N i=1 w i N i=1 w i h(x i ). Mathematically, this says that E[W h(x)] E(W ) = E π h(x). Rule of thumb in importance sampling: n eff n 1 + Var g (W ).

Summary of Importance Sampling Of interest µ = h(x)π(x)dx. We can (and sometimes prefer to) draw samples from g(x), which concentrates on region of importance and has a larger support set than that of π(x). Let X 1,..., X N be samples from g(x), then ˆµ = 1 N i=1 w i N i=1 w i h(x i ), where w i = π(x i )/g(x i ).

Sequential Importance Sampling Suppose x = (x 1,..., x k ); of interest is π(x) = π(x 1 )π(x 2 x 1 ) π(x k x 1,..., x k 1 ) The trial density can also be decomposed similarly g(x) = g 1 (x 1 )g 2 (x 2 x 1 ) g k (x k x 1,..., x k 1 ) w(x) = π(x) g(x) = π(x 1)π(x 2 x 1 ) π(x k x 1,..., x k 1 ) g 1 (x 1 )g 2 (x 2 x 1 ) g k (x k x 1,..., x k 1 ) The weight can be computed sequentially In practice, we create a sequence of approximations, π(x 1 ), π(x 1, x 2 ),, to guide the IS at each stage.

Sequential Importance Sampling: weight caculation Generate x 1 g 1 (x 1 ), set the weight w 1 (x 1 ) = π 1(x 1 ) g 1 (x 1 ). Generate x 2 g 2 (x 2 x 1 ), set the weight w 2 (x 1, x 2 ) = w 1 (x 1 ) π(x 2 x 1 ) g 2 (x 2 x 1 ). Generate x k g k (x k x 1,..., x k 1 ), set the weight w(x 1,..., x k ) = w k 1 (x 1,..., x k 1 ) π(x k x 1,..., x k 1 ) g t (x k x 1,..., x k 1 ).

Variations of Sequential importance sampling (Grassberger, 1997) Set upper cutoff values C t and lower cutoff values c t for t = 1,... k. Enrichment: If w t > C t, the chain x t = (x 1,..., x t ) is split into r copies, each with weight w t /r. Pruning: If w t < c t, one flips a fair coin to decide whether to keep it. If it is kept, its weight w t is doubled to 2w t. A smart way to enrich the good partial conformations, and prune the bad ones. Similar idea: resampling.

Related works Sequential importance sampling with partial rejection control (Liu, Chen and Wong, 1998) Dynamically weighted importance sampling (Liang, 2002)

Example: Polymer Simulation HP model: HHHHHHPPPPHHHPPHHHHHHH H: hydrophobic amino acid (nonpolar); P: hydrophilic amino acid (polar). Self-avoid random walk: fold a protein on a 2D square lattice. Add one new position a time to one of the k (k 3) available positions. Question: sampling distribution of the chain? g(a straight chain) = 1 4 1 3 1 3