Brief introduction to Markov Chain Monte Carlo

Similar documents
Introduction to Machine Learning CMU-10701

Computational statistics

Bayesian Methods for Machine Learning

Markov Chain Monte Carlo

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Markov Chain Monte Carlo methods

6 Markov Chain Monte Carlo (MCMC)

Monte Carlo Methods. Leon Gu CSD, CMU

STA 4273H: Statistical Machine Learning

Session 3A: Markov chain Monte Carlo (MCMC)

Introduction to Machine Learning CMU-10701

Markov chain Monte Carlo methods in atmospheric remote sensing

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

17 : Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

MCMC: Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo

MCMC Sampling for Bayesian Inference using L1-type Priors

Convex Optimization CMU-10725

Markov Chain Monte Carlo Methods

ST 740: Markov Chain Monte Carlo

Bayesian Inference and MCMC

MARKOV CHAIN MONTE CARLO

Kernel adaptive Sequential Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

MCMC algorithms for fitting Bayesian models

Markov Chain Monte Carlo

Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

16 : Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo methods

Reminder of some Markov Chain properties:

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

STA 294: Stochastic Processes & Bayesian Nonparametrics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Markov chain Monte Carlo

Convergence Rate of Markov Chains

Markov chain Monte Carlo

I. Bayesian econometrics

Markov chain Monte Carlo

CPSC 540: Machine Learning

Use of Eigen values and eigen vectors to calculate higher transition probabilities

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

CSC 2541: Bayesian Methods for Machine Learning

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Stat 516, Homework 1

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

16 : Approximate Inference: Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC)

A Bayesian Approach to Phylogenetics

MCMC Methods: Gibbs and Metropolis

F denotes cumulative density. denotes probability density function; (.)

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

Markov Chain Monte Carlo

Markov Processes. Stochastic process. Markov process

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Results: MCMC Dancers, q=10, n=500

Monte Carlo in Bayesian Statistics

Markov chain Monte Carlo Lecture 9

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

Markov Chain Monte Carlo (MCMC)

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling


Quantifying Uncertainty

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Markov Chain Monte Carlo, Numerical Integration

Stochastic Simulation

Markov Chains Handout for Stat 110

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Bayesian Estimation of Input Output Tables for Russia

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Likelihood-free MCMC

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

19 : Slice Sampling and HMC

Markov Chains and MCMC

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Hamiltonian Monte Carlo for Scalable Deep Learning

Bayesian Phylogenetics:

Control Variates for Markov Chain Monte Carlo

Markov Chain Monte Carlo Lecture 4

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Random Walks A&T and F&S 3.1.2

Sampling from complex probability distributions

MCMC and Gibbs Sampling. Sargur Srihari

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Three examples of a Practical Exact Markov Chain Sampling

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Kernel Sequential Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

Transcription:

Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to

Content 1 and motivation Classical iid sampling 2 Basic definitions Stationary distribution Limit theorems for Markov Chains 3 4 Brief introduction to

What is MCMC? and motivation Classical iid sampling MCMC = Markov Chain + Monte Carlo Monte Carlo is just a cool name for random simulation. Typically we generate iid samples from the given distribution. = algorithm generating Markov Chain with prescribed stationary distribution as a tool to sample from this distribution. Not independent. Used when iid sampling is not feasible. MCMC first appears in Metropolis et al. (1953): Equation of state calculations by fast computing machines. Journal of Chemical Physics. MCMC originally used in statistical physics. Nowadays a general simulation tool (popular e.g. in Bayesian statistics). Brief introduction to

Why we need random sampling? and motivation Classical iid sampling Because we need to do following things: Create a random sample to see how does the distribution look like. Estimate the expectation or other parameters (e.g. VaR in operational risk) of the distribution which can t be computed analytically. Estimate the probability of a certain event. Special case: simulating p-values of statistical tests. Create simulated data to test statistical methods on them. Use Monte Carlo integration as an alternative to classical numerical integration. Solve optimization problem using a randomized algorithm. Brief introduction to

Classical iid sampling and motivation Classical iid sampling We want to estimate the expectation µ = E [g(x )]. Create an iid random sample X 1, X 2,..., X n from L(X ) and put ˆµ n = 1 n g(x i ). n i=1 Estimate is unbiased: E (ˆµ n ) = µ. Variance reciprocal to n: var (ˆµ n ) = σ 2 /n, where σ 2 = var [g(x )]. Strong Law of Large Numbers: ˆµ n Central Limit Theorem: n (ˆµ n µ) a.s. µ as n. D N(0, σ 2 ) as n. Brief introduction to

Classical sampling methods and motivation Classical iid sampling Inverse sampling: When the cumulative distribution function can be inverted easily. For example: exponential or logistic distribution. Using U(0, 1) as an input. Only for 1-dimensional cases. Rejection sampling: We need an auxiliary enveloping distribution from which it is easy to simulate. Then we reject some realizations randomly (using U(0, 1) as an additional input) according to the probability density ratio of desired and auxiliary distribution. Importance sampling: Estimating integral from f as an expectation of f /g with respect to g (from which we can simulate). Choose g as much close to f as possible. Brief introduction to

and motivation Classical iid sampling Problems with classical sampling methods Classical sampling methods are not always applicable, especially for large dimensional complex problems. Example: Sampling uniformly from n-dimensional unit sphere via rejection sampling. As the enveloping distribution we use uniform distribution on the n-dimensional cube [ 1, 1] n. What is the acceptance rate of such a generator? π n/2 Γ(n/2 + 1)2 n n 0!!! Brief introduction to

Markov Chains - Basic definitions Basic definitions Stationary distribution Limit theorems for Markov Chains Markov Chain = stochastic process with Markovian property: P(X t+1 = x t+1 t i=0 {X i = x i }) = P(X t+1 = x t+1 X t = x t ). For our purposes we need discrete time and: discrete state space X : transition probability matrix P ij continuous state space X : transition kernel P(x, x ) Homogeneity: transition probabilities P are independent of time. We will work with homogenous chains only. Irreducibility: every state is accessible in one or more steps from any other state with positive probability. Brief introduction to

Basic definitions Stationary distribution Limit theorems for Markov Chains Stationary distribution, Detailed balance equations Collection {π i, i X } is called a stationary distribution of Markov Chain X t with probability transition matrix P if i X π i = 1 and π = πp. Detailed balance equations: Let X t be a Markov Chain with probability transition matrix P. If {π i, i X } satisfies i X π i = 1 and π i P ij = π j P ji for all i, j X then π is a stationary distribution of X. Brief introduction to

Basic definitions Stationary distribution Limit theorems for Markov Chains Law of Large numbers and CLT for Markov Chains Let X t be a stationary Markov Chain with state space X and stationary distribution π. Suppose that µ = X g(x)dπ(x) exists. Let s estimate µ by the sample average of g(x t ) over time: ˆµ n = 1 n n g(x t ). t=1 Strong Law of Large Numbers: ˆµ n a.s. µ as n. Central Limit Theorem: n (ˆµ n µ) D N(0, σ 2 ) as n, where + σ 2 = var [g(x i )] + 2 cov [g(x i ), g(x i+k )]. k=1 Brief introduction to

MCMC - Basic principle We want to simulate from a given (target) distribution p on X. This can be discrete or continuous. Start from an arbitrary element x X and take it as X 0. Further construct Markov Chain {X t }, t = 1, 2,... which will explore the distribution p in successive time iterations. Design the transition probabilities of {X t } so that p is stationary distribution of {X t }. We loose independence which decrease efficiency of estimates when compared to iid. Brief introduction to

algorithm I Let X be finite or countable infinite set. Let p i > 0, i X be a probability distribution on X from which we want to simulate. We will construct an irreducible probability transition matrix P so that p will be a stationary distribution of Markov Chain {X t } with state space X and transition matrix P. Consider any irreducible probability transition matrix Q on state space X. Metropolis et a. (1953) considered symmetric Q so all the terms Q ij /Q ji (so called Hastings ratios) would vanish from the following formulas. Brief introduction to

algorithm II Start {X t } in an arbitrary state X 0 = x 0 X. Then calculate iteratively over time t: 1 Having X t, choose a proposal X X according to Q: P(X = x X t = x t ) = Q xtx. { } 2 Calculate α = min 1, p(x ) Q X X t p(x t) Q (0, 1], so called X t X acceptance probability. 3 With probability α accept X and put X t+1 = X. Otherwise (with probability 1 α) reject X and keep X t+1 = X t. 4 Set t = t + 1 and go back to 1. Brief introduction to

algorithm III Resulted probability transition matrix P of {X t }: { P ij = Q ij min 1, p(j) } Q ji i j, p(i) Q ij P ii = 1 P ij i X. j X {i} By simple algebraic manipulations we can show that matrix {P ij } satisfies the detailed balance equations in combination with p. So p is a stationary distribution of the constructed Markov Chain {X t }, what we needed to prove. We do not need to evaluate p, we just need to compute ratios p(j)/p(i). So the normalizing constant of p can be ignored. Brief introduction to

Examples of applications Generate uniformly from a set of m x n integer contingency tables with given rows and columns marginals. Irreducible Q can be defined as an uniform choice from neighboring tables. Since p is uniform, p(j)/p(i) 1 here. Generating from a Poisson distribution. Let Q be a random walk over X = {0, 1, 2,...}. Ratios p(i ± 1)/p(i) are easy to calculate. Exploring the feasible set of integer programming maximization problem. We can put p increasing with increasing objective function and so push {X t } towards regions with higher values of the objective function. Brief introduction to

algorithm I Let X R m be a support of the probability density function p(x), p(x) > 0 x X, from which we want to simulate. We will construct a Markov Chain {X t } with state space X so that p will be its stationary distribution. Consider any proposal probability density function q(x x) of x X, depending generally on x X. Brief introduction to

algorithm II Start {X t } in an arbitrary state X 0 = x 0 X. Then calculate iteratively over time t: 1 Having X t, draw a proposal X X from q(x x t ). { } 2 Calculate α = min 1, p(x ) q(x X t) p(x t) q(x t X ) (0, 1], so called acceptance probability. 3 With probability α accept X and put X t+1 = X. Otherwise (with probability 1 α) reject X and keep X t+1 = X t. 4 Set t = t + 1 and go back to 1. Brief introduction to

algorithm III Similarly as in the discrete case, we can show that the transition kernel P(x x) of {X t } satisfies the detailed balance equations in combination with p. So p is a stationary distribution of the constructed Markov Chain {X t }, what we needed to prove. We do not need to evaluate p(x), we just need to compute ratios p(x )/p(x). So the normalizing constant of p (which is often unknown) can be ignored. When q(x x) does not depend on x X, we talk about independence chain (but it is still not iid!!!). When q(x x) q(x x), we talk about random walk MCMC. When in addition q( ) is symmetric, formulas simplify again. Brief introduction to

Metropolis-Hastings 2D random walk illustration Brief introduction to

- N(0, 1) Target distribution is N(0, 1). Let s forget for a while that it is easy to simulate from N(0, 1) directly. Starting point X 0 = 0 (modus of the target distribution). We use normal random walk Metropolis-Hastings algorithm with symmetric trial (jump) distribution N(0, σ 2 J ). Simple implementation on Excel sheet. By changing σ J > 0 parameter, we drive the acceptance rate of the proposals, the convergence speed and efficiency. We use σ J {0.2, 0.5, 1, 2, 2.5, 3, 5, 8, 10} and evaluate the results. Brief introduction to

MCMC trajectory (400 observations): σ J = 0.1 Very high acceptance rate. Random walk with mean reversion. Brief introduction to

MCMC trajectory (400 observations): σ J = 2.5 Moderate acceptance rate. Looks most like a white noise. Brief introduction to

MCMC trajectory (400 observations): σ J = 10 Very low acceptance rate. Locally constant with rare high jumps. Brief introduction to

Simulated histogram (400 observations): σ J = 0.1 Chain didn t explore the whole target distribution (especially tails). Brief introduction to

Simulated histogram (400 observations): σ J = 2.5 Resulted histogram looks nice. Brief introduction to

Simulated histogram (400 observations): σ J = 10 The chain is too rigid, the histogram consists of several peaks. Brief introduction to

MCMC trajectory ACF: σ J = 0.1 Very low (linear) decay of ACF. Brief introduction to

MCMC trajectory ACF: σ J = 2.5 ACF (k) 0.6 k. Brief introduction to

MCMC trajectory ACF: σ J = 10 ACF (k) 0.83 k. Brief introduction to

Acceptance rate as a function of σ J The acceptance rate goes from 100 % to 15 % as σ J goes from 0.1 to 10. At moderate σ J = 2.5 the acceptance rate is around 40 %. Brief introduction to

ACF(1) as a function of σ J ACF(1) plot has an U-shape, starting at 1 and ending at 0.83. The minimum value 0.6 is attained somewhere near σ J = 2.5. Brief introduction to

Efficiency as a function of σ J The efficiency forms a -shape, starting at 0 % value. The maximum value around 30 % is attained again somewhere near σ J = 2.5. Brief introduction to

Choosing the trial distribution So called trial (or proposal, candidate, jumping ) distribution is crucial in achieving reasonable efficiency of MCMC estimates. We must be able to simulate from it easily. Its shape is recommended to copy the shape of the target distribution. Its width (standard deviation) must be tuned as to optimize the efficiency. 20-50% acceptance rate is recommended. For 1-dimensional case, the optimal efficiency is usually around 30 %. Efficiency decreases with higher dimensions. Brief introduction to

Convergence issue Since the constructed Markov Chain just approximates the target distribution in a limiting sense, we must care whether our Markov Chain has converged to the target distribution already. Some convergence diagnostics are available for this. The convergence speed can differ, be careful. Drop the beginning phase of MC trajectory before the convergence is reached. So called burn in stage. Run the simulation several times and compare characteristics of individual trajectories. Try run the simulation with different starting points and compare the results. Brief introduction to

Error of MCMC estimate Assessing the error of MCMC estimate is not so straightforward as in iid sampling. We can use the formula with process covariance function values to estimate the variance of its sample means. Typically autocorrelation function ACF (k) of g(x t ) decays to 0 like exponential ρ k with ρ > 0, i.e. like ACF of AR(1) process. We try to reach ρ as close to 0 as possible (higher efficiency). We can estimate sample mean error directly from variation of shorter block (batch) sample means. Tune the process to increase efficiency and so decrease the estimation error. Brief introduction to

Further MCMC topics There are many variants and modifications of MCMC one can meet when browsing through literature. Just to list them: Gibbs sampling Slice sampling Annealing (tempering) Langevin technique Hamiltonian hybrid algorithm Coupling from the past... Brief introduction to

References References Contacts W. K. Hastings : Monte Carlo sampling methods using and their applications. Biometrika 57 (1970), 97-109. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller : Equation of state calculations by fast computing machines. J. of Chemical Physics 21 (1953), 1087-92. J. M. Hohendorff : An to. Department of Statistics, University of Toronto, 2005. http://probability.ca/jeff/ftpdir/johannes.pdf. T. Balún : metódy posteriórnej simulácie a ich aplikácia v ekonómii. Diploma thesis, 2011. http://is.muni.cz/th/211335/prif m/diplomka.pdf. C. Geyer : to. University of Minnesota, 2003. http://www.stat.umn.edu/geyer/mcmc/talk/mcmc.pdf. L. Kroc : to. http://www.cs.cornell.edu/selman/cs475/lectures/intro-mcmc-lukas.pdf. S. Lalley : to. Department of Statistics, University of Chicago. http://galton.uchicago.edu/ lalley/courses/313/proppwilson.pdf. I. Murray : Markov chain Monte Carlo. Machine Learning Summer School 2009. http://mlg.eng.cam.ac.uk/mlss09/mlss slides/murray 1.pdf. P. Lam : MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm. Harvard University. http://www.people.fas.harvard.edu/ plam/teaching/methods/mcmc/mcmc mprint.pdf. K. M. Hanson : Tutorial on. Los Alamos National Laboratory, 2000. http://kmh-lanl.hansonhub.com/talks/maxent00b.pdf. M. Scullard : Reversible Markov Chains. 2008. http://www.math.ucsd.edu/ williams/courses/m28908/scullardmath289 mreversibility.pdf. Brief introduction to

Contacts References Contacts mobile: 604 799 879 e-mail: tomas.hanzak@post.cz web: www.thanzak.sweb.cz Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics Charles University in Prague Sokolovská 83, 186 75 Praha 8. e-mail: hanzak@karlin.mff.cuni.cz web: www.karlin.mff.cuni.cz/ kpms MEDIARESEARCH, a.s. Českobratrská 1, 130 00 Praha 3. mobile: 725 535 535 e-mail: tomas.hanzak@mediaresearch.cz web: www.mediaresearch.cz Brief introduction to