Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Similar documents
Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Markov Chain Monte Carlo Lecture 6

Probabilistic Graphical Models

Hidden Markov Models

CS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo

Target tracking example Filtering: Xt. (main interest) Smoothing: X1: t. (also given with SIS)

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Outline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo

Probability Theory (revisited)

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Artificial Intelligence Bayesian Networks

Gaussian Mixture Models

Parametric fractional imputation for missing data analysis

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Quantifying Uncertainty

Information Geometry of Gibbs Sampler

An adaptive SMC scheme for ABC. Bayesian Computation (ABC)

Simulation and Random Number Generation

NUMERICAL DIFFERENTIATION

EM and Structure Learning

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Classification as a Regression Problem

Stat 543 Exam 2 Spring 2016

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Stat 543 Exam 2 Spring 2016

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Some modelling aspects for the Matlab implementation of MMA

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Convergence of random processes

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

Lecture Notes on Linear Regression

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Sampling Self Avoiding Walks

Singular Value Decomposition: Theory and Applications

1 Motivation and Introduction

Numerical Heat and Mass Transfer

Conjugacy and the Exponential Family

Generalized Linear Methods

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Limited Dependent Variables

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hierarchical Bayes. Peter Lenk. Stephen M Ross School of Business at the University of Michigan September 2004

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Appendix B. The Finite Difference Scheme

Analysis of Discrete Time Queues (Section 4.6)

Topic 5: Non-Linear Regression

Why Monte Carlo Integration? Introduction to Monte Carlo Method. Continuous Probability. Continuous Probability

6. Stochastic processes (2)

WORM ALGORITHM. Nikolay Prokofiev, Umass, Amherst. Boris Svistunov, Umass, Amherst Igor Tupitsyn, PITP, Vancouver

6. Stochastic processes (2)

Global Sensitivity. Tuesday 20 th February, 2018

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Markov Chain Monte-Carlo (MCMC)

Chapter Newton s Method

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Small Area Interval Estimation

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Fabio Rapallo. p x = P[x] = ϕ(t (x)) x X, (1)

Calculating CLs Limits. Abstract

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Introduction to Hidden Markov Models

STATISTICAL MECHANICS

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

PRIME NUMBER GENERATION BASED ON POCKLINGTON S THEOREM

Cell Biology. Lecture 1: 10-Oct-12. Marco Grzegorczyk. (Gen-)Regulatory Network. Microarray Chips. (Gen-)Regulatory Network. (Gen-)Regulatory Network

A Robust Method for Calculating the Correlation Coefficient

Linear Approximation with Regularization and Moving Least Squares

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Maximum likelihood. Fredrik Ronquist. September 28, 2005

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Knowledge-Gradient Methods for Efficient Information Collection

Randomness and Computation

Limited Dependent Variables and Panel Data. Tibor Hanappi

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

Basic Statistical Analysis and Yield Calculations

Web Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong,

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

Error Probability for M Signals

3.1 ML and Empirical Distribution

Hidden Markov Models

6.3.4 Modified Euler s method of integration

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Difference Equations

Statistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals

4.1 basic idea of interval mapping

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Engineering Risk Benefit Analysis

Lecture 3: Shannon s Theorem

CS 798: Homework Assignment 2 (Probability)

Transcription:

Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs Algorthm! Metropols Algorthm! Gbbs Samplng! Smulated Annealng c 2001 SNU Bontellgence Lab 2

Introducton! Markov Chan Monte Carlo Monte Carlo ntegraton usng Markov chans Monte Carlo ntegraton draw samples from the requred dstrbuton, and then forms sample averages to approxmate expectatons Markov chan Monte Carlo draws samples by runnng a cleverly constructed Markov chan for a long tme MCMC s usually used for Bayesan nference c 2001 SNU Bontellgence Lab 3 Bayesan Inference 1! Bayesan nference Most applcatons of MCMC are orented D : observed data, θ : model parameters Pθ : pror dstrbuton, PD θ : lkelhood Full probablty model PD, θ =PD θ Pθ Posteror dstrbuton of θ: PD θ! Havng observed D, Bayes theorem s used! Object of all Bayesan nference P θ D = P θ P D θ P θ P D θ dθ c 2001 SNU Bontellgence Lab 4

Bayesan Inference 2! Any features of the posteror dstrbuton are legtmate for Bayesan nference The posteror expectaton of a functon fθ:! Dffculty E[ f θ D] = f θ P θ P D θ dθ P θ P D θ dθ Integraton, especally n hgh dmensons s mpossble Numercal evaluaton s also dffcult and naccurate Analytc approxmaton: Laplace approxmaton, Monte Carlo ntegraton MCMC c 2001 SNU Bontellgence Lab 5 Calculatng Expectatons 1! Terms π Task! A vector of k random varables wth dstrbuton π! In Bayesan applcatons, wll comprse model parameters θ! Posteror dstrbuton for Bayesans, e Pθ D =PD θpθ / PD! Lkelhood for frequentsts, e PD θ! Evaluate the expectaton for some functon f E[ f ] = f x π x dx π x dx c 2001 SNU Bontellgence Lab 6

Calculatng Expectatons 2! Problem πxdx s unknown Generalty of! takesvaluesnk-dmensonal Eucldean space! Dscrete random varables! Mxture of dscrete and contnuous random varables! k can tself be varable c 2001 SNU Bontellgence Lab 7 Monte Carlo Integraton 1! Drawng samples { t, t=1,, n} from π! Approxmatng 1 E[ f ] n n t= 1 f t! When the samples { t } are ndependent, laws of large numbers ensure that the approxmaton can be made as accurate as desred by ncreasng the sample sze n c 2001 SNU Bontellgence Lab 8

Monte Carlo Integraton 2! Problem Drawng { t } ndependently from π s not feasble, snce π can be qute non-standard! { t } need not necessarly be ndependent { t } can be generated by any process whch draws samples throughout the support of π n the correct proportons MCMC! One way of dong ths s through a Markov chan havng π as ts statonary dstrbuton c 2001 SNU Bontellgence Lab 9 Markov Chan 1! Consder a generated sequence { 0, 1, 2, }! t+1 s sampled from a dstrbuton P t+1 t Markov Chan : ths sequence P : transton kernel of the chan Assume that the chan s tme-homogenous c 2001 SNU Bontellgence Lab 10

Markov Chan 2! Effect of 0 to t : P t t 0 Subject to regularty condtons, the chan wll gradually forget ts ntal state and P t 0 wlleventually converge to a unque statonary dstrbuton φ As t ncreases, the sampled ponts { t } wll look ncreasngly lke dependent samples from φ c 2001 SNU Bontellgence Lab 11 Markov Chan 3! After a suffcently long burn-n of, say, m teratons, ponts { t : t = m+1,, n} wll be dependent samples approxmately from φ! Use the output from the Markov chan to estmate E[f], where has dstrbuton φ! Ergodc average f 1 = n m n t= m+ 1 f t c 2001 SNU Bontellgence Lab 12

The Metropols-Hastngs Algorthm 1! How to construct a Markov chan such that ts statonary dstrbuton φ s precsely our dstrbuton of nterest π?! At each tme t, the next state t+1 s chosen by frst samplng canddate pont Y from a proposal dstrbuton q t! The canddate pont Y s then accepted wth probablty α t, Y π Y q Y α, Y = mn 1, π q Y! If the canddate pont s accepted, the next state becomes t+1 =Y! If the canddate s rejected, the chan does not move, e t+1 = t c 2001 SNU Bontellgence Lab 13 The Metropols-Hastngs Algorthm 2 Intalze 0 ;sett =0 Repeat { Sample a pont Y from q t Sample a Unform0,1 random varable U If U α t, Y set t+1 = Y Otherwse set t+1 = t Increment t } c 2001 SNU Bontellgence Lab 14

Implementaton Issues! Canoncal forms of proposal dstrbuton Any proposal dstrbuton wll ultmately delver samples from the target dstrbuton π However, the rate of convergence to the statonary dstrbuton wll depend crucally on the relatonshp between q and π For computatonal effcency, q should be chosen so that t can be easly sampled and evaluated c 2001 SNU Bontellgence Lab 15 Metropols Algorthm Symmetrc proposals qy =q Y whch generates Y condtonally ndependently, gven t π Y α, Y = mn 1, π! Random-walk Metropols: qy =q -Y Scale of a proposal dstrbuton may need to be chosen carefully! A cautous proposal dstrbuton generatng small steps! A bold proposal dstrbuton generatng large steps! Avod both these extremes c 2001 SNU Bontellgence Lab 16

c 2001 SNU Bontellgence Lab 17 q =N, 05 q =N, 01 q =N, 100 Statonary dstrbuton N,1 c 2001 SNU Bontellgence Lab 18 Gbbs Samplng 1! Sngle-Component Metropols-Hastngs Dvdng nto components { 1, 2,, h } of possbly dfferng dmenson, and then updatng component one by one! An teraton of the sngle-component Metropols-Hastngs algorthm comprses h updatng steps! The th proposal dstrbuton q, generates a canddate only for the th component of, and may depend on the current values of any of the components of =,, 1, mn,, Y q Y q Y Y π π α

Gbbs Samplng 2! Gbbs Samplng A specal case of sngle-component Metropols-Hastngs Most statstcal applcatons of MCMC have used Gbbs samplng q Y, = π Y Acceptance probablty s 1; that s, Gbbs sampler canddates are always accepted c 2001 SNU Bontellgence Lab 19 Smulated Annealng 1! Statstcal Mechancs Smulated annealng SA explots an analogy between the way n whch a metal cools and freezes nto a mnmum energy crystallne structure the annealng process and the search for a mnmum n a more general system! Boltzmann-Gbbs Dstrbuton The probablty of beng n state s at temperature T P s = P x1,, xn = e f s/ kt Z c 2001 SNU Bontellgence Lab 20

Smulated Annealng 2! The mplementaton of the SA algorthm Representaton of possble solutons Generator of random changes n solutons Means of evaluatng the problem functons Annealng schedule: anntal temperature and rules for lowerng t as the search progresses Intalze Generate New State s Accept? Yes Update State Lower the Temperature No Termnate? No c 2001 SNU Bontellgence Lab Yes21 Stop Smulated Annealng 3! Advantages SA can reach the Boltzmann-Gbbs equlbrum dstrbuton n a reasonable tme, whle any MCMC method fals n general SA's another advantage over other methods s an ablty to avod becomng trapped at local optmum Whle smulated annealng s usually used n combnaton wth the Metropols algorthm, t s n fact applcable to any MCMC method, and n partcular Gbbs samplng c 2001 SNU Bontellgence Lab 22