CS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo

Similar documents
Probabilistic Graphical Models

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo

Outline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)

6. Stochastic processes (2)

6. Stochastic processes (2)

Markov Chain Monte Carlo Lecture 6

Target tracking example Filtering: Xt. (main interest) Smoothing: X1: t. (also given with SIS)

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Continuous Time Markov Chains

Correlation Clustering with Noisy Input

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Continuous Time Markov Chain

Google PageRank with Stochastic Matrix

Problem Set 9 Solutions

Hidden Markov Models

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Information Geometry of Gibbs Sampler

Convergence of random processes

I. Decision trees II. Ensamble methods: Mixtures of experts

Expected Value and Variance

Lecture 10: May 6, 2013

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

Engineering Risk Benefit Analysis

Web Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong,

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

On complexity and randomness of Markov-chain prediction

Markov chains. Definition of a CTMC: [2, page 381] is a continuous time, discrete value random process such that for an infinitesimal

EGR 544 Communication Theory

Fabio Rapallo. p x = P[x] = ϕ(t (x)) x X, (1)

} Often, when learning, we deal with uncertainty:

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Dynamic Programming. Lecture 13 (5/31/2017)

1 GSW Iterative Techniques for y = Ax

BAR & TRUSS FINITE ELEMENT. Direct Stiffness Method

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture 4: November 17, Part 1 Single Buffer Management

Artificial Intelligence Bayesian Networks

AE/ME 339. K. M. Isaac. 8/31/2004 topic4: Implicit method, Stability, ADI method. Computational Fluid Dynamics (AE/ME 339) MAEEM Dept.

MEMBRANE ELEMENT WITH NORMAL ROTATIONS

Problem Set 9 - Solutions Due: April 27, 2005

STRATEGIES FOR BUILDING AN AC-DC TRANSFER SCALE

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Review of Taylor Series. Read Section 1.2

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Classes of States and Stationary Distributions

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

12. The Hamilton-Jacobi Equation Michael Fowler

6.842 Randomness and Computation February 18, Lecture 4

Machine learning: Density estimation

18.1 Introduction and Recap

The Bellman Equation

Chapter 2 Transformations and Expectations. , and define f

Applied Stochastic Processes

Monte Carlo methods for magnetic systems

Analysis of Discrete Time Queues (Section 4.6)

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Probability Theory (revisited)

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

DS-GA 1002 Lecture notes 5 Fall Random processes

Probability and Random Variable Primer

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Mean Field / Variational Approximations

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Lecture#1 Path integral and Mote Carlo simulations

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Lecture Notes on Linear Regression

Gaussian Mixture Models

Expectation Maximization Mixture Models HMMs

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

PHYS 705: Classical Mechanics. Canonical Transformation II

Hidden Markov Models

Portfolios with Trading Constraints and Payout Restrictions

CSCE 790S Background Results

Appendix B: Resampling Algorithms

Report on Image warping

CS 798: Homework Assignment 2 (Probability)

Lecture Space-Bounded Derandomization

Calculation of time complexity (3%)

Microarray data: s of hypotheses. Example: Using co-expressed gene clusters to hunt for cis-regulatory elements (Nelander 2005)

Strong Markov property: Same assertion holds for stopping times τ.

WORM ALGORITHM. Nikolay Prokofiev, Umass, Amherst. Boris Svistunov, Umass, Amherst Igor Tupitsyn, PITP, Vancouver

Neural-Network Quantum States

Monte Carlo method II

Canonical transformations

Mechanics Physics 151

Foundations of Arithmetic

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

6 Supplementary Materials

HMMT February 2016 February 20, 2016

Multilayer Perceptron (MLP)

Generative classification models

Probability-Theoretic Junction Trees

Boostrapaggregating (Bagging)

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Monte Carlo Simulation and Generation of Random Numbers

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

CHAPTER 17 Amortized Analysis

Transcription:

CS 3750 Machne Learnng Lectre 6 Monte Carlo methods Mlos Haskrecht mlos@cs.ptt.ed 5329 Sennott Sqare Markov chan Monte Carlo Importance samplng: samples are generated accordng to Q and every sample from Q s reweghted accordng to w, bt the Q dstrbton may be very far from the target MCMC s a strategy for generatng samples from the target dstrbton, ncldng condtonal dstrbtons MCMC: Markov chan defnes a samplng process that ntally generates samples very dfferent from the target dstrbton e.g. posteror bt gradally refnes the samples so that they are closer and closer to the posteror. 1

MCMC The constrcton of a Markov chan reqres two basc ngredents a transton matr P an ntal dstrbton 0 Assme a fnte set S={1, m} of states, then a transton matr s p11 p12 p1 m p21 p22 p2m P pm 1 pm2 pmm Where p j 0, j S 2 and js p j 1 S Markov Chan Markov chan defnes a random process of selectng states Chan Dynamcs P, t1 X,, 0 1 m Intal state selected based on 0 t1 ' t t+1 P Dom X t X t Probablty of a state beng selected at tme t+1 Sbseqent states selected based on the prevos state and the transton matr T ' transton matr 2

MCMC Markov chan satsfes P X n j X 0 0, X1 1, X n n P X n1 j X 1 n n Irredcblty: A MC s called rredcble or ndecomposable f there s a postve transton probablty for all pars of states wthn a lmted nmber of steps In rredcble chans there may stll est a perodc strctre sch that for each state, the set of possble retrn tmes to when startng n s a sbset of the set pν { p,2 p,3p, } contanng all bt a fnte set of these elements. The smallest nmber p wth ths property s the so-called perod of the chan p gcd{ n N : n p 0} Aperodcty: An rredcble chan s called aperodc or acyclc f the perod p eqals 1 or, eqvalently, f for all pars of states there s an nteger n j sch that for all n n j, the probablty p n j>0. If a Markov chan satsfes both rredcblty and aperodcty, then t converges to an nvarant dstrbton q A Markov chan wth transton matr P wll have an eqlbrm dstrbton q ff q = qp. A sffcent, bt not necessary, condton to ensre a partclar q s the nvarant dstrbton of transton matr P s the followng reversblty detaled balance condton q P MCMC q P 1 1 1 3

Markov Chan Monte Carlo Objectve: generate samples from the posteror dstrbton Idea: Markov chan defnes a samplng process that ntally generates samples very dfferent from the target posteror bt gradally refnes the samples so that they are closer and closer to the posteror. MCMC PX e PX e the qery we want to compte e 1 & e 2 are known evdence Samplng from the dstrbton PX s very dfferent from the desred posteror PX e e 1 e 2 4

Markov Chan Monte Carlo MCMC X 2 X 3 State Space X 4 MCMC Cont. Goal: a sample from PX e Start from some PX and generate a sample 1 5

MCMC Cont. Goal: a sample from PX e Start from some PX and generate a sample 1 Apply T MCMC Cont. Goal: a sample from PX e Start from some PX and generate a sample 1 From 1 and transton generate 2 X 2 Apply T Apply T 6

MCMC Cont. Goal: a sample from PX e Start from some PX and generate a sample 1 From 1 and transton generate 2 X 2 Apply T Apply T MCMC Cont. Goal: a sample from PX e Start from some PX and generate a sample 1 From 1 and transton generate 2 Repeat for n steps P X e X 2 X n Apply T Apply T Apply T 7

MCMC Cont. Goal: a sample from PX e Start from some PX and generate a sample 1 From 1 and transton generate 2 Repeat for n steps P X e X 2 X n Apply T Apply T Apply T MCMC Cont. Goal: a sample from PX e Start from some PX and generate a sample 1 From 1 and transton generate 2 Repeat for n steps P X e Samples from desred P X e X 2 X n X n+1 X n+2 Apply T Apply T Apply T 8

MCMC In general, an MCMC samplng process doesn t have to converge to a statonary dstrbton A fnte state Markov Chan has a nqe statonary dstrbton ff the markov chan s reglar reglar: est some k, for each par of states and, the probablty of gettng from to n eactly k steps s greater than 0 We want Markov chans that converge to a nqe target dstrbton from any ntal state How to bld sch Markov chans? Gbbs Samplng - A smple method to defne MC for BBN can beneft from the strctre ndependences n the network 1 2 3 Evdence: 5 =T 6 =T all varables have bnary vales T or F 4 5 6 9

Gbbs Samplng Intal state 1 2 3 4 1 =F, 2 =T 3 =T, 4 =T 5 6 5 = 6 =T Fed X 0 Gbbs Samplng Intal state 1 Update Vale of 4 2 3 1 =F, 2 =T 3 =T, 4 =T 4 5 6 5 = 6 =T Fed X 0 10

Gbbs Samplng 1 2 3 1 2 3 1 =F, 2 =T, 3 =T, 5 4 6 X 0 5 4 6 4 =F 5 =T 6 =T Gbbs Samplng 1 2 3 1 2 3 Update Vale of 3 5 4 6 X 0 5 4 6 4 =F 5 =T 6 =T 11

After many reassgnments Gbbs Samplng 1 1 2 3 2 3 4 4 5 6 5 6 X n Samples from desred PX rest e Gbbs Samplng Keep resamplng each varable sng the vale of varables n ts local neghborhood Markov blanket 1 1 2 3 2 3 4 4 P X 4 2, 3, 5, 6 5 6 5 6 12

Gbbs Samplng Gbbs samplng takes advantage of the strctre Markov blanket makes the varable ndependent from the rest of the network 1 2 3 4 P X 4 2, 3, 5, 6 5 6 Bldng a Markov Chan A reversble Markov chan: A sffcent, bt not necessary, condton to ensre a partclar q s the nvarant dstrbton of transton matr P s the followng reversblty detaled balance condton q P q P Metropols-Hastngs algorthm blds a reversble Markov Chan Uses a proposal dstrbton to generate canddate states Ether accept t and take a transton to state Or reject t and stay at crrent state 1 1 1 13

Bldng a Markov Chan Metropols-Hastngs algorthm blds a reversble Markov Chan ses the proposal dstrbton smlar to proposal the dstrbton n mportance samplng to generate canddates for A proposal dstrbton Q: T Q ' Eample: Unform over the vales of varables Ether accept a proposal and take a transton to state Or reject t and stay at crrent state Acceptance probablty A ' Bldng a Markov Chan Transton for the MH: Q T ' T ' A ' f ' T T Q ' From reversblty condton: q T ' q ' T ' We get T q ' T A ' mn[1, q T Q '1 A ' Q ' ] ' Q otherwse 14

15 Bldng a Markov Chan Comparng MH wth Gbbs For Gbbs Specal MH, for whch acceptance probablty s 1. 1 mn[1,1] ] ' ' mn[1, ],,, ', ' mn[1, ',, ' Q Q P P P P T P T P A MH algorthm Assmptons: We can t draw the samples from q We can evalate q for any We se a Markov chan that moves towards * wth acceptance probablty The transton kernel defned by ths process satsfes the detaled balance condton * * * 1, mn *, p q p q

Mng Tme n Usng Markov Chan Mng Tme The nmber of steps we take ntl we collect a sample from the target dstrbton. # = n Mng Tme Samples from desred PX e X 2 X n X n+1 X n+2 Local Rles Local Rles Local Rles Smmary Markov Chan Monte Carlo method attempts to generate samples from posteror dstrbton Metropols Hastngs algorthm s a general scheme for specfyng a Markov chan. Gbbs samplng s a specal case that takes advantage of the network strctre Markov Blanket 16