Probabilistic Graphical Models

Similar documents
Markov chain Monte Carlo Lecture 9

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

CS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo

( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo

Markov Chain Monte Carlo Lecture 6

Convergence of random processes

Speech and Language Processing

Lecture Notes on Linear Regression

Outline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)

Web Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong,

Target tracking example Filtering: Xt. (main interest) Smoothing: X1: t. (also given with SIS)

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

6. Stochastic processes (2)

6. Stochastic processes (2)

Feature Selection: Part 1

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Lecture 10: May 6, 2013

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Hidden Markov Models

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Lecture Nov

EM and Structure Learning

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Economics 101. Lecture 4 - Equilibrium and Efficiency

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Gaussian Mixture Models

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Why Monte Carlo Integration? Introduction to Monte Carlo Method. Continuous Probability. Continuous Probability

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Singular Value Decomposition: Theory and Applications

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Information Geometry of Gibbs Sampler

Mixture o f of Gaussian Gaussian clustering Nov

6 Supplementary Materials

Appendix B: Resampling Algorithms

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Module 9. Lecture 6. Duality in Assignment Problems

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Section 8.3 Polar Form of Complex Numbers

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

Probability Theory (revisited)

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Lecture 2: Prelude to the big shrink

Lecture 12: Classification

Week 5: Neural Networks

Randomness and Computation

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

NP-Completeness : Proofs

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Chapter Newton s Method

Quantifying Uncertainty

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

Lecture 4: November 17, Part 1 Single Buffer Management

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

A Robust Method for Calculating the Correlation Coefficient

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Report on Image warping

Conjugacy and the Exponential Family

1 GSW Iterative Techniques for y = Ax

More metrics on cartesian products

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

10-701/ Machine Learning, Fall 2005 Homework 3

Chapter Twelve. Integration. We now turn our attention to the idea of an integral in dimensions higher than one. Consider a real-valued function f : D

Grover s Algorithm + Quantum Zeno Effect + Vaidman

DS-GA 1002 Lecture notes 5 Fall Random processes

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

Goodness of fit and Wilks theorem

Continuous Time Markov Chain

12. The Hamilton-Jacobi Equation Michael Fowler

Lecture 21: Numerical methods for pricing American type derivatives

Probability-Theoretic Junction Trees

Errors for Linear Systems

Lecture 4. Instructor: Haipeng Luo

Chapter 12. Ordinary Differential Equation Boundary Value (BV) Problems

Lecture 3: Probability Distributions

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

18.1 Introduction and Recap

VQ widely used in coding speech, image, and video

x = , so that calculated

Tracking with Kalman Filter

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Artificial Intelligence Bayesian Networks

1 The Mistake Bound Model

Google PageRank with Stochastic Matrix

IRO0140 Advanced space time-frequency signal processing

ECE559VV Project Report

Chapter 1. Probability

The Feynman path integral

Expected Value and Variance

Hidden Markov Models

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Lecture 4: Constant Time SVD Approximation

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Classes of States and Stationary Distributions

Lecture 3. Ax x i a i. i i

Introduction to Algorithms

Transcription:

School of Computer Scence robablstc Graphcal Models Appromate Inference: Markov Chan Monte Carlo 05 07 Erc Xng Lecture 7 March 9 04 X X 075 05 05 03 X 3 Erc Xng @ CMU 005-04

Recap of Monte Carlo Monte Carlo methods are algorthms that: Generate samples from a gven probablty dstrbuton Estmate epectatons of functons [ f ] under a dstrbuton p E p Why s ths useful? Can use samples of p to appromate p tself Allows us to do graphcal model nference when we can t compute E [ f ] p Epectatons reveal nterestng propertes about eg means and varances of p p Erc Xng @ CMU 005-04

Lmtatons of Monte Carlo Drect samplng Hard to get rare events n hgh-dmensonal spaces Infeasble for MRFs unless we know the normalzer Z Rejecton samplng Importance samplng Do not work well f the proposal Q s very dfferent from Yet constructng a Q smlar to can be dffcult Makng a good proposal usually requres knowledge of the analytc form of but f we had that we wouldn t even need to sample! Intuton: nstead of a fed proposal Q what f we could use an adaptve proposal? Erc Xng @ CMU 005-04 3

Markov Chan Monte Carlo MCMC algorthms feature adaptve proposals Instead of Q they use Q where s the new state beng sampled and s the prevous sample As changes Q can also change as a functon of Importance samplng wth a bad proposal Q MCMC wth adaptve proposal Q Q Q Q 3 Q 4 3 3 3 Erc Xng @ CMU 005-04 4

Metropols-Hastngs Let s see how MCMC works n practce Later we ll look at the theoretcal aspects Metropols-Hastngs algorthm Draws a sample from Q where s the prevous sample The new sample s accepted or rejected wth some probablty A Ths acceptance probablty s ' Q ' A ' mn Q ' A s lke a rato of mportance samplng weghts /Q s the mportance weght for /Q s the mportance weght for We dvde the mportance weght for by that of Notce that we only need to compute / rather than or separately A ensures that after suffcently many draws our samples wll come from the true dstrbuton we shall learn why later n ths lecture Erc Xng @ CMU 005-04 5

The MH Algorthm Intalze startng state 0 set t =0 Burn-n: whle samples have not converged = t t =t + sample * ~ Q* // draw from proposal sample u ~ Unform0 // draw acceptance threshold * Q * -f u A * mn Q * t = * // transton -else t = // stay n current state Take samples from = : Reset t=0 for t =:N t+ Draw sample t Functon Draw sample t Erc Xng @ CMU 005-04 6

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Q 0 0 Erc Xng @ CMU 005-04 7

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Q 0 0 Erc Xng @ CMU 005-04 8

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Q 0 Erc Xng @ CMU 005-04 9

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Q 3 0 rejected 3 Erc Xng @ CMU 005-04 0

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = We reject because /Q < and /Q > hence A s close to zero! Q 3 0 rejected 3 Erc Xng @ CMU 005-04

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Draw accept 4 Q 3 0 3 4 Erc Xng @ CMU 005-04

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Draw accept 4 Draw accept 5 Q 3 0 3 4 5 Erc Xng @ CMU 005-04 3

The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Draw accept 4 Draw accept 5 The adaptve proposal Q allows us to sample both modes of! Q 3 0 3 4 5 Erc Xng @ CMU 005-04 4

Theoretcal aspects of MCMC The MH algorthm has a burn-n perod Why do we throw away samples from burn-n? Why are the MH samples guaranteed to be from? The proposal Q keeps changng wth the value of ; how do we know the samples wll eventually come from? What s the connecton between Markov Chans and MCMC? Erc Xng @ CMU 005-04 5

Markov Chans A Markov Chan s a sequence of random varables n wth the Markov roperty n n n n n n s known as the transton kernel The net state depends only on the precedng state recall HMMs! Note: the rvs can be vectors We defne t to be the t-th sample of all varables n a graphcal model X t represents the entre state of the graphcal model at tme t We study homogeneous Markov Chans n whch the t t transton kernel s fed wth tme T To emphasze ths we wll call the kernel where s the prevous state and s the net state Erc Xng @ CMU 005-04 6

MC Concepts To understand MCs we need to defne a few concepts: t robablty dstrbutons over states: s a dstrbuton over the state of the system at tme t When dealng wth MCs we don t thnk of the system as beng n one state but as havng a dstrbuton over states For graphcal models remember that represents all varables Transtons: recall that states transton from t to t+ accordng to the transton kernel T We can also transton entre dstrbutons: t t T At tme t state has probablty mass π t The transton probablty redstrbutes ths mass to other states Statonary dstrbutons: s statonary f t does not change under the transton kernel: T for all Erc Xng @ CMU 005-04 7

MC Concepts Statonary dstrbutons are of great mportance n MCMC To understand them we need to defne some notons: Irreducble: an MC s rreducble f you can get from any state to any other state wth probablty > 0 n a fnte number of steps e there are no unreachable parts of the state space Aperodc: an MC s aperodc f you can return to any state at any tme erodc MCs have states that need tme steps to return to cycles Ergodc or regular: an MC s ergodc f t s rreducble and aperodc Ergodcty s mportant: t mples you can reach the statonary 0 dstrbuton no matter the ntal dstrbuton st All good MCMC algorthms must satsfy ergodcty so that you can t ntalze n a way that wll never converge Erc Xng @ CMU 005-04 8

MC Concepts Reversble detaled balance: an MC s reversble f there ests a dstrbuton such that the detaled balance condton s satsfed: robablty of and can be dfferent but the jont of amd reman the same no matter whch drecton to go Reversble MCs always have a statonary dstrbuton! roof: The last lne s the defnton of a statonary dstrbuton! 9 T T T T T T T T T Erc Xng @ CMU 005-04

Why does Metropols-Hastngs work? Recall that we draw a sample accordng to Q and then accept/reject accordng to A In other words the transton kernel s We can prove that MH satsfes detaled balance Recall that Notce ths mples the followng: 0 ' ' A Q T mn ' Q Q A ' A Q Q f then and thus ' A Erc Xng @ CMU 005-04

Why does Metropols-Hastngs work? Now suppose A < and A = We have The last lne s eactly the detaled balance condton In other words the MH algorthm leads to a statonary dstrbuton Recall we defned to be the true dstrbuton of Thus the MH algorthm eventually converges to the true dstrbuton! ' ' ' T T A Q A Q Q A Q Q Q A Erc Xng @ CMU 005-04 ' A Q Q f then and thus ' A

Caveats Although MH eventually converges to the true dstrbuton we have no guarantees as to when ths wll occur The burn-n perod represents the un-converged part of the Markov Chan that s why we throw those samples away! Knowng when to halt burn-n s an art We wll look at some technques later n ths lecture Erc Xng @ CMU 005-04

Gbbs Samplng Gbbs Samplng s an MCMC algorthm that samples each random varable of a graphcal model one at a tme GS s a specal case of the MH algorthm GS algorthms Are farly easy to derve for many graphcal models eg mture models Latent Drchlet allocaton Have reasonable computaton and memory requrements because they sample one rv at a tme Can be Rao-Blackwellzed ntegrate out some rvs to decrease the samplng varance Erc Xng @ CMU 005-04 3

Gbbs Samplng The GS algorthm: Suppose the graphcal model contans varables n Intalze startng values for n 3 Do untl convergence: ck an orderng of the n varables can be fed or random For each varable n order: Sample from - + n e the condtonal dstrbuton of gven the current values of all other varables Update When we update we mmedately use ts new value for samplng other varables j Erc Xng @ CMU 005-04 4

Markov Blankets The condtonal - + n looks ntmdatng but recall Markov Blankets: Let MB be the Markov Blanket of then MB n For a BN the Markov Blanket of s the set contanng ts parents chldren and co-parents For an MRF the Markov Blanket of s ts mmedate neghbors Erc Xng @ CMU 005-04 5

Gbbs Samplng: An Eample t B E A J M 0 F F F F F 3 4 Consder the alarm network Assume we sample varables n the order BEAJM Intalze all varables at t = 0 to False Erc Xng @ CMU 005-04 6

Gbbs Samplng: An Eample Samplng BAE at t = : Usng Bayes Rule AE = FF so we compute the followng and sample B = F 7 t B E A J M 0 F F F F F F 3 4 B E B A E A B 09980 09990999 00006 00600 F E F A F B F E F A T B Erc Xng @ CMU 005-04

Gbbs Samplng: An Eample Samplng EAB: Usng Bayes Rule AB = FF so we compute the followng and sample E = T 8 t B E A J M 0 F F F F F F T 3 4 E E B A B A E 09970 09990998 004 0700 F B F A F E F B F A T E Erc Xng @ CMU 005-04

Gbbs Samplng: An Eample Samplng ABEJM: Usng Bayes Rule BEJM = FTFF so we compute the followng and sample A = F 9 t B E A J M 0 F F F F F F T F 3 4 E B A A M A J M J E B A 06678 09509907 00087 00309 F M F J T E F B F A F M F J T E F B T A Erc Xng @ CMU 005-04

Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T 3 4 Samplng JA: No need to apply Bayes Rule A = F so we compute the followng and sample J = T J T A F 005 J F A F 095 Erc Xng @ CMU 005-04 30

Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T F 3 4 Samplng MA: No need to apply Bayes Rule A = F so we compute the followng and sample M = F M T A F 00 M F A F 099 Erc Xng @ CMU 005-04 3

Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T F F T T T T 3 4 Now t = and we repeat the procedure to sample new values of BEAJM Erc Xng @ CMU 005-04 3

Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T F F T T T T 3 T F T F T 4 T F T F F Now t = and we repeat the procedure to sample new values of BEAJM And smlarly for t = 3 4 etc Erc Xng @ CMU 005-04 33

Topc Models: Collapsed Gbbs Tom Grffths & Mark Steyvers Collapsed Gbbs samplng opular nference algorthm for topc models α Integrate out topc vectors π and topcs B Only need to sample word-topc assgnments z β π Algorthm: K B z For all varables z = z z z n Draw z t+ from z z - w where z - = z t+ z t+ z t+ - z t + z t n w M N Erc Xng @ CMU 005-04 34

Collapsed Gbbs samplng What s z z - w? It s a product of two Drchlet-Multnomal condtonal dstrbutons: word-topc term doc-topc term Erc Xng @ CMU 005-04 35

Collapsed Gbbs samplng What s z z - w? It s a product of two Drchlet-Multnomal condtonal dstrbutons: # word postons a ecludng w such that: w a = w z a = j # word postons a n the current document d ecludng w such that: z a = j # word postons a ecludng w such that: z a = j # word postons a n the current document d ecludng w Erc Xng @ CMU 005-04 36

Collapsed Gbbs llustraton w d z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton Erc Xng @ CMU 005-04 37

Collapsed Gbbs llustraton w d z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc Xng @ CMU 005-04 38

Collapsed Gbbs llustraton w d z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc Xng @ CMU 005-04 39

Collapsed Gbbs llustraton w d z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc Xng @ CMU 005-04 40

Collapsed Gbbs llustraton w d z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc Xng @ CMU 005-04 4

Collapsed Gbbs llustraton w d z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc Xng @ CMU 005-04 4

Collapsed Gbbs llustraton w d z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc Xng @ CMU 005-04 43

Collapsed Gbbs llustraton w d z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc Xng @ CMU 005-04 44

Collapsed Gbbs llustraton w d z z z 3 4 5 6 7 8 9 0 50 MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton 000 Erc Xng @ CMU 005-04 45

Gbbs Samplng s a specal case of MH The GS proposal dstrbuton s Where - denotes all varables ecept Applyng MH to ths proposal we fnd that samples are always accepted whch s eactly what GS does: GS s smply MH wth a proposal that s always accepted! 46 Q mn mn mn mn Q Q A Erc Xng @ CMU 005-04

ractcal Aspects of MCMC How do we know f our proposal Q s any good? Montor the acceptance rate lot the autocorrelaton functon How do we know when to stop burn-n? lot the sample values vs tme lot the log-lkelhood vs tme Erc Xng @ CMU 005-04 47

Acceptance Rate Low-varance proposal Q Hgh-varance proposal Q Choosng the proposal Q s a tradeoff: Narrow low-varance proposals have hgh acceptance but take many teratons to eplore fully because the proposed are too close Wde hgh-varance proposals have the potental to eplore much of but many proposals are rejected whch slows down the sampler A good Q proposes dstant samples wth a suffcently hgh acceptance rate Erc Xng @ CMU 005-04 48

Acceptance Rate Low-varance proposal Q Hgh-varance proposal Q Acceptance rate s the fracton of samples that MH accepts General gudelne: proposals should have ~05 acceptance rate [] Gaussan specal case: If both and Q are Gaussan the optmal acceptance rate s ~045 for D= dmenson and approaches ~03 as D tends to nfnty [] [] Muller 993 A Generc Approach to osteror Integraton and Gbbs Samplng [] Roberts GO Gelman A and Glks WR 994 Weak Convergence and Optmal Scalng of Random Walk Metropols Algorthms Erc Xng @ CMU 005-04 49

Autocorrelaton functon MCMC chans always show autocorrelaton AC AC means that adjacent samples n tme are hghly correlated We quantfy AC wth the autocorrelaton functon of an rv : 50 Low autocorrelaton Hgh autocorrelaton k n t t k n t k t t k R Erc Xng @ CMU 005-04

Autocorrelaton functon R k nk t t nk t t t k Low autocorrelaton Hgh autocorrelaton The frst-order AC R can be used to estmate the Sample Sze Inflaton Factor SSIF: R s R If we took n samples wth SSIF s then the effectve sample sze s n/s Hgh autocorrelaton leads to smaller effectve sample sze! We want proposals Q wth low autocorrelaton Erc Xng @ CMU 005-04 5

Sample Values vs Tme Well-med chans oorly-med chans Montor convergence by plottng samples of rvs from multple MH runs chans If the chans are well-med left they are probably converged If the chans are poorly-med rght we should contnue burn-n Erc Xng @ CMU 005-04 5

Log-lkelhood vs Tme Not converged Converged Many graphcal models are hgh-dmensonal Hard to vsualze all rv chans at once Instead plot the complete log-lkelhood vs tme The complete log-lkelhood s an rv that depends on all model rvs Generally the log-lkelhood wll clmb then eventually plateau Erc Xng @ CMU 005-04 53

Summary Markov Chan Monte Carlo methods use adaptve proposals Q to sample from the true dstrbuton Metropols-Hastngs allows you to specfy any proposal Q But choosng a good Q requres care Gbbs samplng sets the proposal Q to the condtonal dstrbuton Acceptance rate always! But remember that hgh acceptance usually entals slow eploraton In fact there are better MCMC algorthms for certan models Knowng when to halt burn-n s an art Erc Xng @ CMU 005-04 54