Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

Similar documents
No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Let's contemplate a continuous-time limit of the Bernoulli process:

Countable state discrete time Markov Chains

Mathematical Framework for Stochastic Processes

At the boundary states, we take the same rules except we forbid leaving the state space, so,.

Finite-Horizon Statistics for Markov chains

This is now an algebraic equation that can be solved simply:

Transience: Whereas a finite closed communication class must be recurrent, an infinite closed communication class can be transient:

Monte Carlo Methods. Leon Gu CSD, CMU

FDST Markov Chain Models

Markov Chains and MCMC

Convex Optimization CMU-10725

The cost/reward formula has two specific widely used applications:

Introduction to Machine Learning CMU-10701

Markov Chains Handout for Stat 110

Markov Chains, Random Walks on Graphs, and the Laplacian

Homework 4 due on Thursday, December 15 at 5 PM (hard deadline).

Examples of Countable State Markov Chains Thursday, October 16, :12 PM

CSC 446 Notes: Lecture 13

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

6 Markov Chain Monte Carlo (MCMC)

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Markov Chains and MCMC

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Classification of Countable State Markov Chains

Markov chain Monte Carlo Lecture 9

16 : Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Note special lecture series by Emmanuel Candes on compressed sensing Monday and Tuesday 4-5 PM (room information on rpinfo)

Mathematical Foundations of Finite State Discrete Time Markov Chains

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Markov Processes Hamid R. Rabiee

Markov chain Monte Carlo

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Inference in Bayesian Networks

18.600: Lecture 32 Markov Chains

Convergence Rate of Markov Chains

Markov Processes. Stochastic process. Markov process

18.440: Lecture 33 Markov Chains

Bayesian Methods for Machine Learning

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

18.175: Lecture 30 Markov chains

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Introduction to Machine Learning CMU-10701

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Random Walks A&T and F&S 3.1.2

Let's transfer our results for conditional probability for events into conditional probabilities for random variables.

So in terms of conditional probability densities, we have by differentiating this relationship:

Stochastic optimization Markov Chain Monte Carlo

Homework 3 posted, due Tuesday, November 29.

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

Numerical methods for lattice field theory

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Markov Chain Model for ALOHA protocol

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Markov Chains and Stochastic Sampling

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Markov Chain Monte Carlo

Lecture 20: Reversible Processes and Queues

Detailed Balance and Branching Processes Monday, October 20, :04 PM

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

CS 798: Homework Assignment 3 (Queueing Theory)

Stochastic Processes

Markov-Chain Monte Carlo

Birth-death chain models (countable state)

Markov chain Monte Carlo

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

MCMC and Gibbs Sampling. Sargur Srihari

25.1 Markov Chain Monte Carlo (MCMC)

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

µ n 1 (v )z n P (v, )

MAA704, Perron-Frobenius theory and Markov chains.

Approximating a single component of the solution to a linear system

STA 294: Stochastic Processes & Bayesian Nonparametrics

Poisson Point Processes

Discrete quantum random walks

6.842 Randomness and Computation March 3, Lecture 8

CONTENTS. Preface List of Symbols and Notation

Chapter 7. Markov chain background. 7.1 Finite state space

Session 3A: Markov chain Monte Carlo (MCMC)

Probability & Computing

Some Definition and Example of Markov Chain

Approximate Counting and Markov Chain Monte Carlo

Stochastic Simulation


MCMC Sampling for Bayesian Inference using L1-type Priors

Online Social Networks and Media. Link Analysis and Web Search

Use of Eigen values and eigen vectors to calculate higher transition probabilities

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Computer Engineering II Solution to Exercise Sheet Chapter 12

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

4.7.1 Computing a stationary distribution

INTRODUCTION TO MCMC AND PAGERANK. Eric Vigoda Georgia Tech. Lecture for CS 6505

Transcription:

Stationary Distributions: Application Monday, October 05, 2015 2:04 PM Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM. To prepare to describe the conditions under which the stationary distribution serves as a limit distribution for a FSDT MC, we need to introduce one more characterization of the probability transition matrix: The period of a state of a FSDT MC is defined as follows: the greatest common divisor of set of integers such that state can return to state in exactly that many epochs. The period of a state is a class property, meaning it takes the same value for all states in a communication class. A state (and therefore a communication class) is said to be aperiodic whenever the period of the state/class is equal to 1. The typical situation in which a Markov chain class fails to be aperiodic is when the Markov chain is somehow forced to cycle through certain subclasses of states in a synchronized fashion; there is a theorem that says in fact this is true in a mathematical sense. (see Lawler Ch. 1.) Random walks on graphs can sometimes fail to be aperiodic when the random walker is forced to move in each epoch. A somewhat common situation is bipartite graphs (means you can decompose the graph into two subgraphs such that every edge connects nodes from one subgraph to the other subgraph). Then the period would in general be 2 for undirected graphs (and possibly larger for directed graphs). In the last two paragraphs, subclass does not mean communication class. Now we can make two important statement about how stationary distributions represent long-time properties of FSDT MCs: 1. Stationary distribution as a limit distribution An irreducible, aperiodic FSDT MC will have the property: Stoch15 Page 1

where is the stationary distribution. 2. Law of Large Numbers for Markov Chains An irreducible (but not necessarily aperiodic) FSDT MC has the property that: For any function on the state space : Some noteworthy aspects of the LLN FSDT MC: extends LLN from iid random variables to sequences of random variables obtained from Markov chain dynamics LHS is essentially a "long-run" average of the behavior of the FSDT MC, and the right hand side gives a typically computable deterministic formula for this long-run behavior Think about the function as some cost/reward associated to the state of the FSDT MC In fact the LLN FSDT MC expresses an ergodic property, meaning a relationship between a long time average and an "ensemble average" with respect to some probability distribution of states (here the stationary distribution). Ergodicity is a property that is often exploited or assumed in practice because it's useful for computational statistics, in fact in both directions: If one can compute the RHS of the LLN FSDT MC, then this gives a good formula for computing long-time averages without having to do a long-time simulation. For example, some Hamiltonian systems in physics, which have known stationary distributions such as microcanonical, canonical, etc. On the other hand, if the stationary distribution is hard to obtain or compute with, then one can compute the RHS by taking long-time averages as indicated on the left hand side Imagine we are trying to calculate an average of a function on a large, highdimensional state space with respect to a probability distribution for which it's hard to simulate random variables directly (high-dimensional, correlated,non- Gaussian): Want There is an approach called Markov Chain Monte Carlo (MCMC) which makes up a Markov chain to construct approximate calculations for this quantity, using the LLN for MC. The main challenges to doing so are: Must construct Markov chain to have p as its stationary distribution. the most common way to do this is to use the Metropolis-Hastings algorithm, which takes a proposed Markov chain and fixes it (by acceptance/rejection steps) so it has the right stationary distribution. The Markov chain must have "good ergodic" properties. This approach only makes sense if the Markov chain is not too expensive to simulate. Stoch15 Page 2

Even though the LLN for MC guarantees eventual convergence, an important practical questions is: how long does one need to wait until the LHS is actually close to the RHS. This is called the equilibration time or relaxation time, etc. for Markov chains, and trying to get this analytically is very hard on complex systems. But for simpler systems, it actually can be easily estimated by looking at the second largest eigenvalue of the probability transition matrix for reasons I'll explain later. The reason this is important in practice is that usually the initial condition is not somehow natural/representative of the true system, and we rely on ergodicity/lln to tell us that if we simulate the system long enough, the initial conditions don't matter much to the result. and along these lines, one often applies a "burn-in period" to a Markov chain simulation to allow the initial condition to become more "natural," meaning closer to the stationary distribution, before taking statistics from the simulation Also for MCMC, it's important that the Markov chain be constructed in such a way that the convergence to the stationary distribution is reasonably fast, i.e., the random moves have to be sufficiently aggressive but also not get rejected too much. Note that Resnick calls a Markov chain "ergodic" if it is both aperiodic and irreducible, but this is a little confusing because periodicity does not interfere with ergodicity in the sense of the LLN. Computing stationary distributions One way is just to use the definition of stationary distribution, i.e., find the left eigenvector of with eigenvalue 1, and then normalize it appropriately. But there are alternatives that might be faster, particularly when the state space is large. Power method: for irreducible, aperiodic MCs: where is any probability distribution vector. From this perspective, we see that the second largest eigenvalue, governs the rate of convergence, in that the difference between the LHS and RHS will have, with. from Resnick Sec. 2.14, where is a vector of 1's, and ONE is a matrix of all 1's. Method of Kirchhof (Haken Synergetics Secs. 4.6-4.8): uses some graphical/diagrammatic analysis of the Markov chain to get analytical formulas; I've only seen this work well in some biochemical reaction networks Most important approach for obtaining analytical solutions for stationary distributions is to look for a detailed balance solution: Simply try to find a solution that satisfies the following equation: for all. Stoch15 Page 3

In general, this is overdetermined, and there will be no solution satisfying these conditions, but it's easy to check, and if such a detailed balance solution exists, then it is the stationary distribution (if irreducible). Also, systems with time-reversal symmetry typically have detailed balance solutions, i.e. in statistical mechanics. Also random walks on graph. Let's see how to apply the stationary distribution to answering long-run questions about a Markov chain model. We will consider an inspection protocol for some manufactured products. The inspection protocol is as follows: Every manufactured product is inspected until the inspector has seen good products in a row. So long as the last M inspected products are good, the inspector will only inspect every r products (i..e, skip products and pass them along for shipping) But as soon as the inspector detects a defect, it will resume inspecting every product until it has seen good ones in a row. Every good inspected product, and every uninspected product is shipped, but inspected defective products are destroyed. Practical questions (imagine trying to design ) Cost: fraction of manufactured items that are inspected Quality: fraction of shipped items that are defective And it would be natural, if the company is established, to look at long-run values of these fractions. Strategy for computing these fractions (as functions of ): develop a Markov chain model for the inspections solve for the stationary distribution Evaluate the long-run fractions of interest by using the LLN for FSDT MC The Markov chain model not only requires a description of how the inspector behaves, but also some stochastic rule for the state of the products. We'll take the simplest model where the defect status of any product is independent of any other product, that is, the defects of the products are Stoch15 Page 4

determined by a Bernoulli process with "success probability" to have a defect. Stoch15 Page 5