Ch5. Markov Chain Monte Carlo

Size: px
Start display at page:

Download "Ch5. Markov Chain Monte Carlo"

Transcription

1 ST4231, Semester I, Ch5. Markov Chain Monte Carlo In general, it is very difficult to simulate the value of a random vector X whose component random variables are dependent. In this chapter we present a powerful approach for generating a random variable whose distribution is approximately that of X. This approach, called the Markov Chain Monte Carlo (MCMC) method, has the added significance of only requiring that the mass (or density) function of X be specified up to a multiplicative constant, and this, we will see, is of great importance in applications. Outline of the Chapter: Discrete Markov Chains Metropolis-Hastings Algorithm Gibbs Sampler Simulated Annealing MCMC in Bayesian Statistics 1

2 1 Discrete Markov Chains This section sets notation and reviews some fundamental results for discrete Markov chains. We only discuss discrete-time Markov chains in this chapter. We will not discuss the theory of chains with uncountable state spaces, although we will use such chains in Sections 2,3, and 4 for applications in statistics. Definition 1.1 A Markov Chain X is a discrete time stochastic process {X 0, X 1, } with the property that the distribution of X t given all previous values of the process, X 0, X 1,, X t 1, only depends upon X t 1, i.e., for any set A. P [X t A X 0,, X t 1 ] = P [X t A X t 1 ] 2

3 In a Markov chain, the transition from one state to another is probabilistic, but the production of an output symbol is deterministic. One-Step Transition Let p ij = P (X t+1 = j X t = i) denote the transition probability from state i to state j at time n + 1. Since the p ij are conditional probabilities, all transition probabilities must satisfy two conditions: p ij 0 for all (i, j) p ij = 1 for all i j We will assume that the transition probabilities are fixed and do not change with time. In such a case the Markov chain is said to be homogeneous in time. 3

4 In a case of a system with a finite number of possible state K, for example, the transition probabilities constitute a K by K matrix: p 11 p 12 p 1K p 21 p 22 P 2k P =.... p K1 p K2 p KK This kind of matrix is called a stochastic matrix. Any stochastic matrix can serve as a matrix of transition probabilities. 4

5 Multi-Step Transition The definition of one-step transition probability may be generalized to cases where the transition from one state to another takes place in some fixed number of states. j: Let p ij (m) denote the m-step transition probability from state i to state p ij (m) = P (X t+m = x j X t = x i ), m = 1, 2, We may view p ij (m) as the sum over all intermediate states k through which the system passes in its transition from state i to state j. It has the following recursive relation: p ij (m + 1) = k p ik (m)p kj, m = 1, 2, with p ik (1) = p ik. The recursive equation may be generalized as follows: p ij (m + n) = k p ik (m)p kj (n), (m, n) = 1, 2,. 5

6 Suppose a Markov chain starts in state i. The state i is said to be a recurrent state if the Markov chain returns to state i with probability 1; that is, f i = P (ever returning to state i) = 1. If the probability f i < 1, state i is said to be a transition state. If the Markov chain starts in a recurrent state, the state reoccures an infinite number of times. If it starts in a transient state, that state reoccures only a finite number of times. 6

7 The state j of a Markov chain is said to be accessible from state i if there is a finite sequence of transitions from i to j with a positive probability. If the states i and j are accessible to each other, they are said to communicate with each other, and the communication is denoted by i j. Clearly, if i j and j k, then we have i k. If two state of a Markov chain communicate with each other, they are said to belong to the same class. In general, the states of a Markov chain consist of one or more disjoint classes. If, however, all the states consist of a single class, the Markov chain is said to be indecomposable or irreducible. Reducible chains are of little practical interest in most areas of application. 7

8 Example 1 (Simple Random Walk on a Graph) A graph is described by a set of vertices and a set of edges. We say that the vertices i and j are neighbors if there is an edge joining i and j (we can also say that i and j are adjacent). For simplicity, assume that there is at most one edge joining any given pair of vertices and that there is no edge that joins a vertex to itself. The number of neighbors of the vertex i is called the degree of i and is written deg(i). In a simple random walk on a graph (Figure 1), the state space is the set of vertices of the graph. At each time t, the Markov chain looks at it current position, X t, and chooses one of its neighbors uniformly at random. It then jumps to the chosen neighbor, which becomes X t+1. Figure Figure 1: A graph with five labeled vertices. 8

9 More formally, the transition probabilities are given by { 1/deg(i) if i and j are neighbors p ij = 0 otherwise (including i = j) The associated transition probability matrix is 0 1/3 1/3 0 1/3 1/2 0 1/ /4 1/4 0 1/4 1/ /2 0 1/2 1/3 0 1/3 1/3 0 The Markov chain is reducible if and only if the graph is connected. 9

10 Consider an irreducible Markov chain that starts in a recurrent state i at time n = 0. Let T i (k) denote the time that elapses between the (k 1)th and kth returns to state i. The mean recurrence time of state i is defined as the expectation of T i (k) over the returns k. The Steady-State Probability of state i, denoted by π i, is defined as π i = 1 E(T i (k)). If E(T i (k)) <, that is π i > 0, the state i is said to be a positive recurrent state. If E(T i (k)) =, that is π i = 0, the state i is said to be a null recurrent state. 10

11 In principle, ergodicity means that we may substitute time averages for ensemble averages. In the context of a Markov chain, ergodicity means that the long-term proportion of time spent by the chain in state i corresponds to the steady-state probability π i, which may be justified as follows. The Proportion of Time spent in state i after k returns, denoted by v i (k), is defined by k v i (k) = k l=1 T i(l). The return times T i (l) form a sequence of independently and identically distributed random variables since, by definition, each return time is statistically independent of all previous return times. Moreover, in the case of a recurrent state i the chain returns to state i an infinite number of times. Hence, as the number of returns k approaches infinity, the law of large numbers states that the proportion of time spent in state i approaches the steady-state probability, as shown by lim v i(k) = π i, i = 1, 2,, K. k A sufficient, but not necessary condition, for a Markov chain to be ergodic is for it to be both irreducible and aperiodic. 11

12 Consider an ergodic Markov chain characterized by a stochastic matrix P. Let the row vector π (n 1) denote the state distribution vector of the chain at time n 1. The state distribution vector at time n is defined by By iteration, we obtain π (n) = π (n 1) P. π (n) = π (n 1) P = π (n 2) P 2 = = π (0) P n, where π (0) is the initial value of the state distribution vector. 12

13 Let p (n) ij denote the ij-th element of P n. Suppose that as time n approaches infinity, p (n) ij tends to π j independent of i, where π j is the steadystate probability of state j. Correspondingly, for large n, the matrix P n approaches the limiting form of a square matrix with identical rows as shown by π 1 π 2 π K π lim P n = n π 1 π 2 π K... = π 1 π 2 π K π. π. We then find that ( K j=1 π (0) j 1)π = 0 Since, by definition K j=1 π(0) j = 1, this condition is satisfied by the vector π independent of the initial distribution. 13

14 The probability distribution {π j } K j=1 is called an invariant or stationary distribution. It is so called because it persists forever once it is established. For an ergodic Markov chain, we may say the following. 1. Starting from an arbitrary initial distribution, the transition probabilities of a Markov chain will converge to a stationary distribution provided that such a distribution exists. 2. The stationary distribution of the Markov chain is completely independent of the initial distribution if the chain is ergodic. Example 2: Simple Random Walk on a Graph (continued) Consider a simple random walk on a graph, as described in Example 1. We claim that the equilibrium distribution is π i = deg(i)/z, where Z = k S deg(k). That is, π = (3/14, 2/14, 4/14, 2/14, 3/14). 14

15 To summarize the descriptions above, we have the following definitions and theorems. Let τ ii be the time of the first return to state i: τ ii = min{t > 0 : X t = i X 0 = i}. Definition 1.2 X is called irreducible if for all i,j, there exists a t > 0 such that P ij (t) > 0. Definition 1.3 An irreducible chain X is recurrent if P [τ ii < ] = 1 for some (and hence for all)i. Otherwise, X is transient. Another equivalent condition for recurrence is P ii (t) = for all i. t 15

16 Definition 1.4 An irreducible recurrent chain X is called positive recurrent if E[τ ii ] < for some (and hence for all) i. Otherwise, it is called null-recurrent. Another equivalent condition for positive recurrence is the existence of a stationary probability distribution for X, that is there exist π( ) such that π(i)p ij (t) = π(j) (1) for all j and t 0. i If the probability distribution of X t say π i = {X t = i}, i = 0,, N is a stationary distribution, then P {X t+1 = j} = = N P {X t+1 = j X t = i}p {X t = i} i=0 N π i P ij = π j i=0 Definition 1.5 An irreducible chain X is called aperiodic if for some (and hence for all) i, greatest common divider {t > 0 : P ii (t) > 0} = 1. 16

17 Theorem 1.1 If X is positive recurrent and aperiodic then its stationary distribution π( ) is the unique probability distribution satisfying (1). We then say that X is ergodic and the following consequences hold: (i) P ij (t) π(j) as t for all i, j. (ii) (Ergodic theorem) If E π [ h(x) ] <, then P { h N E π [h(x)]} = 1, where E π [h(x)] = i h(i)π(i), the expectation of h(x) with respect to π( ). 17

18 There is sometimes an easier way than solving the set of equations (1) of finding the stationary probabilities. Suppose one can find positive numbers x j, j = 1, 2,, N such that x i P ij = x j P ji, for i j, N x j = 1 j=1 Then summing the preceding equations over all states i yields N x i P ij = x j i=1 N i=1 P ji = x j which, since {π j, j = 1, 2,, N} are the unique solution of (1), implies that π j = x j When π i P ij = π j P ji for all i j, the Markov chain is said to be time reversible, because it can be shown, under this condition, that if the initial state is chosen according to the probability {π j }, then starting at any time the sequence of states going backwards in time will also be a Markov chain with transition probability P ij. 18

19 Example 3: Simple Random Walk on a Graph (continued) Consider a simple random walk on a graph, as described in Example 1. We claim that the equilibrium distribution is where Z = k S deg(k). π i = deg(i)/z, Verify that: If i and j are not neighbors, then π i p ij = 0 = π j p ji ; if i and j are neighbors, then π i p ij = 1 = π j p ji. 19

20 Suppose now that we want to generate the value of a random variable X having probability mass function P {X = j} = p j, j = 1, 2,, N. If we could generate an irreducible aperiodic Markov chain with limiting probabilities p j, j = 1, 2,, N, then we could be able to approximately generate such a random variable by running the chain for n steps to obtain the value of X n, where n is large. In addition, if our objective is to estimate E[h(x)] = N j=1 h(j)p j, then we could also estimate this quantity by using the estimator 1 n n i=1 h(x i). However, since the early states of the Markov chain can be strongly influenced by the initial state chosen, it is common in practice to discard the first k states, for some suitably chosen value of k. That is, the estimator 1 n k n i=k+1 h(x i), is utilized. It is difficult to know exactly how large a value of k should be used and usually one just uses one s intuition (which usually works fine because the convergence is guaranteed no matter what value is used). 20

21 2 Metropolis-Hastings Algorithm Let b(j), j = 1,, m be positive numbers, and let B = m j=1 b(j). Suppose that m is large and B is difficult to calculate, and that we want to simulate a sequence of random variables with probability mass function π(j) = b(j)/b, j = 1, 2,, m. One way of simulating a sequence of random variables whose distributions converge to π ( j), j = 1, 2,, m, is to find a Markov chain that is easy to simulate and whose limiting probabilities are the π j. The Metropolis- Hastings algorithm is as follows. 21

22 Let Q be an irreducible Markov transition probability matrix on the integers 1,, m, with q(i, j) representing the row i, column j element of Q. Now define a Markov chain {X n, n 0} as follows. When X n = i, a random variable X such that P {X = j} = q(i, j), j = 1,, m, is generated. If X = j, then X n+1 is set equal to j with probability α(i, j) and is set equal to i with probability 1 α(i, j). Under these conditions, it is easy to see that the sequence of states will constitute a Markov chain with transition probabilities P i,j given by P i,j = q(i, j)α(i, j), if j i P i,i = q(i, i) + k i q(i, k)(1 α(i, k)) 22

23 Now this Markov chain will be time reversible and have stationary probabilities π(j) if π(i)p i,j = π j P j,i for j i which is equivalent to π(i)q(i, j)α(i, j) = π(j)q(j, i)α(j, i) It is now easy to check that this will be satisfied if we take π(j)q(j, i) i) α(i, j) = min(, 1) = min(b(j)q(j, π(i)q(i, j) b(i)q(i, j), 1) 23

24 The following sums up the Metropolis-Hastings algorithm for generating a time-reversible Markov chain whose limiting probabilities are π(j) = b(j)/b, for j = 1, 2,, m. 1. Choose an irreducible Markov transition probability matrix Q with transition probabilities q(i, j), i, j = 1, 2,, m. Also, choose some integer value k between 1 and m. 2. Let n = 0 and X 0 = k. 3. Generate a random variable X such that P {X = j} = q(x n, j) and generate a random number U. 4. If U < b(x)q(x,x n) b(x n )q(x n,x), then X n+1 = X; else X n+1 = X n. 5. Set n = n + 1, goto step 3. 24

25 Proposition 2.1 Show that the Markov chain induced by the Metropolis- Hastings algorithm described above is irreducible and reversible with respect to π. Proof: Irreducibility follows from the irreducibility of Q since p(i, j) > 0 whenever q(i, j) > 0. For reversibility, we need to check π(i)p (i, j) = π(j)p (j, i) for every i and j. This is trivial if i = j, so assume i j. Then π(j)q(j, i) π(i)p (i, j) = π(i)q(i, j)min(, 1) = min(π(j)q(j, i), π(i)q(i, j)), π(i)q(i, j) which is symmetric in i and j. 25

26 Example 4: The Knapsack Problem. Suppose you have m items. For each i = 1,..., m, the i th item is worth $v i and weighs w i kilograms. You can put some of these items into a knapsack, but the total weight cannot be more than b kilograms. What is the most valuable subset of items that will fit into the knapsack? This is an NP-complete problem (the problem can not be solve in a polynomial time of m, even for an optimal algorithm). Let w = (w 1,..., w m ) R m and v = (v 1,..., v m ) R m be the vectors of weights and values respectively. Let z = (z 1,..., z m ) {0, 1} m be a vector of decision variables, where z i = 1 if and only if item i is put into the knapsack. Then for a particular decision z, the total weight of the chosen items is m i=1 w iz i = w z. The set of all permissible ways to pack the knapsack is S = {z {0, 1} m : w z b}. 26

27 We call S the set of feasible solutions. The original problem can be expressed as the following integer programming problem: maximize v z subject to z S. The problem can be solved using the Metropolis-Hastings algorithm by sampling from the following distribution π(z) = 1 Z exp{βv z}, z S, where Z = y S exp{βv y} is the known normalizing constant, and β is a user specified parameter. 27

28 The Metropolis-Hastings algorithm for the problem is as follows. Given X t = z, (i) Choose j {1,..., m} uniformly at random. (ii) Let y = (z1,..., z j 1, 1 z j, z j+1,..., z m ). (iii) If y S, then X t+1 X t and stop. (iv) If y S, then let α = min{1, exp(βv y) exp(βv z) } = min{1, exp(βv (y z))}. Generate U u[0, 1]. If U α, then set X t+1 y, otherwise, set X t+1 X t. Note that value of α can be computed quickly since { v (y z) = v vj if z j = 1 (0,..., 1 2z j,..., 0) = v j if z j = 0 Therefore, { exp( βvj ) if z j = 1 α = 1 if z j = 0 In particular, a feasible attempt to add an item is always accepted. 28

29 3 Gibbs sampler Like the Metropolis algorithm, the Gibbs sampler generates a Markov chain with the Gibbs distribution as the equilibrium distribution. However, the transition probability distribution is chosen in a special way (Geman and Geman, 1984). Consider a k-dimensional random vector X made up of the components X 1, X 2,, X k. Suppose that we have knowledge of the conditional distribution of X m, given values of all other components of X for m = 1, 2,, k. The Gibbs sampler proceeds by generating a value for the conditional distribution for each component of the random vector X, given the values of all other components of X. 29

30 Specially, starting from an arbitrary configuration {x 1 (0), x 2 (0),, x k (0)}, the Gibbs Sampler iterates as follows. Set i = 1. x 1 (i) is drawn from the conditional distribution of X 1, given x 2 (i 1), x 3 (i 1),, x k (i 1). x 2 (i) is drawn from the conditional distribution of X 2, given x 1 (i), x 3 (i 1),, x k (i 1).. x m (i) is drawn from the conditional distribution of X m, given x 1 (i),, x m 1 (i), x m+1 (i 1), x k (i 1).. x k (i) is drawn from the conditional distribution of X k, given x 1 (i), x 2 (i),, x k 1 (i). The Gibbs sampler is an iterative adaptive scheme. After n iterations of its use, we arrive at k variates: X 1 (n),, X k (n). 30

31 For the Gibbs sampler, the following two points should be carefully noted: 1. Each component of the random vector X is visited in the natural order, with the result that a total of k new variates are generated on each iteration. 2. The new value of component X m 1 is used immediately when a new value of X m is drawn for m = 2, 3,, k. 31

32 Example Suppose we want to generate (U, V, W ) having the Dirichlet density f(u, v, w) = ku 4 v 3 w 2 (1 u v w) where u > 0, v > 0, w > 0, u + v + w < 1. Let Be(a, b) represent a beta distribution with parameters a and b. Then the appropriate conditional distributions are U (V, W ) (1 V W )Q, Q Be(5, 2) V (U, W ) (1 U W )R, R Be(4, 2) W (U, V ) (1 U V )S, S Be(3, 2) Where Q is independent of (V, W ), R is independent of (U, W ), and S is independent of (U, V ). 32

33 Therefore, to use the Gibbs sampler for this problem, the algorithm is as follows: 0. Initialize u 0, v 0 and w 0 such that u 0 > 0, v 0 > 0, w 0 > 0 and u 0 + v 0 + w 0 < Sample q 1 Be(5, 2), set u 1 = (1 v 0 w 0 )q Sample r 1 Be(4, 2), set v 1 = (1 u 1 w 0 )r Sample s 1 Be(3, 2), set w 1 = (1 u 1 v 1 )s 1. Then (u 1, v 1, w 1 ) is the first cycle of the Gibbs sampler. To find the second cycle, we would simulate Q 2, R 2 and S 2 independently, Q 2 Be(5, 2), R 2 Be(4, 2), S 2 Be(3, 2) with observed values q 2, r 2 and s 2. Then the second cycle is given by u 2 = (1 v 1 w 1 )q2, v 2 = (1 u 2 w 1 )r 2, w 2 = (1 u 2 v 2 )s 2. We compute the third and higher cycles in a similar fashion. As n, the number of cycles, goes to, the distribution of (U n, V n, W n ) converges to the desired Dirichlet distribution. 33

34 4 Simulated Annealing Let A be a finite set of vectors and let V (x) be a nonnegative function defined on x A, and suppose that we are interested in finding its maximal value and at least one argument at which the maximal value is attained. That is, letting V = max x A V (x) and µ = {x A : V (x) = V } we are interested in finding V as well as an element in µ. 34

35 To begin, let β > 0 and consider the following probability mass function on the set of values in A: π β (x) = βv (x) e x A eβv (x) By multiplying the numerator and denominator of the preceding by e βv, and letting µ be the cardinality of µ, we see that π β (x) = e β(v (x) V ) µ + x µ eβ(v (x) V ) However, since V (x) V < 0 for x µ, we obtain that as β, π β (x) δ(x, µ) µ where δ(x, µ) = 1 if x µ and is 0 otherwise. Hence, if we let β be large and generate a Markov chain whose limiting distribution is π β (x), then most of the mass of this limiting distribution will be concentrated on points in µ. 35

36 These observations prompt us to raise the question: Why not simply apply the Metropolis algorithm for generating a population of configurations representative of the stochastic system at very low temperature? We DO NOT advocate the use of such a strategy because the rate of convergence of the Markov chain to thermal equilibrium is extremely slow at very low temperatures. Rather, the preferred method for improved computational efficiency is to operate the stochastic system at a high temperature where convergence to equilibrium is fast, and then maintain the system at equilibrium as the temperature is carefully lowered. 36

37 That is, we use a combination of two related ingredients: A scheme that determines the rate at which the temperature is lowered. An algorithm exemplified by the Metropolis algorithm that iteratively find the equilibrium distribution at each new temperature in the schedule by using the final state of the system at the previous temperature as the starting point for the new temperature. The twofold scheme is the essence of simulated annealing (Kirkpatrick et al, 1983). The technique derives its name from analogy with an annealing process in physics/chemistry where we start the process at high temperature and then lower the temperature slowly while maintaining thermal equilibrium. 37

38 The primary objective of simulated annealing is to find a global minimum of a cost function that characterizes large and complex systems. Simulated annealing differs from conventional iterative optimization algorithms in two important respects: The algorithm need not get stuck, since transition out of a local minimum is always possible when the system operates at a nonzero temperature. Simulated annealing is adaptive in that gross features of the final state of the system are seen at higher temperatures, while fine details of the state appear at lower temperatures. 38

39 In simulated annealing, the temperature T plays the role of a control parameter. The process will converge to a configuration of minimal energy provided that the temperature is decreased no faster than logarithmically, i.e., T k > c/log(k), where T k denotes the value of the k-th temperature. Unfortunately, such an annealing schedule is extremely slow too slow to be of practical use. In practice, we must resort to a finite-time approximation of the asymptotic convergence of the algorithm at a price that the algorithm is no longer guaranteed to find a global minimum with probability 1. 39

40 One of popular annealing schedules (Kirkpatrick et al, 1983) is as follows. Initial value of the temperature. The initial value of T 0 of the temperature is chosen high enough to ensure that virtually all proposed transitions are accepted by the simulated annealing algorithm. Decrement of the temperature. Ordinarily, the cooling is performed exponentially, and the changes made in the value of the temperature are small. In particular, the decrement function is defined by T k = αt k 1, k = 1, 2, where α is a constant smaller than, but close to, unity. Typical values of α lie between 0.8 and At each temperature, enough transitions are attempted so that there are 10 accepted transitions per experiment on the average. Final value of temperature. The system is frozen and annealing stops if the desired number of acceptances is not achieved at three successive temperature. Example 5: The Knapsack Problem Revisit the problem with simulated annealing. 40

41 5 MCMC in Bayesian Statistics In this section, we will discuss one example in detail, using it to introduce the basic ideas of Bayesian statistics. Example 6: Pump Failure. The nuclear power plant Farley 1 contains 10 pump systems. The i th has been running for time t i (in thousands of hours). During that time, it has failed N i times as recorded below: i N i t i N i /t i

42 We treat the t i s as nonrandom quantities. We assume that failures of the i th pump occur according to a Poisson process with rate λ i, independent of the other nine pumps. This assumption means that N i P oisson(λ i t i ), and the likelihood function P (N λ) = 10 i=1 (λ i t i ) N i exp( λ it i ), N i! where N = (N 1,..., N 10 ) and λ = (λ 1,..., λ 10 ). We further assume that λ i Gamma(α, β), i = 1,..., 10 with α = 1.8 as in Gelfand and Smith (1990), and β Gamma(γ, δ), with γ = 0.01 and δ = 1. Then E(β) = 0.01, so that β tends to be small, which makes the Gamma(α, β) distribution very flat. A flat prior distribution for λ is desirable because it does not contain much a priori information and so in principle it should affect the results less. 42

43 Thus, we have the following posterior distribution P (λ, β N) = 1 P (N) i=1 10 [λ N i+α 1 i i=1 10 [ λn i+α 1 i exp( λ i (t i + β))t N i i N i!γ(α) β α exp( λ i (t i + β))t N i i β α ]β γ 1 e β ] βγ 1 e β Γ(γ) It is easy to see that f(λ i λ i, β, N) λ N i+α 1 i which is Gamma(N i + α, t i + β). Next, f(β λ, N) 10 i=1 which is Gamma(10α + γ, 10 i=1 λ i + 1). exp[ λ i (t i + β)], [exp( λ i β)β α ]β γ 1 e β 10 = β 10α+γ 1 exp[ β( λ i + 1)], i=1 Suppose that Markov chain samples X t = (λ 1 (t),..., λ 10 (t), β(t)), t = 0, 1,... have been generated. Based on that, we can estimate E(λ 1 N) by Recall that 1 n n t=1 N 1 + α t 1 + β(t). E(λ 1 N, β) = N 1 + α t 1 + β. 43

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

On Markov Chain Monte Carlo

On Markov Chain Monte Carlo MCMC 0 On Markov Chain Monte Carlo Yevgeniy Kovchegov Oregon State University MCMC 1 Metropolis-Hastings algorithm. Goal: simulating an Ω-valued random variable distributed according to a given probability

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

Statistics 150: Spring 2007

Statistics 150: Spring 2007 Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Lecture 20 : Markov Chains

Lecture 20 : Markov Chains CSCI 3560 Probability and Computing Instructor: Bogdan Chlebus Lecture 0 : Markov Chains We consider stochastic processes. A process represents a system that evolves through incremental changes called

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulation Ulm University Institute of Stochastics Lecture Notes Dr. Tim Brereton Summer Term 2015 Ulm, 2015 2 Contents 1 Discrete-Time Markov Chains 5 1.1 Discrete-Time Markov Chains.....................

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods p. /36 Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Markov Chain Monte Carlo Methods p. 2/36 Markov Chains

More information

Approximate Counting and Markov Chain Monte Carlo

Approximate Counting and Markov Chain Monte Carlo Approximate Counting and Markov Chain Monte Carlo A Randomized Approach Arindam Pal Department of Computer Science and Engineering Indian Institute of Technology Delhi March 18, 2011 April 8, 2011 Arindam

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected 4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1 MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter

More information

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states. Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1 Irreducibility Irreducible every state can be reached from every other state For any i,j, exist an m 0, such that i,j are communicate, if the above condition is valid Irreducible: all states are communicate

More information

Convergence Rate of Markov Chains

Convergence Rate of Markov Chains Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Chapter 16 focused on decision making in the face of uncertainty about one future

Chapter 16 focused on decision making in the face of uncertainty about one future 9 C H A P T E R Markov Chains Chapter 6 focused on decision making in the face of uncertainty about one future event (learning the true state of nature). However, some decisions need to take into account

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Markov Chains Handout for Stat 110

Markov Chains Handout for Stat 110 Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of

More information

SMSTC (2007/08) Probability.

SMSTC (2007/08) Probability. SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

1 Random Walks and Electrical Networks

1 Random Walks and Electrical Networks CME 305: Discrete Mathematics and Algorithms Random Walks and Electrical Networks Random walks are widely used tools in algorithm design and probabilistic analysis and they have numerous applications.

More information

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University

Markov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University Markov Chains Andreas Klappenecker Texas A&M University 208 by Andreas Klappenecker. All rights reserved. / 58 Stochastic Processes A stochastic process X tx ptq: t P T u is a collection of random variables.

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Probability, Random Processes and Inference

Probability, Random Processes and Inference INSTITUTO POLITÉCNICO NACIONAL CENTRO DE INVESTIGACION EN COMPUTACION Laboratorio de Ciberseguridad Probability, Random Processes and Inference Dr. Ponciano Jorge Escamilla Ambrosio pescamilla@cic.ipn.mx

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Markov Chain Monte Carlo. Simulated Annealing.

Markov Chain Monte Carlo. Simulated Annealing. Aula 10. Simulated Annealing. 0 Markov Chain Monte Carlo. Simulated Annealing. Anatoli Iambartsev IME-USP Aula 10. Simulated Annealing. 1 [RC] Stochastic search. General iterative formula for optimizing

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution

More information

SC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.

More information

Lecture 5: Random Walks and Markov Chain

Lecture 5: Random Walks and Markov Chain Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables

More information

Propp-Wilson Algorithm (and sampling the Ising model)

Propp-Wilson Algorithm (and sampling the Ising model) Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)

More information

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Markov and Gibbs Random Fields

Markov and Gibbs Random Fields Markov and Gibbs Random Fields Bruno Galerne bruno.galerne@parisdescartes.fr MAP5, Université Paris Descartes Master MVA Cours Méthodes stochastiques pour l analyse d images Lundi 6 mars 2017 Outline The

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Stochastic process. X, a series of random variables indexed by t

Stochastic process. X, a series of random variables indexed by t Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,

More information

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

2. Transience and Recurrence

2. Transience and Recurrence Virtual Laboratories > 15. Markov Chains > 1 2 3 4 5 6 7 8 9 10 11 12 2. Transience and Recurrence The study of Markov chains, particularly the limiting behavior, depends critically on the random times

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Markov Processes Hamid R. Rabiee

Markov Processes Hamid R. Rabiee Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

An Introduction to Entropy and Subshifts of. Finite Type

An Introduction to Entropy and Subshifts of. Finite Type An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property Chapter 1: and Markov chains Stochastic processes We study stochastic processes, which are families of random variables describing the evolution of a quantity with time. In some situations, we can treat

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at MSc MT15. Further Statistical Methods: MCMC Lecture 5-6: Markov chains; Metropolis Hastings MCMC Notes and Practicals available at www.stats.ox.ac.uk\ nicholls\mscmcmc15 Markov chain Monte Carlo Methods

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

Simulation - Lectures - Part III Markov chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo Simulation - Lectures - Part III Markov chain Monte Carlo Julien Berestycki Part A Simulation and Statistical Programming Hilary Term 2018 Part A Simulation. HT 2018. J. Berestycki. 1 / 50 Outline Markov

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

25.1 Markov Chain Monte Carlo (MCMC)

25.1 Markov Chain Monte Carlo (MCMC) CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Model Counting for Logical Theories

Model Counting for Logical Theories Model Counting for Logical Theories Wednesday Dmitry Chistikov Rayna Dimitrova Department of Computer Science University of Oxford, UK Max Planck Institute for Software Systems (MPI-SWS) Kaiserslautern

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture

More information

Lecture 9 Classification of States

Lecture 9 Classification of States Lecture 9: Classification of States of 27 Course: M32K Intro to Stochastic Processes Term: Fall 204 Instructor: Gordan Zitkovic Lecture 9 Classification of States There will be a lot of definitions and

More information

Markov Chains (Part 3)

Markov Chains (Part 3) Markov Chains (Part 3) State Classification Markov Chains - State Classification Accessibility State j is accessible from state i if p ij (n) > for some n>=, meaning that starting at state i, there is

More information

MARKOV PROCESSES. Valerio Di Valerio

MARKOV PROCESSES. Valerio Di Valerio MARKOV PROCESSES Valerio Di Valerio Stochastic Process Definition: a stochastic process is a collection of random variables {X(t)} indexed by time t T Each X(t) X is a random variable that satisfy some

More information

Advanced Sampling Algorithms

Advanced Sampling Algorithms + Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Advanced Sampling Algorithms Aditya Kulkarni Tobias Bertelsen Nirandika Wanigasekara Malay Singh

More information

5. Simulated Annealing 5.2 Advanced Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

5. Simulated Annealing 5.2 Advanced Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini 5. Simulated Annealing 5.2 Advanced Concepts Fall 2010 Instructor: Dr. Masoud Yaghini Outline Acceptance Function Initial Temperature Equilibrium State Cooling Schedule Stopping Condition Handling Constraints

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Question Points Score Total: 70

Question Points Score Total: 70 The University of British Columbia Final Examination - April 204 Mathematics 303 Dr. D. Brydges Time: 2.5 hours Last Name First Signature Student Number Special Instructions: Closed book exam, no calculators.

More information

Chapter 10. Optimization Simulated annealing

Chapter 10. Optimization Simulated annealing Chapter 10 Optimization In this chapter we consider a very different kind of problem. Until now our prototypical problem is to compute the expected value of some random variable. We now consider minimization

More information

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013

Statistics 253/317 Introduction to Probability Models. Winter Midterm Exam Friday, Feb 8, 2013 Statistics 253/317 Introduction to Probability Models Winter 2014 - Midterm Exam Friday, Feb 8, 2013 Student Name (print): (a) Do not sit directly next to another student. (b) This is a closed-book, closed-note

More information

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature. Markov Chains Markov chains: 2SAT: Powerful tool for sampling from complicated distributions rely only on local moves to explore state space. Many use Markov chains to model events that arise in nature.

More information

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018 Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections

More information

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices Discrete time Markov chains Discrete Time Markov Chains, Limiting Distribution and Classification DTU Informatics 02407 Stochastic Processes 3, September 9 207 Today: Discrete time Markov chains - invariant

More information

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t 2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC) Markov Chains (2) Outlines Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC) 2 pj ( n) denotes the pmf of the random variable p ( n) P( X j) j We will only be concerned with homogenous

More information