Markov chain Monte Carlo

Size: px
Start display at page:

Download "Markov chain Monte Carlo"

Transcription

1 Markov chain Monte Carlo Feng Li School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017

2 Today we are going to learn... 1 Markov Chains 2 Metropolis Algorithm 3 Metropolis-Hastings 4 Multiple variables Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 2 / 66

3 Markov Chains The goal of today s lecture is to learn about the Metropolis Hastings algorithm The Metropolis Hastings algorithm allows us to simulate from any distribution as long as we have the kernel of the density of the distribution. To understand the Metropolis Hastings algorithm, we must learn a little bit about Markov chains Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 3 / 66

4 Basic Probability Rules Law of conditional probability Pr(A = a, B = b) = Pr(A = a B = b)pr(b = b) (1) More general conditional probability Pr(A = a, B = b C = c) = Pr(A = a B = b, C = c)ˆ Pr(B = b C = c) (2) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 4 / 66

5 Basic Probability Rules Marginalizing (for a discrete variable) Pr(A = a) = ÿ b Pr(A = a, B = b) (3) More general Pr(A = a C = c) = ÿ b Pr(A = a, B = b C = c) (4) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 5 / 66

6 Independence Two variables are independent if Pr(A = a, B = b) = Pr(A = a)pr(b = b (5) Dividing both sides by Pr(B=b) gives Pr(A = a B = b) = Pr(A = b (6) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 6 / 66

7 Conditional Independence Two variables A and B are Conditionally Independent if Pr(A = a, B = b C = c) = Pr(A = a C = c)ˆ Pr(B = b C = b, c (7) Dividing both sides by Pr(B = b C = c) gives Pr(A = a B = b, C = c) = Pr(A = a C = b, c (8) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 7 / 66

8 A simple game Player A and Player B play a game. The probability that Player A wins each game is 0.6 and the probability that Player B wins each game is 0.4. They play the game N times. Each game is independent. Let X i = 0 if Player A wins game i X i = 1 if Player B wins game i Also assume there is an initial Game called Game 0 (X 0 ) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 8 / 66

9 Some simple questions What is the probability that Player A wins Game 1 ((X 1 = 0)) if If X 0 = 0 (Player A wins Game 0) If X 0 = 1 (Player B wins Game 0) What is the probability that Player A wins Game 2 ((X 2 = 0)) if If X 0 = 0 (Player A wins Game 0) If X 0 = 1 (Player B wins Game 0) Since each game is independent all answers are 0.6. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 9 / 66

10 A different game: A Markov chain Now assume that both players have a better chance of winning Game i + 1 if they already won Game i. Pr(X i+1 = 0 X i = 0) = 0.8 (9) Pr(X i+1 = 1 X i = 1) = 0.7 (10) Assume nothing other than game i has a direct effect on Game i + 1. This is called the Markov Property. Mathematically Pr(X i+1 X i, X i 1,..., X 1, X 0 ) = Pr(X i+1 X i ) (11) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 10 / 66

11 Markov Property Another way to define the Markov property is to notice that X i+1 and X i 1,..., X 0 are independent conditional on X i This may be a model for the stock market, all the valuable information about tomorrow s stock price is contained in today s price. This is related to the Efficient Market Hypothesis, a popular theory in finance. Now back to the simple game. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 11 / 66

12 Simulating from a Markov chain Now let s simulate a sequence X 1, X 2,..., X 100 from the Markov chain. Initialize at x 0 = 0. Then inside a loop Code the following using if. " 0 with probability 0.8 if X i = 0 then X i+1 = 1 with probability 0.2 " 0 with probability 0.3 if X i = 1 then X i+1 = 1 with probability 0.7 Try it Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 12 / 66

13 Markov chain Sequence of X x i Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 13 / 66

14 Simple questions again What is the probability that Player A wins the first game (i.e (X 1 = 0)) if If X 0 = 0 (Player A wins initial game) If X 0 = 1 (Player B wins initial game) The answers are 0.8 and 0.3. What is the probability that Player A wins the second game (X 2 = 0) if If X 0 = 0 (Player A wins initial game) If X 0 = 1 (Player B wins initial game) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 14 / 66

15 Solution Let X 0 = 0. Then Pr(X 2 = 0 X 0 = 0) = ÿ Pr(X 2 = 0, X 1 = x 1 X 0 = 0) x 1 =0,1 = ÿ Pr(X 2 = 0 X 1 = x 1, X 0 = 0)Pr(X 1 = x 1 X 0 = 0) x 1 =0,1 = ÿ Pr(X 2 = 0 X 1 = x 1 )Pr(X 1 = x 1 X 0 = 0) x 1 =0,1 = 0.8 ˆ ˆ 0.2 = 0.7 What if X 0 = 1? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 15 / 66

16 Recursion Notice that the distribution of X i depends on X 0 The sequence is no longer independent. How could you compute Pr(X n = 0 X 0 = 0) when n = 3, when n = 5, when n = 100? This is hard, but the Markov Property does make things simpler We can use a recursion to compute the probability that Player A wins any game. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 16 / 66

17 Recursion Note that Pr(X i = 0 X 0 = 0) = ÿ Pr(X i = 0, X i 1 = x i 1 X 0 = 0) x i 1 = ÿ x i 1 Pr(X i = 0 X i 1 = x i 1, X 0 = 0)Pr(X i 1 = x i 1 X 0 = 0) = ÿ x i 1 Pr(X i = 0 X i 1 = x i 1 )Pr(X i 1 = x i 1 X 0 = 0) We already applied this formula when i = 2. We can continue for i = 3, 4, 5,..., n Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 17 / 66

18 Recursion Pr(X i = 0 X 0 = 0) = ÿ x i 1 Pr(X i = 0 X i 1 = x i 1 )Pr(X i 1 = x i 1 X 0 = 0) Start with Pr(X 1 = 0 X 0 = 0) Get Pr(X 1 = 1 X 0 = 0) Use these in formula with i = 2 Get Pr(X 2 = 0 X 0 = 0) Get Pr(X 2 = 1 X 0 = 0) Use these in formula with i = 3 Get Pr(X 3 = 0 X 0 = 0).... Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 18 / 66

19 Matrix Form It is much easier to do this calculation in matrix form (especially when X is not binary). Let P be the transition matrix X i = 0 X i = 1 X i 1 = 0 Pr(X i = 0 X i 1 = 0) Pr(X i = 1 X i 1 = 0) X i 1 = 1 Pr(X i = 0 X i 1 = 1) Pr(X i = 1 X i 1 = 1) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 19 / 66

20 Matrix Form In our example: X i = 0 X i = 1 X i 1 = X i 1 = P = ( ) (12) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 20 / 66

21 Matrix Form Let π i be a 1 ˆ 2 row vector which denotes the probabilities of each player winning Game i conditional on the initial Game In our example if X 0 = 0 π i = (Pr(X i = 0 X 0 ), Pr(X i = 1 X 0 )) (13) In our example if X 0 = 1 π 1 = (0.8, 0.2) (14) π 1 = (0.3, 0.7) (15) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 21 / 66

22 Recursion in Matrix form The recursion formula is π i = π i 1 P (16) Therefore Now code this up in R. What is Pr(X n = 0 X 0 = 0) when n = 3 n = 5 n = 100? Do the same when X 0 = 1 π n = π 1 P ˆ P ˆ... ˆ P (17) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 22 / 66

23 Convergence? For n = 3 and n = 5, the starting point made a big difference. For n = 100 it did not make a big difference. Could this Markov chain be converging to something? Now write code to keep the values of π i for i = 1, 2,..., 100. Then plot the values of π i1 against i Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 23 / 66

24 Convergence X 0 = 0 Pr(X i = 0) i X 0 = 1 Pr(X i = 0) i Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 24 / 66

25 More Questions What is Pr(X 100 = 0 X 0 = 0)? What is Pr(X 100 = 0 X 0 = 1)? What is Pr(X 1000 = 0 X 0 = 0)? What is Pr(X 1000 = 0 X 0 = 1)? The answer to all of these is 0.6. The X do not converge. They keep changing from 0 to 1. The Markov chain however converges to a stationary distribution. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 25 / 66

26 Simulation with a Markov chain Go back to your code for generating a Markov chain and generate a chain with n = Exclude the first values of X i and keep the remaining values. How many X i = 0? How many X i = 1 We have discovered a new way to simulate from a distribution with Pr(X i = 0) = 0.6 and Pr(X i = 1) = 0.4 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 26 / 66

27 Markov Chains Sometimes two different Markov chains converge to the same stationary distribution. See what happens when ( ) P = (18) Sometimes Markov chains do not converge to a stationary distribution at all. Some Markov chains can get stuck in an absorbing state. For example what would the simple example look like if Pr(X i+1 = 0 X i = 0) = 1? Markov chains can be defined on continuous support as well, X i can be continuous. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 27 / 66

28 Some important points This is a very complicated way to generate from a simple distribution. For the binary example the direct method would be better. However for other examples, either the direct method or accept/reject algorithm do not work. In these cases we can construct a Markov chain that has a stationary distribution that is our target distribution. All we need is the kernel of the density function, and an algorithm called the Metropolis Algorithm Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 28 / 66

29 Normalizing Constant and Kernel What are the normalizing constant and kernel of the Beta density? Beta(x; a, b) = Γ(a + b) (Γ(a)Γ(b)) xa 1 (1 x) b 1 (19) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 29 / 66

30 The Metropolis algorithm The Metropolis algorithm was developed in a 1953 paper by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller. The aim is to simulate x p(x) where p(x) is called the target density. We will need a proposal density q(x [old] Ñ x [new] ) For example one choice of q is This is called a Random Walk proposal x [new] N(x [old], 1) (20) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 30 / 66

31 Symmetric proposal An important property of q in the Metropolis algorithm is symmetry of the proposal q(x [old] Ñ x [new] ) = q(x [new] Ñ x [old] ) (21) Later we will not need this assumption Can you confirm this is true for x [new] N(x [old], 1)? Can you simulate from this random walk (use x 0 = 0 as a starting value)? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 31 / 66

32 Proof of symmetry of random walk The proposal " ) * 2 q(x [old] Ñ x [new] ) = (2π) 1/2 exp 1 (x [new] x [old] 2 " [ )] * 2 = (2π) 1/2 exp 1 1 (x [new] x [old] 2 " ) * 2 = (2π) 1/2 exp 1 (x [old] x [new] 2 (22) (23) (24) = q(x [new] Ñ x [old] ) (25) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 32 / 66

33 Accept and reject By itself the random walk will not converge to anything. To make sure this Markov chain converges to our target, we need to include the following. At step i + 1 set x [old] = x [i]. Generate x [new] N(x [old], 1) and compute ( ) α = min 1, p(x[new] ) p(x [old] ) Then Set x [i+1] to x [new] with probability α (accept) Set x [i+1] to x [old] with probability 1 α (reject) (26) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 33 / 66

34 Code it up Use the Metropolis algorithm with a random walk proposal to simulate a sample from the standard t distribution with 5 df. The target density is p(x) = ] 3 [1 + x2 (27) 5 Simulate a Markov chain with iterates using the random walk as a proposal. The first iterates are the burn-in and will be left out because the Markov chain may not have converged yet. Use x 0 = 0 as a starting value Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 34 / 66

35 Some diagnostics - Convergence There a few diagnostics we can use to investigate the behaviour of the chain One is a trace plot (including burn in), which is simply a line plot of the iterates. Plot this for your Markov chain Another diagnostic is the Geweke diagnostic which can be found in the R package coda. The Geweke diagnostic tests the equality of the means of two different parts of the chain (excluding burn-in). The test statistic has a standard normal distribution. Rejecting this test is evidence that the chain has not converged Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 35 / 66

36 Trace Plot Initial Value =0, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 36 / 66

37 The effect of starting value Now rerun the code with a starting value of X 0 = 100 Does the chain still converge? Does it converge quicker or slower? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 37 / 66

38 Trace Plot Initial Value =0, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 38 / 66

39 Trace Plot Initial Value =100, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 39 / 66

40 The effect of proposal variance Keep the starting value of X 0 = 100 No change the standard deviation of the proposal to 3. Does the chain still converge? Does it converge quicker or slower? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 40 / 66

41 Trace Plot Initial Value =100, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 41 / 66

42 Trace Plot Initial Value =100, Proposal S.D. =3 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 42 / 66

43 Huge proposal variance Maybe you think the best strategy is to choose a huge standard deviation. Try to use a proposal standard deviation of 100. Plot a trace plot of the chain. The plot is rejecting many iterates. This must be inefficient Change your code to compute the percentage of times a new iterate is accepted (excluding burn in). Use x 0 = as an initial value. What is the acceptance rate when the proposal standard deviation is 1? What is the acceptance rate when the proposal standard deviation is 100? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 43 / 66

44 Trace Plot Initial Value =100, Proposal S.D. =100 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 44 / 66

45 Acceptance Rate If the proposal variance is too high Values will be proposed that are too far into the tails of the stationary distribution The Metropolis algorithm will mostly reject these values. The sample will still come from the correct target distribution but this is a very inefficient way to sample. What happens if a very small proposal variance is used. Try a proposal variance of What is the acceptance rate? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 45 / 66

46 Trace Plot Initial Value =0, Proposal S.D. =0.001 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 46 / 66

47 Acceptance Rate The acceptance rate is almost 1. However is this a good proposal? The jumps made by this proposal are too small, and do not sample enough iterates from the tails of the distribution. If it runs long enough the Markov chain will provide a sample from the target distribution. However, it is very inefficient. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 47 / 66

48 Random Walk proposal For a random walk proposal it is not good to have an acceptance rate that is too high or too low. What exactly is too high and too low? It depends on many things including the target and proposal. A rough rule is to aim for an acceptance rate between 20% and 70% If your acceptance rate is outside this range the proposal variance can be doubled or halved There are better (but more complicated) ways to do this. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 48 / 66

49 Monte Carlo Error Now that there is a sample. X [1], X [2],..., X [M] p(x). What can it be used for? We can estimate the expected value E(X) This can be done by taking: E(X) «1 M Note we use «instead of =. There is some error since we are estimating E(X) based on a sample. Mÿ i=1 X [i] Luckily we can make this smaller by generating a bigger sample. We call this Monte Carlo error. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 49 / 66

50 Measuring Monte Carlo Error One way to measure Monte Carlo Error is the variance of the sample mean. ( ) ( ) 1 Mÿ Var X [i] = 1 M M M 2 Var ÿ X [i] i=1 i=1 = 1 Mÿ M 2 Var(X [i] ) i=1 = Var(X) M A sample from a Markov chain is correlated Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 50 / 66

51 Measuring Monte Carlo Error One way to measure Monte Carlo Error is the variance of the sample mean. ( ) ( ) 1 Mÿ Var X [i] = 1 M M M 2 Var ÿ X [i] i=1 i=1 = 1 Mÿ M 2 Var(X [i] ) i=1 = Var(X) M A sample from a Markov chain is correlated Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 50 / 66

52 Measuring Monte Carlo Error One way to measure Monte Carlo Error is the variance of the sample mean. ( ) ( ) 1 Mÿ Var X [i] = 1 M M M 2 Var ÿ X [i] i=1 i=1 = 1 ÿ M M 2 Var(X [i] ) + 2 ÿ M ÿ M 2 cov(x [i], X [j] ) i=1 = Var(X) M + 2 M 2 i=1 jąi Mÿ ÿ cov(x [i], X [j] ) i=1 jąi A sample from a Markov chain is correlated Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 50 / 66

53 Monte Carlo efficiency It is better to have lower correlation in the Markov chain. The efficiency of the chain can be measured using the effective sample size. The effective sample size can be computed using the function effectivesize in the R Package coda. Obtain an effective sample size for your sample (excluding burn in) where Proposal S.D. =1 Proposal S.D. =5 My answers were about 6000 and and yours should be close to that Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 51 / 66

54 Interpret Effective Sample Size What does it mean to say a Monte Carlo with a sample size of has an effective sample size (ESS) of just 6000? The sampling error of a correlated sample of is equal to the sampling error of an independent sample of Mathematically ( ) 1 Mÿ Var X [i] M i=1 = Var(X) M + 2 M 2 Mÿ Mÿ i=1 jąi cov(x [i], X [j] ) = Var(X) M eff It is a useful diagnostic for comparing two different proposal variances. A higher ESS implies a more efficient scheme. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 52 / 66

55 Non-Symmetric proposal In 1970, Hastings proposed an extension to the Metropolis Hastings algorithm. This allows for the case when q(x [old] Ñ x [new] ) q(x [new] Ñ x [old] ) (28) The only thing that changes is the acceptance probability ( α = min 1, p(x[new] )q(x [new] Ñ x [old] ) ) p(x [old] )q(x [old] Ñ x [new] ) This is called the Metropolis-Hastings algorithm (29) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 53 / 66

56 An interesting proposal Suppose we use the proposal: x new N(0, a (5/3)) (30) What is q(x [old] Ñ x [new] )? It is q(x [new] ) where q(.) is the density of a N(0, a (5/3)). Is this symmetric? No, since generally q(x [new] ) q(x [old] ) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 54 / 66

57 Metropolis Hastings Code this where p(.) is the standard t density with 5 d.f, and q(.) is normal with mean 0 and standard deviation 5/3. Inside a loop Generate x [new] N(0, a (5/3)) Set x old = x [i] and compute ( ) α = min 1, p(x[new] )q(x [old] ) (31) p(x [old] )q(x [new] ) Try it Set x [i+1] to x [new] with probability α (accept) Set x [i+1] to x [old] with probability 1 α (reject) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 55 / 66

58 Comparison The Effective Sample Size of this proposal is about much higher than the best random walk proposal. Why does it work so well? The standard t distribution with 5 df has a mean of 0 and a standard deviation of a (5/3) So the N(0, a (5/3)) is a good approximation to the standard student t with 5 df. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 56 / 66

59 Laplace Approximation Using a Taylor expansion of lnp(x) around the point a lnp(x) «lnp(a) + Blnp(x) ˇ (x a) + 1 B 2 lnp(x) ˇ Bx ˇx=a 2 Bx 2 (x a) 2 ˇx=a Let a be the point that maximises lnp(x) and let ( B 2 lnp(x) b = ˇ Bx 2 ˇx=a ) 1 (32) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 57 / 66

60 The approximation is lnp(x) «lnp(a) 1 (x a)2 2b Taking exponential of both sides ] [ (x a)2 p(x) «k ˆ exp 2b Any distribution can be approximated by a normal distribution with mean a and variance b where a and b values can be found numerically if needed. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 58 / 66

61 Exercise: Generating from skew normal The density of the skew normal is p(x) = 2φ(x)Φ(δx) (33) where φ(x) is the density of the standard normal Φ(x) is the distribution of the standard normal. Using a combination of optim, dnorm and pnorm find the Laplace approximation of the skew normal when δ = 3 Use it to generate a sample from the skew normal distribution using the Metropolis Hastings algorithm. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 59 / 66

62 Indirect method v Metropolis Hastings Some similarities are: Both require a proposal Both are more efficient when the proposal is a good approximation to the target. Both involve some form of accepting/rejecting Some differences are: Why? Indirect method produces an independent sample, MH samples are correlated. Indirect method requires p(x)/q(x) to be finite for all x. MH works better when x is a (high-dimensional vector). Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 60 / 66

63 Multiple variables Suppose we now want to sample from a bivariate distribution p(x, z) The ideas involved in this section work for more than two variables. It is possible to do a 2-dimensional random walk proposal. However as the number of variables goes up the acceptance rate becomes lower. Also the Laplace approximation does not work as well in high dimensions. Indirect methods of simulation suffer from the same problem. We need a way to break the problem down. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 61 / 66

64 Method of composition Markov chain methods, allow us to break multivariate distributions down. If it is easy to generate from p(x) then the best way is Method of composition. Generate x [i] p(x) z [i] p(z x = x [i] ) Sometimes p(x) is difficult to get Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 62 / 66

65 Gibbs Sampler If it is easy to simulate from the conditional distribution f(x z) then that can be used as a proposal What is the acceptance ratio? α = = = 1 ( 1, p(xnew, z)p(x old ) z) p(x old, z)p(x new z) ( 1, p(xnew z)p(z)p(x old ) z) p(x old z)p(z)p(x new z) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 63 / 66

66 Gibbs Sampler This gives the Gibbs Sampler Generate x [i+1] p(x [i+1] z [i] ) Generate z [i+1] p(z [i+1] x [i+1] ) Repeat x and z can be swapped around. It works for more than two variables. Always make sure the conditioning variables are at the current state. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 64 / 66

67 Metropolis within Gibbs Even if the individual conditional distributions are not easy to simulate from, Metropolis Hastings can be used within each Gibbs step. This works very well because it breaks down a multivariate problem into smaller univariate problems. We will practice some of these algorithms in the context of Bayesian Inference Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 65 / 66

68 Summary You should be familiar with a Markov chain You should understand this can have a stationary distribution You should have a basic understanding of the Metropolis Hastings and the special cases Random Walk Metropolis Laplace approximation Gibbs Sampler Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 66 / 66

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Session 3A: Markov chain Monte Carlo (MCMC)

Session 3A: Markov chain Monte Carlo (MCMC) Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

I. Bayesian econometrics

I. Bayesian econometrics I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo (and Bayesian Mixture Models) David M. Blei Columbia University October 14, 2014 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review

MCMC Review. MCMC Review. Gibbs Sampling. MCMC Review MCMC Review http://jackman.stanford.edu/mcmc/icpsr99.pdf http://students.washington.edu/fkrogsta/bayes/stat538.pdf http://www.stat.berkeley.edu/users/terry/classes/s260.1998 /Week9a/week9a/week9a.html

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Markov chain Monte Carlo Lecture 9

Markov chain Monte Carlo Lecture 9 Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013 Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Advanced Statistical Modelling

Advanced Statistical Modelling Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Lecture 10: Powers of Matrices, Difference Equations

Lecture 10: Powers of Matrices, Difference Equations Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Generating the Sample

Generating the Sample STAT 80: Mathematical Statistics Monte Carlo Suppose you are given random variables X,..., X n whose joint density f (or distribution) is specified and a statistic T (X,..., X n ) whose distribution you

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Peter Beerli October 10, 2005 [this chapter is highly influenced by chapter 1 in Markov chain Monte Carlo in Practice, eds Gilks W. R. et al. Chapman and Hall/CRC, 1996] 1 Short

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Appendix 3. Markov Chain Monte Carlo and Gibbs Sampling

Appendix 3. Markov Chain Monte Carlo and Gibbs Sampling Appendix 3 Markov Chain Monte Carlo and Gibbs Sampling A constant theme in the development of statistics has been the search for justifications for what statisticians do Blasco (2001) Draft version 21

More information

MATH 56A SPRING 2008 STOCHASTIC PROCESSES

MATH 56A SPRING 2008 STOCHASTIC PROCESSES MATH 56A SPRING 008 STOCHASTIC PROCESSES KIYOSHI IGUSA Contents 4. Optimal Stopping Time 95 4.1. Definitions 95 4.. The basic problem 95 4.3. Solutions to basic problem 97 4.4. Cost functions 101 4.5.

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Probability Theory and Simulation Methods

Probability Theory and Simulation Methods Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:

More information

Convergence Rate of Markov Chains

Convergence Rate of Markov Chains Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Sampling Methods (11/30/04)

Sampling Methods (11/30/04) CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with

More information

Session 5B: A worked example EGARCH model

Session 5B: A worked example EGARCH model Session 5B: A worked example EGARCH model John Geweke Bayesian Econometrics and its Applications August 7, worked example EGARCH model August 7, / 6 EGARCH Exponential generalized autoregressive conditional

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

General Construction of Irreversible Kernel in Markov Chain Monte Carlo

General Construction of Irreversible Kernel in Markov Chain Monte Carlo General Construction of Irreversible Kernel in Markov Chain Monte Carlo Metropolis heat bath Suwa Todo Department of Applied Physics, The University of Tokyo Department of Physics, Boston University (from

More information

Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques

Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques Practical Numerical Methods in Physics and Astronomy Lecture 5 Optimisation and Search Techniques Pat Scott Department of Physics, McGill University January 30, 2013 Slides available from http://www.physics.mcgill.ca/

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018 Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information