Markov chain Monte Carlo
|
|
- Posy O’Connor’
- 5 years ago
- Views:
Transcription
1 Markov chain Monte Carlo Feng Li School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017
2 Today we are going to learn... 1 Markov Chains 2 Metropolis Algorithm 3 Metropolis-Hastings 4 Multiple variables Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 2 / 66
3 Markov Chains The goal of today s lecture is to learn about the Metropolis Hastings algorithm The Metropolis Hastings algorithm allows us to simulate from any distribution as long as we have the kernel of the density of the distribution. To understand the Metropolis Hastings algorithm, we must learn a little bit about Markov chains Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 3 / 66
4 Basic Probability Rules Law of conditional probability Pr(A = a, B = b) = Pr(A = a B = b)pr(b = b) (1) More general conditional probability Pr(A = a, B = b C = c) = Pr(A = a B = b, C = c)ˆ Pr(B = b C = c) (2) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 4 / 66
5 Basic Probability Rules Marginalizing (for a discrete variable) Pr(A = a) = ÿ b Pr(A = a, B = b) (3) More general Pr(A = a C = c) = ÿ b Pr(A = a, B = b C = c) (4) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 5 / 66
6 Independence Two variables are independent if Pr(A = a, B = b) = Pr(A = a)pr(b = b (5) Dividing both sides by Pr(B=b) gives Pr(A = a B = b) = Pr(A = b (6) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 6 / 66
7 Conditional Independence Two variables A and B are Conditionally Independent if Pr(A = a, B = b C = c) = Pr(A = a C = c)ˆ Pr(B = b C = b, c (7) Dividing both sides by Pr(B = b C = c) gives Pr(A = a B = b, C = c) = Pr(A = a C = b, c (8) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 7 / 66
8 A simple game Player A and Player B play a game. The probability that Player A wins each game is 0.6 and the probability that Player B wins each game is 0.4. They play the game N times. Each game is independent. Let X i = 0 if Player A wins game i X i = 1 if Player B wins game i Also assume there is an initial Game called Game 0 (X 0 ) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 8 / 66
9 Some simple questions What is the probability that Player A wins Game 1 ((X 1 = 0)) if If X 0 = 0 (Player A wins Game 0) If X 0 = 1 (Player B wins Game 0) What is the probability that Player A wins Game 2 ((X 2 = 0)) if If X 0 = 0 (Player A wins Game 0) If X 0 = 1 (Player B wins Game 0) Since each game is independent all answers are 0.6. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 9 / 66
10 A different game: A Markov chain Now assume that both players have a better chance of winning Game i + 1 if they already won Game i. Pr(X i+1 = 0 X i = 0) = 0.8 (9) Pr(X i+1 = 1 X i = 1) = 0.7 (10) Assume nothing other than game i has a direct effect on Game i + 1. This is called the Markov Property. Mathematically Pr(X i+1 X i, X i 1,..., X 1, X 0 ) = Pr(X i+1 X i ) (11) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 10 / 66
11 Markov Property Another way to define the Markov property is to notice that X i+1 and X i 1,..., X 0 are independent conditional on X i This may be a model for the stock market, all the valuable information about tomorrow s stock price is contained in today s price. This is related to the Efficient Market Hypothesis, a popular theory in finance. Now back to the simple game. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 11 / 66
12 Simulating from a Markov chain Now let s simulate a sequence X 1, X 2,..., X 100 from the Markov chain. Initialize at x 0 = 0. Then inside a loop Code the following using if. " 0 with probability 0.8 if X i = 0 then X i+1 = 1 with probability 0.2 " 0 with probability 0.3 if X i = 1 then X i+1 = 1 with probability 0.7 Try it Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 12 / 66
13 Markov chain Sequence of X x i Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 13 / 66
14 Simple questions again What is the probability that Player A wins the first game (i.e (X 1 = 0)) if If X 0 = 0 (Player A wins initial game) If X 0 = 1 (Player B wins initial game) The answers are 0.8 and 0.3. What is the probability that Player A wins the second game (X 2 = 0) if If X 0 = 0 (Player A wins initial game) If X 0 = 1 (Player B wins initial game) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 14 / 66
15 Solution Let X 0 = 0. Then Pr(X 2 = 0 X 0 = 0) = ÿ Pr(X 2 = 0, X 1 = x 1 X 0 = 0) x 1 =0,1 = ÿ Pr(X 2 = 0 X 1 = x 1, X 0 = 0)Pr(X 1 = x 1 X 0 = 0) x 1 =0,1 = ÿ Pr(X 2 = 0 X 1 = x 1 )Pr(X 1 = x 1 X 0 = 0) x 1 =0,1 = 0.8 ˆ ˆ 0.2 = 0.7 What if X 0 = 1? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 15 / 66
16 Recursion Notice that the distribution of X i depends on X 0 The sequence is no longer independent. How could you compute Pr(X n = 0 X 0 = 0) when n = 3, when n = 5, when n = 100? This is hard, but the Markov Property does make things simpler We can use a recursion to compute the probability that Player A wins any game. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 16 / 66
17 Recursion Note that Pr(X i = 0 X 0 = 0) = ÿ Pr(X i = 0, X i 1 = x i 1 X 0 = 0) x i 1 = ÿ x i 1 Pr(X i = 0 X i 1 = x i 1, X 0 = 0)Pr(X i 1 = x i 1 X 0 = 0) = ÿ x i 1 Pr(X i = 0 X i 1 = x i 1 )Pr(X i 1 = x i 1 X 0 = 0) We already applied this formula when i = 2. We can continue for i = 3, 4, 5,..., n Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 17 / 66
18 Recursion Pr(X i = 0 X 0 = 0) = ÿ x i 1 Pr(X i = 0 X i 1 = x i 1 )Pr(X i 1 = x i 1 X 0 = 0) Start with Pr(X 1 = 0 X 0 = 0) Get Pr(X 1 = 1 X 0 = 0) Use these in formula with i = 2 Get Pr(X 2 = 0 X 0 = 0) Get Pr(X 2 = 1 X 0 = 0) Use these in formula with i = 3 Get Pr(X 3 = 0 X 0 = 0).... Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 18 / 66
19 Matrix Form It is much easier to do this calculation in matrix form (especially when X is not binary). Let P be the transition matrix X i = 0 X i = 1 X i 1 = 0 Pr(X i = 0 X i 1 = 0) Pr(X i = 1 X i 1 = 0) X i 1 = 1 Pr(X i = 0 X i 1 = 1) Pr(X i = 1 X i 1 = 1) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 19 / 66
20 Matrix Form In our example: X i = 0 X i = 1 X i 1 = X i 1 = P = ( ) (12) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 20 / 66
21 Matrix Form Let π i be a 1 ˆ 2 row vector which denotes the probabilities of each player winning Game i conditional on the initial Game In our example if X 0 = 0 π i = (Pr(X i = 0 X 0 ), Pr(X i = 1 X 0 )) (13) In our example if X 0 = 1 π 1 = (0.8, 0.2) (14) π 1 = (0.3, 0.7) (15) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 21 / 66
22 Recursion in Matrix form The recursion formula is π i = π i 1 P (16) Therefore Now code this up in R. What is Pr(X n = 0 X 0 = 0) when n = 3 n = 5 n = 100? Do the same when X 0 = 1 π n = π 1 P ˆ P ˆ... ˆ P (17) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 22 / 66
23 Convergence? For n = 3 and n = 5, the starting point made a big difference. For n = 100 it did not make a big difference. Could this Markov chain be converging to something? Now write code to keep the values of π i for i = 1, 2,..., 100. Then plot the values of π i1 against i Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 23 / 66
24 Convergence X 0 = 0 Pr(X i = 0) i X 0 = 1 Pr(X i = 0) i Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 24 / 66
25 More Questions What is Pr(X 100 = 0 X 0 = 0)? What is Pr(X 100 = 0 X 0 = 1)? What is Pr(X 1000 = 0 X 0 = 0)? What is Pr(X 1000 = 0 X 0 = 1)? The answer to all of these is 0.6. The X do not converge. They keep changing from 0 to 1. The Markov chain however converges to a stationary distribution. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 25 / 66
26 Simulation with a Markov chain Go back to your code for generating a Markov chain and generate a chain with n = Exclude the first values of X i and keep the remaining values. How many X i = 0? How many X i = 1 We have discovered a new way to simulate from a distribution with Pr(X i = 0) = 0.6 and Pr(X i = 1) = 0.4 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 26 / 66
27 Markov Chains Sometimes two different Markov chains converge to the same stationary distribution. See what happens when ( ) P = (18) Sometimes Markov chains do not converge to a stationary distribution at all. Some Markov chains can get stuck in an absorbing state. For example what would the simple example look like if Pr(X i+1 = 0 X i = 0) = 1? Markov chains can be defined on continuous support as well, X i can be continuous. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 27 / 66
28 Some important points This is a very complicated way to generate from a simple distribution. For the binary example the direct method would be better. However for other examples, either the direct method or accept/reject algorithm do not work. In these cases we can construct a Markov chain that has a stationary distribution that is our target distribution. All we need is the kernel of the density function, and an algorithm called the Metropolis Algorithm Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 28 / 66
29 Normalizing Constant and Kernel What are the normalizing constant and kernel of the Beta density? Beta(x; a, b) = Γ(a + b) (Γ(a)Γ(b)) xa 1 (1 x) b 1 (19) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 29 / 66
30 The Metropolis algorithm The Metropolis algorithm was developed in a 1953 paper by Metropolis, Rosenbluth, Rosenbluth, Teller and Teller. The aim is to simulate x p(x) where p(x) is called the target density. We will need a proposal density q(x [old] Ñ x [new] ) For example one choice of q is This is called a Random Walk proposal x [new] N(x [old], 1) (20) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 30 / 66
31 Symmetric proposal An important property of q in the Metropolis algorithm is symmetry of the proposal q(x [old] Ñ x [new] ) = q(x [new] Ñ x [old] ) (21) Later we will not need this assumption Can you confirm this is true for x [new] N(x [old], 1)? Can you simulate from this random walk (use x 0 = 0 as a starting value)? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 31 / 66
32 Proof of symmetry of random walk The proposal " ) * 2 q(x [old] Ñ x [new] ) = (2π) 1/2 exp 1 (x [new] x [old] 2 " [ )] * 2 = (2π) 1/2 exp 1 1 (x [new] x [old] 2 " ) * 2 = (2π) 1/2 exp 1 (x [old] x [new] 2 (22) (23) (24) = q(x [new] Ñ x [old] ) (25) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 32 / 66
33 Accept and reject By itself the random walk will not converge to anything. To make sure this Markov chain converges to our target, we need to include the following. At step i + 1 set x [old] = x [i]. Generate x [new] N(x [old], 1) and compute ( ) α = min 1, p(x[new] ) p(x [old] ) Then Set x [i+1] to x [new] with probability α (accept) Set x [i+1] to x [old] with probability 1 α (reject) (26) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 33 / 66
34 Code it up Use the Metropolis algorithm with a random walk proposal to simulate a sample from the standard t distribution with 5 df. The target density is p(x) = ] 3 [1 + x2 (27) 5 Simulate a Markov chain with iterates using the random walk as a proposal. The first iterates are the burn-in and will be left out because the Markov chain may not have converged yet. Use x 0 = 0 as a starting value Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 34 / 66
35 Some diagnostics - Convergence There a few diagnostics we can use to investigate the behaviour of the chain One is a trace plot (including burn in), which is simply a line plot of the iterates. Plot this for your Markov chain Another diagnostic is the Geweke diagnostic which can be found in the R package coda. The Geweke diagnostic tests the equality of the means of two different parts of the chain (excluding burn-in). The test statistic has a standard normal distribution. Rejecting this test is evidence that the chain has not converged Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 35 / 66
36 Trace Plot Initial Value =0, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 36 / 66
37 The effect of starting value Now rerun the code with a starting value of X 0 = 100 Does the chain still converge? Does it converge quicker or slower? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 37 / 66
38 Trace Plot Initial Value =0, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 38 / 66
39 Trace Plot Initial Value =100, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 39 / 66
40 The effect of proposal variance Keep the starting value of X 0 = 100 No change the standard deviation of the proposal to 3. Does the chain still converge? Does it converge quicker or slower? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 40 / 66
41 Trace Plot Initial Value =100, Proposal S.D. =1 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 41 / 66
42 Trace Plot Initial Value =100, Proposal S.D. =3 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 42 / 66
43 Huge proposal variance Maybe you think the best strategy is to choose a huge standard deviation. Try to use a proposal standard deviation of 100. Plot a trace plot of the chain. The plot is rejecting many iterates. This must be inefficient Change your code to compute the percentage of times a new iterate is accepted (excluding burn in). Use x 0 = as an initial value. What is the acceptance rate when the proposal standard deviation is 1? What is the acceptance rate when the proposal standard deviation is 100? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 43 / 66
44 Trace Plot Initial Value =100, Proposal S.D. =100 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 44 / 66
45 Acceptance Rate If the proposal variance is too high Values will be proposed that are too far into the tails of the stationary distribution The Metropolis algorithm will mostly reject these values. The sample will still come from the correct target distribution but this is a very inefficient way to sample. What happens if a very small proposal variance is used. Try a proposal variance of What is the acceptance rate? Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 45 / 66
46 Trace Plot Initial Value =0, Proposal S.D. =0.001 x e+00 2e+04 4e+04 6e+04 8e+04 1e+05 Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 46 / 66
47 Acceptance Rate The acceptance rate is almost 1. However is this a good proposal? The jumps made by this proposal are too small, and do not sample enough iterates from the tails of the distribution. If it runs long enough the Markov chain will provide a sample from the target distribution. However, it is very inefficient. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 47 / 66
48 Random Walk proposal For a random walk proposal it is not good to have an acceptance rate that is too high or too low. What exactly is too high and too low? It depends on many things including the target and proposal. A rough rule is to aim for an acceptance rate between 20% and 70% If your acceptance rate is outside this range the proposal variance can be doubled or halved There are better (but more complicated) ways to do this. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 48 / 66
49 Monte Carlo Error Now that there is a sample. X [1], X [2],..., X [M] p(x). What can it be used for? We can estimate the expected value E(X) This can be done by taking: E(X) «1 M Note we use «instead of =. There is some error since we are estimating E(X) based on a sample. Mÿ i=1 X [i] Luckily we can make this smaller by generating a bigger sample. We call this Monte Carlo error. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 49 / 66
50 Measuring Monte Carlo Error One way to measure Monte Carlo Error is the variance of the sample mean. ( ) ( ) 1 Mÿ Var X [i] = 1 M M M 2 Var ÿ X [i] i=1 i=1 = 1 Mÿ M 2 Var(X [i] ) i=1 = Var(X) M A sample from a Markov chain is correlated Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 50 / 66
51 Measuring Monte Carlo Error One way to measure Monte Carlo Error is the variance of the sample mean. ( ) ( ) 1 Mÿ Var X [i] = 1 M M M 2 Var ÿ X [i] i=1 i=1 = 1 Mÿ M 2 Var(X [i] ) i=1 = Var(X) M A sample from a Markov chain is correlated Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 50 / 66
52 Measuring Monte Carlo Error One way to measure Monte Carlo Error is the variance of the sample mean. ( ) ( ) 1 Mÿ Var X [i] = 1 M M M 2 Var ÿ X [i] i=1 i=1 = 1 ÿ M M 2 Var(X [i] ) + 2 ÿ M ÿ M 2 cov(x [i], X [j] ) i=1 = Var(X) M + 2 M 2 i=1 jąi Mÿ ÿ cov(x [i], X [j] ) i=1 jąi A sample from a Markov chain is correlated Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 50 / 66
53 Monte Carlo efficiency It is better to have lower correlation in the Markov chain. The efficiency of the chain can be measured using the effective sample size. The effective sample size can be computed using the function effectivesize in the R Package coda. Obtain an effective sample size for your sample (excluding burn in) where Proposal S.D. =1 Proposal S.D. =5 My answers were about 6000 and and yours should be close to that Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 51 / 66
54 Interpret Effective Sample Size What does it mean to say a Monte Carlo with a sample size of has an effective sample size (ESS) of just 6000? The sampling error of a correlated sample of is equal to the sampling error of an independent sample of Mathematically ( ) 1 Mÿ Var X [i] M i=1 = Var(X) M + 2 M 2 Mÿ Mÿ i=1 jąi cov(x [i], X [j] ) = Var(X) M eff It is a useful diagnostic for comparing two different proposal variances. A higher ESS implies a more efficient scheme. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 52 / 66
55 Non-Symmetric proposal In 1970, Hastings proposed an extension to the Metropolis Hastings algorithm. This allows for the case when q(x [old] Ñ x [new] ) q(x [new] Ñ x [old] ) (28) The only thing that changes is the acceptance probability ( α = min 1, p(x[new] )q(x [new] Ñ x [old] ) ) p(x [old] )q(x [old] Ñ x [new] ) This is called the Metropolis-Hastings algorithm (29) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 53 / 66
56 An interesting proposal Suppose we use the proposal: x new N(0, a (5/3)) (30) What is q(x [old] Ñ x [new] )? It is q(x [new] ) where q(.) is the density of a N(0, a (5/3)). Is this symmetric? No, since generally q(x [new] ) q(x [old] ) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 54 / 66
57 Metropolis Hastings Code this where p(.) is the standard t density with 5 d.f, and q(.) is normal with mean 0 and standard deviation 5/3. Inside a loop Generate x [new] N(0, a (5/3)) Set x old = x [i] and compute ( ) α = min 1, p(x[new] )q(x [old] ) (31) p(x [old] )q(x [new] ) Try it Set x [i+1] to x [new] with probability α (accept) Set x [i+1] to x [old] with probability 1 α (reject) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 55 / 66
58 Comparison The Effective Sample Size of this proposal is about much higher than the best random walk proposal. Why does it work so well? The standard t distribution with 5 df has a mean of 0 and a standard deviation of a (5/3) So the N(0, a (5/3)) is a good approximation to the standard student t with 5 df. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 56 / 66
59 Laplace Approximation Using a Taylor expansion of lnp(x) around the point a lnp(x) «lnp(a) + Blnp(x) ˇ (x a) + 1 B 2 lnp(x) ˇ Bx ˇx=a 2 Bx 2 (x a) 2 ˇx=a Let a be the point that maximises lnp(x) and let ( B 2 lnp(x) b = ˇ Bx 2 ˇx=a ) 1 (32) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 57 / 66
60 The approximation is lnp(x) «lnp(a) 1 (x a)2 2b Taking exponential of both sides ] [ (x a)2 p(x) «k ˆ exp 2b Any distribution can be approximated by a normal distribution with mean a and variance b where a and b values can be found numerically if needed. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 58 / 66
61 Exercise: Generating from skew normal The density of the skew normal is p(x) = 2φ(x)Φ(δx) (33) where φ(x) is the density of the standard normal Φ(x) is the distribution of the standard normal. Using a combination of optim, dnorm and pnorm find the Laplace approximation of the skew normal when δ = 3 Use it to generate a sample from the skew normal distribution using the Metropolis Hastings algorithm. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 59 / 66
62 Indirect method v Metropolis Hastings Some similarities are: Both require a proposal Both are more efficient when the proposal is a good approximation to the target. Both involve some form of accepting/rejecting Some differences are: Why? Indirect method produces an independent sample, MH samples are correlated. Indirect method requires p(x)/q(x) to be finite for all x. MH works better when x is a (high-dimensional vector). Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 60 / 66
63 Multiple variables Suppose we now want to sample from a bivariate distribution p(x, z) The ideas involved in this section work for more than two variables. It is possible to do a 2-dimensional random walk proposal. However as the number of variables goes up the acceptance rate becomes lower. Also the Laplace approximation does not work as well in high dimensions. Indirect methods of simulation suffer from the same problem. We need a way to break the problem down. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 61 / 66
64 Method of composition Markov chain methods, allow us to break multivariate distributions down. If it is easy to generate from p(x) then the best way is Method of composition. Generate x [i] p(x) z [i] p(z x = x [i] ) Sometimes p(x) is difficult to get Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 62 / 66
65 Gibbs Sampler If it is easy to simulate from the conditional distribution f(x z) then that can be used as a proposal What is the acceptance ratio? α = = = 1 ( 1, p(xnew, z)p(x old ) z) p(x old, z)p(x new z) ( 1, p(xnew z)p(z)p(x old ) z) p(x old z)p(z)p(x new z) Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 63 / 66
66 Gibbs Sampler This gives the Gibbs Sampler Generate x [i+1] p(x [i+1] z [i] ) Generate z [i+1] p(z [i+1] x [i+1] ) Repeat x and z can be swapped around. It works for more than two variables. Always make sure the conditioning variables are at the current state. Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 64 / 66
67 Metropolis within Gibbs Even if the individual conditional distributions are not easy to simulate from, Metropolis Hastings can be used within each Gibbs step. This works very well because it breaks down a multivariate problem into smaller univariate problems. We will practice some of these algorithms in the context of Bayesian Inference Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 65 / 66
68 Summary You should be familiar with a Markov chain You should understand this can have a stationary distribution You should have a basic understanding of the Metropolis Hastings and the special cases Random Walk Metropolis Laplace approximation Gibbs Sampler Feng Li (SAM.CUFE.EDU.CN) Statistical Computing 66 / 66
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationMarkov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017
Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationI. Bayesian econometrics
I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods
More informationProbabilistic Graphical Models Lecture 17: Markov chain Monte Carlo
Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo (and Bayesian Mixture Models) David M. Blei Columbia University October 14, 2014 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationMetropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9
Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods
More informationMarkov chain Monte Carlo methods in atmospheric remote sensing
1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationBayesian Phylogenetics:
Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes
More informationKobe University Repository : Kernel
Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof
More informationMCMC Review. MCMC Review. Gibbs Sampling. MCMC Review
MCMC Review http://jackman.stanford.edu/mcmc/icpsr99.pdf http://students.washington.edu/fkrogsta/bayes/stat538.pdf http://www.stat.berkeley.edu/users/terry/classes/s260.1998 /Week9a/week9a/week9a.html
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationMarkov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationEco517 Fall 2013 C. Sims MCMC. October 8, 2013
Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationAdvanced Statistical Modelling
Markov chain Monte Carlo (MCMC) Methods and Their Applications in Bayesian Statistics School of Technology and Business Studies/Statistics Dalarna University Borlänge, Sweden. Feb. 05, 2014. Outlines 1
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationLecture 10: Powers of Matrices, Difference Equations
Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each
More informationDoing Bayesian Integrals
ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationGenerating the Sample
STAT 80: Mathematical Statistics Monte Carlo Suppose you are given random variables X,..., X n whose joint density f (or distribution) is specified and a statistic T (X,..., X n ) whose distribution you
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Peter Beerli October 10, 2005 [this chapter is highly influenced by chapter 1 in Markov chain Monte Carlo in Practice, eds Gilks W. R. et al. Chapman and Hall/CRC, 1996] 1 Short
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationMarkov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa
Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationQuantifying Uncertainty
Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling
More informationIntroduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo
Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationAppendix 3. Markov Chain Monte Carlo and Gibbs Sampling
Appendix 3 Markov Chain Monte Carlo and Gibbs Sampling A constant theme in the development of statistics has been the search for justifications for what statisticians do Blasco (2001) Draft version 21
More informationMATH 56A SPRING 2008 STOCHASTIC PROCESSES
MATH 56A SPRING 008 STOCHASTIC PROCESSES KIYOSHI IGUSA Contents 4. Optimal Stopping Time 95 4.1. Definitions 95 4.. The basic problem 95 4.3. Solutions to basic problem 97 4.4. Cost functions 101 4.5.
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationProbability Theory and Simulation Methods
Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:
More informationConvergence Rate of Markov Chains
Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationMarkov Chain Monte Carlo, Numerical Integration
Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationMarkov Chain Monte Carlo
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationSession 5B: A worked example EGARCH model
Session 5B: A worked example EGARCH model John Geweke Bayesian Econometrics and its Applications August 7, worked example EGARCH model August 7, / 6 EGARCH Exponential generalized autoregressive conditional
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationGeneral Construction of Irreversible Kernel in Markov Chain Monte Carlo
General Construction of Irreversible Kernel in Markov Chain Monte Carlo Metropolis heat bath Suwa Todo Department of Applied Physics, The University of Tokyo Department of Physics, Boston University (from
More informationPractical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques
Practical Numerical Methods in Physics and Astronomy Lecture 5 Optimisation and Search Techniques Pat Scott Department of Physics, McGill University January 30, 2013 Slides available from http://www.physics.mcgill.ca/
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationMath 456: Mathematical Modeling. Tuesday, April 9th, 2018
Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationMarkov Chain Monte Carlo Lecture 6
Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationMarkov Chain Monte Carlo The Metropolis-Hastings Algorithm
Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability
More information