Advanced Sampling Algorithms
|
|
- Jordan Potter
- 6 years ago
- Views:
Transcription
1 + Advanced Sampling Algorithms
2 + Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Advanced Sampling Algorithms Aditya Kulkarni Tobias Bertelsen Nirandika Wanigasekara Malay Singh
3 + Introduction Mobashir Mohammad 3
4 + Sampling Algorithms 4 Monte Carlo Algorithms Gives an approximate answer with a certain probability May terminate with a failure signal Correct answer not Guaranteed Time Bounded Las Vegas Algorithms Always gives an exact answer Doesn t terminates in a definite time Answer Guaranteed Time not Bounded
5 + Monte Carlo Method 5 What are the chances of winning with a properly shuffled deck? Hard to compute because winning or losing depends on a complex procedure of reorganizing the cards Insight Why not play just a few hands, and see empirically how many do in fact we win? More generally We can approximate a probability density function using only a few samples from that density
6 + Simplest Monte Carlo Method 6 Finding the value of π Shooting Darts 1 π/4 is equal to the area of a circle of diameter 1 p 4 ò ò x 2 + y 2 <1 = dx dy 1 Integral solved with Monte Carlo Randomly select a large number of points inside the square N _ inside_ circle p» 4 * (Exact when N_total ) N _ total
7 + Monte Carlo Principle 7 Given a very large set X and a distribution p(x) over it We draw a set of N independent and identically distributed samples We can then approximate the distribution using these samples. N p N x = 1 n i=1 1 xi =x N p x We can also use these sample to compute expectations,
8 + Sampling 8 Problem: Exact inference from a probability distribution is hard Determine the number of people likely to go shopping to a Nike Store at Vivo City on a Sunday? Solution: Use random samples to get an approximate inference. How: Use a Markov Chain to get random samples!
9 + Markov Chains Hirak Sarkar 9
10 + Introduction to Markov Chain 10 Real world systems contain uncertainty and evolve over time Stochastic Processes and Markov Chains model such systems A discrete time stochastic process is a sequence of random variables X 0, X 1, and typically denoted by {X n }
11 + Components of a Stochastic 11 Process The state space of a stochastic process is the set of all values that the {X n } can take. Time: n = 1,2, Set of states is denoted by S. S = {s 1, s 2,, s k } here there are k states Clearly {X n } will take one of k values
12 + Markov Property and Markov 12 Chain Consider special class of stochastic processes that satisfy a certain property called Markov Property. Markov Property: State for X n only depends on content of X n 1 not any of X i for i < n 1 Formally, P X n = i n X 0 = i 0,, X n 1 = i n 1 ) = P X n = i n X n 1 = i n 1 ) = p(i n, i n 1 ) We call such processes as Markov Chain A matrix containing p(i n, i n 1 ) in (i n, i n 1 ) cell is called Transition Matrix.
13 + Transition Matrix 13 The one-step transition matrix for a Markov chain with states S = 0,1,2 is P p p p p p p p p p Where p ij = Pr X 1 = j X 0 = i If P is same for any point of time then it is called time homogeneous. m j=1 p i,j = 1 p i,j 0 i, j
14 + Example: Random Walk 14 Adjacency Matrix Transition Matrix 1 1 1/2 1/2
15 + 15 What is a random walk B 1 A 1 t=0 1/2 1/2 C
16 + 16 What is a random walk B 1 A 1 t=0 t=1 A 1/2 B 1 1 1/2 1/2 C 1/2 C
17 + 17 What is a random walk B 1 A 1 t=0 t=1 A 1 1/2 1/2 1 B 1/2 C 1/2 C B 1 A 1 t=2 1/2 1/2 C
18 + 18 What is a random walk B 1 A 1 t=0 t=1 A 1 1/2 1/2 1 B 1/2 C 1/2 C B 1 1/2 A 1 C t=2 1/2 B 1 1/2 A 1 C t=3 1/2
19 + 19 Probability Distributions x t i = P X t = i = probability that the surfer is at node i at time t x t+1 (i)= j P X t+1 = i X t = j) P(X t = j) = j p i,j x t (j) x t+1 = x t P= x t 1 P P= x t 2 P P P = =x 0 P t What happens when the surfer keeps walking for a long time?
20 + 20 Stationary Distribution When the surfer keeps walking for a long time When the distribution does not change anymore i.e. x T+1 = x T We denote it as π
21 + Basic Limit Theorem 21 P = P 5 = P 10 = P 20 =
22 + Example 22
23 + Classification of States 23 State j is reachable from state i if the probability to go from i to j in n > 0 steps is greater than zero (State j is reachable from state i if in the state transition diagram there is a path from i to j) A subset S of the state space S is closed if p ij = 0 for every i S and j S A state i is said to be absorbing if it is a single element closed set. A closed set S of states is irreducible if any state j S is reachable from every state i S A Markov chain is said to be irreducible if the state space S is irreducible.
24 + Example 24 Irreducible Markov Chain p 01 p 12 p 22 p 00 p 10 p Reducible Markov Chain p 01 p 12 p p 00 p 10 p 14 p 32 Absorbing State 4 p 22 p 33 Closed irreducible set
25 + Periodic and Aperiodic States 25 Suppose that the structure of the Markov Chain is such that state i is visited after a number of steps that is an integer multiple of an integer d >1. Then the state is called periodic with period d. If no such integer exists (i.e., d =1) then the state is called aperiodic. Example Periodic State d = P
26 + Stationary Distribution 26 P = π 0 = π 0 0, π 1 0 = 0.5,0.5 π 1 = π 0 P = = ( )
27 + Steady State Analysis 27 Recall that the probability of finding the MC at state i after the k th step is given by Pr X i π k k, k... i k k 0 1 An interesting question is what happens in the long run, i.e., Questions: Do these limits exists? i lim k k This is referred to as steady state or equilibrium or stationary state probability If they exist, do they converge to a legitimate probability distribution, i.e., i How do we evaluate π j, for all j. 1
28 + Steady State Analysis 28 THEOREM: In a finite irreducible aperiodic Markov chain a unique stationary state probability vector π exists such that π j > 0 and lim k π j k = π j The steady state distribution π is determined by solving π = πp and i 1 i
29 + Sampling from a Markov Chain Parvathy 29
30 + Sampling from a Markov Chain 30 Sampling create random samples from the desired probability distribution (P) Construct Markov Chain with steady state distribution π Π = P Run the Markov chain long enough On convergence, its states are approximately distributed according to π. Random sampling of Markov chain yields random samples of P But! Difficult to determine how long to run the Markov chain.
31 + Ising Model 31 Named after physicist Ernst Ising, invented by Wilhem Lenz Mathematical model of ferromagnetism Consists of discrete variables representing magnetic dipole moments of atomic spins (+1 or 1)
32 + 1D Ising Model 32 Solved by Ernst Ising N spins arranged in a ring P(s) = e βh(s) Z β inverse temperature H s energy s = x 1,, x N is one possible configuration x N x 1 Higher probability on states with lower energy
33 + 2D Ising Model 33 Spins are arranged in a lattice. Each spin interacts with 4 neighbors Simplest statistical models to show a phase transition
34 + 2D Ising Model (2) 34 G = (V, E) x i {+1, 1} - value of spin at location i S = {s 1, s 2,, s n } is the state space. H s = xi,x j E s x i s(x j ) Similar spins subtract energy Opposite spins add energy
35 + 2D Ising Model (3) 35 P(s) = e βh(s) Z β = e xi x j E s x i s(x j ) Z If β = 0 all spin configurations have same probability If β > 0 lower energy preferred
36 + Exact Inference is Hard 36 Posterior distribution over s If Y is any observed variable P s Y) = p s,y p(s,y) Intractable computation Joint probability distribution - 2 N possible combinations of s Marginal probability distribution at a site MAP estimation
37 + The Big Question 37 Given π on S = s 1, s 2,, s n distribution π simulate a random object with
38 + Methods 38 Generate random samples to estimate a quantity Samples are generated Markov-chain style Markov Chain Monte Carlo (MCMC) Propp Wilson Simulation Las Vegas variant Sandwiching Improvement on Propp Wilson
39 + MARKOV CHAIN MONTE CARLO METHOD (MCMC) YAMILET R. SERRANO LL. 39
40 + Recall 40 Given a probability distribution π on S = {s1,, sk}, how do we simulate a random object with distribution π?
41 + Intuition 41 Given a probability distribution π on S = {s1,, sk}, how do we simulate a random object with distribution π? High Dimension Space
42 + MCMC Construct an irreducible and aperiodic Markov Chain [X0,X1, ], whose stationary distribution π. 2. If we run the chain with arbitrary initial distribution then 1. The Markov Chain Convergence Theorem guarantees that the distribution of the chain at time n converges to π. 3. Hence, if we run the chain for a sufficiently long time n, then the distribution of Xn, will be very close to π. So it can be used as a sample.
43 + MCMC(2) 43 Generally, two types of MCMC algorithm Metropolis Hasting Gibbs Sampling
44 + Metropolis Hastings Algorithm 44 Original method: Metropolis, Rosenbluth, Rosenbluth, Teller and Teller (1953). Generalized by Hasting in Rediscovered by Tanner and Wong (1987) and Gelfang and Smith (1990) Is one way to implement MCMC. Nicholas Metropolis
45 + Metropolis Hastings Algorithm 45 Basic Idea GIVEN: A probability distribution π on S = {s1,, sk} GOAL : Approx. sample from π Start with a proposal distribution Q(x,y) Q(x,y) specifies transition of Markov Chain Q(x,y) plays the role of the transition matrix x: current state y: new proposal By accepting/rejecting the proposal, MH simulates a Markov Chain, whose stationary distribution is π
46 + Metropolis Hastings Algorithm 46 Algorithm GIVEN: GOAL : A probability distribution π on S = {s1,, sk} Approx. sample from π Given current sample, Draw y from the proposal distribution, Draw U ~ (0,1) and update where the acceptance probability is
47 + Metropolis Hastings Algorithm 47 The Ising Model Consider m sites around a circle. Each site i can have one of two spins xi {-1,1} The target distribution :
48 + Metropolis Hastings Algorithm The Ising Model Target Distribution: Proposal Distribution: 1. Randomly pick one out of the m spins 2. Flip its sign Acceptance probability (say i-th spin flipped):
49 + Metropolis Hastings Algorithm 49 The Ising Model Image has been taken from Montecarlo investigation of the Ising Model(2006) Tobin Fricke
50 + Disadvantages of MCMC 50 No matter how large n is taken to be in the MCMC algorithm, there will still be some discrepancy between the distribution of the output and the target distribution π. In order to make the previous error small, we need to figure out how large n needs to be.
51 + Bounds on Convergence of MCMC Aditya Kulkarni 51
52 + Seminar on probabilities 52 (Strasunbourge ) If it is difficult to obtain asymptotic bounds on the convergence time of MCMC algorithms for Ising models, use quantitative bounds Use a characteristics of Ising model MCMC algorithm to obtain quantitative bounds David Aldous
53 + What we need to ensure 53 As we go along the time We approach to the stationary distribution in a monotonically decreasing fashion. When we stop d t 0 as t The sample should follow a distribution which is not further apart from a stationary distribution by some factor ε. d t ε
54 + Total Variation Distance TV 54 Given two distributions p and q over a finite set of states S, The total variation distance TV is p q TV = 1 2 p q 1 = 1 2 x S p x q x
55 + Convergence Time 55 Let X = X 0, X 1, be a Markov Chain of a state space with stationary distribution π We define d t as the worst total variation distance at time t. d t = max x V P X t X 0 = x π TV The mixing time t is the minimum time t such that d t most ε. is at τ(ε) = min {t d t ε} Define τ = τ 1 2e = min {t d t 1 2e}
56 + Quantitative results 56 1 d t 1 2e ε 0 τ τ(ε) t
57 + Lemma 57 Consider two random walks started from state i and state j Define ρ t as the worst total variation distance between their respective probability distributions at time t a. ρ t 2d t b. d(t) is decreasing
58 + Upper bound on d(t) proof 58 From part a and the definition of τ = τ we get ρ τ e 1 1 2e = min {t d t 1 2e} Also from part b, we deduce that upper bound of ρ at a particular time t 0 gives upper bound for later times ρ t (ρ(t 0 )) n ; nt 0 t (n + 1)t 0 ρ t (ρ(t 0 )) ( t t 1) 0 Substitute τ instead of t 0 ρ t (ρ(τ)) ( t τ 1) d t exp 1 t τ, t 0
59 + Upper bound on τ(ε) proof 59 Algebraic calculations: d t exp 1 t τ log d(t) 1 t τ t τ 1 log d t 1 + log( 1 d(t) ) t τ (1 + log( 1 ε)) τ ε 2et (1 + log( 1 ε))
60 + Upper bounds 60 d t min(1, exp 1 t τ ), t 0 τ(ε) τ (1 + log 1 ε ), 0 < ε < 1 τ(ε) 2et (1 + log 1 ε ), 0 < ε < 1
61 + Entropy of initial distribution 61 Measure of randomness: ent μ = μ x log μ x x V x is initial state μ is initial probability distribution
62 + Few more lemma s Let (X t ) is a random walk associated with μ, and μ t be the distribution of X t then ent μ t t ent(μ) 2. If v is a distribution of S such that v π ε then ent v (1 ε) log S
63 + Lower bound on d(t) proof 63 From lemma 2, ent v 1 ε log S ent v log S d t 1 1 ε (1 d(t)) ent v log S d t 1 t ent μ log S
64 + Lower bound on τ(ε) proof 64 From lemma 1, ent μ t t ent(μ) t ent μ t ent(μ) From lemma 2, ent μ t (1 ε) log S t 1 ε log S ent(μ) τ(ε) 1 ε log S ent(μ)
65 + Lower bounds 65 d t 1 t ent(μ) log S τ ε 1 ε log S ent(μ)
66 + Propp Wilson Tobias Bertelsen 66
67 + An exact version of MCMC 67 Problems with MCMC A. Have accuracy error, which depends on starting state B. We must know number of iterations James Propp and David Wilson propos Coupling from the past (1996) A.k.a. the Propp-Wilson algorithm Idea: Solve problems by running chain infinitely
68 + An exact version of MCMC 68 Theoretical Runs all configurations infinitely Literarily takes time Impossible Coupling from the past Runs all configurations for finite time Might take 1000 s of years Infeasible Sandwiching Run few configurations for finite time Takes seconds Practicable
69 + Theoretical exact sampling 69 Recall the convergence theorem: We will approach the stationary distribution as the number of steps goes to infinity Intuitive approach: To sample perfectly we start a chain and run for infinity Start at t = 0, sample at t = Problem: We never get a sample Alternative approach: To sample perfectly we take a chain that have already been running for an infinite amount of time Start at t =, sample at t = 0
70 + Theoretical independence of 70 starting state Sample from a Markov chain in MCMC depends solely on Starting state X Sequence of random numbers U We want to be independent of the starting state. For a given sequence of random numbers U,, U 1 we want to ensure that the starting state X has no effect on X 0
71 + Theoretical independence of 71 starting state Collisions: For a given U,, U 1 if two Markov chains is at the same state at some t the will continue on together:
72 + Coupling from the past 72 At some finite past time t = N All past chains has already run infinitively and has coupled into one N = We want to continue that coupled chain to t = 0 But we don t know which at state they will be at t = N Run all states from N instead of
73 + Coupling from the past Let U = U 1, U 2, U 3, be the sequence of independnent uniformly random numbers 2. For N j 1,2,4,8, 1. Extend U to length n j, keeping U 1,, U nj 1 the same 2. Start one chain from each state at t = N j 3. For t from N j to zero: Simulate the chains using U t 4. If all chains has converged at S i at t = 0, return S i 5. Else repeat loop
74 + Coupling from the past 74
75 + Questions 75 Why do we double the lengths? Worst case < 4N opt steps, where N opt is the minimal N at which we can achieve convergence 2 Compare to N [1,2,3,4,5, ], O N opt steps Why do we have to use the same random numbers? Different samples might take longer or shorter to converge. We must evaluate the same sample in each iteration. The sample should only be dependent on U not the different N
76 + Using the same U 76 We have k states with the following update function S i, U = S 1 U < 1 2 S i+1 otherwise φ S k, U = S 1 U < 1 2 S k otherwise π S i = 1 2 i, π k = 2 2 k
77 + Using the same U 77 The probability of S 1 only depends on the last random number: P S 1 = P U 1 < 1 2 = 1 2 Lets assume we generate a new U for each run: U 1, U 2, U 3, n U P S 1 P S 1 acc. 1 1 U 1 2 U 2 2 2, U 1 4 U 3 3 4,, U 1 P U 1 1 < 1 2 P U 2 1 < 1 U < 1 2 P U 3 1 < 1 U < 1 U < % 75 % % 8 [U 4 8,, U 4 1 ] P U 3 1 < 1 U < 1 U < 1 U < % 2
78 + Using the same U 78 The probability of S 1 only depends on the last random number: P S 1 = P U 1 < 1 2 = 1 2 Lets instead use the same U for each run: U 1 n U P S 1 P S 1 acc. 1 1 U 1 2 U 1 1 2, U 1 4 U 1 1 4,, U 1 P U 1 1 < 1 2 P U 1 1 < 1 2 P U 1 1 < % 50 % 50 % 8 [U 1 8,, U 1 1 ] P U 1 1 < %
79 + Problem 79 In each step we update up to k chains Total execution time is O N opt k BUT k = 2 L2 in the Ising model Worse than the naïve approach O(k)
80 + Sandwiching Nirandika Wanigasekara 80
81 + Sandwiching 81 Propp Wilson algorithm Many vertices running k chains will take time Impractical for large k Choose a relatively small state space Can we still get the same results? Try sandwiching
82 + Sandwiching 82 Idea Find two chains bounding all other chains If we have such two boundary chains Check if those two chains converge Then all other chains have also converged
83 + Sandwiching 83 To come up with the boundary chains Need a way to order the states S 1 S 2 S 3 Chain in higher state does not cross a chain in lower state Results in a Markov Chain obeying certain monotonicity properties
84 + Sandwiching 84 Lets consider A fixed set of states k State space S = {1,., k} A transition matrix P 11 = P 12 = 1 2 P kk = P k,k 1 = 1 2 for i=2,..k 1, P i,i 1 = P i,i+1 = 1 2 All the other entries are 0
85 + Sandwiching 85 What is this Markov Chain doing? Take one step up and one step down the ladder, at each integer time, with probability 1 2 If at the top or the bottom(state k) It will stays where it is Ladder Walk on k vertices The stationary distribution π of this Markov Chain π i = 1 k for i = 1,., k
86 + Sandwiching 86 Propp-Wilson for this Markov Chain with Valid update function φ if u < 1 2 if u 1 2 then step down then step up negative starting times N 1, N 2, = 1, 2, 4, 8, States k = 5
87 + Sandwiching 87
88 + Sandwiching 88 Update function preserves ordering between states for all U 0, 1 and all i, j 1,., k such that i j we have φ i, U φ(j, U) It is sufficient to run only 2 chains rather than k
89 + Sandwiching 89 Are these conditions always met No, not always But, there are frequent instances where these conditions are met Especially useful when k is large Ising model is a good example for this
90 + Ising Model Malay Singh 90
91 + 2D Ising Model 91 A grid with sites numbered from 1 to L 2 where L is grid size. Each site i can have spin x i 1, +1 V is set of sites S = 1,1 V defines all possible configurations (states) Magnetisation m for a state s is m s = i s x i L 2 The energy of a state s H s = ij s x i s x j
92 + Ordering of states 92 For two states s, s we say s s if s x < s x x V Minima m = 1 Maxima m = 1 s min x = 1 x V s max x = +1 x V Hence s min s s max for all s
93 + The update function 93 We use the sequence of random numbers U n, U n 1,, U 0 We are updating the state at X n. We choose a site x in 0, L 2 uniformly. X n+1 x = +1, if U n+1 < exp 2β K + x, s K x, s exp 2β K + x, s K x, s + 1 1, otherwise
94 + Maintaining ordering 94 Ordering after update (From X n to X n+1 ) We choose the same site x to update in both chains The spin of x at X n+1 depends on update function We want to check exp 2β K + x, s K x, s exp 2β K + x, s K x, s + 1 exp 2β K + x, s K x, s exp 2β K + x, s K x, s + 1 That is equivalent to checking K + x, s K x, s K + x, s K x, s
95 + Maintaining ordering 95 As s s we have K + x, s K + x, s K x, s K x, s First equation minus second equation K + x, s K x, s K + x, s K x, s
96 + Ising Model L = 4 96 T = 3.5 and N = 512 T = 4.8 and N = 128
97 + Ising Model L = 8 97 T = 5.9 and N = 512
98 + Ising Model L = T = 5.3 and N =
99 + Summary 99 Exact sampling Markov chains monte carlo but when we converge Propp Wilson with Sandwiching to rescue
100 + Questions? 100
101 + Additional 101 Example 6.2
102 + Aditional 102 Update function
103 + Additional 103 Proof for order of between states?? Need to find
Propp-Wilson Algorithm (and sampling the Ising model)
Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)
More informationMarkov Chain Monte Carlo The Metropolis-Hastings Algorithm
Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationLecture 3: September 10
CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 3: September 10 Lecturer: Prof. Alistair Sinclair Scribes: Andrew H. Chan, Piyush Srivastava Disclaimer: These notes have not
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationConvergence Rate of Markov Chains
Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution
More information7.1 Coupling from the Past
Georgia Tech Fall 2006 Markov Chain Monte Carlo Methods Lecture 7: September 12, 2006 Coupling from the Past Eric Vigoda 7.1 Coupling from the Past 7.1.1 Introduction We saw in the last lecture how Markov
More informationMarkov Processes Hamid R. Rabiee
Markov Processes Hamid R. Rabiee Overview Markov Property Markov Chains Definition Stationary Property Paths in Markov Chains Classification of States Steady States in MCs. 2 Markov Property A discrete
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More information25.1 Markov Chain Monte Carlo (MCMC)
CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Approx counting/sampling, MCMC methods Date: 4/4/07 The previous lecture showed that, for self-reducible problems,
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationMarkov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More information1 Probabilities. 1.1 Basics 1 PROBABILITIES
1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationMath 456: Mathematical Modeling. Tuesday, April 9th, 2018
Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationIntroduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo
Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10
More informationThe Markov Chain Monte Carlo Method
The Markov Chain Monte Carlo Method Idea: define an ergodic Markov chain whose stationary distribution is the desired probability distribution. Let X 0, X 1, X 2,..., X n be the run of the chain. The Markov
More informationCoupling. 2/3/2010 and 2/5/2010
Coupling 2/3/2010 and 2/5/2010 1 Introduction Consider the move to middle shuffle where a card from the top is placed uniformly at random at a position in the deck. It is easy to see that this Markov Chain
More informationMonte Carlo Simulation of the Ising Model. Abstract
Monte Carlo Simulation of the Ising Model Saryu Jindal 1 1 Department of Chemical Engineering and Material Sciences, University of California, Davis, CA 95616 (Dated: June 9, 2007) Abstract This paper
More informationExample: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected
4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMCMC notes by Mark Holder
MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:
More informationINTRODUCTION TO MARKOV CHAIN MONTE CARLO
INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationCS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash
CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness
More informationPerfect simulation for image analysis
Perfect simulation for image analysis Mark Huber Fletcher Jones Foundation Associate Professor of Mathematics and Statistics and George R. Roberts Fellow Mathematical Sciences Claremont McKenna College
More informationMarkov Chains, Random Walks on Graphs, and the Laplacian
Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer
More informationMarkov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains
Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time
More informationOn Markov Chain Monte Carlo
MCMC 0 On Markov Chain Monte Carlo Yevgeniy Kovchegov Oregon State University MCMC 1 Metropolis-Hastings algorithm. Goal: simulating an Ω-valued random variable distributed according to a given probability
More informationMonte Carlo importance sampling and Markov chain
Monte Carlo importance sampling and Markov chain If a configuration in phase space is denoted by X, the probability for configuration according to Boltzman is ρ(x) e βe(x) β = 1 T (1) How to sample over
More informationMarkov Chains Handout for Stat 110
Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of
More informationNumerical methods for lattice field theory
Numerical methods for lattice field theory Mike Peardon Trinity College Dublin August 9, 2007 Mike Peardon (Trinity College Dublin) Numerical methods for lattice field theory August 9, 2007 1 / 24 Numerical
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationMarkov Processes. Stochastic process. Markov process
Markov Processes Stochastic process movement through a series of well-defined states in a way that involves some element of randomness for our purposes, states are microstates in the governing ensemble
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationNumerical Analysis of 2-D Ising Model. Ishita Agarwal Masters in Physics (University of Bonn) 17 th March 2011
Numerical Analysis of 2-D Ising Model By Ishita Agarwal Masters in Physics (University of Bonn) 17 th March 2011 Contents Abstract Acknowledgment Introduction Computational techniques Numerical Analysis
More informationLecture 5: Random Walks and Markov Chain
Spectral Graph Theory and Applications WS 20/202 Lecture 5: Random Walks and Markov Chain Lecturer: Thomas Sauerwald & He Sun Introduction to Markov Chains Definition 5.. A sequence of random variables
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods
Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationSimulated Annealing for Constrained Global Optimization
Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationMarkov Chain Monte Carlo. Simulated Annealing.
Aula 10. Simulated Annealing. 0 Markov Chain Monte Carlo. Simulated Annealing. Anatoli Iambartsev IME-USP Aula 10. Simulated Annealing. 1 [RC] Stochastic search. General iterative formula for optimizing
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationMATH 56A: STOCHASTIC PROCESSES CHAPTER 1
MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter
More informationMarkov Chain Monte Carlo Lecture 6
Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationBayesian networks: approximate inference
Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationOnline Social Networks and Media. Link Analysis and Web Search
Online Social Networks and Media Link Analysis and Web Search How to Organize the Web First try: Human curated Web directories Yahoo, DMOZ, LookSmart How to organize the web Second try: Web Search Information
More informationMarkov Chain Monte Carlo
1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior
More informationMetropolis Monte Carlo simulation of the Ising Model
Metropolis Monte Carlo simulation of the Ising Model Krishna Shrinivas (CH10B026) Swaroop Ramaswamy (CH10B068) May 10, 2013 Modelling and Simulation of Particulate Processes (CH5012) Introduction The Ising
More informationMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods p. /36 Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Markov Chain Monte Carlo Methods p. 2/36 Markov Chains
More informationOn random walks. i=1 U i, where x Z is the starting point of
On random walks Random walk in dimension. Let S n = x+ n i= U i, where x Z is the starting point of the random walk, and the U i s are IID with P(U i = +) = P(U n = ) = /2.. Let N be fixed (goal you want
More informationMarkov and Gibbs Random Fields
Markov and Gibbs Random Fields Bruno Galerne bruno.galerne@parisdescartes.fr MAP5, Université Paris Descartes Master MVA Cours Méthodes stochastiques pour l analyse d images Lundi 6 mars 2017 Outline The
More informationDiscrete solid-on-solid models
Discrete solid-on-solid models University of Alberta 2018 COSy, University of Manitoba - June 7 Discrete processes, stochastic PDEs, deterministic PDEs Table: Deterministic PDEs Heat-diffusion equation
More informationDefinition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.
Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationQuantifying Uncertainty
Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling
More informationLect4: Exact Sampling Techniques and MCMC Convergence Analysis
Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions
More informationThe parallel replica method for Markov chains
The parallel replica method for Markov chains David Aristoff (joint work with T Lelièvre and G Simpson) Colorado State University March 2015 D Aristoff (Colorado State University) March 2015 1 / 29 Introduction
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationCh5. Markov Chain Monte Carlo
ST4231, Semester I, 2003-2004 Ch5. Markov Chain Monte Carlo In general, it is very difficult to simulate the value of a random vector X whose component random variables are dependent. In this chapter we
More informationSession 3A: Markov chain Monte Carlo (MCMC)
Session 3A: Markov chain Monte Carlo (MCMC) John Geweke Bayesian Econometrics and its Applications August 15, 2012 ohn Geweke Bayesian Econometrics and its Session Applications 3A: Markov () chain Monte
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationLecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321
Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process
More informationAny live cell with less than 2 live neighbours dies. Any live cell with 2 or 3 live neighbours lives on to the next step.
2. Cellular automata, and the SIRS model In this Section we consider an important set of models used in computer simulations, which are called cellular automata (these are very similar to the so-called
More informationMCMC and Gibbs Sampling. Sargur Srihari
MCMC and Gibbs Sampling Sargur srihari@cedar.buffalo.edu 1 Topics 1. Markov Chain Monte Carlo 2. Markov Chains 3. Gibbs Sampling 4. Basic Metropolis Algorithm 5. Metropolis-Hastings Algorithm 6. Slice
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationPowerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.
Markov Chains Markov chains: 2SAT: Powerful tool for sampling from complicated distributions rely only on local moves to explore state space. Many use Markov chains to model events that arise in nature.
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationCS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions
CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions Instructor: Erik Sudderth Brown University Computer Science April 14, 215 Review: Discrete Markov Chains Some
More informationMonte Caro simulations
Monte Caro simulations Monte Carlo methods - based on random numbers Stanislav Ulam s terminology - his uncle frequented the Casino in Monte Carlo Random (pseudo random) number generator on the computer
More informationLIMITING PROBABILITY TRANSITION MATRIX OF A CONDENSED FIBONACCI TREE
International Journal of Applied Mathematics Volume 31 No. 18, 41-49 ISSN: 1311-178 (printed version); ISSN: 1314-86 (on-line version) doi: http://dx.doi.org/1.173/ijam.v31i.6 LIMITING PROBABILITY TRANSITION
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling
More informationMarkov Chains. Andreas Klappenecker by Andreas Klappenecker. All rights reserved. Texas A&M University
Markov Chains Andreas Klappenecker Texas A&M University 208 by Andreas Klappenecker. All rights reserved. / 58 Stochastic Processes A stochastic process X tx ptq: t P T u is a collection of random variables.
More informationRandom Walks A&T and F&S 3.1.2
Random Walks A&T 110-123 and F&S 3.1.2 As we explained last time, it is very difficult to sample directly a general probability distribution. - If we sample from another distribution, the overlap will
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More information