A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Size: px
Start display at page:

Download "A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring"

Transcription

1 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 25 Lecture 25:! Markov Processes and Markov Chain Monte Carlo!! Chapter 29 of Mackay (Monte Carlo Methods)! Chapter 2 in Gregory (MCMC)! An Introduction to MCMC for Machine Learning (Andrieu et al. 23, Machine Learning, 5, 5! Genetic Algorithms: Principles of Natural Selection Applied to Computation (Stephanie Forrest, Science 993, 26, 872)!!!!

2 Markov Processes Markov processes are used for modeling as well as in statistical inference problems.! Markov processes are generally n th order:! The current state of a system may depend on n previous states! Most applications consider st order processes! Hidden Markov processes:! A physical system may involve transitions between discrete states, but observables my reflect those states only indirectly (e.g. measurement noise, other physics, etc.)!

3 Markov Chains and Markov Processes Definitions: A Markov process has future samples determined only by the present state and by a transition probability from the present state to a future state. A Markov chain is one that has a countable number of states. Transitions between states are described by an n n stochastic matrix Q with elements q ij comprising the probabilities for changing in a single time step from state s i to state s j with i, j =,...,n. The state probability vector P has elements comprising the ensemble probability of finding the system in each state. E.g. for a three-state system: States = {s,s 2,,s n }, Q = q q 2 q 3 q 2 q 22 q 23 q 3 q 32 q 33. Normalization across a row is j q ij =since the system must be in some state at any time. In a single time step the probability for staying in the i th state is the metastability q ii and the probability for residing in that state for a time T is proportional to q T ii.

4 Example of a two-state Markov process States = {s,s 2 }, Q = q q 2 q 2 q 22. So Q 2 = q q 2 q 2 q 22 q q 2 q 2 q 22 = q 2 + q 2 q 2 q q 2 + q 2 q 22 q 2 q + q 22 q 2 q 2 q 2 + q 2 22 We want lim t! Qt. This gets messy very quickly even though there are only two independent quantitis, since q 2 = q and q 2 = q 22. But it can be shown that and Q = p p 2 p p 2 where p = T T + T 2 p 2 = T 2 T + T 2 T =( q ) and T 2 =( q 22 ). Thus the transition probabilities q,q 22 determine both the mean lifetime of each state T and T 2 and the probabilities p and p 2 of finding the process in each state. 5

5 Two-state Markov Processes

6 The probability density function (PDF) for the duration of a given state is therefore a geometric series that sums to f T (T )=Ti Ti T, T =, 2,, () with mean and rms values T i =( q ii ), T i /T i = q ii. (2) Asymptotic behavior as the number of steps : The transition matrix after t steps is Q t. Under the reasonable assumptions that all elements of Q are non-negative and that all states are accessible in a finite number of steps, Q t converges to a steady-state form Q as t that has identical rows. Each row of Q is equal to the state probability vector P, the elements of which are the probabilities that a given time sample is in a particular state. P also equals the normalized left eigenvector of Q that has unity eigenvalue, i.e. PQ = P (e.g. Papoulis). For P to exist, the determinant det(q I) = (where I is the identity matrix), but this is automatically satisfied for a stochastic matrix corresponding to a stationary process. Convergence of Q t to a matrix with identical rows implies that the transition probabilities trend to those appropriate for an i.i.d. process when the time step t is much larger than the mean lifetimes T i of any of the states. For a two-state system P has elements p =( q 22 )/(2 q q 22 ) and p 2 = P. 2

7 Utility of Markov processes:. Modeling: Many processes in the lab and in nature are consistent with being Markov chains. The key elements are a set of discrete states and transitions that are random but are according to a transition matrix. 2. Sampling: A Markov chain can define a trajectory in the relevant space which can be used to randomly but efficiently sample the space. The key aspect of Markov Chain Monte Carlo is that the trajectory conforms statistically to the asymptotic form of the transition matrix. 3

8 First order Markov processes: exponential PDFs for state durations Pure two-state processes with different transition probabilities Two-states with a periodic driving function à quasi-periodic state switching

9

10 State Changes in Pulsars B yr of state changes (Young et al. 23) Kramer et al. 26 State durations are widely but NOT exponentially distributed A strictly periodic forcing function (e.g. orbit) can produce quasi-periodic state changes Stochastic resonance model can produce similar histograms

11 Statistics are nice but what are the physics? Effective potential of a two state system State changes = stochastic jumps between wells A pulsar magnetosphere + accelerator is essentially a diode circuit with a return current Recent models (Liu, Spitkovsky, Timokhin, +) incorporate disks for the return current Stochastic resonance is from periodic modulation of the potential Markov switching and stochastic resonance are seen in laboratory diode circuits Pulsars more complicated because they are 2D circuits Periodic forcing in the equatorial disk can drive SR

12 Article pubs.acs.org/jctc Identifying Metastable States of Folding Proteins Abhinav Jain and Gerhard Stock* Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 794 Freiburg, Germany *S Supporting Information ABSTRACT: Recent molecular dynamics simulations of biopolymers have shown that in many cases the global features of the free energy landscape can be characterized in terms of the metastable conformational states of the system. To identify these states, a conceptionally and computationally simple approach is proposed. It consists of (i) an initial preprocessing via principal component analysis to reduce the dimensionality of the data, followed by k-means clustering to generate up to 4 microstates, (ii) the most probable path algorithm to identify the metastable states of the system, and (iii) boundary corrections of these states via the introduction of cluster cores in order to obtain the correct dynamics. By adopting two well-studied model problems, hepta-alanine and the villin headpiece protein, the potential and the performance of the approach are demonstrated.. INTRODUCTION While molecular dynamics (MD) simulations account for the structure and dynamics of biomolecules in microscopic detail, they generate huge amounts of data. To extract the essential information and reduce the complex and highly correlated biomolecular motion from 3N atomic coordinates to a few collective degrees of freedom, dimensionality reduction methods such as principal component analysis (PCA) are commonly employed. 5 The resulting low-dimensional representation of the dynamics can then be used to construct the free energy landscape ΔG(V) = k B T ln P(V), where P is the probability distribution of the molecular system along the principal components V = {V,V 2,...}. Characterized by its minima (which represent the metastable conformational states of the systems) and its barriers (which connect these states), the energy landscape allows us to account for the pathways and their kinetics occurring in a biomolecular process. 6 8 Recent simulations of peptides, proteins, and RNA have shown that in many cases the free energy landscape can be well characterized in terms of metastable conformational states. 9 2 As an example, Figure A shows a two-dimensional free energy landscape of hepta-alanine 3 (Ala 7 ) obtained from an 8 ns MD simulation with subsequent PCA of the ϕ, ψ backbone dihedral angles (see section 3). The purple circles on the contour plot readily indicate about 3 well-defined minima (or basins) of the energy surface. They correspond to metastable conformational states, which can be employed to construct a transition network of the dynamics of the system The network can be analyzed to reveal the relevant pathways of the considered process, or to discuss general features of the system such as the topology (i.e., a hierarchical structure) of the energy landscape and network properties such as scale-freeness. Also, in protein folding, metastable states have emerged as a new paradigm. 9, Augmenting the funnel picture of folding, the presence of thermally populated metastable states may result in an ensemble of (rather than one or a few) folding pathways. 9 Moreover, they can result in kinetic traps, which may considerably extend the average folding time. As an example, Figure B shows the free energy landscape of the villin headpiece subdomain, obtained from a PCA of extensive folding trajectories by Pande and co-workers 33 (see section 4). Due to the high dimensionality of the energy landscape, the two-dimensional projection only vaguely indicates the multiple minima of the protein. Although energy landscapes as in Figure appear to easily provide the location of the energy minima, in general it turns out that metastable states are surprisingly difficult to identify, even for a seemingly simple system like Ala 7. To partition the conformational space into clusters of data points representing the states, one may use either geometric clustering methods such as k-means, 38 which require only data in a metric space, or kinetic clustering methods, which additionally require dynamical information on the process While geometrical methods are fast and easy to use, they show several well-known flaws. For example, since they usually require one to fix the number of clusters k beforehand, it easily happens that one combines two separate states into one (if k is chosen too small) or cuts one state into two (if k is chosen too large). Another problem is the appropriate definition of the border between two clusters. From a dynamical point of view, the correct border is clearly located at the top of the energy barrier between the two states. Using exclusively geometrical criteria, however, the middle between the two cluster centers appears as an obvious choice, see Figure 2A. As a consequence, conformational fluctuations in a single minimum of the energy surface may erroneously be taken as transitions to another energy minimum, see Figure 2B. The same problem may occur for systems with low energy barrier heights, say, ΔG B 3k B T. Kinetic cluster algorithms may avoid these problems by using the dynamical information provided by the time evolution of the MD trajectory In a first step, the conformational space is partitioned into disjoint microstates, which can be obtained, e.g., by geometrical clustering (see section 2.). Employing these microstates, we calculate the transition matrix {T mn } from the MD trajectory, where T mn represents the probability that Special Issue: Wilfred F. van Gunsteren Festschrift Received: January 3, 22 XXXX American Chemical Society A dx.doi.org/.2/ct377q J. Chem. Theory Comput. XXXX, XXX, XXX XXX

13 Journal of Chemical Theory and Computation Article Figure 2. Common problems in the identification of metastable conformational states, illustrated for a two-state model, which is represented by a schematic free energy curve along some reaction coordinate r. (A) Although the top of the energy barrier between the two states clearly represents the correct border, geometrical clustering methods may rather choose the geometrical middle between the two cluster centers. (B) Typical time evolution of a MD trajectory along r for the two-state model and the corresponding probability distribution P(r). Low barrier heights or an inaccurate definition of the separating barrier may cause intrastate fluctuations to be mistaken as interstate transitions. The overlapping region of the two states is indicated by Δr. The introduction of cluster cores (shaded areas) can correct for this. Figure. Free energy landscape (in units of k B T) of (A) hepta-alanine and (B) the villin headpiece as a function of the first two principal components. The metastable conformational states of the system show up as minima of the energy surface (see Tables and 2 for the labeling). analyzing the eigenfunctions of the transition matrix,4 or by employing steepest-decent-type algorithms. 22,23,39,4 The choice of the method may depend to a large extent on the application in mind. For example, an important purpose of kinetic clustering is the construction of discrete Markov state

14 Properties of Markov processes relevant to MCMC. The state probability vector evolves as P t = P t Q, where Q is the transition matrix. 2. This implies P t = P Q t. 3. As t! Q t! Q where Q has equal rows, each equal to the asymptotic state probability vector, i.e. the ensemble probabilities for each state, which we write as P (rather than P. 4. The eigenvectors of the single-step transition matrix Q include one that equals the state probability vector; the associated eigenvalue is unity (see Papoulis). 4

15 Numerical example of a 3x3 Matrix (Python output) Mean state durations: T, T2, T3 = Q = [[.5..4 ] [ ] [ ]] Qˆ2 = [[ ] [ ] [ ]] Qˆ = [[ ] [ ] [ ]] Qˆ5 = [[ ] [ ] [ ]] Qˆ = [[ ] [ ] [ ]] Qˆ = [[ ] [ ] [ ]] Convergence to equal rows = state probability vector

16 Numerical example of a 3x3 Matrix (Python output) Qˆ = [[ ] [ ] [ ]] Convergence to equal rows = state probability vector Eigenvalue problem: P = state probability vector (row vector) Q = transition matrix PQ = P à P(Q-I) = Eigenvector that has unit eigenvalue is equal to P eigenvalues = [ ] Row 2 = eigenvector with eigenvalue = eigenvectors = [[ ] [ ] [ ]] Normalize row 2 à state probability vector P State probabilities from Qˆ = State probabilities from eigenvector = Same

17 MCMC

18 Markov Chain Monte Carlo Monte Carlo methods: Various statistical calculations are done by using random samples that represent the relevant domain. Stated generally, an integral I = dx g(x) can be approximated as a sum over samples x j,j =,,n I n n g(x j ). j= For a simple domain (e.g. D, 2D) the samples over a uniform grid can be used. However for a high number of dimensions and where the full extent of the function is not known, a more intelligently selected set of samples may yield faster convergence. The error in the estimate of I is where 2 g = g 2 (x j ) n j Î g n, g 2 (x j n j 2 = g 2 (x j ) I 2 n j

19 Sampling methods: A common problem is the sampling of a posterior PDF in a Bayesian analysis with high dimensionality. In general it is much easier to get the shape of the PDF than it is to get the normalization because the latter requires integration over the full parameter space. Also, even if the normalization were known, sampling in multiple dimensions is difficult. Consider sampling from a function P (x) that could be, for example, a posterior PDF where x is a vector of parameters. If we don t know the normalization, we could write P (x) =ZP (x) where P is the unnormalized function. Uniform sampling: sample randomly but with uniform probability over each dimension of the parameter space. This is highly inefficient if probability is concentrated in islands in parameter space. Importance Sampling: samples are drawn from a different function Q(x) whose support covers that of P (x). Q is chosen to be a function that is simpler to draw samples from. In the desired summation, samples are then weighted according to X w j = P (x) Q(x), I j w j g(x j ) X. w j j 2

20 Rejection sampling: A proposal density Q(x) is used that is required to be larger than P (x) for all x. First a random x j is generated. Then a uniform number u 2 [, ] is generated. If u<p (x j )/Q(x j ), then the sample is accepted. Otherwise it is rejected. See Figure 29.8 of MacKay. Metropolis-Hasting (MH) method: Unlike the rejection method, the MH method chooses a new sample based on the current value of x. I.e. a sequence of x j values is viewed as a time sequence of a Markov process or chain. The proposal density depends on the current state and is in fact related to the transition matrix of a Markov process. The MH algorithm exploits the fact that a Markov process has a state probability vector that converges to a stable form if the process satisfies two conditions: () that it not be periodic; and (2) that all states are accessible from any other state. If these are satisfied, the Markov process will have stationary statistics. The trick and beauty of the MH algorithm is that a well chosen transition matrix will allow the Markov process to converge to a state probability vector equal to that of the PDF P (x) even if the normalization of P (x) is not known. In this context, P (x) is called the target density. The MH algorithm provides two things: () A time sequence of samples drawn from the target density P (x); (2) The PDF P (x) itself. 3

21 P (x) Q (x) φ(x) x Figure Functions involved in importance sampling. We wish to estimate the expectation of φ(x) under P (x) P (x). We can generate samples from the simpler distribution Q(x) Q (x). We can evaluate Q and P at any point. From MacKay

22 (a) P (x) cq (x) (b) P (x) cq (x) u x x x Figure Rejection sampling. (a) The functions involved in rejection sampling. We desire samples from P (x) P (x). We are able to draw samples from Q(x) Q (x), and we know a value c such that c Q (x) > P (x) for all x. (b) A point (x, u) is generated at random in the lightly shaded area under the curve c Q (x). If this point also lies below P (x) then it is accepted. From MacKay

23 Detailed balance and the choice of acceptance probability The Metropolis-Hastings algorithm hinges on three requirements for a Markov chain:. All states are accessible from any other state in a finite number of steps. This is often called irreducibility because if some states are not accessible, the chain could be reduced in size. 2. The chain must not be periodic. If it were, it could get stuck in a limit cycle. 3. The chain asymptotes to a state probability vector that is stationary. It does not depend on time (as in our usual definition of stationarity). This also means that the asymptotic PDF is equal to the target PDF (e.g. the posterior PDF of Bayesian inference). The target PDF is a left eigenvector of the transition matrix Q with unit eigenvector: PQ= P. 4. For this to be true, detailed balance must hold, meaning that transitions between any two possible values of the chain are equiprobable. 9

24 The eigenvalue equation can be written P (x) = X x P (x )q(x x ) where P (x) = State probability vector or target PDF; often written as (x). Gregory writes this as P (X t D, I) for Bayesian inference contexts. q(x x )= Transition probability between states x and x. Not to be confused with the proposal density. Gregory writes this as p(x t X t+ ). Satisfying the eigenvalue equation requires that This can be demonstrated explicitly: P (x )q(x x )=P (x)q(x x). (*) P (x) = X x P (x )q(x x ) substitute using equation (*) = X x P (x)q(x x) factor out P (x) = P (x) X x q(x x) sum of destination probabilities = P (x).

25 It is useful to separate the transition probability into two terms, one for the probability of moving to a new state; the other to stay in the same state: These satisfy so that q(x x )= q(x x ) +r(x {z } ) x,x. q(x x)= X q(x x )== X x x X x q(x x )+ X r(x ) x,x x {z } =r(x ) q(x x )= r(x ). Using this separation it can be shown that detailed balance still holds. 3

26 Metropolis Algorithm a = acceptance probability X t+ -a = rejection probability Current state X t Some other state X t+ Choose a such that The probabilities of reaching different values of X are given by the target PDF The target PDF is reached asymptotically at a rate that depends on the proposal PDF used to generate trial values of X t+. Detailed balance is achieved (as many transitions out of as into a given state) which also means that the Markov sequence is time reversible.

27 Determining the acceptance probability: On previous pages we used the true transition matrix q(x x ) that defines the Markov chain and that has the target PDF as the eigen-pdf. For MCMC problems, we are free to choose any transition matrix we like, but its performance may or may not be suitable for a particular application. As Gregory says, finding an ideal proposal distribution is an art. So let a candidate transition matrix be Q(x x ) that is normalized in the usual way: X Q(x x )=. Generally Q will not satisfy detailed balance for the target PDF: x P (x )Q(x x ) 6= P (x)q(x x). We fix this by putting in a fudge factor a(x x ): or P (x )Q(x x )a(x x )=P (x)q(x x) a(x x )= P (x)q(x x) P (x )Q(x x ). We don t want the factor to exceed unity, however, so we write apple a(x x )=min, P (x)q(x x) P (x )Q(x x ). 4

28 MCMC exploits this convergence to the ensemble state probabilities. The simplest form of the algorithm:. Choose a proposal density Q(y, x t ) that will be used to determine the value of x t+. Suppose that this proposal density is symmetric in its arguments. 2. Generate a value y from the proposal density. 3. Calculate the test ratio a = P (y) P (x t ). The test ratio is the acceptance probability for the candidate sample y. 4. Choose a random number u [, ]. 5. If a accept the sample and set x t+ = y. 6. If a< accept y if u a and set x t+ = y. 7. Otherwise set x t+ = x t (i.e. the new value equals the previous value). 8. Each time step has a value. 9. The sampling steers the time sequence favorably toward regions of higher probability but allows the trajectory to move to regions of low probability.. Samples are correlated as with a random walk type process.. The burn-in time corresponds to the initial, transient portion of the time series x t that it takes the Markov process to converge. Often the autocorrelation function of the time sequence is used to diagnose the time series. 7

29 Q(x; x () ) P (x) x () x Q(x; x (2) ) P (x) x (2) Figure 29.. Metropolis Hastings method in one dimension. The proposal distribution Q(x ; x) is here shown as having a shape that changes as x changes, though this is not typical of the proposal densities used in practice. x From MacKay

30 For general, possibly asymmetric forms for the transition matrix, the test ratio is a = P (y)q(x t,y) P (x t )Q(y, x t. It reduces to the previous form when Q is symmetric in its arguments. This form preserves detailed balance of the Markov process (meaning that statistically the same results are gotten under time reversal) that is required in order for the state probability vector to converge to the desired target PDF. A system in thermal equilibrium has as many particles leaving a state as are entering. By analogy, a Markov process that has stationary statistics must also satisfy detailed balance. With the acceptance probability defined above, the Markov chain will satisfy detailed balance. See Gregory, Section 2.3 for a proof. Also the paper by Andrieu et al. on the course web page. 8

31 Machine Learning, 5, 5 43, 23 c 23 Kluwer Academic Publishers. Manufactured in The Netherlands. An Introduction to MCMC for Machine Learning CHRISTOPHE ANDRIEU C.Andrieu@bristol.ac.uk Department of Mathematics, Statistics Group, University of Bristol, University Walk, Bristol BS8 TW, UK NANDO DE FREITAS nando@cs.ubc.ca Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC V6T Z4, Canada ARNAUD DOUCET doucet@ee.mu.oz.au Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, Victoria 352, Australia MICHAEL I. JORDAN jordan@cs.berkeley.edu Departments of Computer Science and Statistics, University of California at Berkeley, 387 Soda Hall, Berkeley, CA , USA Abstract. This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons. Keywords: Markov chain Monte Carlo, MCMC, sampling, stochastic algorithms. Introduction A recent survey places the Metropolis algorithm among the ten algorithms that have had the greatest influence on the development and practice of science and engineering in the 2th century (Beichl & Sullivan, 2). This algorithm is an instance of a large class of sampling algorithms, known as Markov chain Monte Carlo (MCMC). These algorithms have played a significant role in statistics, econometrics, physics and computing science over the last two decades. There are several high-dimensional problems, such as computing the volume of a convex body in d dimensions, for which MCMC simulation is the only known general approach for providing a solution within a reasonable time (polynomial in d) (Dyer, Frieze, & Kannan, 99; Jerrum & Sinclair, 996). While convalescing from an illness in 946, Stan Ulam was playing solitaire. It, then, occurred to him to try to compute the chances that a particular solitaire laid out with 52 cards would come out successfully (Eckhard, 987). After attempting exhaustive combinatorial calculations, he decided to go for the more practical approach of laying out several solitaires at random and then observing and counting the number of successful plays. This idea of selecting a statistical sample to approximate a hard combinatorial problem by a much simpler problem is at the heart of modern Monte Carlo simulation.

32 6 C. ANDRIEU ET AL i=. i=5.5.5 Figure 5. Metropolis-Hastings algorithm i=. i= Figure 6. Target distribution and histogram of the MCMC samples at different iteration points. The MH algorithm is very simple, but it requires careful design of the proposal distribution q(x x). In subsequent sections, we will see that many MCMC algorithms arise by considering specific choices of this distribution. In general, it is possible to use suboptimal inference and learning algorithms to generate data-driven proposal distributions. The transition kernel for the MH algorithm is K MH ( x (i+) x (i)) = q ( x (i+) x (i)) A ( x (i), x (i+)) + δ x (i)( x (i+) ) r ( x (i)),

33 Toy examples of MCMC using Gaussian target and proposal PDFs The target PDF is N(µ, 2 ). For a proposal PDF we use N(µ p, 2 p) that is wide enough so that values are generated that overlap with the target PDF. So use µ p = and p =3 µ /2. In practice, of course, we would not know the parameters of the target PDF (otherwise what would be the point of doing MCMC?) and we might not know its support in parameter space. Experimentation may be required to ensure that the parameter space is adequately sampled. Plots:

34 Plots: Histograms of MC points x t,t=,,n for different N and different µ and. Autocovariance functions of x t the MC time series. ˆµx for single realizations that show the correlation time for Lessons: the more that the target and proposal PDFs differ, the longer it takes for the time series to show stationary statistics that conform to the target PDF. The burn-in time is thus longer in such cases because it is related to the autocorrelation time. Example time series are shown for two of the cases that illustrate the burn-in time and the correlation time.

35 !! Histograms Demonstrate how the distribution of MC points trends to the target PDF! Target PDF = Gaussian with non-zero mean! Proposal PDF = N(, σ 2 ) with σ wide enough to span the target PDF!

36 Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ =.38 =.29 ˆµ =.4 ˆ =.95 Gaussian: µ =.38 =.29 ˆµ =.36 ˆ =.9 5 N=8 4 N=32 Proposal PDF Target PDF Gaussian: µ =.38 =.29 ˆµ =.24 ˆ =.36 N= Gaussian: µ =.38 =.29 ˆµ =.37 ˆ =.22 N= X Gaussian: µ =.38 =.29 ˆµ =.8 ˆ =.34 N= Gaussian: µ =.38 =.29 ˆµ =.3 ˆ =.25 N= Counts 2 MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF

37 Target µ, σ Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ =.38 =.29 ˆµ =.4 ˆ =.95 Gaussian: µ =.38 =.29 ˆµ =.36 ˆ =.9 5 N=8 4 N=32 Proposal PDF Target PDF Gaussian: µ =.38 =.29 ˆµ =.24 ˆ =.36 N= Gaussian: µ =.38 =.29 ˆµ =.37 ˆ =.22 N= X Gaussian: µ =.38 =.29 ˆµ =.8 ˆ =.34 N= Gaussian: µ =.38 =.29 ˆµ =.3 ˆ =.25 N= Counts 2 MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF

38 Target µ, σ Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ =.38 =.29 ˆµ =.4 ˆ =.95 Gaussian: µ =.38 =.29 ˆµ =.36 ˆ =.9 5 N=8 4 N=32 Proposal PDF Target PDF Gaussian: µ =.38 =.29 ˆµ =.24 ˆ =.36 N= Gaussian: µ =.38 =.29 ˆµ =.37 ˆ =.22 N= µ, σ from MC values X Gaussian: µ =.38 =.29 ˆµ =.8 ˆ =.34 N= Gaussian: µ =.38 =.29 ˆµ =.3 ˆ =.25 N= Counts 2 MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF

39 ! Four cases with different target PDFs! Even for target PDFs with large means, we obtain convergence.!

40 Counts Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ =.38 =.29 ˆµ =.4 ˆ = Gaussian: µ =.38 =.29 ˆµ =.24 ˆ = Gaussian: µ =.38 =.29 ˆµ =.37 ˆ = Gaussian: µ = -.99 =.33 ˆµ = -.48 ˆ = Gaussian: µ = -.99 =.33 ˆµ = -.8 ˆ = X Gaussian: µ =.38 =.29 ˆµ =.36 ˆ = Gaussian: µ =.38 =.29 ˆµ =.8 ˆ = Gaussian: µ =.38 =.29 ˆµ =.3 ˆ = Note only 2 states for st 8 MC samples 5 MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ = -.99 =.33 ˆµ = -.95 ˆ = Gaussian: µ = -.99 =.33 ˆµ = -.89 ˆ = Gaussian: µ = -.99 =.33 ˆµ = -. ˆ = X Gaussian: µ = -.99 =.33 ˆµ = -.99 ˆ = Counts Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ =.38 =.67 ˆµ =.8 ˆ = Gaussian: µ =.38 =.67 ˆµ =.9 ˆ = Gaussian: µ =.38 =.67 ˆµ =.36 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -.84 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -.84 ˆ = X Gaussian: µ =.38 =.67 ˆµ =.53 ˆ = Gaussian: µ =.38 =.67 ˆµ =.42 ˆ = Gaussian: µ =.38 =.67 ˆµ =.36 ˆ = MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ = -.99 =.3 ˆµ = -.88 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -.98 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -. ˆ =.5 Gaussian: µ = -.99 =.3 ˆµ = -.99 ˆ = X

41 Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ = -.99 =.5 ˆµ = -.45 ˆ =.77 Narrow target PDF Gaussian: µ = -.99 =.5 ˆµ = -.9 ˆ = Gaussian: µ = -.99 =.5 ˆµ = -. ˆ = X Gaussian: µ = -.99 =.5 ˆµ = -.24 ˆ = Gaussian: µ = -.99 =.5 ˆµ = -.97 ˆ = Gaussian: µ = -.99 =.5 ˆµ = -.99 ˆ =

42 Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ =.38 =.67 ˆµ =.8 ˆ =.8 Broader target PDF Gaussian: µ =.38 =.67 ˆµ =.9 ˆ = Gaussian: µ =.38 =.67 ˆµ =.36 ˆ = X Gaussian: µ =.38 =.67 ˆµ =.53 ˆ = Gaussian: µ =.38 =.67 ˆµ =.42 ˆ = Gaussian: µ =.38 =.67 ˆµ =.36 ˆ =

43 Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ =.38 =.29 ˆµ =.4 ˆ =.95 Even broader target PDF Proposal PDF proportionately broader Gaussian: µ =.38 =.29 ˆµ =.24 ˆ = Gaussian: µ =.38 =.29 ˆµ =.37 ˆ = X Gaussian: µ =.38 =.29 ˆµ =.36 ˆ = Gaussian: µ =.38 =.29 ˆµ =.8 ˆ = Gaussian: µ =.38 =.29 ˆµ =.3 ˆ =

44 Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ = -.99 =.33 ˆµ = -.48 ˆ =.86 Sequence of progressively narrower target PDFs Gaussian: µ = -.99 =.33 ˆµ = -.8 ˆ = Gaussian: µ = -.99 =.33 ˆµ = -.95 ˆ = X Gaussian: µ = -.99 =.33 ˆµ = -.89 ˆ = Gaussian: µ = -.99 =.33 ˆµ = -. ˆ = Gaussian: µ = -.99 =.33 ˆµ = -.99 ˆ =

45 Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ = -.99 =.3 ˆµ = -.84 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -.84 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -. ˆ = Gaussian: µ = -.99 =.3 ˆµ = -.88 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -.98 ˆ = Gaussian: µ = -.99 =.3 ˆµ = -.99 ˆ = X

46 Counts MCMC of offset Gaussian target PDF using zero-mean Gaussian proposal PDF Gaussian: µ = -.99 =.5 ˆµ = -.45 ˆ = Gaussian: µ = -.99 =.5 ˆµ = -.9 ˆ = Gaussian: µ = -.99 =.5 ˆµ = -. ˆ = X Gaussian: µ = -.99 =.5 ˆµ = -.24 ˆ = Gaussian: µ = -.99 =.5 ˆµ = -.97 ˆ = Gaussian: µ = -.99 =.5 ˆµ = -.99 ˆ =

47 ACFs of MCMC-generated Time Series Width of ACF = correlation time for the time series! Too long a correlation time à inefficient sampling of parameter space! Longer correlation times correspond to proposal PDFs that have larger support relative to the support of the target PDF.!!

48 Time Series of MCMC Samples Case with wide target PDF Gaussian: µ =.38 =.29 ˆµ =.23 ˆ = Time Series Time (steps)

49 Time Series of MCMC Samples Case with narrow target PDF.2 Gaussian: µ = -.99 =.5 ˆµ = -.99 ˆ =.6.2 Gaussian: µ = -.99 =.5 ˆµ = -. ˆ =.7..2 Burn-in time..2 Time Series.4.6 Time Series Time (steps) Time (steps)

50 Gaussian: µ =.38 =.67 ˆµ =.3 ˆ = Full ACF.6 ACV Lag (time steps) Gaussian: µ =.38 =.67 ˆµ =.3 ˆ =.58 ACV Zoom in to inner % (same case, different realization) Lag (time steps)

51 Relatively wide target PDF Gaussian: µ =.38 =.67 ˆµ =.34 ˆ =.67 Gaussian: µ =.38 =.67 ˆµ =.35 ˆ = ACV.4 ACV Lag (time steps) Lag (time steps)

52 Wider target PDF à Narrower ACF Gaussian: µ =.38 =.29 ˆµ =.33 ˆ =.24 Gaussian: µ =.38 =.29 ˆµ =.35 ˆ = ACV.4 ACV Lag (time steps) Lag (time steps)

53 Narrower target PDF à Wider ACF. Gaussian: µ = -.99 =.33 ˆµ = -.98 ˆ = ACV Lag (time steps)

54 Narrower target PDF à Wider ACF. Gaussian: µ = -.99 =.3 ˆµ = -.99 ˆ = ACV Lag (time steps)

55 Narrower target PDF à Wider ACF. Gaussian: µ = -.99 =.5 ˆµ = -.99 ˆ = ACV Lag (time steps)

56 Unsuitable Proposal PDFs

57

58

59

60

61

62

63

64

65

66

67

68

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Lecture 23:! Nonlinear least squares!! Notes Modeling2015.pdf on course

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Metropolis Algorithm

Metropolis Algorithm //7 A Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture MCMC example Reading: Ch, and in Gregory (from before) Chapter 9 of Mackay (Monte Carlo Methods) hip://www.inference.phy.cam.ac.uk/itprnn/

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Random Walks A&T and F&S 3.1.2

Random Walks A&T and F&S 3.1.2 Random Walks A&T 110-123 and F&S 3.1.2 As we explained last time, it is very difficult to sample directly a general probability distribution. - If we sample from another distribution, the overlap will

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky Monte Carlo Lecture 15 4/9/18 1 Sampling with dynamics In Molecular Dynamics we simulate evolution of a system over time according to Newton s equations, conserving energy Averages (thermodynamic properties)

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

I. Bayesian econometrics

I. Bayesian econometrics I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods

More information

Markov Processes. Stochastic process. Markov process

Markov Processes. Stochastic process. Markov process Markov Processes Stochastic process movement through a series of well-defined states in a way that involves some element of randomness for our purposes, states are microstates in the governing ensemble

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Simulated Annealing for Constrained Global Optimization

Simulated Annealing for Constrained Global Optimization Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Lecture notes on Regression: Markov Chain Monte Carlo (MCMC)

Lecture notes on Regression: Markov Chain Monte Carlo (MCMC) Lecture notes on Regression: Markov Chain Monte Carlo (MCMC) Dr. Veselina Kalinova, Max Planck Institute for Radioastronomy, Bonn Machine Learning course: the elegant way to extract information from data,

More information

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Simulation - Lectures - Part III Markov chain Monte Carlo

Simulation - Lectures - Part III Markov chain Monte Carlo Simulation - Lectures - Part III Markov chain Monte Carlo Julien Berestycki Part A Simulation and Statistical Programming Hilary Term 2018 Part A Simulation. HT 2018. J. Berestycki. 1 / 50 Outline Markov

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

CSC 446 Notes: Lecture 13

CSC 446 Notes: Lecture 13 CSC 446 Notes: Lecture 3 The Problem We have already studied how to calculate the probability of a variable or variables using the message passing method. However, there are some times when the structure

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018 Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount

More information

Stochastic optimization Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing

More information

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 15: MCMC Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Course progress Learning from examples Definition + fundamental theorem of statistical learning,

More information

Optimization Methods via Simulation

Optimization Methods via Simulation Optimization Methods via Simulation Optimization problems are very important in science, engineering, industry,. Examples: Traveling salesman problem Circuit-board design Car-Parrinello ab initio MD Protein

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Copyright 2001 University of Cambridge. Not to be quoted or copied without permission.

Copyright 2001 University of Cambridge. Not to be quoted or copied without permission. Course MP3 Lecture 4 13/11/2006 Monte Carlo method I An introduction to the use of the Monte Carlo method in materials modelling Dr James Elliott 4.1 Why Monte Carlo? The name derives from the association

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

Hierarchical Metastable States and Kinetic Transition Networks: Trajectory Mapping and Clustering

Hierarchical Metastable States and Kinetic Transition Networks: Trajectory Mapping and Clustering Hierarchical Metastable States and Kinetic Transition Networks: Trajectory Mapping and Clustering Xin Zhou xzhou@gucas.ac.cn Graduate University of Chinese Academy of Sciences, Beijing 2012.6.5 KITP, Santa

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Using Spectral Clustering to Sample Molecular States and Pathways

Using Spectral Clustering to Sample Molecular States and Pathways Using Spectral Clustering to Sample Molecular States and Pathways Surl-Hee Ahn 1, a) 2, b) and Johannes Birgmeier 1) Chemistry Department, Stanford University, Stanford, California 94305, USA 2) Computer

More information

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below Introduction In statistical physics Monte Carlo methods are considered to have started in the Manhattan project (1940

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Monte Carlo and cold gases. Lode Pollet.

Monte Carlo and cold gases. Lode Pollet. Monte Carlo and cold gases Lode Pollet lpollet@physics.harvard.edu 1 Outline Classical Monte Carlo The Monte Carlo trick Markov chains Metropolis algorithm Ising model critical slowing down Quantum Monte

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Multi-Ensemble Markov Models and TRAM. Fabian Paul 21-Feb-2018

Multi-Ensemble Markov Models and TRAM. Fabian Paul 21-Feb-2018 Multi-Ensemble Markov Models and TRAM Fabian Paul 21-Feb-2018 Outline Free energies Simulation types Boltzmann reweighting Umbrella sampling multi-temperature simulation accelerated MD Analysis methods

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible

More information

Multimodal Nested Sampling

Multimodal Nested Sampling Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

arxiv:astro-ph/ v1 14 Sep 2005

arxiv:astro-ph/ v1 14 Sep 2005 For publication in Bayesian Inference and Maximum Entropy Methods, San Jose 25, K. H. Knuth, A. E. Abbas, R. D. Morris, J. P. Castle (eds.), AIP Conference Proceeding A Bayesian Analysis of Extrasolar

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Monte Carlo (MC) Simulation Methods. Elisa Fadda

Monte Carlo (MC) Simulation Methods. Elisa Fadda Monte Carlo (MC) Simulation Methods Elisa Fadda 1011-CH328, Molecular Modelling & Drug Design 2011 Experimental Observables A system observable is a property of the system state. The system state i is

More information

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold

More information

Markov Chains Handout for Stat 110

Markov Chains Handout for Stat 110 Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of

More information

REVIEW FOR EXAM III SIMILARITY AND DIAGONALIZATION

REVIEW FOR EXAM III SIMILARITY AND DIAGONALIZATION REVIEW FOR EXAM III The exam covers sections 4.4, the portions of 4. on systems of differential equations and on Markov chains, and..4. SIMILARITY AND DIAGONALIZATION. Two matrices A and B are similar

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II References Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ. Recall: Sampling Motivation If we can generate random samples x i from a given distribution P(x), then we can

More information

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem? Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information