Computer intensive statistical methods
|
|
- Austen Marshall
- 5 years ago
- Views:
Transcription
1 Lecture 11 Markov Chain Monte Carlo cont. October 6, 2015 Jonas Wallin Chalmers, Gothenburg university
2 The two stage Gibbs sampler If the conditional distributions are easy to sample from one can use the Gibbs sampler: onas Wallin (Chalmers)
3 The two stage Gibbs sampler If the conditional distributions are easy to sample from one can use the Gibbs sampler: Start with X (0) (0) 1:2 = (X 1, X (0) 2 ) for l = 1 N do draw X (l) 1 π(x 1 X (l 1) 2 ) draw X (l) 2 π(x 2 X (l) 1 ) end for return X = (X (0) 1:2,..., X (N) 1:2 ) The output of algorithm is a Markov chain. onas Wallin (Chalmers)
4 Repetition: Markov chains A Markov chain on χ R d is a family of random variables (= stochastic process) (X k ) k 0, where each X k takes values in χ, and P(X k+1 B X 0, X 1,..., X k ) = P(X k+1 B X k ). The density q of the distribution of X k+1 given X k = x is called the transition density of (X k ). Consequently, P(X k+1 B X k = x k ) = q(x k+1 x k ) dx k+1. As a first example we considered an AR(1) process: X 0 = 0, X k+1 = αx k + ɛ k+1, where α is a constant and (ɛ k ) are i.i.d. variables. B
5 Stationary Markov chains A distribution π(x) is said to be stationary if q(x z)π(z)dz = π(x). (Global balance)
6 Stationary Markov chains A distribution π(x) is said to be stationary if q(x z)π(z)dz = π(x). (Global balance) For a stationary distribution π it holds that f 0 = π f 1 (x 1 ) = q(x 1 x 0 )f 0 (x 0 ) dx 0 = q(x 1 x 0 )π(x 0 ) dx 0 = π(x 1 ) f 2 (x) = q(x 2 x 1 )f 1 (x 1 ) dx 1 = q(x 2 x 1 )π(x 1 ) dx 1 = π(x 2 )... f n (x n ) = π(x n ), n.
7 Stationary Markov chains A distribution π(x) is said to be stationary if q(x z)π(z)dz = π(x). (Global balance) For a stationary distribution π it holds that f 0 = π f 1 (x 1 ) = q(x 1 x 0 )f 0 (x 0 ) dx 0 = q(x 1 x 0 )π(x 0 ) dx 0 = π(x 1 ) f 2 (x) = q(x 2 x 1 )f 1 (x 1 ) dx 1 = q(x 2 x 1 )π(x 1 ) dx 1 = π(x 2 )... f n (x n ) = π(x n ), n. Thus, if the chain starts in the stationary distribution, it will always stay in the stationary distribution. In this case we call also the chain stationary.
8 Stationary distribution of the Gibbs sampler* Theorem (Global balance Gibbs ) The joint density π is a stationary distribution for the Markov chain X (t) generated by the Gibbs sampler.
9 Irreducibility π-irreducibility A Markov chain is said to be π-irreducible if for all points x χ and all measurable sets A s.t π(a) = π(x)dx > 0 there exists for some t s.t: q t (y x)dy > 0. A If this condition holds with t = 1, then the chain is said to be strongly π-irreducible.
10 π-irreducibility fails π-irreducibility is not always satisfied for the Gibbs sampler: density x x density
11 π-irreducibility sufficient conditions. Positivity condition A distribution with density π(x 1,..., x d ) satisfy the positivity condition if π(x 1,..., x d ) > 0 for all x 1,..., x d with π(x i ) > 0.
12 π-irreducibility sufficient conditions. Positivity condition A distribution with density π(x 1,..., x d ) satisfy the positivity condition if π(x 1,..., x d ) > 0 for all x 1,..., x d with π(x i ) > 0. Proposition If π satisfies the positivity condition, the Gibbs sampler is π-irreducibility (and recurrent) Markov Chain.
13 π-irreducibility sufficient conditions. The positivity condition is sufficient, but not necessary. 3 density x x density
14 Korsbetning In 1361 the Danish king Valdemar Atterdag conquered Gotland and captured the rich Hanseatic town of Visby. In the graveside was excavated. A total of 493 femurs (237 right, 256 left) were found. How many people where buried there? onas Wallin (Chalmers)
15 Korsbetning model A reasonable (?) prior could be N U[260,..., 2000] p Beta(2, 2) onas Wallin (Chalmers)
16 Korsbetning model A reasonable (?) prior could be N U[260,..., 2000] p Beta(2, 2) The likelihood given the parameter is then π(y N, p) = Bin(y 1 ; N, p)bin(y 2 ; N, p) onas Wallin (Chalmers)
17 Korsbetning Gibbs To generate samples from the posterior we use a Gibbs sampler First the conditional distribution for p is given by π(p N, y) = Beta(y 1 + y 2 + 2, 2N y 1 y 2 + 2) onas Wallin (Chalmers)
18 Korsbetning Gibbs To generate samples from the posterior we use a Gibbs sampler First the conditional distribution for p is given by π(p N, y) = Beta(y 1 + y 2 + 2, 2N y 1 y 2 + 2) The conditional distribution of N is π(n y, p) Bin(y 1 ; N, p)bin(y 2 ; N, p) onas Wallin (Chalmers)
19 Korsbetning results The posterior distribution of N is given by Posterior distribution number of persons Density N
20 Geometric ergodicity Definition (Geometric ergodicity) X (t) is a geometric ergodic Markov Chain if: There exists a ρ < 1 and a function C x > 0 such that if X (0) = x then sup P(X (t) A) π(a) C x ρ t. A χ If there exists a constant C > 0 s.t C x < C for all x then Markov chain is uniformly ergodic.
21 Geometric ergodicity Definition (Geometric ergodicity) X (t) is a geometric ergodic Markov Chain if: There exists a ρ < 1 and a function C x > 0 such that if X (0) = x then sup P(X (t) A) π(a) C x ρ t. A χ If there exists a constant C > 0 s.t C x < C for all x then Markov chain is uniformly ergodic. Uniformly ergodicity for a Markov chain (often) implies that if E f [h 2 (X )] < C(h(X (m) ), h(x (n) )) Cρ n m.
22 Law of Large Numbers v2. Geometric ergodicity gives LLN, we however will show a weak version of LLN: Theorem (Law of large numbers for Markov chains) Let X (t) be a stationary Markov chain, with stationary distribution π, and X (0) π. If h is a function s.t for k i. Then C[h(X (k) ), h(x (i) )] < Cρ k i, def. τ T = 1 T T h(x (i) ) P E π (h(x )). i=1
23 Proof of LLN Proof. First by stationarity E[τ T ] = E[ 1 T T h(x (i) )] = 1 T T E π[ h(x )] = E π [h(x )]. i=1 i=1 Second we bound the variance: Denote S T = T i=1 h(x (i) ), (τ T = S T T ) and µ = E π [h(x )], then 1 T 2 T i=1 j=1 V[τ T ] = E[ 1 T 2 (S T T µ) 2 ] = T E[(h(X (i) ) µ)(h(x (j) ) µ)].
24 Proof of cont Proof. By ergodicity 1 T 2 T T i=1 j=1 E[(h(X (i) ) µ)(h(x (j) ) µ)] 1 T 2 T i=1 j=1 1 T 2 T Cρ i j T j=1 C 1 = C 1 T. Here C 1, C are positive constants. The LLN follows now directly from Chebyshev s inequality.
25 CLT Theorem (Central limit theorem) Let X (t) be geometric ergodic Markov chain, with stationary distribution π. Then for any h s.t E π [ h(x ) 2+ɛ ] <, where ɛ > 0, T (τt E π (h(x ))) N (0, σ 2 h), Under some additional assumption: σh 2 = V π [h(x 0 )] + 2 C π [h(x 0 ), h(x i )] i=1
26 CLT Theorem (Central limit theorem) Let X (t) be geometric ergodic Markov chain, with stationary distribution π. Then for any h s.t E π [ h(x ) 2+ɛ ] <, where ɛ > 0, T (τt E π (h(x ))) N (0, σ 2 h), Under some additional assumption: σ 2 h = V π [h(x 0 )] + 2 C π [h(x 0 ), h(x i )] A side note, in Häggström (2005) it was shown that ɛ can t be dropped. i=1
27 Effective Sample size MCMC style 1 σ 2 h is typically not known.
28 Effective Sample size MCMC style 1 σ 2 h is typically not known. 2 Often an AR(1) is a good approximation of the behavior of MCMC chain.
29 Effective Sample size MCMC style 1 σ 2 h is typically not known. 2 Often an AR(1) is a good approximation of the behavior of MCMC chain. 3 Approximate the Markov chain h(x t ) with an AR(1) process, where C(h(X (t) ), h(x (t+h) )) = ρ h (here C is the correlation function).
30 Effective Sample size MCMC style 1 σ 2 h is typically not known. 2 Often an AR(1) is a good approximation of the behavior of MCMC chain. 3 Approximate the Markov chain h(x t ) with an AR(1) process, where C(h(X (t) ), h(x (t+h) )) = ρ h (here C is the correlation function). 4 Then the variance of the estimator V[ 1 T T h(x (i) )] 1 + ρ 1 1 ρ T V[h(X (t) )] i=1 Then T 1 ρ 1+ρ is the effective sample size.
31 Korsbettning acf Going back to Gibbs sampler for N, p the acf (acf in R) for N looks as follows: Series N_vec ACF Lag
32 Korsbettning acf Going back to Gibbs sampler for N, p the acf (acf in R) for N looks as follows: Series N_vec ACF And 1 ρ 1+ρ samples). onas Wallin (Chalmers) Lag (thus a sample is worth independent
33 Data augmentation MCMC Suppose we augment with z then we now π(θ y) = π(θ, z y) dz. To solve this integral using MCMC, one sample the Markov chain (θ, z) (t) but store only θ (t), Done! Of course there is no free lunch. Adding extra variables increases the variance.
34 censored data Censored data occur frequently in statistics, especially in survival analysis. Suppose y follows standard distribution f (y θ) (with corresponding, unkown or intractable, probability P θ (A) = P(y A θ).), however the data is only observed up to α.
35 censored data Censored data occur frequently in statistics, especially in survival analysis. Suppose y follows standard distribution f (y θ) (with corresponding, unkown or intractable, probability P θ (A) = P(y A θ).), however the data is only observed up to α. The posterior distribution for θ is then n m π(θ y) π(θ) f (y i θ) i=1 m P θ (y α), j=1 where m is the number of censored observations. Typically not possible to sample from.
36 censored data Censored data occur frequently in statistics, especially in survival analysis. Suppose y follows standard distribution f (y θ) (with corresponding, unkown or intractable, probability P θ (A) = P(y A θ).), however the data is only observed up to α. The posterior distribution for θ is then n m π(θ y) π(θ) f (y i θ) i=1 m P θ (y α), where m is the number of censored observations. Typically not possible to sample from. If one observed the censored component sampling θ is simple. Thus introduce the augmented variable z j which correspond to the actual value of the censored observations Z j = y j y j > α (slight abuse of index notation). j=1
37 censored regression (Tobit) Censored regression is regression model that is applied when the variables only in a certain region. This can be model as where ɛ N (0, σ 2 I) and y = Xβ + ɛ, z i if z i [a, b], y i = b if z i b, a if z i a.
38 censored regression (Tobit) Censored regression is regression model that is applied when the variables only in a certain region. This can be model as where ɛ N (0, σ 2 I) and y = Xβ + ɛ, z i if z i [a, b], y i = b if z i b, a if z i a. This model fits perfectly into data augmentation when augmenting with z.
39 Ingorning the censoring To make inference from a Bayesian model we set: π(β) = N (β; 0, Σ β ), π(σ 2 ) = IG(σ 2 ; α, β). The first thing we do is remove the censored observations.
40 y Example data set Assume X i = [1, t i ], t
41 censored regression example To include the censored variables, we apply the following Gibbs sampler: Sample z i β, σ 2 N y>b (X i β, σ 2 ) if y i = b Sample β, σ z
42 posterior distribution Density Density β β 2 Density Density β 1 β 2
43 Hierarchical model definition Often parameters in a model is connected by some structure. For a statistical model to be reasonable it should be able to incorporate structures. onas Wallin (Chalmers)
44 Hierarchical model definition Often parameters in a model is connected by some structure. For a statistical model to be reasonable it should be able to incorporate structures. Recall we have π(y, θ) = π(y θ)π(θ), onas Wallin (Chalmers)
45 Hierarchical model definition Often parameters in a model is connected by some structure. For a statistical model to be reasonable it should be able to incorporate structures. Recall we have π(y, θ) = π(y θ)π(θ), we can impose a prior on the prior by π(y, θ) = π(y θ 1 )π(θ 1 θ 0 )π(θ 0 ). θ 0 is often denoted a hyper prior.
46 Hierarchical model example A classical data set is the study of rat tumors experiments are conducted, and afterwards one studied the number of rats that developed a tumor.
47 Hierarchical model example A classical data set is the study of rat tumors experiments are conducted, and afterwards one studied the number of rats that developed a tumor. 2 One could estimate each experiment independently: n π(θ y) = Binom(n i, y i, θ i )π(θ i ), i=1 however, these leads to problems since the number of observations is few.
48 Hierarchical model example A classical data set is the study of rat tumors experiments are conducted, and afterwards one studied the number of rats that developed a tumor. 2 One could estimate each experiment independently: n π(θ y) = Binom(n i, y i, θ i )π(θ i ), i=1 however, these leads to problems since the number of observations is few. 3 An simplistic approach is to assume that is a joint risk for all experiments: n π(θ y) = π(θ) Binom(n i, y i, θ), i=1 easy to fit, to simplistic, ignores the variation among e experiments.
49 Independent model DAG θ 1 θ 2 θ θ 70 θ 71 θ 72 y 1 y 2 y y 70 y 71 π(θ y) π(θ 72 ) 71 i=1 Binom(n i, y i, θ i )π(θ i ). Hard to make inference and prediction for θ 72 (future experiment) or y 72 (future observation). onas Wallin (Chalmers)
50 Independent model DAG θ y 1 y 2 y y 70 y 71 π(θ y) π(θ) 71 i=1 Binom(n i, y i, θ). To simplistic, does not allow for variance among the experiments.
51 Hierarchicals model DAG α, β θ 1 θ 2 θ θ 70 θ 71 θ 72 y 1 y 2 y y 70 y 71 π(α, β) exp( β α), π(θ i ) Beta(α, β), n π(y θ) = Binom(n i, y i, θ i ) i=1 71 π(θ y) Binom(n i, y i, θ i )π(θ i α, β)π(α, β)dαdβ. i=1
52 Hierarchicals model DAG compact α β θ j y j j = π(θ y) 71 i=1 Binom(n i, y i, θ i )π(θ i α, β)π(α, β)dαdβ.
53 Example:rats The joint posterior distribution is given by n n ( π(θ, α, β y) θ y i i (1 θ i ) n i y i Γ(α + β) Γ(α)Γ(β) i=1 i=1 }{{} π(y θ) ) θ α i (1 θ i ) β } {{ } π(θ α,β) e α β }{{} π(α,β)
54 Example:rats The joint posterior distribution is given by n n ( π(θ, α, β y) θ y i i (1 θ i ) n i y i Γ(α + β) Γ(α)Γ(β) i=1 i=1 }{{} π(y θ) ) θ α i (1 θ i ) β } {{ } π(θ α,β) The t + 1 iterations of the Gibbs sampler follows as θ (t+1) j... Beta(α (t) + y j, β (t) + n j y j ) ( ) n α (t+1) Γ(α + β (t) ) j... π(α...) e Γ(α) β (t+1) j... π(β...) ( Γ(α (t+1) + β) Γ(β) ) n e (t+1) log(θ j )α e α (t+1) log(θ j 1)β e β The densities of α and β are both log concave. For log concave densities there are good rejection algorithms. e α β }{{} π(α,β)
55 Example:rats posterior predicitve Below is the posterior predictive of θ 72 for the Hierarchal model vs θ for the simplistic model f(θ) θ
56 Example:rats (acf) However it turns out that acf for α, β is not good Series alpha_vec C(h) h Series beta_vec C(h) h
Computer intensive statistical methods
Lecture 13 MCMC, Hybrid chains October 13, 2015 Jonas Wallin jonwal@chalmers.se Chalmers, Gothenburg university MH algorithm, Chap:6.3 The metropolis hastings requires three objects, the distribution of
More informationComputer intensive statistical methods Lecture 1
Computer intensive statistical methods Lecture 1 Jonas Wallin Chalmers, Gothenburg. Jonas Wallin - jonwal@chalmers.se 1/27 People and literature The following people are involved in the course: Function
More informationMonte Carlo-based statistical methods (MASM11/FMS091)
Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 12 MCMC for Bayesian computation II March 1, 2013 J. Olsson Monte Carlo-based
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics jimmyol@kth.se Lecture 13 Introduction to bootstrap 5 May 2014 Computer Intensive Methods (1) Plan of today s lecture 1 Last
More informationA short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods
A short diversion into the theory of Markov chains, with a view to Markov chain Monte Carlo methods by Kasper K. Berthelsen and Jesper Møller June 2004 2004-01 DEPARTMENT OF MATHEMATICAL SCIENCES AALBORG
More informationMarkov Chain Monte Carlo and Applied Bayesian Statistics
Markov Chain Monte Carlo and Applied Bayesian Statistics Trinity Term 2005 Prof. Gesine Reinert Markov chain Monte Carlo is a stochastic simulation technique that is very useful for computing inferential
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationBayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017
Chalmers April 6, 2017 Bayesian philosophy Bayesian philosophy Bayesian statistics versus classical statistics: War or co-existence? Classical statistics: Models have variables and parameters; these are
More informationBayesian Graphical Models
Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationIntroduction to Bayesian Statistics 1
Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia
More informationMCMC Methods: Gibbs and Metropolis
MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationSTA205 Probability: Week 8 R. Wolpert
INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationMarkov Chain Monte Carlo
Chapter 5 Markov Chain Monte Carlo MCMC is a kind of improvement of the Monte Carlo method By sampling from a Markov chain whose stationary distribution is the desired sampling distributuion, it is possible
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationTheory of Stochastic Processes 8. Markov chain Monte Carlo
Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16 Lecture 16 : Bayesian analysis
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationA = {(x, u) : 0 u f(x)},
Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationThe Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS
The Pennsylvania State University The Graduate School RATIO-OF-UNIFORMS MARKOV CHAIN MONTE CARLO FOR GAUSSIAN PROCESS MODELS A Thesis in Statistics by Chris Groendyke c 2008 Chris Groendyke Submitted in
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationMetropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9
Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationThe Recycling Gibbs Sampler for Efficient Learning
The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationMarkov Chain Monte Carlo for Linear Mixed Models
Markov Chain Monte Carlo for Linear Mixed Models a dissertation submitted to the faculty of the graduate school of the university of minnesota by Felipe Humberto Acosta Archila in partial fulfillment of
More informationMarkov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa
Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationReminder of some Markov Chain properties:
Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent
More informationZig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods
Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationThe Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More informationLecture 4: Dynamic models
linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationLearning in graphical models & MC methods Fall Cours 8 November 18
Learning in graphical models & MC methods Fall 2015 Cours 8 November 18 Enseignant: Guillaume Obozinsi Scribe: Khalife Sammy, Maryan Morel 8.1 HMM (end) As a reminder, the message propagation algorithm
More informationBeta statistics. Keywords. Bayes theorem. Bayes rule
Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate
More informationStochastic Simulation
Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count x 1 n 1. x k total. n k m X probability x 1. n 1 /m. x k n k /m If we could sample from a variable s (posterior)
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationMarkov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017
Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the
More informationControl Variates for Markov Chain Monte Carlo
Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability
More informationWeakness of Beta priors (or conjugate priors in general) They can only represent a limited range of prior beliefs. For example... There are no bimodal beta distributions (except when the modes are at 0
More informationKazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract
Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight
More informationADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance ADVANCED FINANCIAL ECONOMETRICS PROF. MASSIMO GUIDOLIN a.a. 14/15 p. 1 LECTURE 3: REVIEW OF BASIC ESTIMATION METHODS: GMM AND OTHER EXTREMUM
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationMarkov Chain Monte Carlo, Numerical Integration
Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationLecture 6: Markov Chain Monte Carlo
Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationMonte Carlo Methods in Bayesian Inference: Theory, Methods and Applications
University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 1-016 Monte Carlo Methods in Bayesian Inference: Theory, Methods and Applications Huarui Zhang University of Arkansas, Fayetteville
More information