Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Similar documents
eqr094: Hierarchical MCMC for Bayesian System Reliability

Markov Chain Monte Carlo

Bayesian Inference and MCMC

Markov Chain Monte Carlo methods

Markov chain Monte Carlo

Probabilistic Machine Learning

Bayesian Networks in Educational Assessment

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Bayesian Inference. Chapter 1. Introduction and basic concepts

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Bayesian Graphical Models

DAG models and Markov Chain Monte Carlo methods a short overview

MARKOV CHAIN MONTE CARLO

Advanced Statistical Modelling

Markov Chain Monte Carlo

MCMC Methods: Gibbs and Metropolis

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo

Bayesian Phylogenetics:

Introduction to Machine Learning CMU-10701

CS281A/Stat241A Lecture 22

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Markov Chain Monte Carlo (MCMC)

A note on Reversible Jump Markov Chain Monte Carlo

Principles of Bayesian Inference

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bagging During Markov Chain Monte Carlo for Smoother Predictions

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Markov Chains and MCMC

Reminder of some Markov Chain properties:

Monte Carlo in Bayesian Statistics

Statistics & Data Sciences: First Year Prelim Exam May 2018

Markov chain Monte Carlo methods in atmospheric remote sensing

Part 1: Expectation Propagation

CPSC 540: Machine Learning

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Bayesian Estimation for the Generalized Logistic Distribution Type-II Censored Accelerated Life Testing

MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Lecture 1 Bayesian inference

Bayesian model selection: methodology, computation and applications

Bayesian Methods for Machine Learning

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Estimation with Sparse Grids

Bayesian Phylogenetics

Learning the hyper-parameters. Luca Martino

BUGS Bayesian inference Using Gibbs Sampling

Sampling Methods (11/30/04)

Computational statistics

Markov Chain Monte Carlo in Practice

Lecture 8: The Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Monte Carlo Methods. Leon Gu CSD, CMU

STA 4273H: Statistical Machine Learning

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Bayesian data analysis in practice: Three simple examples

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

MCMC and Gibbs Sampling. Kayhan Batmanghelich

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-682.

Bayesian GLMs and Metropolis-Hastings Algorithm

Probabilistic Graphical Networks: Definitions and Basic Results

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayesian inference for multivariate extreme value distributions

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning


Tutorial on Probabilistic Programming with PyMC3

Convex Optimization CMU-10725

Principles of Bayesian Inference

Reconstruction of individual patient data for meta analysis via Bayesian approach

A Geometric Interpretation of the Metropolis Hastings Algorithm

Approximate Bayesian Computation: a simulation based approach to inference

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Convergence Rate of Markov Chains

Lecture 7 and 8: Markov Chain Monte Carlo

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

INTRODUCTION TO BAYESIAN STATISTICS

Markov Chain Monte Carlo and Applied Bayesian Statistics

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Markov chain Monte Carlo

Answers and expectations

David Giles Bayesian Econometrics

Bayesian model selection in graphs by using BDgraph package

Lecture 2: From Linear Regression to Kalman Filter and Beyond

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Molecular Epidemiology Workshop: Bayesian Data Analysis

an introduction to bayesian inference

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

The Effects of Monetary Policy on Stock Market Bubbles: Some Evidence

Transcription:

Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22

Overview 1 Bayesian modelling 2 Examples 3 WinBUGS - A Bayesian modelling framework 4 Markov Chain Monte Carlo Algorithm 5 References H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 2 / 22

Bayesian modelling Objectives [Van Oijen et al., 2005] Bridging the gaps between models and data Bayesian calibration Inferring the parameter from the data (outcome) Bayesian model Parameters θ 1,..., θ n ModelL(y θ) Data y = (y 1,..., y m ) Main steps (cf. http://www.stat.osu.edu/ sses/ps and pdf/stb.pdf) 1 Process model 2 Data model 3 Prior density distribution H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 3 / 22

Bayes theorem Statistical inference can be done via Bayes theorem. The posterior distribution is given by p(θ y) L(y θ)p(θ) up to a normalizing factor. The posterior density function contains all statistical information for providing mean values, medians, and credible intervals. Sampling methods In general, the normalizing factor cannot be explicitly calculated. That can be done by sampling methods that give samples with the posterior density distribution. Mainly, two methods can be used Gibbs sampling Markov Chain Monte Carlo Method H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 4 / 22

Linear regression Process model We have the simple model x i = at i + b, i = 1,..., n. Data model The theoretical outcomes x 1,..., x n are disturbed by random errors ε 1,..., ε n : y i = x i + ε i, i = 1,..., n. We assume that random errors are independent and normally distributed 1 L(y θ) = ( ( 2π) n σ exp (y 1 x 1 ) 2 ) n 2σ 2 exp ( (y n x n ) 2 ) 2σ 2 with the parameter vector θ = (a, b, σ). H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 5 / 22

Prior For the parameters to be estimated, prior densities should be specified. If no information is available, we may choose the non-informative prior, i.e., p(θ) = const. Several experiments Bayes methods allow updating the prior information. For the first data set y 1, we get by Bayes theorem p 1 (θ y 1 ) L 1 (y 1 θ)p(θ) For the second data set y 2, we may take p 1 (θ y 1 ) as prior density to obtain and so on. p 2 (θ y 2 ) L 2 (y 2 θ)p 1 (θ y 1 ) H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 6 / 22

Nonlinear models Distance between two walls An electronic measuring device measures the distance between two points by sending an electromagnetic beam. Assume we want to measure the perpendicular distance θ between two walls. In practice, the beam is not perpendicular to the walls. Assume we have a displacement e at our measurement. z = θ 2 + e 2 θ e For a displacement e, we get by the theorem of Pythagoras as length z z = θ 2 + e 2 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 7 / 22

Model specification Model specification Our model looks like that e N (0, σ 2 p) z = θ 2 + e 2 We may introduce for z an additional error caused by the measuring device y N (z, σ 2 m) e N (0, σ 2 p) z = θ 2 + e 2. We can simulate the density distribution of the measurements with a random number generator. For example, we can use in R the function rnorm() for getting normally distributed samples. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 8 / 22

Distribution of the error Error caused by displacement Density 0 50 100 150 200 9.90 9.95 10.00 10.05 10.10 Distance [m] Distribution of the error σ p = 0.3, σ m = 0, σ = 0.0064 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 9 / 22

Distribution of the error Error caused by displacement and by device Density 0 50 100 150 200 9.90 9.95 10.00 10.05 10.10 Distance [m] Distribution of the error σ p = 0.3, σ m = 0.005, σ = 0.0081 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 9 / 22

WinBUGS - A Bayesian modelling framework WinBUGS [Lunn et al., 2000] WinBUGS is a fully extensible modular framework for constructing and analysing Bayesian full probability models. Models may be specified either textually via the BUGS language or pictorially using a graphical interface called DoodleBUGS. BUGS is an acronym for Bayesian inference Using Gibbs Sampling. Linear model In WinBUGS, a linear model is specified by model { } y dnorm(x, tau) x <- a*t + b H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 10 / 22

Linear regression We extend the model by incorporating N observations and by assigning priors. model { for (j in 1:N) { y[j] dnorm(x[j], tau) x[j] <- a*t[j] + b } a dunif(0,a_max) b dunif(0,b_max) tau dunif(0,tau_max) } For a, b and τ we assign non-informative priors. By dnorm(x[j], tau) the Gaussian density is specified, where τ denotes the precision 1/σ 2 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 11 / 22

Nonlinear model As before, we get by the theorem of Pythagoras the length z z = θ 2 + e 2, where e denotes the displacement, and θ denotes the distance between the two walls. Distance meter model: BUGS code model { y dnorm(z, tau_m) e dnorm(0, tau_p) z <- sqrt(theta * theta + e * e) theta dunif(a, b) } H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 12 / 22

Distance meter example We extend the model by incorporating N observations. model { for (j in 1:N) { y[j] dnorm(z[j], tau_m) e[j] dnorm(0, tau_p) z[j] <- sqrt(theta * theta + e[j] * e[j]) } theta dunif(a, b) } The displacement e which is different for each measurement changes the observed distance to a value z. The distributions of the errors are specified by normal distributions with precisions τ p and τ m. In the last line, a uniform prior distribution of the unknown distance is given. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 13 / 22

Visualization Directed graph by Doodle H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 14 / 22

Markov Chain Monte Carlo Algorithms [Hastings, 1970] Markov Chains [Gelman et al., 2004] We consider a system which may be at time t in a certain state X (t). First we consider a finite set of states, say i = 1,..., S. The states may be thought as different locations. Transition matrix P The probability that we come from location i to the location j is denoted by p ij. The matrix P = (p ij ) is called the transition matrix. A Markov chain is characterized by assuming that the state X (t + 1) is determined by X (t), and does not explicitly depend on former states. Starting distribution By π 1,..., π S we denote the probabilities for being at time t at locations 1,..., S. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 15 / 22

Coming to location 1 The probability for coming from location i to location 1 is p i1. Since we are at i with probability π i, we get π 1 p 11 + π 2 p 21 + + π S p S1 as probability for coming to location 1 at time t + 1. In a similiar way, we get the probability for coming to location j. Distribution at time t + 1 The probability that at time t + 1 the location j is reached is given by π i p ij. In matrix notation, we have πp. i H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 16 / 22

Irreducibility For two steps, the transition matrix is given by P 2, for three steps by P 3 and so on. A Markov chain is said to be irreducible if there exists for every i and every j some n such that p n ij is positive. Stationarity We call the distribution π stationary if π j = i π i p ij. It means that the distribution does not change in further steps. Construction of the transition matrix For a distribution π i, we are looking for transition matrix which has the given distribution as stationary distribution. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 17 / 22

Markov Chain Monte Carlo Algorithm Jumping rule We choose a transition matrix Q = (q ij ) called jumping or proposal distribution, which must be symmetric. If we are at location i we go to location j with probability q ij. That means each row of Q gives a probability distribution Two stage algorithm We choose a jumping matrix Q = {q ij } and consider the algorithm 1 Assume that X (t) = i and select a state j using a distribution given by the ith row of Q 2 Take X (t + 1) = j with probability α ij and X (t + 1) = i with probability 1 α ij, where α ij = { 1 (πj /π i 1) π j /π i, (π j /π i < 1) H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 18 / 22

Markov Chain Monte Carlo Algorithm Samples from a given distribution Assume we have a probability distribution (π 1,..., π S ). The general form of our algorithm is the following: 1 Set k = 1, X (k) = i where i {1,..., S} is chosen at random. 2 Draw a sample j at random with probability q ij, set k k + 1 3 Calculate the odds ratio r = π j /π i If r 1 set If r < 1 set 4 Set i = X (k) 5 Go to Step 2 X (k) = X (k) = j { j with probability r i with probability 1 r H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 19 / 22

Explanation of the acceptance and rejection rules Reversibility We consider matrices P which satisfy the reversibility condition π i p ij = π j p ji which implies stationarity. We put p ij = q ij α ij, j = i. Since j p ji = 1, this property ensures stationarity π j = i π i p ij. Theorem The transition matrix q ij α ij satisfies the reversibility condition. Generalization to a continuous density distribution The algorithm can be extented for a continuous distribution. This gives th famous Metropolis algorithm. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 20 / 22

Metropolis algoritm Assume we have a probability distribution π(x) = π(x 1, x 2,..., x n ) for all vectors x = (x 1, x 2,..., x n ) in a given set U R n. Choose a function q(x, y), which is symmetric such that q(., y) is for each y a probability distribution. The general form of our algorithm is the following: 1 Set k = 1, X (k) = x where x U is chosen at random 2 Draw a sample y q(., x) at random, set k k + 1 3 Calculate the odds ratio r = π(y)/π(x) If r 1 set If r < 1 set 4 Set x = X (k) 5 Go to Step 2 X (k) = X (k) = y { y with probability r x with probability 1 r H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 21 / 22

References Gelman, A. B., Carlin, J. S., Stern, H., and Rubin, D. (2004). Bayesian Data Analysis. Chapman & Hall. Hastings, W. K. (1970). Monte carlo sampling methods using Markov chains and their applications. Biometrika, 57:97 109. Lunn, D. J., Thomas, A., Best, N., and Spiegelhalter, D. (2000). Winbugs - a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing (2000), 10:325 337. Van Oijen, M., Rougier, J., and Smith, R. (2005). Bayesian calibration of process-based forest models: bridging the gap between models and data. Tree Physiology, 25(7):915 927. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 22 / 22