Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Bayesian modelling Hans-Peter Helfrich University of Bonn Theodor-Brinkmann-Graduate School H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 1 / 22

Overview 1 Bayesian modelling 2 Examples 3 WinBUGS - A Bayesian modelling framework 4 Markov Chain Monte Carlo Algorithm 5 References H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 2 / 22

Bayesian modelling Objectives [Van Oijen et al., 2005] Bridging the gaps between models and data Bayesian calibration Inferring the parameter from the data (outcome) Bayesian model Parameters θ 1,..., θ n ModelL(y θ) Data y = (y 1,..., y m ) Main steps (cf. http://www.stat.osu.edu/ sses/ps and pdf/stb.pdf) 1 Process model 2 Data model 3 Prior density distribution H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 3 / 22

Bayes theorem Statistical inference can be done via Bayes theorem. The posterior distribution is given by p(θ y) L(y θ)p(θ) up to a normalizing factor. The posterior density function contains all statistical information for providing mean values, medians, and credible intervals. Sampling methods In general, the normalizing factor cannot be explicitly calculated. That can be done by sampling methods that give samples with the posterior density distribution. Mainly, two methods can be used Gibbs sampling Markov Chain Monte Carlo Method H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 4 / 22

Linear regression Process model We have the simple model x i = at i + b, i = 1,..., n. Data model The theoretical outcomes x 1,..., x n are disturbed by random errors ε 1,..., ε n : y i = x i + ε i, i = 1,..., n. We assume that random errors are independent and normally distributed 1 L(y θ) = ( ( 2π) n σ exp (y 1 x 1 ) 2 ) n 2σ 2 exp ( (y n x n ) 2 ) 2σ 2 with the parameter vector θ = (a, b, σ). H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 5 / 22

Prior For the parameters to be estimated, prior densities should be specified. If no information is available, we may choose the non-informative prior, i.e., p(θ) = const. Several experiments Bayes methods allow updating the prior information. For the first data set y 1, we get by Bayes theorem p 1 (θ y 1 ) L 1 (y 1 θ)p(θ) For the second data set y 2, we may take p 1 (θ y 1 ) as prior density to obtain and so on. p 2 (θ y 2 ) L 2 (y 2 θ)p 1 (θ y 1 ) H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 6 / 22

Nonlinear models Distance between two walls An electronic measuring device measures the distance between two points by sending an electromagnetic beam. Assume we want to measure the perpendicular distance θ between two walls. In practice, the beam is not perpendicular to the walls. Assume we have a displacement e at our measurement. z = θ 2 + e 2 θ e For a displacement e, we get by the theorem of Pythagoras as length z z = θ 2 + e 2 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 7 / 22

Model specification Model specification Our model looks like that e N (0, σ 2 p) z = θ 2 + e 2 We may introduce for z an additional error caused by the measuring device y N (z, σ 2 m) e N (0, σ 2 p) z = θ 2 + e 2. We can simulate the density distribution of the measurements with a random number generator. For example, we can use in R the function rnorm() for getting normally distributed samples. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 8 / 22

Distribution of the error Error caused by displacement Density 0 50 100 150 200 9.90 9.95 10.00 10.05 10.10 Distance [m] Distribution of the error σ p = 0.3, σ m = 0, σ = 0.0064 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 9 / 22

Distribution of the error Error caused by displacement and by device Density 0 50 100 150 200 9.90 9.95 10.00 10.05 10.10 Distance [m] Distribution of the error σ p = 0.3, σ m = 0.005, σ = 0.0081 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 9 / 22

WinBUGS - A Bayesian modelling framework WinBUGS [Lunn et al., 2000] WinBUGS is a fully extensible modular framework for constructing and analysing Bayesian full probability models. Models may be specified either textually via the BUGS language or pictorially using a graphical interface called DoodleBUGS. BUGS is an acronym for Bayesian inference Using Gibbs Sampling. Linear model In WinBUGS, a linear model is specified by model { } y dnorm(x, tau) x <- a*t + b H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 10 / 22

Linear regression We extend the model by incorporating N observations and by assigning priors. model { for (j in 1:N) { y[j] dnorm(x[j], tau) x[j] <- a*t[j] + b } a dunif(0,a_max) b dunif(0,b_max) tau dunif(0,tau_max) } For a, b and τ we assign non-informative priors. By dnorm(x[j], tau) the Gaussian density is specified, where τ denotes the precision 1/σ 2 H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 11 / 22

Nonlinear model As before, we get by the theorem of Pythagoras the length z z = θ 2 + e 2, where e denotes the displacement, and θ denotes the distance between the two walls. Distance meter model: BUGS code model { y dnorm(z, tau_m) e dnorm(0, tau_p) z <- sqrt(theta * theta + e * e) theta dunif(a, b) } H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 12 / 22

Distance meter example We extend the model by incorporating N observations. model { for (j in 1:N) { y[j] dnorm(z[j], tau_m) e[j] dnorm(0, tau_p) z[j] <- sqrt(theta * theta + e[j] * e[j]) } theta dunif(a, b) } The displacement e which is different for each measurement changes the observed distance to a value z. The distributions of the errors are specified by normal distributions with precisions τ p and τ m. In the last line, a uniform prior distribution of the unknown distance is given. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 13 / 22

Visualization Directed graph by Doodle H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 14 / 22

Markov Chain Monte Carlo Algorithms [Hastings, 1970] Markov Chains [Gelman et al., 2004] We consider a system which may be at time t in a certain state X (t). First we consider a finite set of states, say i = 1,..., S. The states may be thought as different locations. Transition matrix P The probability that we come from location i to the location j is denoted by p ij. The matrix P = (p ij ) is called the transition matrix. A Markov chain is characterized by assuming that the state X (t + 1) is determined by X (t), and does not explicitly depend on former states. Starting distribution By π 1,..., π S we denote the probabilities for being at time t at locations 1,..., S. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 15 / 22

Coming to location 1 The probability for coming from location i to location 1 is p i1. Since we are at i with probability π i, we get π 1 p 11 + π 2 p 21 + + π S p S1 as probability for coming to location 1 at time t + 1. In a similiar way, we get the probability for coming to location j. Distribution at time t + 1 The probability that at time t + 1 the location j is reached is given by π i p ij. In matrix notation, we have πp. i H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 16 / 22

Irreducibility For two steps, the transition matrix is given by P 2, for three steps by P 3 and so on. A Markov chain is said to be irreducible if there exists for every i and every j some n such that p n ij is positive. Stationarity We call the distribution π stationary if π j = i π i p ij. It means that the distribution does not change in further steps. Construction of the transition matrix For a distribution π i, we are looking for transition matrix which has the given distribution as stationary distribution. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 17 / 22

Markov Chain Monte Carlo Algorithm Jumping rule We choose a transition matrix Q = (q ij ) called jumping or proposal distribution, which must be symmetric. If we are at location i we go to location j with probability q ij. That means each row of Q gives a probability distribution Two stage algorithm We choose a jumping matrix Q = {q ij } and consider the algorithm 1 Assume that X (t) = i and select a state j using a distribution given by the ith row of Q 2 Take X (t + 1) = j with probability α ij and X (t + 1) = i with probability 1 α ij, where α ij = { 1 (πj /π i 1) π j /π i, (π j /π i < 1) H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 18 / 22

Markov Chain Monte Carlo Algorithm Samples from a given distribution Assume we have a probability distribution (π 1,..., π S ). The general form of our algorithm is the following: 1 Set k = 1, X (k) = i where i {1,..., S} is chosen at random. 2 Draw a sample j at random with probability q ij, set k k + 1 3 Calculate the odds ratio r = π j /π i If r 1 set If r < 1 set 4 Set i = X (k) 5 Go to Step 2 X (k) = X (k) = j { j with probability r i with probability 1 r H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 19 / 22

Explanation of the acceptance and rejection rules Reversibility We consider matrices P which satisfy the reversibility condition π i p ij = π j p ji which implies stationarity. We put p ij = q ij α ij, j = i. Since j p ji = 1, this property ensures stationarity π j = i π i p ij. Theorem The transition matrix q ij α ij satisfies the reversibility condition. Generalization to a continuous density distribution The algorithm can be extented for a continuous distribution. This gives th famous Metropolis algorithm. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 20 / 22

Metropolis algoritm Assume we have a probability distribution π(x) = π(x 1, x 2,..., x n ) for all vectors x = (x 1, x 2,..., x n ) in a given set U R n. Choose a function q(x, y), which is symmetric such that q(., y) is for each y a probability distribution. The general form of our algorithm is the following: 1 Set k = 1, X (k) = x where x U is chosen at random 2 Draw a sample y q(., x) at random, set k k + 1 3 Calculate the odds ratio r = π(y)/π(x) If r 1 set If r < 1 set 4 Set x = X (k) 5 Go to Step 2 X (k) = X (k) = y { y with probability r x with probability 1 r H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 21 / 22

References Gelman, A. B., Carlin, J. S., Stern, H., and Rubin, D. (2004). Bayesian Data Analysis. Chapman & Hall. Hastings, W. K. (1970). Monte carlo sampling methods using Markov chains and their applications. Biometrika, 57:97 109. Lunn, D. J., Thomas, A., Best, N., and Spiegelhalter, D. (2000). Winbugs - a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing (2000), 10:325 337. Van Oijen, M., Rougier, J., and Smith, R. (2005). Bayesian calibration of process-based forest models: bridging the gap between models and data. Tree Physiology, 25(7):915 927. H.-P. Helfrich (University of Bonn) Bayesian modelling Brinkmann School 22 / 22