BUGS Bayesian inference Using Gibbs Sampling

Similar documents
Bayesian Networks in Educational Assessment

Advanced Statistical Modelling

Principles of Bayesian Inference

Bayesian Graphical Models

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

36-463/663Multilevel and Hierarchical Models

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Principles of Bayesian Inference

Bayesian Methods for Machine Learning

Bayesian Inference for Regression Parameters

MCMC for Cut Models or Chasing a Moving Target with MCMC

Probabilistic Machine Learning

PIER HLM Course July 30, 2011 Howard Seltman. Discussion Guide for Bayes and BUGS

Robust Bayesian Regression

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution


Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Hypothesis Testing and Estimation under a Bayesian Approach

Principles of Bayesian Inference

Hierarchical Linear Models

Lecture 13 Fundamentals of Bayesian Inference

Metric Predicted Variable on One Group

Bayes: All uncertainty is described using probability.

Principles of Bayesian Inference

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

Bayesian Computation

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Tutorial on Probabilistic Programming with PyMC3

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Modeling of Accelerated Life Tests with Random Effects

Metric Predicted Variable on Two Groups

Confidence Intervals. CAS Antitrust Notice. Bayesian Computation. General differences between Bayesian and Frequntist statistics 10/16/2014

Markov Chain Monte Carlo

Bayesian Inference and Decision Theory


Why Bayesian approaches? The average height of a rare plant

Preface 5. Introduction 6. A Hands-on Example 8. Regression Models 12. Model Checking & Diagnostics 20. Contents. Bayesian Probability 7

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

Stat 5101 Lecture Notes

WinBUGS : part 2. Bruno Boulanger Jonathan Jaeger Astrid Jullion Philippe Lambert. Gabriele, living with rheumatoid arthritis

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Bayesian Phylogenetics:

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

DAG models and Markov Chain Monte Carlo methods a short overview

Monte Carlo Inference Methods

36-463/663: Hierarchical Linear Models

Bayesian hypothesis testing for the distribution of insurance claim counts using the Gibbs sampler

0.1 normal.bayes: Bayesian Normal Linear Regression

Bayesian Linear Models

Computer intensive statistical methods

Probabilities for climate projections

DIC: Deviance Information Criterion

Bayesian Linear Regression

Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS

INTRODUCTION TO BAYESIAN STATISTICS

Down by the Bayes, where the Watermelons Grow

Metropolis-Hastings Algorithm

ST 740: Markov Chain Monte Carlo

Bayesian Linear Models

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

A Parameter Expansion Approach to Bayesian SEM Estimation

Bayesian Inference and MCMC

Markov Chain Monte Carlo methods

Bayesian Statistics: An Introduction

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Fast Likelihood-Free Inference via Bayesian Optimization

0.1 factor.bayes: Bayesian Factor Analysis

Introduction to Machine Learning CMU-10701

FAV i R This paper is produced mechanically as part of FAViR. See for more information.

Model comparison: Deviance-based approaches

A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models

Introduction to Markov Chain Monte Carlo

Advanced Statistical Methods. Lecture 6

Lecture 6. Prior distributions

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Markov Chain Monte Carlo

CSC321 Lecture 18: Learning Probabilistic Models

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Computational statistics

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Bayesian Methods in Multilevel Regression

MCMC algorithms for fitting Bayesian models

Bayesian Regression Linear and Logistic Regression

Probabilistic programming and Stan. mc-stan.org

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Markov chain Monte Carlo

MCMC Methods: Gibbs and Metropolis

STAT 425: Introduction to Bayesian Analysis

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Part 7: Hierarchical Modeling

Statistical Computing with R

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Transcription:

BUGS Bayesian inference Using Gibbs Sampling Glen DePalma Department of Statistics May 30, 2013 www.stat.purdue.edu/~gdepalma 1 / 20

Bayesian Philosophy I [Pearl] turned Bayesian in 1971, as soon as I began reading Savage s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: i. It is plain silly to ignore what we know, ii. iii. It is natural and useful to cast what we know in the language of probabilities, and If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases. Note: He later would doubt the validity of (iii) 2 / 20

Bayesian Statistics Becoming more and more popular due to ease of simulation tools (R, SAS, BUGS,...) The main problem with frequentist statistics is there is a natural tendency to treat the p-value as if it was a Bayesian a-posterior probability that the null hypothesis is true (and hence 1-p is the probability that the alternative is true) or treating a frequentist confidence interval as a Bayesian credible interval (and hence assuming there is a 95% probability that the true value lies within a 95% confidence interval for the particular sample of data we have) I challenge you to find me in any published statistical analysis, outside of an introductory textbook, of a confidence interval given the correct interpretation. If you can find even one instance where the confidence interval is not interpreted as a credible interval, then I will eat your hat. - William Briggs 2012 The main objection to Bayesian statistics is the subjectivity in the prior (interestingly the subjectivity of the likelihood for frequentist approach is rarely mentioned) 3 / 20

Andrew Gelman Quotes The difference between significant and non-significant is not itself statistically significant I ve never in my professional life made a Type I error or a Type II error. But I ve made lots of errors. How can this be? A Type 1 error occurs only if the null hypothesis is true (typically if a certain parameter, or difference in parameters, equals zero). In the applications I ve worked on, in social science and public health, I ve never come across a null hypothesis that could actually be true, or a parameter that could actually be zero. A Type 2 error occurs only if I claim that the null hypothesis is true, and I would certainly not do that, given my statement above! 4 / 20

BUGS? Much of Bayesian analysis is done using Markov chain Monte Carlo (MCMC) to sample from the posterior. What does BUGS do? You define the model by specifying the relationships between the variables, BUGS handles the MCMC. Four (?) flavors: 1. WinBUGS (the original, development has ceased) 2. OpenBUGS (open source, still in development (?)) 3. JAGS (Just Another Gibbs Sampler) 4. STAN (New, Hamiltonian Monte Carlo) The syntax is very similar among the programs, examples in this presentation use JAGS: http://mcmc-jags.sourceforge.net/ 5 / 20

Example 1 Out of 2,430 parts produced 219 were not shipped due to being defective. Our interest is in the probability of a defective part. Let us assume (unlikely) we know nothing apriori about the proportion of scrapped parts. y θ Binomial(2430, θ) θ Beta(1, 1) Exact Posterior is Beta(219 + 1, 2430 219 + 1) 6 / 20

MCMC burnin =1000 nsim =5000 theta =.5 n=2430 y=219 a=1 b=1 t h e t a S a v e=rep (NA, nsim b u r n i n ) l l k =f u n c t i o n ( t h e t a ) r e t u r n ( dbinom ( y, n, t h e t a, l o g=true) ) l p r i o r=f u n c t i o n ( t h e t a ) r e t u r n ( dbeta ( t h e t a, a, b, l o g=true) ) f o r ( i i n 1 : nsim ){ # p r o p o s e new Theta thetanew=t h e t a+r u n i f ( 1,. 0 5,. 0 5 ) #Compute M H Acceptance i f ( thetanew > 0){ proposalllk= l l k ( thetanew )+l p r i o r ( thetanew ) l l k ( t h e t a ) l p r i o r ( t h e t a ) i f ( proposalllk>l o g ( r u n i f ( 1 ) ) ) t h e t a=thetanew i f ( i>b u r n i n ) thetasave [ i burnin ]= theta p l o t ( ( b u r n i n +1) : nsim, t h e tasave, t y p e= l, x l a b= I t e r a t i o n, y l a b= Theta ) summary ( t h e t a S a v e ) 7 / 20

BUGS Code model{ # Model y d b i n ( t h e t a, n ) # P r i o r t h e t a dbeta ( a1, b1 ) 8 / 20

Example 2 - One Sample Normal Likelihood: y i N (µ, 1/τ) Priors: µ N (0,.001) τ Gamma(.001,.001) Posterior: n i=1 τ 2π exp ( τ(y i µ) 2 /2 ).001 2π exp (.001µ 2 /2 ).001. 001τ.001 1 e.001τ Γ(.001) We could calculate the exact posteriors of µ and τ but let s use BUGS! 9 / 20

BUGS Code model{ # Model f o r ( i i n 1 : n ){ y [ i ] dnorm (mu, tau ) # P r i o r mu dnorm ( 0,. 0 0 1 ) tau dgamma (. 0 0 1,. 0 0 1 ) # E x t r a sigma < 1/ s q r t ( tau ) prob < s t e p (mu 4.8) 10 / 20

Example 3 - Regression Model: Likelihood: L(β, σ 2 ) = (2πσ 2 ) n/2 exp y i N (X T i β, σ 2 ) ( ) 1 2σ 2 (Y X β)t (Y X β) Priors: We will use non-informative priors for β and σ 2 and let BUGS handle the details. 11 / 20

BUGS Code model { f o r ( i i n 1 : n ){ l o s s [ i ] dnorm (mu [ i ], tau ) mu[ i ] < beta [1]+ beta [ 2 ] a i r [ i ]+ beta [ 3 ] water [ i ]+ beta [ 4 ] a c i d [ i ] #p r i o r s f o r ( i i n 1 : 4 ) { beta [ i ] dnorm ( 0, 0. 0 0 1 ) tau dgamma ( 0. 0 0 1, 0. 0 0 1 ) sigma < 1/ s q r t ( tau ) #t e s t b e t a 4 beta4prob < step ( beta [4] 0) #P r e d i c t i o n s mu1 < beta [1]+ beta [ 2 ] 60.43+ beta [ 3 ] 21.1+ beta [ 4 ] 8 6. 2 9 pred1 dnorm (mu1, tau ) 12 / 20

Example 4 - One-way ANOVA Model: Lamb data: y ij = µ i + ɛ ij µ i N (0,.001) ɛ ij N (0, 1/τ) y ij = µ i + ɛ ij µ i N (β i, λ i ) β i N (5,.01) λ i Gamma(10, 1) ɛ ij N (0, 1/τ) i = 1...5 13 / 20

BUGS Code model { f o r ( i i n 1 : n ){ y [ i ] dnorm (mu [ group [ i ] ], tau ) f o r ( i i n 1 : 5 ) { mu[ i ] dnorm ( beta [ i ], lambda [ i ] ) beta [ i ] dnorm ( 5,. 0 1 ) lambda [ i ] dgamma (1 0,1) tau dgamma ( 0. 0 0 1, 0. 0 0 1 ) sigma < 1/ s q r t ( tau ) #########d i f f e r e n c e i n mus############ prob1< step (mu[2] mu [ 1 ] ) prob2< step (mu[2] mu [ 3 ] ) prob3< step (mu[2] mu [ 4 ] ) prob4< step (mu[2] mu [ 5 ] ) 14 / 20

Example 5 - Agricultural Experiment - 2 Way ANOVA Fertilizer was applied to four corn types and three different fertilizers. Let us be real Bayesians and use prior knowledge we have gathered from talking to the farmers and industry experts. Cell Means ANOVA Model: Y ij = θ i + ɛ ij where θ i N ( 1 4 4 i=1 ) X i, σ2 4 i = 1...12 j = 1...4 ɛ ij N (0, σ 2 ) The variance of each observation is known to be around 20 (σ 2 = 20), therefore the variance of the thetas equals 5. Theta is expected to be around 125 with a possible range of 115-140. Given this information, we use the following prior for theta: θ i N (µ i, 5) µ i N (125, 225) 15 / 20

BUGS Code ##c e l l means anova model model { f o r ( i i n 1 : 1 2 ) { f o r ( j i n 1 : 4 ) { #x [ i, j ] i s t h e t a x [ i, j ] dnorm ( mu [ i ], 1/ 20) # p r i o r on t h e t a mu[ i ] dnorm (125,1 / 225) # Extra! ### Compute Main E f f e c t s o f F e r t i l i z e r and Corn c o r n [ 1 ] < mean (mu [ 1 : 3 ] ) c o r n [ 2 ] < mean (mu [ 4 : 6 ] ) c o r n [ 3 ] < mean (mu [ 7 : 9 ] ) c o r n [ 4 ] < mean (mu [ 1 0 : 1 2 ] ) f e r t [ 1 ] < (mu[1]+mu[4]+mu[7]+mu [ 1 0 ] ) /4 f e r t [ 2 ] < (mu[2]+mu[5]+mu[8]+mu [ 1 1 ] ) /4 f e r t [ 3 ] < (mu[3]+mu[6]+mu[9]+mu [ 1 2 ] ) /4 ### Prob main e f f e c t s b e s t f o r ( i i n 1 : 4 ) { probcorn [ i ] < e q u a l s ( c o r n [ i ], max( c o r n [ 1 : 4 ] ) ) f o r ( i i n 1 : 3 ) { p r o b F e r t [ i ] < e q u a l s ( f e r t [ i ], max( f e r t [ 1 : 3 ] ) ) ### Prob c e l l means b e s t f o r ( i i n 1 : 1 2 ) { probtrt [ i ] < e q u a l s (mu[ i ], max(mu [ 1 : 1 2 ] ) ) ### prob [ 4, 3 ] > [ 2, 1 ] prob1 < s t e p (mu[12] mu [ 4 ] ) 16 / 20

Reporting a Bayesian Analysis 1. Motivate the use of Bayesian analysis Richer and more informative, no reliance on p-values 2. Clearly describe the model and its parameters The posterior distribution is a distribution over the parameters 3. Clearly describe and justify the prior 4. Mention MCMC details 5. Interpret the posterior Report summary statistics of the parameters that are theoretically meaningful 6. Robustness of the posterior for different priors Conduct the analysis with different priors as a sensitivity test 7. Posterior Predictive Check Generate data from the posterior, do they match the actual data? 17 / 20

Convergence How do we know if we have converged to the correct posterior? Quick answer: we can t. Many fancy diagnostic theorems have been proposed, but none of them prove the property you really want a diagnostic to have. These theorems say that if the chain converges, then the diagnostic will probably say the chain has converged, but they do not say that if the chain pseudo-converges, then the diagnostic will probably say the chain did not converge. How do we know if we have converged to the correct posterior? Theorems that do claim pseudo-convergence have unverifiable conditions that make them useless. 18 / 20

Convergence II Your humble author has a dictum that the least one can do is make an overnight run. What better way for your computer to spend its time? In many problems that are not too complicated, this is millions or billions of iterations. If you do not make runs like that, you are not simply serious about MCMC. - Charles J. Geyer 19 / 20

Summary Frequentist statistics are a useful tool, but the future of effective data analysis is moving closer to Bayesian analyses BUGS makes it easy to do Bayesian Analysis Can handle a wide variety of models Presented only basic (but common!) examples Roughly 100 examples: http://www.openbugs.info/w/examples However BUGS can t do everything (but it is getting close..): Models that require a specialized likelihood (not Normal, Binomial,...) Reversible Jump Markov Chain Monte Carlo (RJMCMC) Still very important to learn how to fully program a Bayesian analysis Frequentist methods still have a place... 20 / 20