BUGS Bayesian inference Using Gibbs Sampling Glen DePalma Department of Statistics May 30, 2013 www.stat.purdue.edu/~gdepalma 1 / 20
Bayesian Philosophy I [Pearl] turned Bayesian in 1971, as soon as I began reading Savage s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: i. It is plain silly to ignore what we know, ii. iii. It is natural and useful to cast what we know in the language of probabilities, and If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases. Note: He later would doubt the validity of (iii) 2 / 20
Bayesian Statistics Becoming more and more popular due to ease of simulation tools (R, SAS, BUGS,...) The main problem with frequentist statistics is there is a natural tendency to treat the p-value as if it was a Bayesian a-posterior probability that the null hypothesis is true (and hence 1-p is the probability that the alternative is true) or treating a frequentist confidence interval as a Bayesian credible interval (and hence assuming there is a 95% probability that the true value lies within a 95% confidence interval for the particular sample of data we have) I challenge you to find me in any published statistical analysis, outside of an introductory textbook, of a confidence interval given the correct interpretation. If you can find even one instance where the confidence interval is not interpreted as a credible interval, then I will eat your hat. - William Briggs 2012 The main objection to Bayesian statistics is the subjectivity in the prior (interestingly the subjectivity of the likelihood for frequentist approach is rarely mentioned) 3 / 20
Andrew Gelman Quotes The difference between significant and non-significant is not itself statistically significant I ve never in my professional life made a Type I error or a Type II error. But I ve made lots of errors. How can this be? A Type 1 error occurs only if the null hypothesis is true (typically if a certain parameter, or difference in parameters, equals zero). In the applications I ve worked on, in social science and public health, I ve never come across a null hypothesis that could actually be true, or a parameter that could actually be zero. A Type 2 error occurs only if I claim that the null hypothesis is true, and I would certainly not do that, given my statement above! 4 / 20
BUGS? Much of Bayesian analysis is done using Markov chain Monte Carlo (MCMC) to sample from the posterior. What does BUGS do? You define the model by specifying the relationships between the variables, BUGS handles the MCMC. Four (?) flavors: 1. WinBUGS (the original, development has ceased) 2. OpenBUGS (open source, still in development (?)) 3. JAGS (Just Another Gibbs Sampler) 4. STAN (New, Hamiltonian Monte Carlo) The syntax is very similar among the programs, examples in this presentation use JAGS: http://mcmc-jags.sourceforge.net/ 5 / 20
Example 1 Out of 2,430 parts produced 219 were not shipped due to being defective. Our interest is in the probability of a defective part. Let us assume (unlikely) we know nothing apriori about the proportion of scrapped parts. y θ Binomial(2430, θ) θ Beta(1, 1) Exact Posterior is Beta(219 + 1, 2430 219 + 1) 6 / 20
MCMC burnin =1000 nsim =5000 theta =.5 n=2430 y=219 a=1 b=1 t h e t a S a v e=rep (NA, nsim b u r n i n ) l l k =f u n c t i o n ( t h e t a ) r e t u r n ( dbinom ( y, n, t h e t a, l o g=true) ) l p r i o r=f u n c t i o n ( t h e t a ) r e t u r n ( dbeta ( t h e t a, a, b, l o g=true) ) f o r ( i i n 1 : nsim ){ # p r o p o s e new Theta thetanew=t h e t a+r u n i f ( 1,. 0 5,. 0 5 ) #Compute M H Acceptance i f ( thetanew > 0){ proposalllk= l l k ( thetanew )+l p r i o r ( thetanew ) l l k ( t h e t a ) l p r i o r ( t h e t a ) i f ( proposalllk>l o g ( r u n i f ( 1 ) ) ) t h e t a=thetanew i f ( i>b u r n i n ) thetasave [ i burnin ]= theta p l o t ( ( b u r n i n +1) : nsim, t h e tasave, t y p e= l, x l a b= I t e r a t i o n, y l a b= Theta ) summary ( t h e t a S a v e ) 7 / 20
BUGS Code model{ # Model y d b i n ( t h e t a, n ) # P r i o r t h e t a dbeta ( a1, b1 ) 8 / 20
Example 2 - One Sample Normal Likelihood: y i N (µ, 1/τ) Priors: µ N (0,.001) τ Gamma(.001,.001) Posterior: n i=1 τ 2π exp ( τ(y i µ) 2 /2 ).001 2π exp (.001µ 2 /2 ).001. 001τ.001 1 e.001τ Γ(.001) We could calculate the exact posteriors of µ and τ but let s use BUGS! 9 / 20
BUGS Code model{ # Model f o r ( i i n 1 : n ){ y [ i ] dnorm (mu, tau ) # P r i o r mu dnorm ( 0,. 0 0 1 ) tau dgamma (. 0 0 1,. 0 0 1 ) # E x t r a sigma < 1/ s q r t ( tau ) prob < s t e p (mu 4.8) 10 / 20
Example 3 - Regression Model: Likelihood: L(β, σ 2 ) = (2πσ 2 ) n/2 exp y i N (X T i β, σ 2 ) ( ) 1 2σ 2 (Y X β)t (Y X β) Priors: We will use non-informative priors for β and σ 2 and let BUGS handle the details. 11 / 20
BUGS Code model { f o r ( i i n 1 : n ){ l o s s [ i ] dnorm (mu [ i ], tau ) mu[ i ] < beta [1]+ beta [ 2 ] a i r [ i ]+ beta [ 3 ] water [ i ]+ beta [ 4 ] a c i d [ i ] #p r i o r s f o r ( i i n 1 : 4 ) { beta [ i ] dnorm ( 0, 0. 0 0 1 ) tau dgamma ( 0. 0 0 1, 0. 0 0 1 ) sigma < 1/ s q r t ( tau ) #t e s t b e t a 4 beta4prob < step ( beta [4] 0) #P r e d i c t i o n s mu1 < beta [1]+ beta [ 2 ] 60.43+ beta [ 3 ] 21.1+ beta [ 4 ] 8 6. 2 9 pred1 dnorm (mu1, tau ) 12 / 20
Example 4 - One-way ANOVA Model: Lamb data: y ij = µ i + ɛ ij µ i N (0,.001) ɛ ij N (0, 1/τ) y ij = µ i + ɛ ij µ i N (β i, λ i ) β i N (5,.01) λ i Gamma(10, 1) ɛ ij N (0, 1/τ) i = 1...5 13 / 20
BUGS Code model { f o r ( i i n 1 : n ){ y [ i ] dnorm (mu [ group [ i ] ], tau ) f o r ( i i n 1 : 5 ) { mu[ i ] dnorm ( beta [ i ], lambda [ i ] ) beta [ i ] dnorm ( 5,. 0 1 ) lambda [ i ] dgamma (1 0,1) tau dgamma ( 0. 0 0 1, 0. 0 0 1 ) sigma < 1/ s q r t ( tau ) #########d i f f e r e n c e i n mus############ prob1< step (mu[2] mu [ 1 ] ) prob2< step (mu[2] mu [ 3 ] ) prob3< step (mu[2] mu [ 4 ] ) prob4< step (mu[2] mu [ 5 ] ) 14 / 20
Example 5 - Agricultural Experiment - 2 Way ANOVA Fertilizer was applied to four corn types and three different fertilizers. Let us be real Bayesians and use prior knowledge we have gathered from talking to the farmers and industry experts. Cell Means ANOVA Model: Y ij = θ i + ɛ ij where θ i N ( 1 4 4 i=1 ) X i, σ2 4 i = 1...12 j = 1...4 ɛ ij N (0, σ 2 ) The variance of each observation is known to be around 20 (σ 2 = 20), therefore the variance of the thetas equals 5. Theta is expected to be around 125 with a possible range of 115-140. Given this information, we use the following prior for theta: θ i N (µ i, 5) µ i N (125, 225) 15 / 20
BUGS Code ##c e l l means anova model model { f o r ( i i n 1 : 1 2 ) { f o r ( j i n 1 : 4 ) { #x [ i, j ] i s t h e t a x [ i, j ] dnorm ( mu [ i ], 1/ 20) # p r i o r on t h e t a mu[ i ] dnorm (125,1 / 225) # Extra! ### Compute Main E f f e c t s o f F e r t i l i z e r and Corn c o r n [ 1 ] < mean (mu [ 1 : 3 ] ) c o r n [ 2 ] < mean (mu [ 4 : 6 ] ) c o r n [ 3 ] < mean (mu [ 7 : 9 ] ) c o r n [ 4 ] < mean (mu [ 1 0 : 1 2 ] ) f e r t [ 1 ] < (mu[1]+mu[4]+mu[7]+mu [ 1 0 ] ) /4 f e r t [ 2 ] < (mu[2]+mu[5]+mu[8]+mu [ 1 1 ] ) /4 f e r t [ 3 ] < (mu[3]+mu[6]+mu[9]+mu [ 1 2 ] ) /4 ### Prob main e f f e c t s b e s t f o r ( i i n 1 : 4 ) { probcorn [ i ] < e q u a l s ( c o r n [ i ], max( c o r n [ 1 : 4 ] ) ) f o r ( i i n 1 : 3 ) { p r o b F e r t [ i ] < e q u a l s ( f e r t [ i ], max( f e r t [ 1 : 3 ] ) ) ### Prob c e l l means b e s t f o r ( i i n 1 : 1 2 ) { probtrt [ i ] < e q u a l s (mu[ i ], max(mu [ 1 : 1 2 ] ) ) ### prob [ 4, 3 ] > [ 2, 1 ] prob1 < s t e p (mu[12] mu [ 4 ] ) 16 / 20
Reporting a Bayesian Analysis 1. Motivate the use of Bayesian analysis Richer and more informative, no reliance on p-values 2. Clearly describe the model and its parameters The posterior distribution is a distribution over the parameters 3. Clearly describe and justify the prior 4. Mention MCMC details 5. Interpret the posterior Report summary statistics of the parameters that are theoretically meaningful 6. Robustness of the posterior for different priors Conduct the analysis with different priors as a sensitivity test 7. Posterior Predictive Check Generate data from the posterior, do they match the actual data? 17 / 20
Convergence How do we know if we have converged to the correct posterior? Quick answer: we can t. Many fancy diagnostic theorems have been proposed, but none of them prove the property you really want a diagnostic to have. These theorems say that if the chain converges, then the diagnostic will probably say the chain has converged, but they do not say that if the chain pseudo-converges, then the diagnostic will probably say the chain did not converge. How do we know if we have converged to the correct posterior? Theorems that do claim pseudo-convergence have unverifiable conditions that make them useless. 18 / 20
Convergence II Your humble author has a dictum that the least one can do is make an overnight run. What better way for your computer to spend its time? In many problems that are not too complicated, this is millions or billions of iterations. If you do not make runs like that, you are not simply serious about MCMC. - Charles J. Geyer 19 / 20
Summary Frequentist statistics are a useful tool, but the future of effective data analysis is moving closer to Bayesian analyses BUGS makes it easy to do Bayesian Analysis Can handle a wide variety of models Presented only basic (but common!) examples Roughly 100 examples: http://www.openbugs.info/w/examples However BUGS can t do everything (but it is getting close..): Models that require a specialized likelihood (not Normal, Binomial,...) Reversible Jump Markov Chain Monte Carlo (RJMCMC) Still very important to learn how to fully program a Bayesian analysis Frequentist methods still have a place... 20 / 20