MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

Size: px

Start display at page:

Download "MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis"

Lindsey Allen
5 years ago
Views:

1 MIT /30

2 Stan: a program for Bayesian data analysis with complex models Andrew Gelman, Bob Carpenter, and Matt Hoffman, Jiqiang Guo, Ben Goodrich, and Daniel Lee Department of Statistics, Columbia University, New York and others 12 Oct /30

3 Stan overview Fit open-ended Bayesian models Specify log posterior density in C++ Code a distribution once, then use it everywhere Hamiltonian No-U-Turn sampler Autodiff Runs from R; postprocessing 3/30

4 Example from toxicology 4/30

5 Sparse data 5/30

6 Validation using predictive simulations 6/30

7 Public opinion: Health care reform 7/30

8 Public opinion: School vouchers 8/30

9 Public opinion: Death penalty Hierarchical time series model Open-ended 9/30

10 Tree rings 10/30

11 Speed dating Each person meets dates Rate each date on Attractiveness, Sincerity, Intelligence, Ambition, Fun to be with, Shared interests Outcomes Do you want to see this person again? (Yes/No) How much do you like this person (1 10) How important are each of the 6 attributes? Logistic or linear regression Hierarchical model: coefficients vary by person 11/30

12 Steps of Bayesian data analysis Model building Inference Model checking Model understanding and improvement 12/30

13 Background on Bayesian computation Point estimates and standard errors Hierarchical models Posterior simulation Markov chain Monte Carlo (Gibbs sampler and Metropolis algorithm) Hamiltonian Monte Carlo 13/30

14 Hoffman and Gelman HMC Example Trajectory Blue ellipse is contour of target distribution Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions θ traced Initial out byposition the leapfrog integrator at black and associated solid with elements of the set of visited states B, the black solid circle is the starting position, the red solid circles are Arrows positions associated indicate with astates U-turn that mustinbe momentum excluded from the set C of possible next samples because their joint probability is below the slice variable u, Gelman, Carpenter, and the positions Hoffman, with aguo, red x Goodrich, through Lee, them. correspond.. Stan to states for Bayesian that must data be analysis 14/30

15 Solving problems Problem: Gibbs too slow, Metropolis too problem-specific Solution: Hamiltonian Monte Carlo Problem: Interpreters too slow, won t scale Solution: Compilation Problem: Need gradients of log posterior for HMC Solution: Reverse-mode algorithmic differentation Problem: Existing algo-diff slow, limited, unextensible Solution: Our own algo-diff Problem: Algo-diff requires fully templated functions Solution: Our own density library, Eigen linear algebra 15/30

16 Radford Neal (2011) on Hamiltonian Monte Carlo One practical impediment to the use of Hamiltonian Monte Carlo is the need to select suitable values for the leapfrog stepsize, ɛ, and the number of leapfrog steps L... Tuning HMC will usually require preliminary runs with trial values for ɛ and L... Unfortunately, preliminary runs can be misleading... 16/30

17 The No U-Turn Sampler Created by Matt Hoffman Run the HMC steps until they start to turn around (bend with an angle > 180 ) Computationally efficient Requires no tuning of #steps Complications to preserve detailed balance 17/30

18 Hoffman and Gelman NUTS Example Trajectory Blue ellipse is contour of target distribution Figure 2: Example of a trajectory generated during one iteration of NUTS. The blue ellipse is a contour of the target distribution, the black open circles are the positions θ traced Initial out byposition the leapfrog integrator at black and associated solid with elements of the set of visited states B, the black solid circle is the starting position, the red solid circles are Arrows positions associated indicate with astates U-turn that mustinbe momentum excluded from the set C of possible next samples because their joint probability is below the slice variable u, Gelman, Carpenter, and the positions Hoffman, with aguo, red x Goodrich, through Lee, them. correspond.. Stan to states for Bayesian that must data be analysis 18/30

19 NUTS vs. Gibbs andthe Metropolis No-U-Turn Sampler Two dimensions of highly correlated 250-dim distribution igure 7: Samples generated by random-walk Metropolis, Gibbs sampling, and NUTS. The plots compare 1M samples 1,000 independent from Metropolis, draws from1ma highly from correlated Gibbs (thin 250-dimensional to 1K) distribu- 1K(right) samples with 1,000,000 from NUTS, samples 1K(thinned independent to 1,000 samples draws for display) generated by tion random-walk Metropolis (left), 1,000,000 samples (thinned to 1,000 samples for display) generated by Gibbs sampling (second from left), and 1,000 samples generated by NUTS (second from right). Only the first two dimensions are shown here. 19/30

20 Hoffman and Gelman NUTS vs. Basic HMC 250-D normal and logistic regression models Vertical axis shows effective #sims (big is good) (Left) NUTS; (Right) HMC with increasing t = ɛl 20/30

21 NUTS vs. Basic HMC II Hierarchical logistic regression and stochastic volatility igure 6: Effective Simulation sampletime size is (ESS) stepas size a function ɛ times of #steps δ and L(for HMC) simulation length L for the multivariate normal, logistic regression, hierarchical logistic regression NUTS can beat optimally tuned HMC and stochastic volatility models. Each point shows the ESS divided by the numbe 21/30 Gelman, of Carpenter, gradient Hoffman, evaluations Guo, Goodrich, for alee, separate... Stan experiment; for Bayesian data lines analysis denote the average of th

22 Solving more problems in Stan Problem: Need ease of use of BUGS Solution: Compile directed graphical model language Problem: Need to tune parameters for HMC Solution: Auto tuning, adaptation Problem: Efficient up-to-proportion density calcs Solution: Density template metaprogramming Problem: Limited error checking, recovery Solution: Static model typing, informative exceptions Problem: Poor boundary behavior Solution: Calculate limits (e.g. lim x 0 x log x) Problem: Restrictive licensing (e.g., closed, GPL, etc.) Solution: Open-source, BSD license 22/30

23 A simple hierarchical model in Stan data { int<lower=0> J; // number of schools real y[j]; // estimated treatment effects real<lower=0> sigma[j]; // s.e. of effect estimates } parameters { real mu; real<lower=0> tau; real eta[j]; } transformed parameters { real theta[j]; for (j in 1:J) theta[j] <- mu + tau * eta[j]; } model { eta ~ normal(0, 1); y ~ normal(theta, sigma); } 23/30

24 Example: the Kumaraswamy distribution p(θ a, b) = a b x θ 1 (1 θ a )(b 1) for a, b > 0 and θ (0, 1) model { // Put priors on a and b here if you want } // Put in the rest of your model // Kumaraswamy log-likelihood lp <- lp + N*(log(a)+log(b))+(a-1)*sum_log_theta; for (i in 1:N) lp <- lp + (b-1)*log1m(pow(theta[i],a)); 24/30

25 Check that it worked R code: N < a <- 3 b <- 2 theta <- rbeta(n,1,b)^(1/a) Kumaraswamy <- stan (file="kumaraswamy.stan", data=list(n=n,x=x)) print (Kumaraswamy) Result: Inference for Stan model: Kumaraswamy. 4 chains: each with iter=2000; warmup=1000; thin=1; 2000 saved. mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat a b lp /30

26 Scalability Big data Big models 26/30

27 What do we mean by Artificial Intelligence? 27/30

28 A possible model of Bayesian data analysis Computer does step 2 (Bayesian inference); homunculus does step 1 (model building) and step 3 (model checking) 28/30

29 29/30 Summary delta ESS per gradient NUTS Automatic Bayesian inference as good as (or better than) programming it yourself Open architecture, active user and developer communities (mc-stan.org) Goals: Fit more models Run faster Tools for Bayesian data analysis

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo within Stan Daniel Lee Columbia University, Statistics Department bearlee@alum.mit.edu BayesComp mc-stan.org Why MCMC? Have data. Have a rich statistical model. No analytic solution.