Introduction to BART with binary outcomes. Rodney Sparapani Medical College of Wisconsin Copyright (c) 2017 Rodney Sparapani

Size: px

Start display at page:

Johnathan Gibson
6 years ago
Views:

1 Introduction to BART with binary outcomes Rodney Sparapani Medical College of Wisconsin Copyright (c) 2017 Rodney Sparapani September 30: BART Bootcamp Biostatistics in the Modern Computing Era Medical College of Wisconsin, Milwaukee campus Funding for this research was provided, in part, by the Advancing Healthier Wisconsin Research and Education Program under awards and

2 Outline Motivation: chronic spine pain and obesity Probit regression with BART Logistic regression with BART Geweke convergence diagnostics for binary BART

3 Motivation: chronic spine pain and obesity We believe that obesity is a risk factor for chronic lower back/buttock pain Conversely, obesity is not a risk factor for chronic neck pain Data available from the National Health and Nutrition Examination Survey (NHANES) Arthritis Questionnaire 5106 subjects were surveyed Demographics: age and gender Anthropometrics available: weight (kg), height (cm), body mass index (kg/m 2 ), waist circumference (cm) Sampling weights to estimate for the US as a whole

4 Probit BART for binary outcomes Probit regression with latent variables: Albert & Chib 1993 JASA y i p i ind B(p i ) p i f = Φ(µ + f(x i )) where f prior BART { I(, 0) if y i = 0 z i y i, f N(f(x i ), 1) I(0, ) if y i = 1 f z i, y i d = f zi [y f] = N i=1 p yi i (1 p i) 1 yi Likelihood Continuous BART with unit variance, σ 2 = 1, and z i are the data

5 Friedman s partial dependence function Friedman 2001 AnnStat f(x) = f(x S, x C ) BART function where x = [x S, x C ] f(x S ) = E xc [f(x S, x C ) x S ] N 1 i f(x S, x ic ) f m (x S ) N 1 i f m (x S, x ic ) ˆf(x S ) M 1 m f m (x S )

6 pbart and mc.pbart input and output post <- pbart(x.train, y.train,..., ndpost=m, keepevery=1) or post <- mc.pbart(x.train, y.train,..., ndpost=m, keepevery=1, mc.cores=2, seed=99) Input matrices: x.train and, optionally, x.test: x i x 1 x 2.. x N Output object, post, of type pbart which is essentially a list Matrices: post$yhat.train and post$yhat.test: ŷ im = f m (x i ) ŷ ŷ N1... ŷ 1M... ŷ NM

7 predict.pbart input and output pred <- predict(post, x.test, mc.cores=1,...) Input matrices: x.test: x i x 1 x 2.. x Q Output matrix pred: ŷ im = f m (x i ) ŷ ŷ Q1... ŷ 1M... ŷ QM

8 Live demo: chronic spine pain and obesity system.file( demo/nhanes.pbart1.r, package= BART ) system.file( demo/nhanes.pbart2.r, package= BART )

9 Friedman s partial dependence function: Probability of chronic neck pain vs. BMI Φ(f(x)) BMI Unweighted NHANES chronic neck pain: M(blue) vs. F(red)

10 Friedman s partial dependence function: Probability of chronic lower back/buttock pain vs. BMI Φ(f(x)) BMI Unweighted NHANES chronic low back/buttock pain: M(blue) vs. F(red)

11 Logistic BART for binary outcomes Logistic regression: Holmes & Held 1993 Bayesian Analysis Devroye 1986 Non-uniform random variate generation y i p i ind B(p i ) p i f = Φ(f(x i )) where f prior BART z i y i, f N ( f(x i ), σi 2 ) { I(, 0) if y i = 0 I(0, ) if y i = 1 σ 2 i = 4ψ 2 i where ψ i Kolmogorov-Smirnov (see Devroye) Continuous BART with heteroskedastic variance and z i is the data

12 Geweke convergence diagnostics for binary BART Hastings 1970 Biometrika, Silverman 1986 Chapman and Hall ˆθ M = M 1 M θ m m=1 Bayesian estimator σ 2ˆθ = ] [ˆθ lim V M M Asymptotic variance Suppose θ m is an ARMA (p, q) γ(w) = (2π) 1 m= V [θ 0, θ m ] e imw Spectral density ˆσ 2ˆθ = ˆγ 2 (0) Variance estimator

13 Geweke convergence diagnostics for binary BART Geweke 1992 Bayesian Statistics Divide your chain into two segments: A and B m A = {1,..., M A } where M A = am m B = {M M B + 1,..., M} where M B = bm a + b < 1, Geweke suggests a = 0.1 and b = 0.5 ˆθ A = M 1 A m A θ m ˆθ B = M 1 B m B θ m ˆσ 2ˆθ A = ˆγ 2 m A (0) ˆσ2ˆθ B = ˆγ 2 m B (0) z = M(ˆθ A ˆθ B ) a 1ˆσ 2ˆθA + b 1ˆσ 2ˆθB N(0, 1)

14 Geweke convergence diagnostics for binary BART We have a z i corresponding to each θ i = h(f(x i )) In the BART R package, we created the gewekediag function which was adapted from the coda R package Plummer, Best et al system.file( demo/geweke.pbart2.r, package= BART ) & system.file( demo/geweke.pbart3.r, package= BART )

15 Geweke convergence diagnostics for binary BART: simulated data scenario system.file( demo/geweke.pbart2.r, package= BART ) & system.file( demo/geweke.pbart3.r, package= BART ) N = 100, 1000, K = 50 f(x i ) = sin(πx 1i x 2i ) + 2(x 3i 0.5) 2 + x x 5 z i N(f(x i ), 1) y i = I(z i > 0)

16 Geweke convergence diagnostics for binary BART: N = 100 partial dependence function acf x lag Φ(f(x)) z m N:100, k: i N:100, k:50

17 Geweke convergence diagnostics for binary BART: N = 1000 partial dependence function acf x lag Φ(f(x)) z m N:1000, k: i N:1000, k:50

18 Geweke convergence diagnostics for binary BART: N = partial dependence function acf x lag Φ(f(x)) z m N:10000, k: i N:10000, k:50

19 Geweke convergence diagnostics for binary BART: convergence countermeasures for N = system.file( demo/geweke.pbart3.r, package= BART ) Validation set is a 10% random sample of the training set Randomly partition the training set into 10 sets of N = 1000 Analyze the 10 training sets independently with BART Combine/stack the 10 validation set MCMC chains Perform convergence diagnostics and inference on the combined validation chains

20 Geweke convergence diagnostics for binary BART: N = without countermeasures partial dependence function acf x lag Φ(f(x)) z m N:10000, k: i N:10000, k:50

21 Geweke convergence diagnostics for binary BART: N = with countermeasures partial dependence function acf x lag Φ(f(x)) z m N:10000, k: i N:10000, k:50

22 Geweke convergence diagnostics for binary BART: N = 1000 without countermeasures partial dependence function acf x lag Φ(f(x)) z m N:1000, k: i N:1000, k:50

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of