Outline Bayesian Methods for Highly Correlated Exposures: An Application to Disinfection By-products and Spontaneous Abortion November 8, 2007
Outline Outline 1 Introduction
Outline Outline 1 Introduction 2
Outline Outline 1 Introduction 2 3
Outline Outline 1 Introduction 2 3 4
DBPs and SAB Right From the Start Outline 1 Introduction 2 3 4
DBPs and SAB Right From the Start Spontaneous Abortion (SAB) Pregnancy loss prior to 20 weeks gestation Very common (> 30% of all pregnancies) Relatively little known about its causes maternal age, smoking, prior pregnancy loss, occupational exposures, caffeine
DBPs and SAB Right From the Start Spontaneous Abortion (SAB) Pregnancy loss prior to 20 weeks gestation Very common (> 30% of all pregnancies) Relatively little known about its causes maternal age, smoking, prior pregnancy loss, occupational exposures, caffeine disinfection by-products (DBPs)?
DBPs and SAB Right From the Start Disinfection By-Products A vast array of DBPs are formed in the disinfection process We focus on 2 main types: Trihalomethanes (THMs) CHCl 3, CHBr 3, CHCl 2 Br, CHClBr 2
DBPs and SAB Right From the Start Disinfection By-Products A vast array of DBPs are formed in the disinfection process We focus on 2 main types: Trihalomethanes (THMs) CHCl 3, CHBr 3, CHCl 2 Br, CHClBr 2 Haloacetic Acids (HAAs) ClAA, Cl 2 AA, Cl 3 AA, BrAA, Br 2 AA, Br 3 AA, BrClAA, Br 2 ClAA, BrCl 2 AA
DBPs and SAB Right From the Start DBPs and SABs Early Studies Noted an increased risk of SAB with increased tap-water consumption
DBPs and SAB Right From the Start DBPs and SABs Early Studies Noted an increased risk of SAB with increased tap-water consumption More Recent Studies Increased risk of SAB with exposure to THMs Notably, CHBrCl 2 in Waller et al (1998) OR=2.0 (1.2, 3.5)
DBPs and SAB Right From the Start Specific Aim To estimate the effect of each of the 13 constituent DBPs (4 THMs and 9 HAAs) on SAB
DBPs and SAB Right From the Start Specific Aim To estimate the effect of each of the 13 constituent DBPs (4 THMs and 9 HAAs) on SAB The Problem DBPs are very highly correlated e.g., ρ=0.91 between Cl 2 AA and Cl 3 AA
DBPs and SAB Right From the Start RFTS - briefly 2507 enrolled in three metropolitan areas in U.S. Years: 2001-2004 Recruitment Prenatal care practices (52%) Health departments (32%) Promotional mailings (3%) Drug stores, referrals, etc. (13%)
DBPs and SAB Right From the Start Eligibility criteria 18 years lived in area served by 1 of the water utilities not using assisted reproductive technology positive pregnancy test intended to carry to term intended to remain in area
DBPs and SAB Right From the Start Data Collection Baseline Interview demographic information, medical history, other confounders Pregnancy loss self report or chart abstraction DBP concentration Disinfecting utilities Weekly samples at two sites with high DBPs Every other week at thirdy site with low DBPs
Outline Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) 1 Introduction 2 3 4
Preliminary Analysis Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Discrete time hazard model including all 13 DBPs Time to event: gestational weeks until loss DBP concentrations were measured weekly, included as time-varying covariates Allow for non-linear relationships (crudely) by categorizing DBPs (now have 32 coefficients)
Hazard Model Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) logit{pr(t i = j T i j, )} = α j + γ 1 z 1i + + γ p z pi + Where, β 1 x 1ij + + β 32 x 32ij α j s are week specific intercepts (weeks 5... 20) z 1i... z pi are confounders: smoking, alcohol use, ethnicity, maternal age x kij is the concentration of the k th category of DBP for the i th individual in the j th week
of frequentist analysis Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2)
of frequentist analysis Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Several large but imprecise effects are seen 4 of 32 coefficients are statistically significant Imprecision makes us question results Is there a better analytic approach?
Other common options Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Try all exposure in one model Problem: unreliable estimates
Other common options Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Try all exposure in one model Problem: unreliable estimates Combine variables in aggregate scores Problem: difficult to interpret, can mask effects
Other common options Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Try all exposure in one model Problem: unreliable estimates Combine variables in aggregate scores Problem: difficult to interpret, can mask effects Analyze one variable at a time Problem: uncontrolled confounding
Alternative Approaches Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Bayesian Parametric Models Semi-Bayes (model P1) Fully-Bayes (model P2) Bayesian Semi-Parametric Models Dirichlet process priors (model SP1) Dirichlet process models with selection component (model SP2)
Model P1 Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) A simple two-level hierarchical model popularized by Greenland Have seen use in nutritional, genetic, occupational, and cancer epidemiology Despite the name, they are Bayesian models. name may refer to asymptotic methods commonly used in fitting semi-bayes models
Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Model y i N(x i ) β j N(β 0, φ 2 ) Posterior β N(Ê, V ) V = (X X/σ 2 + I/φ 2 ) 1 Ê = V (X y/σ 2 + β 0 /φ 2 ) y i could be dichotomous and we could use a data augmentation scheme to impute a normal continuous latent variable y i
Shrinkage Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Bayesian: natural consequence of combining prior with data Frequentist: introduce bias to reduce MSE (biased but more precise) Amount of shrinkage depends on prior variance
Shrinkage in model P1 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2)
Problems with model P1 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Hypothesis testing Have been advocated as a way to reduce problems of multiple comparisons Unfortunately, reduction in type I error rate is typically small
Type I error rate with model P1 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2)
More Problems with model P1 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Assumes the prior variance, φ 2, is known with certainty constant shrinkage of all coefficients Sensitivity analyses address changes to results with different prior variances Data contain information on prior variance
Model P2 Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Places prior distribution on φ 2 reduces dependence on prior variance Model Specification y i N(x i ) β N(β 0, φ 2 ) φ 2 IG(α 1, α 2 ) Could place prior on µ in some instances as well
Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) What s an inverse-gamma distribution?
Properties of model P2 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Prior distribution on φ 2 allows it to be updated by the data As variability of estimates from prior mean increases, so does φ 2 As variability of estimates from prior mean decreases, so does φ 2 Adaptive shrinkage of all coefficients
Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Model Specification y i N(x i ) β N(β 0, φ 2 ) φ 2 IG(α 1, α 2 ) Conditional Posteriors β y, φ 2 N(Ê, V ) φ 2 β IG(α 1 + p/2, α 2 + (β β 0 ) (β β 0 )/2) V = (X X/σ 2 + I/φ 2 ) 1 Ê = V (X y/σ 2 + β 0 /φ 2 )
Adaptive Shrinkage of model P2 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Model Prior variance φ 2, Data Shrinkage P1 Fixed Constant Constant P2 Random
Adaptive Shrinkage of model P2 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Model Prior variance φ 2, Data Shrinkage P1 Fixed Constant Constant P2 Random
The Problem with Model P2 Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) How sure are we of our parametric specification of the prior? Can we do better by grouping coefficients into clustering and then shrinking the cluster specific coefficients separately? Amount of shrinkage varies by coefficient
Clustering Coefficients Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2)
DPP - Model SP1 Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Popular Bayesian non-parametric approach Rather than specifying that β j N(µ, φ 2 ), we specify β j D D is an unknown distribution D needs a prior distribution: D DP(λ, D 0 ) D 0 is a base distribution such as N(µ, φ 2 ) λ is a precision parameter. As λ gets large, D converges to D 0 Sample space Ω with support over D 0 Chop Ω into disjoint Borel sets B 1, B 2,..., B J Then, specifying D DP(αD 0 ) prior implies: (D(B 1 ), D(B 2 ),..., D(B J )) Dirichlet(αD 0 (B 1 ), αd 0 (B 2 ),..., αd 0 (B J ))
Random Sample from a DP Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2)
Dirichlet Process Priors Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) β j D, D DP(λ, D 0 ), D 0 N(µ, φ 2 ) This prior implies: β j Data λ λ+k 1 D 0 + 1 λ+k 1 i j δ β i β j has a probability of being clustered with any other coefficient Number of expected clusters (asymptotically): λlog(1 + n/λ) Larger values of λ indicate more certainty about the distribution of β j (more clusters) Smaller values of λ indicate less certainty about the distribution of β j (fewer clusters)
Dirichlet Process Prior Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2)
Clustering Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Consider two parameters, β m and β n, that are equal with probability p mn If β m = β n (p mn = 1), we can use both x im and x in to estimate the common parameter β mn We have twice as much data to estimate the parameter of interest More commonly p mn < 1, so β m adds some information when estimating β n (and vice versa) Part of the reason this method performs so well, is that deciding the probability of clustering is very cheap but the payoff is potentially huge (think of df)
Posterior computation Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) y i N(α + Xβ, σ 2 ) β j D; D DP(λD 0 ); D 0 N(µ, φ 2 ) Conditional Posterior β j β (j), y w 0j D j0 + w kj δ βk k j w oj λ N(β j µ, φ 2 )N(y X j β j, σ 2 ); w kj N(y X j β k, σ 2 ) D j0 N(y X j β k, σ 2 )N(β j µ, φ 2 ) y i = y i α x (j) i β (j)
Model SP2 Introduction Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) Minor modification to Dirichlet process prior model We may desire a more parsimonious model If some DBPs have no effect, would prefer to eliminate them from the model forward/backward selection result in inappropriate confidence intervals
Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) We incorporate a selection model in the Dirichlet Process base distribution: D 0 = πδ 0 + (1 π)n(µ, φ 2 ) π is the probability that a coefficient has no effect (1 π) is the probability that it is N(µ, φ 2 )
Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2) A coefficient is equal to zero (no effect) with probability π A priori, we expect this to happen (π100%) of the time We place a prior distribution on π to allow the data to guide inference
DPP with variable selection Model Model P1 (semi-bayes) Model P2 (fully Bayes) Dirichlet Process Prior Model (SP1) DPP with selection component (SP2)
Simulations Hierarchical Models and RFTS Outline 1 Introduction 2 3 4
Simulations Hierarchical Models and RFTS Four hierarchical models, how do they compare? The increased complexity of these hierarchical models seems to make sense, but what does it gain us? Simulated datasets of size n=500
Simulations Hierarchical Models and RFTS MSE of Hierarchical Models
Simulations Hierarchical Models and RFTS Model P1 logit{pr(t i = j t i j, )} = α j + β 1 x 1i + + β k x ki β j N(µ, φ 2 ) Little prior evidence of effect: specify µ = 0 Calculate φ 2 from existing literature Largest observed effect: OR=3.0 φ 2 = (ln(3) ln(1/3))/(2 1.96) = 0.3142
Simulations Hierarchical Models and RFTS Model P1 -
Simulations Hierarchical Models and RFTS Model P2 logit{pr(t i = j t i j, )} = α j + β 1 x 1i + + β k x ki β j N(µ, φ 2 ) φ 2 IG(α 1, α 2 ) µ = 0 φ 2 is random. Choose α 1 = 3.39, α 2 = 1.33 E(φ 2 ) = 0.31 (as in model P1) V (φ 2 ) = 0.07 (at the 95 th percentile of φ 2, 95% of β s will fall between OR=6 and OR=1/6... the most extreme results we believe possible)
Simulations Hierarchical Models and RFTS Model P2 -
Simulations Hierarchical Models and RFTS Model SP1 logit{pr(t i = j t i j, )} = α j + β 1 x 1i + + β k x ki β j D D DP(λ, D 0 ) D 0 N(µ, φ 2 ) λ G(ν 1, ν 2 ) D 0 N(µ, φ 2 ) φ 2 IG(α 1, α 2 ) µ = 0, α 1 = 3.39, α 2 = 1.33 ν 1 = 1, ν 2 = 1, an uninformative prior for λ
Simulations Hierarchical Models and RFTS Model SP1 -
Simulations Hierarchical Models and RFTS Model SP2 logit{pr(t i = j t i j, )} = α j + β 1 x 1i + + β k x ki β j D D DP(λ, D 0 ) D 0 N(µ, φ 2 ) λ G(ν 1, ν 2 ) D 0 πδ 0 + (1 π)n(µ, φ 2 ) φ 2 IG(α 1, α 2 ) π beta(ω 1, ω 2 ) µ = 0, α 1 = 3.39, α 2 = 1.33, ν 1 = 1, ν 2 = 1 ω 1 = 1.5, ω 2 = 1.5, so E(π) = 0.5 and 95%CI=(0.01, 0.99)
Simulations Hierarchical Models and RFTS Model SP2 -
Future/Current Research Outline 1 Introduction 2 3 4
Future/Current Research Hierarchical Models Semi-Bayes: Assumes β random Fully-Bayes: Assumes φ 2 random Dirichlet Process: Assumes prior distribution is random Dirichlet Process with Selection Component: Assumes prior distribution is random and allows coefficients to cluster at the null Can improve performance (MSE) with increasing complexity
Future/Current Research Hierarchical Models Trade off between performance and difficulty Semi and Fully Bayes are easily implemented in Winbugs (exact distribution) Dirichlet process priors require more programming skill, but often have better MSE
Future/Current Research DBPs and SAB Model P1 provided the least shrinkage; Dirichlet Process models, the most These results in contrast to previous research (sort of) Very little evidence of an effect of any constituent DBP on SAB
Future/Current Research Dimension reduction in genomics Big problem in genotyping research: P >> N Typical frequentist models won t work in this situation Generally rely on FDR approach: tradeoff between per-comparison error rate and family-wise error rate... but you still need a model to generate your p-values We cluster SNP effects to reduce dimension and use a Double Exponential prior in our base distribution
Future/Current Research DP for dimension reduction
Future/Current Research Non parametric functional data analysis Functional data is common in epidemiology Use DP prior to fit flexible random curves to data Individual curves may be clustered over all or part of the curve to aid in inference and prediction We use hierarchical DP prior to allow global or local clustering of curves
Future/Current Research Functional data analysis
Future/Current Research The End