Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods

Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods Meta-analysis describes statistical approach to systematically combine results from multiple studies [identified follong an exhaustive literature review] that have addressed the same research question. Why multiple studies? If question has been clearly settled, may be unethical to conduct more RCTs But sometimes RCTs: run concurrently have inadequate sample size to detect evidence of treatment effect don t get published due to non-significance get lost in the literature address different sub-populations An exhaustive literature review [non-trivial!] can often identify similar studies, and systematically combining their results can [meta-analysis objective]: RCTs have clear protocols, often requiring such a literature review and research synthesis (meta-analysis) to justify a new RCT. Meta-Analysis Methods (presented in this handout) 1. Combining p-values Fisher s method Stouffer s method 2. Combining effect sizes Fixed Effects Random Effects Hierarchical Bayes

Approach 1: simplest & oldest use only p-values Fisher s composite testing method p 1,..., p k from k independent studies th common H 0 ( 2 log p i ) χ 2 2k Fishers null: the null in each study is true Fishers alternative: the null is false in at least one study Fishers known to be highly sensitive to very small (or very large) p-values Another way: Stouffer s method (based on a marginal note in 1949 issue of The American Soldier) Transform p-value p i to a standard normal deviate Z i (assume one-tailed test) Z S = Z i k Z i k Focuses on consensus test of nulls from multiple studies If p-values p 1,..., p k all correspond to true nulls, their distribution (and average) ll be: If some (enough) p-values correspond to (sufficiently) false nulls, their distribution (and average) ll be: Not as sensitive to very small (or very large) p-values

Effect Sizes: focus on magnitude of treatment effect Let θ i be true effect size (a standardized treatment effect) in study i, estimated by ˆθ i. Example: Two-sample mean comparison H 0 : µ 2 = µ 1 Define: θ i = µ 2,i µ 1,i σ i ˆθi = c i Ȳ2,i Ȳ1,i S p,i c i a bias correction factor such that E[ˆθ i ] = (exact form involves Γ function) c i 1 3 4(n 1 + n 2 ) 9 This ˆθ i is often referred to as d (or adjusted Hedges g); not the same as Cohen s d Example: Difference of proportions H 0 : p 1 = p 0 Let p j = P {Y = 1 T rt = j} Need a useful standardized treatment effect Y 0 1 Trt 1 a b Trt 0 c d Consider treatment effect in terms of odds ratio OR = Estimate this OR: ( ) p1 1 p ( 1 ) p0 1 p 0

But what if a or d are 0? Or b or c if we stch to odds of Y = 0? Could add 1/2 to allow for this and reduce bias: ˆθ i = log ( ) (bi + 1/2)(c i + 1/2) (a i + 1/2)(d i + 1/2) Other approaches exist, such as the Peto Method (later) Use ˆθ = log of ÔR (possibly adjusted for zero counts) do odds ratio on log scale so distribution of ˆθ is closer to normal Approach 2: Combine effect sizes Simplest way: Fixed Effects Model (weighted averages) ˆθ = ˆθi V ar[ˆθ] = 1 Choose weights to minimize V ar[ˆθ]: If ˆθ i are iid normal, then θ V ar[ˆθ] N(0, 1) and approximate 95% CI for θ is Example: Two-sample mean comparison ˆθ i = c i Ȳ2,i Ȳ1,i S p,i V ar[ˆθ i ] c 2 i (Derivation of variance involves noncentral t distribution) ( 1 + 1 ˆθ 2 ) i + n 1 n 2 2(n 1 + n 2 ) 3.94 Example: Difference of proportions (odds ratio comparison) ˆθ i = log (ÔR i ) V ar[ˆθ i ] 1 a i + 1/2 + 1 b i + 1/2 + 1 c i + 1/2 + 1 d i + 1/2 (Derivation of variance involves delta method: V ar[g(x)] (f (X)) 2 V ar[x])

Example (Steroid therapy): Look at this fixed effects model: ˆθ i = θ i + ɛ i = θ + ɛ i ɛ i N(0, σi 2 ) This is the homogeneity assumption: w i = 1/ˆσ 2 i = ( V ar[ˆθ i ] ) 1 All studies examined and provided estimates of the same parameter θ, and any differences between estimates are attributable to sample error ɛ alone. Test H 0 : θ 1 = = θ k Q = w i (ˆθi ˆθ ) 2 χ 2 k 1 In practice, this test has low power, so even if it s not significant, maybe can t safely assume homogeneity Instead, allow for slight (& unaccountable) differences among study results random effects model The Random Effects Model: ˆθ i = θ i + ɛ i = θ + δ i + ɛ i ɛ i N(0, σi 2 ) δ i N(0, τ 2 ) δ i is the between-study random effect Test of heterogeneity (above) equivalent to H 0 : Estimate τ 2 and proceed as before: w i = 1/ (ˆσ 2 i + ˆτ 2) = ( V ar[ˆθ i ] ) 1

DerSimonian-Laird approach to estimate τ 2 : the method of moments (uses quantity Q above) Q = w i (ˆθi ˆθ ) 2 χ 2 k 1 E[Q] = τ 2 ( w 2 i ) + (k 1) (get this from expected value of a quadratic form) τ 2 E[Q] (k 1) = w 2 i ˆτ 2 Q (k 1) = max w 2, 0 i A third model class is becoming more common: Hierarchical Bayes Model ˆθ i = θ i + ɛ i = θ + δ i + ɛ i ɛ i N(0, σi 2 ) δ i N(0, τ 2 ) τ π(τ) This model is particularly powerful when also accounting for dependence among study results (R package metahdep) In all three models (Fixed, Random, Hierarchical Bayes), can also account for covariates (fundamental differences between studies), coded as numeric predictor variables X i,l = predictor variable l in study i, l = 1,..., j θ i = β 0 + β 1 X i,1 + β 2 X i,2 +... + β j X i,j