EPSE 594: Meta-Analysis: Quantitative Research Synthesis

EPSE 594: Meta-Analysis: Quantitative Research Synthesis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 24, 2019 Ed Kroc (UBC) EPSE 594 January 24, 2019 1 / 37

Last time Composite effect sizes: inverse variance weighting The basic meta-analysis models: Fixed effect Random effect Introduction to the homogeneity assumption Ed Kroc (UBC) EPSE 594 January 24, 2019 2 / 37

Today Review of fixed vs. random effect models Quantifying heterogeneity Prediction Introduction to meta-regression (subgroup analysis) [mixed effects models] Ed Kroc (UBC) EPSE 594 January 24, 2019 3 / 37

Fixed vs. random effects Big difference between: an intervention that consistently increases, say, retention of course materials by 40% [fixed effect], another intervention that increases retention by 40% on average, with a treatment effect ranging between 10% and 70% depending on the study/subpopulation [random effect]. Fixed effects models: Generally take less data to produce precise estimates,...but rely on stronger assumptions, so are more prone to spurious conclusions. Weight larger studies more heavily. Random effects models: More general, more conservative, usually more realistic,...but generally require more data to produce precise estimates. Weight all studies more equally since we assume more variance in the model. Ed Kroc (UBC) EPSE 594 January 24, 2019 4 / 37

Fixed effects model The basic fixed effects model: pθ k θ ` ε k, where θ p k is the estimated effect size from study k, θ is the fixed, true effect in the population, and ε k Np0, σk 2 q captures the sampling error from study k, with unknown variance σk 2. Given our estimates pθ p k, pσ k 2q of pθ k, σk 2 q, k 1,..., N, we can derive the maximum-likelihood estimate for the population effect θ to get: ř N k 1 pθ FE w kθ p k ř N k 1 w, Varp x θfe p q «k 1 ř N k 1 w, k where w k 1{pσ 2 k. Ed Kroc (UBC) EPSE 594 January 24, 2019 5 / 37

Fixed effects model A pictorial representation of the basic fixed effects model: pθ k θ ` ε k Note that in this graphic, v k σ 2 k. Ed Kroc (UBC) EPSE 594 January 24, 2019 6 / 37

Fixed effects model Fixed effects model relies on the homogeneity assumption: θ 1 θ N θ, where θ denotes the common population effect size. We saw one way to test this assumption: via Cochran s Q statistic (an asymptotic chi-squared test): Q Nÿ k 1 pθ p k pθ FE q 2, where we conclude evidence against the null (against homogeneity) if Q ą χ 2 1 αpn 1q. pσ 2 k Ed Kroc (UBC) EPSE 594 January 24, 2019 7 / 37

Random effects More generally, a random effect is a quantity that varies over sample units. A fixed effect, on the other hand, remains fixed across sample units (Kreft & De Leeuw 1998). In experimental design, we think of fixed effects as those that have values that are either (1) fixed by the experimenter, or (2) exhausted by the experimental design (Green & Tukey 1960). Another common definition is that effects are fixed if they are interesting in themselves, and random if there is interest in some greater population (Searle, Casella & McCulloch 1992). The distinction is not always obvious in practice. Moreover, the distinction is largely dependent on what you are interested in studying. Ed Kroc (UBC) EPSE 594 January 24, 2019 8 / 37

Random effects model The basic random effects model: pθ k θ ` u k ` ε k, where θ p k and ε k Np0, σk 2 q are as before, but θ is now the mean effect in the population and u k Np0, τ 2 q. Also, u k is assumed to be independent of ε k. The random effects model does not assume homogeneity. So each p θ k estimates a (sub)population effect θ k for study k, but θ k Np0, τ 2 q. So we can capture two sources of variation: Sampling variation as in the fixed effects model Structural variation since the (true) effects themselves are random draws from some greater population of effects. Ed Kroc (UBC) EPSE 594 January 24, 2019 9 / 37

Random effects model A pictorial representation of the basic random effects model is: Note, in the graphic, v k σ 2 k. pθ k θ ` u k ` ε k Ed Kroc (UBC) EPSE 594 January 24, 2019 10 / 37

Random effects model: estimation The random effect estimate of θ (now the mean effect in the population) is still a weighted average of the individual study effect estimates p θ k : but now w k 1{ppσ 2 k ` pτ 2 q. ř N k 1 pθ RE w kθ p k ř N k 1 w, k Note that this estimator takes into account both within-study variation pσ 2 k, and between-study variation pτ 2. Analogous to before, the estimated (approximate asymptotic) variance of this estimator is xvarp p θ RE q «1 ř N k 1 w k 1 ř N k 1 1{ppσ2 k ` pτ 2 q Ed Kroc (UBC) EPSE 594 January 24, 2019 11 / 37

Random effects model: between-study variation One critical quantity we are interested in that was not present in the fixed effects case is an estimate pτ 2 of the between-study variance τ 2. Many different methods for estimating this. We won t go into the math, but the idea is similar to an ANOVA partition of between-study and within-study sum of squares. Also, many different tests for H 0 : τ 2 0. The most basic is an F -test that is directly analogous to an ANOVA F -test of a between-treatment effect. The study of between-study variation is the study of heterogeneity and is often of critical interest. Ed Kroc (UBC) EPSE 594 January 24, 2019 12 / 37

Heterogeneity Interested in quantifying how much the true effect sizes vary in our (meta)-population; i.e we would like to estimate τ 2. But remember: always have sampling error present from each study; this is not heterogeneity. Would like a way of separating the sampling variability (natural sampling error) from the variability of the true effects (heterogeneity). Accomplish this by: (1) estimate total amount of between-study variation, (2) estimate amount of between-study variation we would expect under the homogeneity assumption, (3) then excess variation should reflect heterogeneity. Ed Kroc (UBC) EPSE 594 January 24, 2019 13 / 37

Heterogeneity Start with Cochran s Q statistic: Q Nÿ k 1 pθ p k pθq 2, where p θ is the summary (combined) effect from our model. This is our (weighted) estimate of total between-study variation. Now assume that homogeneity holds: θ 1 θ N θ. Then Q is (asymptotically) χ 2 pn 1q, and we know that the expected amount of between-study variation is then N 1. So a natural way to estimate excess variation (i.e. variation not attributable to sampling error) is to consider pσ 2 k Q pn 1q. Ed Kroc (UBC) EPSE 594 January 24, 2019 14 / 37

Heterogeneity This is a natural way to estimate excess variation (i.e. variation not attributable to sampling error): But, a couple problems with this: Q pn 1q highly sensitive to total number of studies, N, and to within-study sample size not on an easily interpretable scale can be negative Ed Kroc (UBC) EPSE 594 January 24, 2019 15 / 37

Heterogeneity To fix these issues, we can use the moment-based estimate: " * pτ 2 T 2 Q pn 1q max 0,, C where 1 ř N N k 1 w k 1 ř 2 N N k 1 w k C 1 ř N N k 1 w k Notice that C is really an estimate of the sample variance of the weights w k, standardized by their sample mean. Now T is on the same scale (units) as the outcome/effect of interest. Also, T will not automatically increase with more studies, or with more within-study sample size. Ed Kroc (UBC) EPSE 594 January 24, 2019 16 / 37

Heterogeneity Remember: since T 2 is a statistic (and so a random variable), it has an associated measure of uncertainty. Thus, can derive estimates of VarpT 2 q and then also derive confidence intervals for T 2 ; see Borenstein pp. 122-24 for formulas. WARNING: Jamovi will output an estimate of the standard error (SE) of T 2, but do not just take 2 SE to form a 95% confidence interval. This trick only works if the test statisic is approximately normally distributed (or if we can appeal to the CLT). In practice, T 2 is very far from normally distributed. Ed Kroc (UBC) EPSE 594 January 24, 2019 17 / 37

Heterogeneity There are many ways to derive an estimate of pτ 2, but this method-of-moments (DerSimonian-Laird) method (T 2 ) is the most common (see pictures in Borenstein pp. 115, 116). Many estimation options in Jamovi. Should usually be good to use DerSimonian-Laird or restricted maximum likelihood estimate (has nice statistical properties). Rarely advisable to use simple maximum likelihood to fit RE-model and estimate τ. Personally, I would recommend avoiding empirical Bayes as well unless you know something about Bayesian analysis (workshop next month!). Ed Kroc (UBC) EPSE 594 January 24, 2019 18 / 37

Heterogeneity In many cases, the exact choice of estimation method won t make a huge difference. However, exceptions do exist (and if you find yourself in such a situation you should consult a statistician). Classic example by Snedecor and Cochran (1967): number of conceptions recorded from artificial insemination of six bulls. Ed Kroc (UBC) EPSE 594 January 24, 2019 19 / 37

Heterogeneity Different estimation methods result in very different estimates of τ 2 : Estimation method Est. of τ 2 Est. of SE(τ 2 ) I 2 DerSimonian-Laird 64.80 64.07 66% Hedges 89.68 84.67 73% Hunter-Schmidt 46.80 41.85 58% Sidik-Jonkman 89.89 60.13 73% Maximum likelihood (ML) 51.35 49.62 61% Restricted ML 72.83 69.07 69% Empirical Bayes 81.40 75.18 71% However, note that Cochran s Q-statistic is the same for all of these methods, since Q is computed independently of any estimate of τ 2. Ed Kroc (UBC) EPSE 594 January 24, 2019 20 / 37

Tests for heterogeneity We have already seen Cochran s Q-statistic as a test for homo/heterogeneity. But many others exist. Two of the most common are: Reported in Jamovi: Not reported in Jamovi: H 2 Q N 1 R 2 τ 2 ` σ 2 σ 2, where σ 2 is a typical within-study variance. WARNING: Jamovi does report something called an R 2 when you specify a moderator in the meta-analysis (subgroup analysis or meta-regression), but confusingly, that R 2 is not the same thing as this R 2 test of homogeneity. Ed Kroc (UBC) EPSE 594 January 24, 2019 21 / 37

Relative measures of heterogeneity A very common relative measure of heterogeneity is captured by the I 2 -statistic: " * I 2 Q pn 1q max 0,. Q Notice that I 2 compares our rough measure of excess variation to our measure of total variation (akin to the definition of reliability). In this way, it tries to do the same thing as the traditional R 2 in regression (yes, another R 2 statistic). Notice that, like the traditional R 2, we always have 0 ď I 2 ď 1. I 2 is independent of the number of studies N, but is affected by the precision (sample sizes) of the studies. Ed Kroc (UBC) EPSE 594 January 24, 2019 22 / 37

Relative measures of heterogeneity A very common relative measure of heterogeneity is captured by the I 2 -statistic: " * I 2 Q pn 1q max 0,. Q Rough interpretaion: the closer I 2 is to 100%, the more variation is explained by the random effect (differences in true effect sizes, u k ) rather than just sampling error, ε k ; i.e., the more between-study variation rather than within-study variation. WARNING: however, just as with R 2 in ANOVA/regression, this interpretation breaks down if the data (effect sizes) are not actually normally distributed about their mean. Ed Kroc (UBC) EPSE 594 January 24, 2019 23 / 37

Example 1 Recall Cohen meta-analysis from last time: partial correlation coefficients between section mean instructor rating and section mean final exam score, controlling for student ability. Again, note that R 2 in this Jamovi output is not the same R 2 -statistic we previously defined. Jamovi s R 2 -statistic will be relevant for subgoup analysis (meta-regression). Ed Kroc (UBC) EPSE 594 January 24, 2019 24 / 37

Example 2 Meta-analysis by Colditz et al. (1994). In these 13 trials, the effect of Bacillus Calmette-Guérin (BCG) vacciantion was investigated on the prevention of tuberculosis. Effect size measured is relative risk. Study results are on a logarithmic scale. Ed Kroc (UBC) EPSE 594 January 24, 2019 25 / 37

Example 2 Vaccinated Not vaccinated Trial Disease No disease Disease No disease Risk ratio 1 4 119 11 128 0.39 2 6 300 29 274 0.19* 3 3 228 11 209 0.25* 4 62 13,536 248 12,619 0.23* 5 33 5,036 47 5,761 0.80 6 180 1,361 372 1,079 0.38* 7 8 2,537 10 619 0.20* 8 505 87,886 499 87,892 1.01 9 29 7,470 45 7,232 0.62* 10 17 1,699 65 1,600 0.25* 11 186 50,448 414 27,197 0.24* 12 5 2,493 3 2,338 1.56 13 27 16,886 29 17,825 0.98 Ed Kroc (UBC) EPSE 594 January 24, 2019 26 / 37

Example 2 Ed Kroc (UBC) EPSE 594 January 24, 2019 27 / 37

Prediction Notice that so far we have generated the following: estimate of mean true effect size (random effects model) or of true effect size (fixed effects model) estimate of variation of true effect sizes (random effects model). Graphically, we have ways of representing the mean true effect size and its associated uncertainty (the diamond in a forest plot), but how can we easily express the information contained in our estimate of τ? ANSWER: create a prediction interval. Ed Kroc (UBC) EPSE 594 January 24, 2019 28 / 37

Prediction You may remember prediction intervals from regression. They look superficially similar to confidence intervals, but represent very different things. Recall: a 95% confidence interval for a statistic is a measure of uncertainty about the value of that statistic. Technically, we say that if we were to repeat the same experiment (meta-experiment) many times, then about 95% of the resulting 95% confidence intervals would contain the true effect being estimated by the statistic. In contrast, a 95% prediction interval represents our uncertainty for how the true effects are distributed about the statistic. In the context of meta-analysis, in about 95% of new studies, the true effect of that study will fall inside the 95% prediction interval (given that all modelling assumptions hold). Ed Kroc (UBC) EPSE 594 January 24, 2019 29 / 37

Prediction A 100 p1 αq% prediction interval for the true effect sizes is given by: rθ z 1 α{2 τ, θ ` z 1 α{2 τs In practice, we need to substitute in our estimates for θ and τ, so this becomes an approximate 100 p1 αq% prediction interval. Note: while the confidence interval is based upon the variance of the estimated mean effect, Varp p θq, the prediction interval is based upon the between-study variance τ 2. Thus, as the number of studies increases, a confidence interval will automatically get tighter since Varp p θq Ñ 0....But the between-study variance τ 2 will stay (close to) fixed. The more studies we have, generally, the better estimate T 2 of τ 2 we will have. But this estimate will settle on some nonzero number (unless there is no actual between-study variation). Ed Kroc (UBC) EPSE 594 January 24, 2019 30 / 37

Prediction Consider the following diagram from Borenstein p. 132: Note: width of CI gets smaller as number of studies increases, while width of PI stays about the same. Ed Kroc (UBC) EPSE 594 January 24, 2019 31 / 37

Prediction Prediction interval for correlation between section mean instructor rating and section mean final exam score (controlling for student ability) from last time: Ed Kroc (UBC) EPSE 594 January 24, 2019 32 / 37

Subgroup analysis and meta-regression We have now done the following: Estimated a combined effect size with associated uncertainty (fixed or random effect model) Tested homogeneity assumption (fixed effect model) Quantified heterogeneity (random effect model) Predicted a plausible range of true effect sizes (random effect model) But a natural question remains: if true effect sizes are heterogenous, then what explains this heterogeneity? Ed Kroc (UBC) EPSE 594 January 24, 2019 33 / 37

Subgroup analysis and meta-regression Recall the basic random effects model: pθ k θ ` u k ` ε k, where θ p k and ε k Np0, σk 2 q are as always, θ is the mean effect in the population and u k Np0, τ 2 q captures the between-study variation. Also, u k is assumed to be independent of ε k. This basic model treats the between-study variation as a simple random component, totally explained by a simple normal random variable. However, in practice, we might often expect that such heterogeneity is explained by other study-wide factors. Ed Kroc (UBC) EPSE 594 January 24, 2019 34 / 37

Subgroup analysis and meta-regression Possible examples of informative heterogeneity: Some studies in our meta-analysis may be experimental, while others are observational. We would likely expect both more variation and more bias to arise from the observational studies. Multiple studies could have been conducted at the same lab or by the same research group. Consequently, we would expect less variation between studies coming from the same lab/group. Studies could be conducted in different provinces or countries. Stuides could be conducted at different times, so subject to different traditional protocols, government policies, etc. How do we account for these things? Ed Kroc (UBC) EPSE 594 January 24, 2019 35 / 37

Subgroup analysis and meta-regression ANSWER: work with a mixed effects model. Can simply add other explanatory factors to the basic random effects model, e.g.: pθ k θ ` Type k ` Lab k ` u k ` ε k, where Type k accounts for whether the study was experimental or observational (binary) and Lab k records which lab or research group conducted study k (categorical). The above model would allow for what is called a subgroup analysis. This is just a special case of meta-regression. Ed Kroc (UBC) EPSE 594 January 24, 2019 36 / 37

Subgroup analysis and meta-regression Can easily incorporate continuous explanatory variables (covariates) into the general mixed effects model. Can also easily incorporate other random effects if we would like (e.g. could treat Lab as a random effect in previous example). Downside: this will require lots of data. We will expore details of subgroup analysis (meta-regression) next time. Ed Kroc (UBC) EPSE 594 January 24, 2019 37 / 37