EPSE 594: Meta-Analysis: Quantitative Research Synthesis

Similar documents
EPSE 592: Design & Analysis of Experiments

Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods

Practical Meta-Analysis -- Lipsey & Wilson

A Primer on Statistical Inference using Maximum Likelihood

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

A re-appraisal of fixed effect(s) meta-analysis

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health

Chapter 1 Statistical Inference

Bridging Meta-Analysis and Standard Statistical Methods

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Review of Statistics 101

CPSC 540: Machine Learning

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Introduction to Statistical Analysis

PMR Learning as Inference

Poisson regression: Further topics

Plausible Values for Latent Variables Using Mplus

Stat 5101 Lecture Notes

Design of Experiments. Factorial experiments require a lot of resources

Inverse-variance Weighted Average

Harvard University. Rigorous Research in Engineering Education

Example name. Subgroups analysis, Regression. Synopsis

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Effect and shrinkage estimation in meta-analyses of two studies

Describing Change over Time: Adding Linear Trends

Estimation and Centering

Conditional probabilities and graphical models

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

L6: Regression II. JJ Chen. July 2, 2015

CPSC 540: Machine Learning

STA 4273H: Sta-s-cal Machine Learning

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Statistical Inference: Uses, Abuses, and Misconceptions

Heterogeneity issues in the meta-analysis of cluster randomization trials.

review session gov 2000 gov 2000 () review session 1 / 38

Business Statistics. Lecture 10: Course Review

Bayesian methods in economics and finance

GS Analysis of Microarray Data

Regression models. Categorical covariate, Quantitative outcome. Examples of categorical covariates. Group characteristics. Faculty of Health Sciences

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Robustness and Distribution Assumptions

Math 10 - Compilation of Sample Exam Questions + Answers

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Applied Regression Analysis. Section 2: Multiple Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

What is a meta-analysis? How is a meta-analysis conducted? Model Selection Approaches to Inference. Meta-analysis. Combining Data

A Bayesian Treatment of Linear Gaussian Regression

multilevel modeling: concepts, applications and interpretations

Finding Relationships Among Variables

MS&E 226: Small Data

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Do not copy, post, or distribute

Chapter 18. Sampling Distribution Models /51

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

General Principles Within-Cases Factors Only Within and Between. Within Cases ANOVA. Part One

Online Supplementary Material. MetaLP: A Nonparametric Distributed Learning Framework for Small and Big Data

Bayesian Linear Regression

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Fitting a Straight Line to Data

Two-sample Categorical data: Testing

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Simple logistic regression

Multivariate Statistical Analysis

ECON3150/4150 Spring 2015

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

Lecture 32: Asymptotic confidence sets and likelihoods

Course Review. Kin 304W Week 14: April 9, 2013

Sampling Distributions: Central Limit Theorem

Categorical and Zero Inflated Growth Models

VARIANCE COMPONENT ANALYSIS

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

f rot (Hz) L x (max)(erg s 1 )

PRINCIPAL COMPONENTS ANALYSIS

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Business Statistics. Lecture 9: Simple Regression

CPSC 340: Machine Learning and Data Mining

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Lecture 3: Just a little more math

Lecture 2: Poisson and logistic regression

A Re-Introduction to General Linear Models (GLM)

Measures of Association and Variance Estimation

Regression and the 2-Sample t

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Generalized linear models

MATH 1150 Chapter 2 Notation and Terminology

One-sample categorical data: approximate inference

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Transcription:

EPSE 594: Meta-Analysis: Quantitative Research Synthesis Ed Kroc University of British Columbia ed.kroc@ubc.ca January 24, 2019 Ed Kroc (UBC) EPSE 594 January 24, 2019 1 / 37

Last time Composite effect sizes: inverse variance weighting The basic meta-analysis models: Fixed effect Random effect Introduction to the homogeneity assumption Ed Kroc (UBC) EPSE 594 January 24, 2019 2 / 37

Today Review of fixed vs. random effect models Quantifying heterogeneity Prediction Introduction to meta-regression (subgroup analysis) [mixed effects models] Ed Kroc (UBC) EPSE 594 January 24, 2019 3 / 37

Fixed vs. random effects Big difference between: an intervention that consistently increases, say, retention of course materials by 40% [fixed effect], another intervention that increases retention by 40% on average, with a treatment effect ranging between 10% and 70% depending on the study/subpopulation [random effect]. Fixed effects models: Generally take less data to produce precise estimates,...but rely on stronger assumptions, so are more prone to spurious conclusions. Weight larger studies more heavily. Random effects models: More general, more conservative, usually more realistic,...but generally require more data to produce precise estimates. Weight all studies more equally since we assume more variance in the model. Ed Kroc (UBC) EPSE 594 January 24, 2019 4 / 37

Fixed effects model The basic fixed effects model: pθ k θ ` ε k, where θ p k is the estimated effect size from study k, θ is the fixed, true effect in the population, and ε k Np0, σk 2 q captures the sampling error from study k, with unknown variance σk 2. Given our estimates pθ p k, pσ k 2q of pθ k, σk 2 q, k 1,..., N, we can derive the maximum-likelihood estimate for the population effect θ to get: ř N k 1 pθ FE w kθ p k ř N k 1 w, Varp x θfe p q «k 1 ř N k 1 w, k where w k 1{pσ 2 k. Ed Kroc (UBC) EPSE 594 January 24, 2019 5 / 37

Fixed effects model A pictorial representation of the basic fixed effects model: pθ k θ ` ε k Note that in this graphic, v k σ 2 k. Ed Kroc (UBC) EPSE 594 January 24, 2019 6 / 37

Fixed effects model Fixed effects model relies on the homogeneity assumption: θ 1 θ N θ, where θ denotes the common population effect size. We saw one way to test this assumption: via Cochran s Q statistic (an asymptotic chi-squared test): Q Nÿ k 1 pθ p k pθ FE q 2, where we conclude evidence against the null (against homogeneity) if Q ą χ 2 1 αpn 1q. pσ 2 k Ed Kroc (UBC) EPSE 594 January 24, 2019 7 / 37

Random effects More generally, a random effect is a quantity that varies over sample units. A fixed effect, on the other hand, remains fixed across sample units (Kreft & De Leeuw 1998). In experimental design, we think of fixed effects as those that have values that are either (1) fixed by the experimenter, or (2) exhausted by the experimental design (Green & Tukey 1960). Another common definition is that effects are fixed if they are interesting in themselves, and random if there is interest in some greater population (Searle, Casella & McCulloch 1992). The distinction is not always obvious in practice. Moreover, the distinction is largely dependent on what you are interested in studying. Ed Kroc (UBC) EPSE 594 January 24, 2019 8 / 37

Random effects model The basic random effects model: pθ k θ ` u k ` ε k, where θ p k and ε k Np0, σk 2 q are as before, but θ is now the mean effect in the population and u k Np0, τ 2 q. Also, u k is assumed to be independent of ε k. The random effects model does not assume homogeneity. So each p θ k estimates a (sub)population effect θ k for study k, but θ k Np0, τ 2 q. So we can capture two sources of variation: Sampling variation as in the fixed effects model Structural variation since the (true) effects themselves are random draws from some greater population of effects. Ed Kroc (UBC) EPSE 594 January 24, 2019 9 / 37

Random effects model A pictorial representation of the basic random effects model is: Note, in the graphic, v k σ 2 k. pθ k θ ` u k ` ε k Ed Kroc (UBC) EPSE 594 January 24, 2019 10 / 37

Random effects model: estimation The random effect estimate of θ (now the mean effect in the population) is still a weighted average of the individual study effect estimates p θ k : but now w k 1{ppσ 2 k ` pτ 2 q. ř N k 1 pθ RE w kθ p k ř N k 1 w, k Note that this estimator takes into account both within-study variation pσ 2 k, and between-study variation pτ 2. Analogous to before, the estimated (approximate asymptotic) variance of this estimator is xvarp p θ RE q «1 ř N k 1 w k 1 ř N k 1 1{ppσ2 k ` pτ 2 q Ed Kroc (UBC) EPSE 594 January 24, 2019 11 / 37

Random effects model: between-study variation One critical quantity we are interested in that was not present in the fixed effects case is an estimate pτ 2 of the between-study variance τ 2. Many different methods for estimating this. We won t go into the math, but the idea is similar to an ANOVA partition of between-study and within-study sum of squares. Also, many different tests for H 0 : τ 2 0. The most basic is an F -test that is directly analogous to an ANOVA F -test of a between-treatment effect. The study of between-study variation is the study of heterogeneity and is often of critical interest. Ed Kroc (UBC) EPSE 594 January 24, 2019 12 / 37

Heterogeneity Interested in quantifying how much the true effect sizes vary in our (meta)-population; i.e we would like to estimate τ 2. But remember: always have sampling error present from each study; this is not heterogeneity. Would like a way of separating the sampling variability (natural sampling error) from the variability of the true effects (heterogeneity). Accomplish this by: (1) estimate total amount of between-study variation, (2) estimate amount of between-study variation we would expect under the homogeneity assumption, (3) then excess variation should reflect heterogeneity. Ed Kroc (UBC) EPSE 594 January 24, 2019 13 / 37

Heterogeneity Start with Cochran s Q statistic: Q Nÿ k 1 pθ p k pθq 2, where p θ is the summary (combined) effect from our model. This is our (weighted) estimate of total between-study variation. Now assume that homogeneity holds: θ 1 θ N θ. Then Q is (asymptotically) χ 2 pn 1q, and we know that the expected amount of between-study variation is then N 1. So a natural way to estimate excess variation (i.e. variation not attributable to sampling error) is to consider pσ 2 k Q pn 1q. Ed Kroc (UBC) EPSE 594 January 24, 2019 14 / 37

Heterogeneity This is a natural way to estimate excess variation (i.e. variation not attributable to sampling error): But, a couple problems with this: Q pn 1q highly sensitive to total number of studies, N, and to within-study sample size not on an easily interpretable scale can be negative Ed Kroc (UBC) EPSE 594 January 24, 2019 15 / 37

Heterogeneity To fix these issues, we can use the moment-based estimate: " * pτ 2 T 2 Q pn 1q max 0,, C where 1 ř N N k 1 w k 1 ř 2 N N k 1 w k C 1 ř N N k 1 w k Notice that C is really an estimate of the sample variance of the weights w k, standardized by their sample mean. Now T is on the same scale (units) as the outcome/effect of interest. Also, T will not automatically increase with more studies, or with more within-study sample size. Ed Kroc (UBC) EPSE 594 January 24, 2019 16 / 37

Heterogeneity Remember: since T 2 is a statistic (and so a random variable), it has an associated measure of uncertainty. Thus, can derive estimates of VarpT 2 q and then also derive confidence intervals for T 2 ; see Borenstein pp. 122-24 for formulas. WARNING: Jamovi will output an estimate of the standard error (SE) of T 2, but do not just take 2 SE to form a 95% confidence interval. This trick only works if the test statisic is approximately normally distributed (or if we can appeal to the CLT). In practice, T 2 is very far from normally distributed. Ed Kroc (UBC) EPSE 594 January 24, 2019 17 / 37

Heterogeneity There are many ways to derive an estimate of pτ 2, but this method-of-moments (DerSimonian-Laird) method (T 2 ) is the most common (see pictures in Borenstein pp. 115, 116). Many estimation options in Jamovi. Should usually be good to use DerSimonian-Laird or restricted maximum likelihood estimate (has nice statistical properties). Rarely advisable to use simple maximum likelihood to fit RE-model and estimate τ. Personally, I would recommend avoiding empirical Bayes as well unless you know something about Bayesian analysis (workshop next month!). Ed Kroc (UBC) EPSE 594 January 24, 2019 18 / 37

Heterogeneity In many cases, the exact choice of estimation method won t make a huge difference. However, exceptions do exist (and if you find yourself in such a situation you should consult a statistician). Classic example by Snedecor and Cochran (1967): number of conceptions recorded from artificial insemination of six bulls. Ed Kroc (UBC) EPSE 594 January 24, 2019 19 / 37

Heterogeneity Different estimation methods result in very different estimates of τ 2 : Estimation method Est. of τ 2 Est. of SE(τ 2 ) I 2 DerSimonian-Laird 64.80 64.07 66% Hedges 89.68 84.67 73% Hunter-Schmidt 46.80 41.85 58% Sidik-Jonkman 89.89 60.13 73% Maximum likelihood (ML) 51.35 49.62 61% Restricted ML 72.83 69.07 69% Empirical Bayes 81.40 75.18 71% However, note that Cochran s Q-statistic is the same for all of these methods, since Q is computed independently of any estimate of τ 2. Ed Kroc (UBC) EPSE 594 January 24, 2019 20 / 37

Tests for heterogeneity We have already seen Cochran s Q-statistic as a test for homo/heterogeneity. But many others exist. Two of the most common are: Reported in Jamovi: Not reported in Jamovi: H 2 Q N 1 R 2 τ 2 ` σ 2 σ 2, where σ 2 is a typical within-study variance. WARNING: Jamovi does report something called an R 2 when you specify a moderator in the meta-analysis (subgroup analysis or meta-regression), but confusingly, that R 2 is not the same thing as this R 2 test of homogeneity. Ed Kroc (UBC) EPSE 594 January 24, 2019 21 / 37

Relative measures of heterogeneity A very common relative measure of heterogeneity is captured by the I 2 -statistic: " * I 2 Q pn 1q max 0,. Q Notice that I 2 compares our rough measure of excess variation to our measure of total variation (akin to the definition of reliability). In this way, it tries to do the same thing as the traditional R 2 in regression (yes, another R 2 statistic). Notice that, like the traditional R 2, we always have 0 ď I 2 ď 1. I 2 is independent of the number of studies N, but is affected by the precision (sample sizes) of the studies. Ed Kroc (UBC) EPSE 594 January 24, 2019 22 / 37

Relative measures of heterogeneity A very common relative measure of heterogeneity is captured by the I 2 -statistic: " * I 2 Q pn 1q max 0,. Q Rough interpretaion: the closer I 2 is to 100%, the more variation is explained by the random effect (differences in true effect sizes, u k ) rather than just sampling error, ε k ; i.e., the more between-study variation rather than within-study variation. WARNING: however, just as with R 2 in ANOVA/regression, this interpretation breaks down if the data (effect sizes) are not actually normally distributed about their mean. Ed Kroc (UBC) EPSE 594 January 24, 2019 23 / 37

Example 1 Recall Cohen meta-analysis from last time: partial correlation coefficients between section mean instructor rating and section mean final exam score, controlling for student ability. Again, note that R 2 in this Jamovi output is not the same R 2 -statistic we previously defined. Jamovi s R 2 -statistic will be relevant for subgoup analysis (meta-regression). Ed Kroc (UBC) EPSE 594 January 24, 2019 24 / 37

Example 2 Meta-analysis by Colditz et al. (1994). In these 13 trials, the effect of Bacillus Calmette-Guérin (BCG) vacciantion was investigated on the prevention of tuberculosis. Effect size measured is relative risk. Study results are on a logarithmic scale. Ed Kroc (UBC) EPSE 594 January 24, 2019 25 / 37

Example 2 Vaccinated Not vaccinated Trial Disease No disease Disease No disease Risk ratio 1 4 119 11 128 0.39 2 6 300 29 274 0.19* 3 3 228 11 209 0.25* 4 62 13,536 248 12,619 0.23* 5 33 5,036 47 5,761 0.80 6 180 1,361 372 1,079 0.38* 7 8 2,537 10 619 0.20* 8 505 87,886 499 87,892 1.01 9 29 7,470 45 7,232 0.62* 10 17 1,699 65 1,600 0.25* 11 186 50,448 414 27,197 0.24* 12 5 2,493 3 2,338 1.56 13 27 16,886 29 17,825 0.98 Ed Kroc (UBC) EPSE 594 January 24, 2019 26 / 37

Example 2 Ed Kroc (UBC) EPSE 594 January 24, 2019 27 / 37

Prediction Notice that so far we have generated the following: estimate of mean true effect size (random effects model) or of true effect size (fixed effects model) estimate of variation of true effect sizes (random effects model). Graphically, we have ways of representing the mean true effect size and its associated uncertainty (the diamond in a forest plot), but how can we easily express the information contained in our estimate of τ? ANSWER: create a prediction interval. Ed Kroc (UBC) EPSE 594 January 24, 2019 28 / 37

Prediction You may remember prediction intervals from regression. They look superficially similar to confidence intervals, but represent very different things. Recall: a 95% confidence interval for a statistic is a measure of uncertainty about the value of that statistic. Technically, we say that if we were to repeat the same experiment (meta-experiment) many times, then about 95% of the resulting 95% confidence intervals would contain the true effect being estimated by the statistic. In contrast, a 95% prediction interval represents our uncertainty for how the true effects are distributed about the statistic. In the context of meta-analysis, in about 95% of new studies, the true effect of that study will fall inside the 95% prediction interval (given that all modelling assumptions hold). Ed Kroc (UBC) EPSE 594 January 24, 2019 29 / 37

Prediction A 100 p1 αq% prediction interval for the true effect sizes is given by: rθ z 1 α{2 τ, θ ` z 1 α{2 τs In practice, we need to substitute in our estimates for θ and τ, so this becomes an approximate 100 p1 αq% prediction interval. Note: while the confidence interval is based upon the variance of the estimated mean effect, Varp p θq, the prediction interval is based upon the between-study variance τ 2. Thus, as the number of studies increases, a confidence interval will automatically get tighter since Varp p θq Ñ 0....But the between-study variance τ 2 will stay (close to) fixed. The more studies we have, generally, the better estimate T 2 of τ 2 we will have. But this estimate will settle on some nonzero number (unless there is no actual between-study variation). Ed Kroc (UBC) EPSE 594 January 24, 2019 30 / 37

Prediction Consider the following diagram from Borenstein p. 132: Note: width of CI gets smaller as number of studies increases, while width of PI stays about the same. Ed Kroc (UBC) EPSE 594 January 24, 2019 31 / 37

Prediction Prediction interval for correlation between section mean instructor rating and section mean final exam score (controlling for student ability) from last time: Ed Kroc (UBC) EPSE 594 January 24, 2019 32 / 37

Subgroup analysis and meta-regression We have now done the following: Estimated a combined effect size with associated uncertainty (fixed or random effect model) Tested homogeneity assumption (fixed effect model) Quantified heterogeneity (random effect model) Predicted a plausible range of true effect sizes (random effect model) But a natural question remains: if true effect sizes are heterogenous, then what explains this heterogeneity? Ed Kroc (UBC) EPSE 594 January 24, 2019 33 / 37

Subgroup analysis and meta-regression Recall the basic random effects model: pθ k θ ` u k ` ε k, where θ p k and ε k Np0, σk 2 q are as always, θ is the mean effect in the population and u k Np0, τ 2 q captures the between-study variation. Also, u k is assumed to be independent of ε k. This basic model treats the between-study variation as a simple random component, totally explained by a simple normal random variable. However, in practice, we might often expect that such heterogeneity is explained by other study-wide factors. Ed Kroc (UBC) EPSE 594 January 24, 2019 34 / 37

Subgroup analysis and meta-regression Possible examples of informative heterogeneity: Some studies in our meta-analysis may be experimental, while others are observational. We would likely expect both more variation and more bias to arise from the observational studies. Multiple studies could have been conducted at the same lab or by the same research group. Consequently, we would expect less variation between studies coming from the same lab/group. Studies could be conducted in different provinces or countries. Stuides could be conducted at different times, so subject to different traditional protocols, government policies, etc. How do we account for these things? Ed Kroc (UBC) EPSE 594 January 24, 2019 35 / 37

Subgroup analysis and meta-regression ANSWER: work with a mixed effects model. Can simply add other explanatory factors to the basic random effects model, e.g.: pθ k θ ` Type k ` Lab k ` u k ` ε k, where Type k accounts for whether the study was experimental or observational (binary) and Lab k records which lab or research group conducted study k (categorical). The above model would allow for what is called a subgroup analysis. This is just a special case of meta-regression. Ed Kroc (UBC) EPSE 594 January 24, 2019 36 / 37

Subgroup analysis and meta-regression Can easily incorporate continuous explanatory variables (covariates) into the general mixed effects model. Can also easily incorporate other random effects if we would like (e.g. could treat Lab as a random effect in previous example). Downside: this will require lots of data. We will expore details of subgroup analysis (meta-regression) next time. Ed Kroc (UBC) EPSE 594 January 24, 2019 37 / 37