Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Similar documents
On the relative efficiency of using summary statistics versus individual level data in meta-analysis

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Association studies and regression


Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Stat 5101 Lecture Notes

An accurate test for homogeneity of odds ratios based on Cochran s Q-statistic

STAT331. Cox s Proportional Hazards Model

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Maximum-Likelihood Estimation: Basic Ideas

Confidence Distribution

A re-appraisal of fixed effect(s) meta-analysis

Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy

Summary and discussion of The central role of the propensity score in observational studies for causal effects

EPSE 594: Meta-Analysis: Quantitative Research Synthesis

Open Problems in Mixed Models

Hypothesis testing:power, test statistic CMS:

Meta-analysis. 21 May Per Kragh Andersen, Biostatistics, Dept. Public Health

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

8. Instrumental variables regression

Logistic regression: Miscellaneous topics

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Machine Learning Linear Classification. Prof. Matteo Matteucci

Power and sample size calculations for designing rare variant sequencing association studies.

Survival Analysis for Case-Cohort Studies

Computational Systems Biology: Biology X

Design of Multiregional Clinical Trials (MRCT) Theory and Practice

Binary choice 3.3 Maximum likelihood estimation

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Review of Statistics 101

Chapter 1 Statistical Inference

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Pubh 8482: Sequential Analysis

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Psychology 282 Lecture #4 Outline Inferences in SLR

Biostat Methods STAT 5820/6910 Handout #9a: Intro. to Meta-Analysis Methods

COMP90051 Statistical Machine Learning

Is the cholesterol concentration in blood related to the body mass index (bmi)?

Prediction intervals for random-effects meta-analysis: a confidence distribution approach

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

Probability and Statistics

Prediction intervals for random-effects meta-analysis: a confidence distribution approach

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

14.30 Introduction to Statistical Methods in Economics Spring 2009

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Chapter 20: Logistic regression for binary response variables

Likelihood and Fairness in Multidimensional Item Response Theory

Score test for random changepoint in a mixed model

Plausible Values for Latent Variables Using Mplus

STAT 7030: Categorical Data Analysis

BTRY 4830/6830: Quantitative Genomics and Genetics

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

Testing Homogeneity Of A Large Data Set By Bootstrapping

Beta-binomial model for meta-analysis of odds ratios

Outline of GLMs. Definitions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

University of Bristol - Explore Bristol Research

Lecture : Probabilistic Machine Learning

A Robust Test for Two-Stage Design in Genome-Wide Association Studies

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 5: LDA and Logistic Regression

Lawrence D. Brown* and Daniel McCarthy*

STAT 536: Genetic Statistics

Foundations of Statistical Inference

Causal Inference: Discussion

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Linear Methods for Prediction

1 Preliminary Variance component test in GLM Mediation Analysis... 3

Bias Variance Trade-off

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Introduction to Machine Learning

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Practical considerations for survival models

Poisson regression: Further topics

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Applied Statistics and Econometrics

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

CS281 Section 4: Factor Analysis and PCA

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Variable selection and machine learning methods in causal inference

The Generalized Higher Criticism for Testing SNP-sets in Genetic Association Studies

Empirical likelihood-based methods for the difference of two trimmed means

Probabilistic Machine Learning. Industrial AI Lab.

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Lecture 15 (Part 2): Logistic Regression & Common Odds Ratio, (With Simulations)

Lecture 4 January 23

Cross-Validation with Confidence

Statistical Distribution Assumptions of General Linear Models

Chapter 7, continued: MANOVA

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Understanding Ding s Apparent Paradox

Econometric Analysis of Cross Section and Panel Data

Transcription:

Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse

Meta-Analysis Today s outline P-value based methods. Fixed effect model. Meta vs. pooled analysis. Random effects model. Meta vs. pooled analysis New random effects analysis

Meta-Analysis Single study is under-powered because the effect of common variant is very modest (OR 1.4) Meta-analysis is an effective way to combine data from multiple independent studies

Meta-Analysis Methods P-value based Regression coefficient based Fixed effects model Random effects model

P-value Based Conduct meta-analysis using p-values Simple and widely used Fisher and Z -score based methods

P-value Based K studies T Fisher = Simple and works well K 2 log(p k ) χ 2 2K k=1 Direction of effect is not considered

P-value Based Stouffer s Z -score (one-sided right-tailed) Z k = Φ(1 p k ) Φ is the standard cumulative normal distribution function Z = K k=1 Z k K Z follows the standard normal distribution.

Fisher versus Z -score These two are not perfectly linear, but they follow a highly linear relationship over the range of Z values most observed (Z 1).

P-value Based Z -score can incorporate direction of effects and weighting scheme βk : effect for study k w k = n k Z k = Φ(1 p k /2) sign(β k ) K k=1 Z = w kz k N(0, 1) K k=1 w 2 k

P-value Based Easy to use Z -score is perhaps more popular as it works well to find a consistent signal from studies Cannot estimate the effect size

Regression Coefficient Based For each study k = 1,..., K g{e(y ik )} = X ik α k + G ik β k Estimate β k and its standard error, ( β k, σ k ) β k N(β k, σ 2 k )

Fixed Effect Model Assume β 1 = = β K Effect size estimation β = K k=1 w k β k K k=1 w k n( β β) N ( 0, n k w 2 k σ2 k ( k w k) 2 w k = 1 gives the minimal variance of β var( β k ) )

Fixed Effect Model Assumes all studies in the analysis have the same effect Each study can be considered as a random sample drawn from the population with true parameter value β. There is between-study heterogeneity Different definitions of phenotypes. Effect size may be higher (or lower) in certain subgroups (e.g., age, sex).

Assess Heterogeneity Cochran s Q test K Q = w k ( β k β) 2 k=1 Q should be large if there is heterogeneity. Under the null hypothesis of no heterogeneity Q χ 2 K 1 Under powered when there are fewer studies.

Assess Heterogeneity Measure the % of total variance explained by the between-study heterogeneity (Higgins and Thompson 2002) I 2 Q (K 1) = 100% Q > 50% indicates large heterogeneity Intuitive interpretation, simple to calculate, and can be accompanied by an uncertainty interval

An Example

Meta- vs. Pooled-Analysis Meta analysis: Combining summary statistics (e.g., β k ) of studies Pooled analysis: Combining original or individual-level of all studies

Relative Efficiency For k = 1,, K studies g{e(y ki )} = α k + β T X ki β is common to all K studies; α k is specific to the kth study The maximum likelihood estimator (MLE) β k for the kth study by maximizing n k L k = sup αk f (Y ki, X ki ; β, α k ) i=1

Relative Efficiency β is MLE of β by maximizing K n k L = sup {α1,,α K } f (Y ki, X ki ; β, α k ) k=1 i=1 K n k = sup αk f (Y ki, X ki ; β, α k ) = k=1 K k=1 L k i=1

Relative Efficiency Information Recall var( β) = 1 I(β) I(β) = 2 K β 2 log L = 2 β 2 log L k = var( β) = K k=1 k=1 1 1 var(β k ) = K I k (β) k=1 1 K k=1 I k(β) Using summary statistics has the same asymptotic efficiency as using original data, if β is the only common parameter across studies.

Relative Efficiency Common nuisance parameters, say γ. ( ) ( ) IK I k = ββ I K βγ Iββ I I = βγ I K β I K γγ { K var( β) = (I kββ I kβγ I 1 k=1 = (I ββ K k=1 kγγ I kγβ) I kβγ I 1 kγγ I kγβ) 1 I γβ I γγ } 1 (I ββ I βγ I 1 γγ I γβ ) 1 = var( β) Equality holds if and only of var(β k ) 1 cov( β k, γ k ) are the same among the K studies.

Random Effects Model Under the fixed effect model, β 1 = = β K = β The random effects model β k = β + ξ k (k = 1,, K ) ξ k N(0, τ 2 )

Random Effects Model Estimation of τ 2 DerSimonian & Laird (1986) method-of-moments estimator V k = var( β k β k ) var( β k ) = V k + τ 2 τ 2 = V 1 K Q (K 1) V 2 K / V 1 K Estimate β with inverse variance estimator ŵ k = var( β k ) SE( β) = β = K k=1 ŵk β k K k=1 ŵk 1 k ŵk

Random Effects Model Fixed effect model is more powerful, but ignores the heterogeneity between studies. Random effects model is probably more robust, but is under powered. The confidence intervals have poor coverage for small and moderate K.

Meta- vs Pooled-Analysis The maximum likelihood estimator β from pooled data can be obtained by maximizing K k=1 ) 1 f k (Y ki X ki, β + ξ k ) ( exp ξ2 k 2πτ 2 2τ 2 dξ k Challenging to establish the theoretical properties of β and β under the random effects model because n k K.

Asymptotic Distributions (Zeng and Lin 2015) Assumptions: For k = 1,, K, n k = π k n for some constant π k within a compact interval (0, ). τ 2 = 1 n σ2, σ 2 is a constant. The between-study variability is comparable to the within-study variability. Asymptotic distribution of MLE β n( β β0 ) d ( K n τ 2 d Ã; n K Vk v k k=1 π k v k + π k A ) K k=1 π 1/2 k v k + π k A Z k Z k are independently distributed N(0, v k + π k σ 2 0 ). A is a complicated form that involves {Z k }. It is a mixture of normal random variables with the mixing probabilities both being random and correlated with the normal random variables.

Asymptotic Distributions Asymptotic distribution of β n τ 2 d  n( β β0 ) d ( K k=1 π k v k + π k  ) 1 K k=1 π 1/2 k Z k v k + π k  As n, K, Kn 1/2 0, var( β) var( β), i.e., the weighted average estimator β is at least as efficient as the MLE When K = 100, 200, 400, the empirical relative efficiency is 1.037, 1.083, and 1.171. Perform statistical inference for β and β based on asymptotic distributions using resampling techniques Or simpler, use the profile likelihood to construct 95% confidence intervals (Hardy and Thompson, 1996)

95% Coverage Probabilities Solid: Zeng & Lin; Dashed: DerSimonian-Laird; dotted Jackson-Bowden resampling method; dot-dash: Hardy-Thompson profile method. Left: K=10 studies; Right: K=20

Testing with random effects model Random effects (RE) model gives less significant p values than the fixed effects model when the variants show varying effect sizes between studies Ironic because RE is designed specifically for the case in which there is heterogeneity All associations identified RE are usually identified by the fixed effect model Causal variants showing high between-study heterogeneity might not be discovered by either method

Revisit the Traditional RE First step: estimate the effect size and CI by taking heterogeneity into account. Second step: normalize β/se( β) and translate it into p-value. Effectively, RE assumes heterogeneity under the null hypothesis, i.e., β k N(0, σ 2 + τ 2 ). There should not be heterogeneity under the null because β 1 = = β K = 0.

New RE New null hypothesis H 0 : β = 0, τ 2 = 0. Model β = ( β 1,, β K ). β k N(0, σ 2 ) β follows a multivariate normal distribution β MVN(β1, Σ) The likelihood ratio test statistic. Σ = V + τ 2 I, V = diag(v 1,, V K ). S new = 2 log L 1 L 0 = L 1 = K k=1 K k=1 L 0 1 2πVk exp ( β 2 k 2V k ) ( 1 2π(Vk + τ 2 ) exp ( β k β) 2 2(V k + τ 2 ) )

New RE Basically we test both fixed and random effects together S new = S β + S τ 2 S β is equal to the Z-statistic based on the fixed effect model, and S τ 2 is the test statistic for testing τ 2 = 0 β is unconstrained, but τ 2 0. Hence, S new 1 2 χ2 1 + 1 2 χ2 0. However, asymptotics cannot be applied because of small K. The asymptotic p-value is conservative because of the tail of asymptotic distribution is thicker than that of the true distribution at the tail Resampling approach should be used to calculate p-values

Power Comparison

Interpretation and Prioritization In the usual meta-analysis where one collects similar studies and expects the common effect, the results found by the fixed effects model should be the top priority. An association showing large heterogeneity requires careful investigation (e.g., different pathways, LD pattern, different study populations). Effect size estimate and CI is the same as those in the current RE, but may be inconsistent between wide CI and statistically significant results.

Comparison between meta- vs pooled-analysis Meta Pooled Logistic Easy Difficult Time-efficient Cheap Bias Time-consuming Costly Publication Possible Unlikely Exposure assessment May not be comparable Comparable Confounders May not be consistent Consistent Efficiency Relatively large data Similar Similar Sparse data May be unstable Better Complex analysis (e.g. machine learning) Difficult Easy Long-term benefit Always requires coordination Better

Meta- vs pooled-analysis for sparse data β is unstable if data are sparse in each study. However, if the interest is only on testing H 0 : β = 0, there are ways to combine the studies with only summary statistics. Recall the score test: [ [ 0)] β β=0 log L(β, α I(β = 0 α 0 ) 0)] 1 β β=0 log L(β, α χ 2 1 I(β = 0 α 0 ) = { I ββ I βα I 1 ααi αβ } β=0, α0 [ ] β log L(β, α 0) β=0 = [ K k=1 β log L β=0 k(β, α 0 )] I ββ β=0, α0 = K β=0, α0. Similar for I βα, I αα, and I αβ k=1 I kββ

Summary P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Recommended Reading DerSimonian R & Laird N (1986). Meta-analysis in clinical trials. Contr Clin Trials 7: 177-88. Han B and Eskin E (2011). Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am J Hum Genet 88: 586 598. Hardy RJ & Thomson SG (1996) A likelihood approach to meta-analysis with random effects. Stat in Med 30: 619 29. Lin DY and Zeng D (2010). On the relative efficiency of using summary statistics vs. individual level data in meta-analysis. Biometrika 97: 321 32. Zeng D and Lin DY (2015). On random-effects meta-analysis. Biometrika 102: 281 294.