Statistical inference in Mendelian randomization: From genetic association to epidemiological causation

Similar documents
Mendelian randomization: From genetic association to epidemiological causation

Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score

arxiv: v2 [stat.ap] 7 Feb 2018

Instrumental variables & Mendelian randomization

Robust instrumental variable methods using multiple candidate instruments with application to Mendelian randomization

A Comparison of Robust Methods for Mendelian Randomization Using Multiple Genetic Variants

Mendelian randomization (MR)

Two-Sample Instrumental Variable Analyses using Heterogeneous Samples

Harvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen

Confounder Adjustment in Multiple Hypothesis Testing

Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

Recent Challenges for Mendelian Randomisation Analyses

Combining multiple observational data sources to estimate causal eects

University of Bristol - Explore Bristol Research

Estimating direct effects in cohort and case-control studies

A Sampling of IMPACT Research:

Propensity Score Weighting with Multilevel Data

Causality and Experiments

Cross-Sectional Regression after Factor Analysis: Two Applications

Lecture 14 October 13

Empirical Bayes Moderation of Asymptotically Linear Parameters

Correlation and Simple Linear Regression

Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy

Causal exposure effect on a time-to-event response using an IV.

Varieties of Count Data

Previous lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)

Inference For High Dimensional M-estimates: Fixed Design Results

Casual Mediation Analysis

Survival Analysis for Case-Cohort Studies

6.3 How the Associational Criterion Fails

Control Function Instrumental Variable Estimation of Nonlinear Causal Effect Models

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Inference For High Dimensional M-estimates. Fixed Design Results

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Extending causal inferences from a randomized trial to a target population

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Extending the results of clinical trials using data from a target population

Empirical Bayes Moderation of Asymptotically Linear Parameters

2. Linear regression with multiple regressors

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Mendelian randomization as an instrumental variable approach to causal inference

Causal Mechanisms Short Course Part II:

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

6. Assessing studies based on multiple regression

Inference for High Dimensional Robust Regression

Specification Errors, Measurement Errors, Confounding

The GENIUS Approach to Robust Mendelian Randomization Inference

On the Choice of Parameterisation and Priors for the Bayesian Analyses of Mendelian Randomisation Studies.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Generalised method of moments estimation of structural mean models

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Marginal Screening and Post-Selection Inference

EMERGING MARKETS - Lecture 2: Methodology refresher

Selecting (In)Valid Instruments for Instrumental Variables Estimation

Lecture 7 Introduction to Statistical Decision Theory

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Association studies and regression

Estimating the Marginal Odds Ratio in Observational Studies

Propensity Score Methods for Causal Inference

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Generalised method of moments estimation of structural mean models

The propensity score with continuous treatments

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Median Cross-Validation

Review of Statistics 101

Probabilistic Index Models

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Sociology 6Z03 Review I

8. Instrumental variables regression

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Statistical Data Analysis

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

De-mystifying random effects models

Rerandomization to Balance Covariates

TGDR: An Introduction

Contents. Acknowledgments. xix

Causal Discovery by Computer

Causal Inference Basics

Spatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Applied Statistics and Econometrics

IV-estimators of the causal odds ratio for a continuous exposure in prospective and retrospective designs

Selective Inference for Effect Modification

Bayesian Hierarchical Models

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5101 Lecture Notes

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Causality II: How does causal inference fit into public health and what it is the role of statistics?

multilevel modeling: concepts, applications and interpretations

Transcription:

Statistical inference in Mendelian randomization: From genetic association to epidemiological causation Department of Statistics, The Wharton School, University of Pennsylvania March 1st, 2018 @ UMN Based on joint work with Jingshu Wang, Dylan Small (Penn). Jack Bowden, Gibran Hemani (University of Bristol).

Epidemiology = Why certain people are getting ill? Central problem: Causal inference from observational data. Methodological challenge: Correlation does not imply causation. Conventional solution: Conditioning on confounding variables. Matched case-control studies. Cohort studies. Problem: Biased if there is unmeasured confounder. 1/44

Motivating example Question What is the causal effect of Body Mass Index (BMI) on Systolic Blood Pressure (SBP)? Exposure variable X : BMI. Outcome variable Y : SBP. Confounders C: age, gender, lifestyle (food, smoking,... )... Fundamental problem Where is the end of this list? 2/44

MR: unbiased estimation without enumerating C Mendelian randomization (MR) is an alternative design of observational studies. Instead, MR uses additional instrumental variables Z. Not necessary to enumerate C. Brief history of MR Katan (1986): first proposal. Davey Smith and Ebrahim (2003): gaining attention. Hernán and Robins (2006): An Epidemiologist s dream? Pickrell et al. (2016): Surging interest in human genetics. Publications per year: only 5 in 2003, almost 400 in 2017. 3/44

This talk: MR using summary data Individual-level data (restricted access) Exposure variable X : BMI; Outcome variable Y : SBP; Instrumental variables Z 1,..., Z p : Genetic variants (SNPs). Summary-level data (public) Dataset BMI-FEM BMI-MAL SBP-UKBB Source GIANT (female) GIANT (male) UKBB Sample size 171977 152893 317754 GWAS lm(x Z j ) lm(x Z j ) lm(y Z j ) Coefficient ˆγ Selection j ˆΓj Std. Err. σ Xj σ Yj Step 1 Use BMI-FEM to select significant (p-value 5 10 8 ) and uncorrelated SNPs (p = 25). Step 2 Use BMI-MAL to obtain (ˆγ j, σ Xj ), j = 1,..., p. Step 3 Use SBP-UKBB to obtain (ˆΓ j, σ Yj ), j = 1,..., p. 4/44

Statistical problem Model: Errors-in-variables regression Observe mutually independent random variables: ˆγ j N(γ j, σ 2 Xj ) and ˆΓ j N(Γ j, σ 2 Yj ). For some real number β 0, Γ j β 0 γ j for almost all j. β 0 can be regarded as the causal effect of X on Y. 5/44

Example SNP effect on SBP 0.05 0.00 0.05 0.000 0.025 0.050 0.075 SNP effect on BMI Method Simple Overdispersion + Robust loss Figure: Scatter plot of ˆΓ j versus ˆγ j in the BMI-SBP example (p = 25). The goal is to fit a straight line across the origin. Observation: effects are tiny but many are significant. 6/44

Reasonable model? Key assumptions 1 Measurement error is normal. 2 (ˆγ 1,..., ˆγ p ) (ˆΓ 1,..., ˆΓ p ). 3 ˆγ j ˆγ k and ˆΓ j ˆΓ k if j k. 4 (Implicitly) γ 0. Otherwise no information for β 0. 5 ˆγ j is unbiased: E[ˆγ j ] = γ j. Pre-processing guarantees these assumptions 1 Large sample size CLT. 2 Two-sample MR: ˆγ and ˆΓ are computed from independent samples (GIANT males and UKBB). 3 Selecting uncorrelated and significant SNPs using another independent dataset (GIANT females). 7/44

MR estimates causal effect Two remaining questions 1 Why should the model Γ j β 0 γ j be linear? 2 Why is β 0 causal? Answer: MR is using SNPs as instrumental variables (IV). Heuristic Consider the following linear model: p X = γ j Z j + η X C + E X, j=1 Y = β 0 X + η Y C + E Y = p j=1 (β 0 γ j }{{} Γ j )Z j +... Key assumption: Z (C, E X, E Y ), i.e. Z are valid instruments. 8/44

Background: What is an IV? 2 C 1 Z X Y 3 Figure: Causal effect of X and Y is confounded by C. Z is an instrumental variable. Core IV assumptions 1 Z is associated with the exposure (X ). 2 Z is independent of the unmeasured confounder (C). 3 Z cannot have any direct effect on the outcome (Y ). 9/44

MR = using genetic variants as IVs 2 C 1 Z X Y 3 Examine the core IV assumptions for MR 1 Criterion 1 is not a problem. 2 Criterion 2 almost comes free due to Mendel s Second Law of random assortment. A minor concern is population stratification 3 Criterion 3 is not clear. There is a wide-spread phenomenon in genetics called pleiotropy, a.k.a. multi-function of genes. 10/44

Linear model (for summary data) is very robust Surprisingly, Γ j β 0 γ j is true if the individual-level model for (X, Y, Z, C) are non-linear. Heuristic Each SNP corresponds to a very small randomized trial. Maybe the relationship between Γ and γ is nonlinear, but it is locally linear near γ = 0. Can formalize this argument (see the paper) and show: β 0 lim x 0 E[Y (X + x) Y (X )]. x Y (x) is the counterfactual or potential outcome. The approximate linear model is still true even if the ˆΓ j are computed from glm(y Z j ). 11/44

Recap Two-sample summary-data MR (ˆγ j, ˆΓ j, σxj 2, σ2 Yj )p j=1 Genetic associations Inference = β 0 Epidemiological causation Statistical model 1 Observe mutually independent random variables: ˆγ j N(γ j, σ 2 Xj ) and ˆΓ j N(Γ j, σ 2 Yj ). 2 For some real number β 0, Γ j β 0 γ j for almost all j. Three concrete models for α j = Γ j β 0 γ j 1 No direct effect: α j 0. 2 Random effects model: α j N(0, τ0 2 ). 3 Contamination model: α j (1 ɛ)n(0, τ0 2 ) + ɛg. 12/44

Outline 1 2 : No direct effect 3 : Random direct effects 4 : Random direct effects with outliers 5 More examples 6 13/44

Model (No direct effect) The linear model Γ j = β 0 γ j is true for every j = 1,..., p. Log-likelihood of the data (up to additive constant): l(β, γ 1,..., γ p ) = 1 [ p (ˆγ j γ j ) 2 + 2 Profile likelihood j=1 l(β) = max γ l(β, γ) = 1 2 σ 2 Xj p j=1 Profile score (PS): ψ(β) = (d/dβ)l(β). p (ˆΓ j γ j β) 2 ]. j=1 (ˆΓ j βˆγ j ) 2 σxj 2 β2 + σyj 2. Estimator: ˆβPS = arg max l(β) solves ψ(β) = 0. β σ 2 Yj 15/44

Some theory Assumption σxj 2 = O(1/n) = σ2 Yj. γ 2 is bounded (even if p ). Heuristic p Linear structural model: X = γ j Z j + η X C + E X. j=1 γ 2 is like the variance of X explained by Z. Theorem (Consistency) Under, ˆβ PS p β0 if p/n 2 0. 16/44

Some theory (continued) Theorem Suppose n, and further p is fixed or p but γ 3 / γ 2 0, then V 2 V1 ( ˆβ PS β 0 ) d N(0, 1), where V 1 = p j=1 γ 2 j σ2 Yj + Γ2 j σ2 Xj + σ2 Xj σ2 Yj (σ 2 Xj β2 0 + σ2 Yj )2, V 2 = p j=1 γ 2 j σ2 Yj + Γ2 j σ2 Xj (σ 2 Xj β2 0 + σ2 Yj )2. V 1 and V 2 can be consistently estimated from data. Necessity IV selection: adding Z p+1 hurts if γ p+1 = 0. But adding even very weak SNPs usually reduces variance. 17/44

Diagnostic plots Residual Q-Q plot Under model 1, standardized residuals follow N(0, 1) if ˆβ = β 0 : ˆt j = Influence of a single SNP Asymptotic expansion: ˆΓ j ˆβˆγ j. ˆβ 2 σxj 2 + σ2 Yj ˆβ PS = 1 + o p(1) V 2 p j=1 (ˆΓ j β 0ˆγ j )(ˆΓ j σ 2 Xj β 0 + ˆγ j σ 2 Yj ) (σ 2 Xj β2 0 + σ2 Yj )2. In practice, we can also use leave-one-out. 18/44

Example (continued) Same three datasets, selected 160 uncorrelated SNPs that have p-values 10 4 in BMI-FEM. Diagnostic plots: 10 rs11191593 Standardized residual 5 0 Leave one out estimate 0.64 0.60 0.56 rs11191593 5 3 2 1 0 1 2 3 Quantile of standard normal 2 5 10 20 50 100 200 Instrument strength (F statistic) Clear overdispersion! 19/44

The most likely explanation of the lack of fit is pleiotropy (direct effects). Heuristic (Direct effect of Z j is α j ) p X = γ j Z j + η X C + E X, j=1 Y = β 0 X + p α j Z j + η Y C + E Y = j=1 p j=1 (β 0 γ j + α j }{{} Γ j )Z j +... Model (Random direct effects) Assume α j = Γ j β 0 γ j i.i.d. N(0, τ 2 0 ) for all j. τ 2 0 should be small (scales like 1/p). 21/44

Connection to population genetics We assumed the normal random effects model because of The observed overdiserpsion. Statistical convenience. This is consistent (or not inconsistent) with genetics: Fisher s (1918) infitestimal model. New perspective on pleiotropy: 1 1 Boyle, E. et al. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell 169, p1177 1186 22/44

Back to statistics: Failure of profile likelihood The profile likelihood of is given by l(β, τ 2 ) = 1 2 Easy to verify p j=1 (ˆΓ j βˆγ j ) 2 σ 2 Xj β2 + σ 2 Yj + τ 2 + log(σ2 Yj + τ 2 ), [ ] E β l(β 0, τ0 2 ) = 0. But the other score function is biased: τ 2 l(β, τ 2 ) = 1 2 p j=1 (ˆΓ j βˆγ j ) 2 (σ 2 Xj β2 + σ 2 Yj + τ 2 ) 2 1 σ 2 Yj + τ 2. This is not too surprising as we are profiling out p nuisance parameters γ 1,, γ p (the Neyman-Scott problem). 23/44

Adjusted profile score There exist many ways to modify the profile likelihood. We will take the approach of McCullagh and Tibshirani (1990). Adjusted profile score (APS) ψ 1 (β, τ 2 ) = β l(β, τ 2 ), p [ ψ 2 (β, τ 2 ) = j=1 σ 2 Xj (ˆΓ j βˆγ j ) 2 (σ 2 Xj β2 + σ 2 Yj + τ 2 ) 2 1 σ 2 Xj β2 + σ 2 Yj +τ 2 ]. Trivial root: β ± or τ 2. Let ˆβ APS be the non-trivial finite solution. 24/44

Some theory Theorem Assume that is true, (β 0, pτ 2 0 ) is in a bounded set B, p and p/n 2 0. Then 1 With probability going to 1 there exists a solution in B. p 2 All solutions in B are consistent: ˆβ APE β0 and pˆτ APS 2 pτ0 2 p 0. Can obtain similar asymptotic normality assuming p/n λ (0, ) (see the paper). 25/44

Example (continued) Same 160 SNPs. Standardized residual 5.0 2.5 0.0 rs11191593 Leave one out estimate 0.36 0.32 0.28 rs11191593 2.5 3 2 1 0 1 2 3 Quantile of standard normal 0.24 2 5 10 20 50 100 200 Instrument strength (F statistic) A clear outlier: rs11191593. Slightly underdispersed. 26/44

Model (Random direct effects with outliers) α j (1 ɛ)n(0, τ 2 0 ) + ɛg for some unknown G. Recall the expansion: ˆβ PS = 1 + o p(1) V 2 p j=1 (ˆΓ j β 0ˆγ j )(ˆΓ j σ 2 Xj β 0 + ˆγ j σ 2 Yj ) (σ 2 Xj β2 0 + σ2 Yj )2. Problem: a single SNP can have unbounded influence (same for APS). Our solution: robustify the adjusted profile score. 28/44

A method robust to outliers Standardized residual: t j (β, τ 2 ) = ˆΓ j βˆγ j σ 2 Xj β2 + σ 2 Yj + τ 2. Robust adjusted profile score (RAPS) ψ (ρ) 1 (β, τ 2 ) = ψ (ρ) 2 (β, τ 2 ) = p ρ (t j ) β t j, j=1 p j=1 σ 2 Xj t j ρ (t j ) δ σxj 2 β2 + σyj 2 + τ 2. ρ is robust loss, δ = E[T ρ (T )] for T N(0, 1). Reduces to solution 2 (APS) when ρ(t) = t 2 /2. 29/44

Some theory General theory is very difficult because we are solving nonlinear equations. Theorem (Local identifiability) In, E[ψ (ρ) (β 0, τ 2 0 )] = 0 and E[ ψ (ρ) ] has full rank. Asymptotic normality can be established assuming consistency and additional technical conditions (see the paper). 30/44

Example (continued) Same 160 SNPs. Huber s loss function. Standardized residual rs11191593 6 3 0 Leave one out estimate 0.41 0.39 0.37 0.35 rs11191593 3 3 2 1 0 1 2 3 Quantile of standard normal 2 5 10 20 50 100 200 Instrument strength (F statistic) Influence of the outlier is limited. 31/44

Example (continued) Same 160 SNPs. Tukey s biweight function. Standardized residual 6 3 0 rs11191593 Leave one out estimate 0.42 0.40 0.38 rs11191593 3 0.36 3 2 1 0 1 2 3 Quantile of standard normal 2 5 10 20 50 100 200 Instrument strength (F statistic) Influence of the outlier is essentially 0. 32/44

Comparison of all the methods Method Estimate Standard error Profile score (PS) 0.61 0.05 Adjusted PS (APS) 0.30 0.16 Robust APS (RAPS, Huber) 0.38 0.12 Robust APS (RAPS, Tukey) 0.40 0.11 Inverse variance weighted (IVW) 0.47 0.18 Weighted median 0.33 0.11 MR Egger 0.51 0.10 Existing meta-analysis methods for MR 1 IVW: weighted average of ˆΓ j /ˆγ j. 2 Weighted median of ˆΓ j /ˆγ j. 3 MR Egger: weighted least squares of ˆΓ j against ˆγ j. They all ignore measurement error in ˆγ j and weak IV bias. 33/44

Simulation setting Create simulated datasets to mimic the BMI-SBP example with known β 0. ind. ˆγ j N(γ j, σxj 2 ) where (γ j, σxj 2 ) come from the real data. Γ j are generated in 6 different ways (β 0 = 0.5): : Γ j = γ j β 0 (i.e. α j 0); : α j i.i.d. N(0, τ 2 0 ), where τ 0 = 2 (1/p) p σ Yj ; j=1 : Same as 2 but add 5 τ 0 to α 1. i.i.d. Heavy-tailed α: α j τ 0 Lap(1). Heteroskedastic α: Var(α j ) γj 2 and normal. Many outliers: Add 5 τ 0 to 10% randomly selected α j. 35/44

Simulation results 1 (p = 25) 1 2 3 4 5 6 IVW Egger W. Median method PS APS RAPS 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 β^ Black: boxplot of point estimate over 1000 realizations. Red: quartile using median standard error. 36/44

Simulation results 1 (p = 160) 1 2 3 4 5 6 IVW Egger W. Median method PS APS RAPS 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 β^ Black: boxplot of point estimate over 1000 realizations. Red: quartile using median standard error. 37/44

Another real data example: BMI-BMI BMI-GIANT: full dataset from the GIANT consortium (i.e. combining BMI-FEM and BMI-MAL), used to select SNPs. BMI-UKBB-1: half of the UKBB data, used as the exposure. BMI-UKBB-2: another half of UKBB data, used as the outcome. Because exposure and outcome, γ j Γ j. So true β 0 = 1 and there is no direct effect. 38/44

BMI-BMI results (GIANT, UKBB-1, UKBB-2) p sel # SNPs Mean F IVW W. Median W. Mode 1e-9 48 78.6 0.983 (0.026) 0.945 (0.039) 0.941 (0.042) 1e-8 58 69.2 0.983 (0.024) 0.945 (0.039) 0.939 (0.044) 1e-7 84 55.0 0.988 (0.024) 0.945 (0.036) 0.933 (0.041) 1e-6 126 44.1 0.986 (0.022) 0.944 (0.034) 0.931 (0.038) 1e-5 186 34.3 0.986 (0.019) 0.943 (0.033) 0.928 (0.039) 1e-4 287 26.1 0.981 (0.017) 0.941 (0.031) 0.929 (0.035) 1e-3 474 18.8 0.955 (0.015) 0.903 (0.027) 0.917 (0.231) 1e-2 812 12.7 0.928 (0.014) 0.879 (0.023) 0.739 (7.130) p sel # SNPs Median F Egger PS RAPS 1e-9 48 51.8 0.926 (0.055) 0.999 (0.023) 0.998 (0.026) 1e-8 58 42.0 0.928 (0.050) 0.999 (0.023) 0.998 (0.025) 1e-7 84 32.1 0.905 (0.048) 1.012 (0.021) 1.004 (0.025) 1e-6 126 27.4 0.881 (0.043) 1.017 (0.019) 1.009 (0.023) 1e-5 186 21.0 0.874 (0.036) 1.020 (0.018) 1.013 (0.020) 1e-4 287 15.8 0.921 (0.031) 1.023 (0.017) 1.018 (0.018) 1e-3 474 10.8 0.913 (0.027) 1.010 (0.016) 1.006 (0.016) 1e-2 812 5.6 0.909 (0.022) 1.010 (0.015) 1.005 (0.015) 39/44

Selection bias (UKBB-1, UKBB-1, UKBB-2) p sel # SNPs Mean F IVW W. Median W. Mode 1e-9 110 68.63 0.851 (0.02) 0.83 (0.025) 0.896 (0.046) 1e-8 168 57.00 0.823 (0.017) 0.8 (0.022) 0.885 (0.053) 1e-7 228 50.08 0.799 (0.016) 0.768 (0.019) 0.886 (0.058) 1e-6 305 43.92 0.761 (0.015) 0.736 (0.019) 0.865 (0.079) 1e-5 443 36.98 0.721 (0.013) 0.667 (0.016) 0.824 (0.12) 1e-4 652 30.68 0.678 (0.012) 0.616 (0.015) 0.593 (0.122) 1e-3 929 25.36 0.629 (0.011) 0.57 (0.014) 0.576 (0.096) 1e-2 1289 20.70 0.592 (0.01) 0.528 (0.013) 0.554 (0.093) p sel # SNPs Median F Egger PS RAPS 1e-9 110 49.20 1.071 (0.051) 0.871 (0.015) 0.862 (0.021) 1e-8 168 41.12 1.018 (0.046) 0.848 (0.014) 0.831 (0.018) 1e-7 228 37.12 1.016 (0.043) 0.824 (0.012) 0.803 (0.016) 1e-6 305 33.68 1.006 (0.041) 0.793 (0.011) 0.763 (0.016) 1e-5 443 28.74 0.957 (0.037) 0.762 (0.01) 0.716 (0.015) 1e-4 652 23.23 0.89 (0.033) 0.724 (0.009) 0.66 (0.014) 1e-3 929 19.12 0.823 (0.03) 0.687 (0.008) 0.594 (0.013) 1e-2 1289 15.26 0.749 (0.025) 0.657 (0.008) 0.541 (0.012) 40/44

This talk Paper: Statistical inference in two-sample Mendelian randomization using robust adjusted profile score. arxiv: 1801.09652. Software: R package mr.raps is currently on CRAN. Can be directly called from the TwoSampleMR platform (https://github.com/mrcieu/twosamplemr). 42/44

Additional references I Overview of MR (Epidemiology): Davey Smith, G. and Ebrahim, S. (2003). Mendelian randomization : can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. Davey Smith, G. and Hemani, G. (2014). Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. Overview of MR (more Stats): Hernán, M. and Robins, J. (2006). Instruments for causal inference: an epidemiologist s dream?. Epidemiology. Didelez, V. and Sheehan N. (2007). Mendelian randomization as an instrumental variable approach to causal inference. Statistical Methods in Medical Research. Human genetics: Gamazon, E. et al. (2015). A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics. Pickrell, J. et al. (2016) Detection and interpretation of shared genetic influences on 42 human traits. Nature Genetics. 43/44

Additional references II Robust MR methods (individual-level data): Kang, H. et al. (2016). Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization. JASA. Tchetgen Tchetgen, E. et al. (2017). The GENIUS approach to robust Mendelian randomization inference. arxiv:1709.07779. Robust MR methods (summary-level data): Bowden, J. et al. (2015). Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology. Bowden, J. et al. (2016). Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genetic Epidemiology. Pleiotropy: Solovieff, N. et al. (2013). Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics. Boyle, E. et al. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell. 44/44