Statistical inference in Mendelian randomization: From genetic association to epidemiological causation
|
|
- Lucas Cannon
- 5 years ago
- Views:
Transcription
1 Statistical inference in Mendelian randomization: From genetic association to epidemiological causation Department of Statistics, The Wharton School, University of Pennsylvania March 1st, UMN Based on joint work with Jingshu Wang, Dylan Small (Penn). Jack Bowden, Gibran Hemani (University of Bristol).
2 Epidemiology = Why certain people are getting ill? Central problem: Causal inference from observational data. Methodological challenge: Correlation does not imply causation. Conventional solution: Conditioning on confounding variables. Matched case-control studies. Cohort studies. Problem: Biased if there is unmeasured confounder. 1/44
3 Motivating example Question What is the causal effect of Body Mass Index (BMI) on Systolic Blood Pressure (SBP)? Exposure variable X : BMI. Outcome variable Y : SBP. Confounders C: age, gender, lifestyle (food, smoking,... )... Fundamental problem Where is the end of this list? 2/44
4 MR: unbiased estimation without enumerating C Mendelian randomization (MR) is an alternative design of observational studies. Instead, MR uses additional instrumental variables Z. Not necessary to enumerate C. Brief history of MR Katan (1986): first proposal. Davey Smith and Ebrahim (2003): gaining attention. Hernán and Robins (2006): An Epidemiologist s dream? Pickrell et al. (2016): Surging interest in human genetics. Publications per year: only 5 in 2003, almost 400 in /44
5 This talk: MR using summary data Individual-level data (restricted access) Exposure variable X : BMI; Outcome variable Y : SBP; Instrumental variables Z 1,..., Z p : Genetic variants (SNPs). Summary-level data (public) Dataset BMI-FEM BMI-MAL SBP-UKBB Source GIANT (female) GIANT (male) UKBB Sample size GWAS lm(x Z j ) lm(x Z j ) lm(y Z j ) Coefficient ˆγ Selection j ˆΓj Std. Err. σ Xj σ Yj Step 1 Use BMI-FEM to select significant (p-value ) and uncorrelated SNPs (p = 25). Step 2 Use BMI-MAL to obtain (ˆγ j, σ Xj ), j = 1,..., p. Step 3 Use SBP-UKBB to obtain (ˆΓ j, σ Yj ), j = 1,..., p. 4/44
6 Statistical problem Model: Errors-in-variables regression Observe mutually independent random variables: ˆγ j N(γ j, σ 2 Xj ) and ˆΓ j N(Γ j, σ 2 Yj ). For some real number β 0, Γ j β 0 γ j for almost all j. β 0 can be regarded as the causal effect of X on Y. 5/44
7 Example SNP effect on SBP SNP effect on BMI Method Simple Overdispersion + Robust loss Figure: Scatter plot of ˆΓ j versus ˆγ j in the BMI-SBP example (p = 25). The goal is to fit a straight line across the origin. Observation: effects are tiny but many are significant. 6/44
8 Reasonable model? Key assumptions 1 Measurement error is normal. 2 (ˆγ 1,..., ˆγ p ) (ˆΓ 1,..., ˆΓ p ). 3 ˆγ j ˆγ k and ˆΓ j ˆΓ k if j k. 4 (Implicitly) γ 0. Otherwise no information for β 0. 5 ˆγ j is unbiased: E[ˆγ j ] = γ j. Pre-processing guarantees these assumptions 1 Large sample size CLT. 2 Two-sample MR: ˆγ and ˆΓ are computed from independent samples (GIANT males and UKBB). 3 Selecting uncorrelated and significant SNPs using another independent dataset (GIANT females). 7/44
9 MR estimates causal effect Two remaining questions 1 Why should the model Γ j β 0 γ j be linear? 2 Why is β 0 causal? Answer: MR is using SNPs as instrumental variables (IV). Heuristic Consider the following linear model: p X = γ j Z j + η X C + E X, j=1 Y = β 0 X + η Y C + E Y = p j=1 (β 0 γ j }{{} Γ j )Z j +... Key assumption: Z (C, E X, E Y ), i.e. Z are valid instruments. 8/44
10 Background: What is an IV? 2 C 1 Z X Y 3 Figure: Causal effect of X and Y is confounded by C. Z is an instrumental variable. Core IV assumptions 1 Z is associated with the exposure (X ). 2 Z is independent of the unmeasured confounder (C). 3 Z cannot have any direct effect on the outcome (Y ). 9/44
11 MR = using genetic variants as IVs 2 C 1 Z X Y 3 Examine the core IV assumptions for MR 1 Criterion 1 is not a problem. 2 Criterion 2 almost comes free due to Mendel s Second Law of random assortment. A minor concern is population stratification 3 Criterion 3 is not clear. There is a wide-spread phenomenon in genetics called pleiotropy, a.k.a. multi-function of genes. 10/44
12 Linear model (for summary data) is very robust Surprisingly, Γ j β 0 γ j is true if the individual-level model for (X, Y, Z, C) are non-linear. Heuristic Each SNP corresponds to a very small randomized trial. Maybe the relationship between Γ and γ is nonlinear, but it is locally linear near γ = 0. Can formalize this argument (see the paper) and show: β 0 lim x 0 E[Y (X + x) Y (X )]. x Y (x) is the counterfactual or potential outcome. The approximate linear model is still true even if the ˆΓ j are computed from glm(y Z j ). 11/44
13 Recap Two-sample summary-data MR (ˆγ j, ˆΓ j, σxj 2, σ2 Yj )p j=1 Genetic associations Inference = β 0 Epidemiological causation Statistical model 1 Observe mutually independent random variables: ˆγ j N(γ j, σ 2 Xj ) and ˆΓ j N(Γ j, σ 2 Yj ). 2 For some real number β 0, Γ j β 0 γ j for almost all j. Three concrete models for α j = Γ j β 0 γ j 1 No direct effect: α j 0. 2 Random effects model: α j N(0, τ0 2 ). 3 Contamination model: α j (1 ɛ)n(0, τ0 2 ) + ɛg. 12/44
14 Outline 1 2 : No direct effect 3 : Random direct effects 4 : Random direct effects with outliers 5 More examples 6 13/44
15 Model (No direct effect) The linear model Γ j = β 0 γ j is true for every j = 1,..., p. Log-likelihood of the data (up to additive constant): l(β, γ 1,..., γ p ) = 1 [ p (ˆγ j γ j ) Profile likelihood j=1 l(β) = max γ l(β, γ) = 1 2 σ 2 Xj p j=1 Profile score (PS): ψ(β) = (d/dβ)l(β). p (ˆΓ j γ j β) 2 ]. j=1 (ˆΓ j βˆγ j ) 2 σxj 2 β2 + σyj 2. Estimator: ˆβPS = arg max l(β) solves ψ(β) = 0. β σ 2 Yj 15/44
16 Some theory Assumption σxj 2 = O(1/n) = σ2 Yj. γ 2 is bounded (even if p ). Heuristic p Linear structural model: X = γ j Z j + η X C + E X. j=1 γ 2 is like the variance of X explained by Z. Theorem (Consistency) Under, ˆβ PS p β0 if p/n /44
17 Some theory (continued) Theorem Suppose n, and further p is fixed or p but γ 3 / γ 2 0, then V 2 V1 ( ˆβ PS β 0 ) d N(0, 1), where V 1 = p j=1 γ 2 j σ2 Yj + Γ2 j σ2 Xj + σ2 Xj σ2 Yj (σ 2 Xj β2 0 + σ2 Yj )2, V 2 = p j=1 γ 2 j σ2 Yj + Γ2 j σ2 Xj (σ 2 Xj β2 0 + σ2 Yj )2. V 1 and V 2 can be consistently estimated from data. Necessity IV selection: adding Z p+1 hurts if γ p+1 = 0. But adding even very weak SNPs usually reduces variance. 17/44
18 Diagnostic plots Residual Q-Q plot Under model 1, standardized residuals follow N(0, 1) if ˆβ = β 0 : ˆt j = Influence of a single SNP Asymptotic expansion: ˆΓ j ˆβˆγ j. ˆβ 2 σxj 2 + σ2 Yj ˆβ PS = 1 + o p(1) V 2 p j=1 (ˆΓ j β 0ˆγ j )(ˆΓ j σ 2 Xj β 0 + ˆγ j σ 2 Yj ) (σ 2 Xj β2 0 + σ2 Yj )2. In practice, we can also use leave-one-out. 18/44
19 Example (continued) Same three datasets, selected 160 uncorrelated SNPs that have p-values 10 4 in BMI-FEM. Diagnostic plots: 10 rs Standardized residual 5 0 Leave one out estimate rs Quantile of standard normal Instrument strength (F statistic) Clear overdispersion! 19/44
20 The most likely explanation of the lack of fit is pleiotropy (direct effects). Heuristic (Direct effect of Z j is α j ) p X = γ j Z j + η X C + E X, j=1 Y = β 0 X + p α j Z j + η Y C + E Y = j=1 p j=1 (β 0 γ j + α j }{{} Γ j )Z j +... Model (Random direct effects) Assume α j = Γ j β 0 γ j i.i.d. N(0, τ 2 0 ) for all j. τ 2 0 should be small (scales like 1/p). 21/44
21 Connection to population genetics We assumed the normal random effects model because of The observed overdiserpsion. Statistical convenience. This is consistent (or not inconsistent) with genetics: Fisher s (1918) infitestimal model. New perspective on pleiotropy: 1 1 Boyle, E. et al. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell 169, p /44
22 Back to statistics: Failure of profile likelihood The profile likelihood of is given by l(β, τ 2 ) = 1 2 Easy to verify p j=1 (ˆΓ j βˆγ j ) 2 σ 2 Xj β2 + σ 2 Yj + τ 2 + log(σ2 Yj + τ 2 ), [ ] E β l(β 0, τ0 2 ) = 0. But the other score function is biased: τ 2 l(β, τ 2 ) = 1 2 p j=1 (ˆΓ j βˆγ j ) 2 (σ 2 Xj β2 + σ 2 Yj + τ 2 ) 2 1 σ 2 Yj + τ 2. This is not too surprising as we are profiling out p nuisance parameters γ 1,, γ p (the Neyman-Scott problem). 23/44
23 Adjusted profile score There exist many ways to modify the profile likelihood. We will take the approach of McCullagh and Tibshirani (1990). Adjusted profile score (APS) ψ 1 (β, τ 2 ) = β l(β, τ 2 ), p [ ψ 2 (β, τ 2 ) = j=1 σ 2 Xj (ˆΓ j βˆγ j ) 2 (σ 2 Xj β2 + σ 2 Yj + τ 2 ) 2 1 σ 2 Xj β2 + σ 2 Yj +τ 2 ]. Trivial root: β ± or τ 2. Let ˆβ APS be the non-trivial finite solution. 24/44
24 Some theory Theorem Assume that is true, (β 0, pτ 2 0 ) is in a bounded set B, p and p/n 2 0. Then 1 With probability going to 1 there exists a solution in B. p 2 All solutions in B are consistent: ˆβ APE β0 and pˆτ APS 2 pτ0 2 p 0. Can obtain similar asymptotic normality assuming p/n λ (0, ) (see the paper). 25/44
25 Example (continued) Same 160 SNPs. Standardized residual rs Leave one out estimate rs Quantile of standard normal Instrument strength (F statistic) A clear outlier: rs Slightly underdispersed. 26/44
26 Model (Random direct effects with outliers) α j (1 ɛ)n(0, τ 2 0 ) + ɛg for some unknown G. Recall the expansion: ˆβ PS = 1 + o p(1) V 2 p j=1 (ˆΓ j β 0ˆγ j )(ˆΓ j σ 2 Xj β 0 + ˆγ j σ 2 Yj ) (σ 2 Xj β2 0 + σ2 Yj )2. Problem: a single SNP can have unbounded influence (same for APS). Our solution: robustify the adjusted profile score. 28/44
27 A method robust to outliers Standardized residual: t j (β, τ 2 ) = ˆΓ j βˆγ j σ 2 Xj β2 + σ 2 Yj + τ 2. Robust adjusted profile score (RAPS) ψ (ρ) 1 (β, τ 2 ) = ψ (ρ) 2 (β, τ 2 ) = p ρ (t j ) β t j, j=1 p j=1 σ 2 Xj t j ρ (t j ) δ σxj 2 β2 + σyj 2 + τ 2. ρ is robust loss, δ = E[T ρ (T )] for T N(0, 1). Reduces to solution 2 (APS) when ρ(t) = t 2 /2. 29/44
28 Some theory General theory is very difficult because we are solving nonlinear equations. Theorem (Local identifiability) In, E[ψ (ρ) (β 0, τ 2 0 )] = 0 and E[ ψ (ρ) ] has full rank. Asymptotic normality can be established assuming consistency and additional technical conditions (see the paper). 30/44
29 Example (continued) Same 160 SNPs. Huber s loss function. Standardized residual rs Leave one out estimate rs Quantile of standard normal Instrument strength (F statistic) Influence of the outlier is limited. 31/44
30 Example (continued) Same 160 SNPs. Tukey s biweight function. Standardized residual rs Leave one out estimate rs Quantile of standard normal Instrument strength (F statistic) Influence of the outlier is essentially 0. 32/44
31 Comparison of all the methods Method Estimate Standard error Profile score (PS) Adjusted PS (APS) Robust APS (RAPS, Huber) Robust APS (RAPS, Tukey) Inverse variance weighted (IVW) Weighted median MR Egger Existing meta-analysis methods for MR 1 IVW: weighted average of ˆΓ j /ˆγ j. 2 Weighted median of ˆΓ j /ˆγ j. 3 MR Egger: weighted least squares of ˆΓ j against ˆγ j. They all ignore measurement error in ˆγ j and weak IV bias. 33/44
32 Simulation setting Create simulated datasets to mimic the BMI-SBP example with known β 0. ind. ˆγ j N(γ j, σxj 2 ) where (γ j, σxj 2 ) come from the real data. Γ j are generated in 6 different ways (β 0 = 0.5): : Γ j = γ j β 0 (i.e. α j 0); : α j i.i.d. N(0, τ 2 0 ), where τ 0 = 2 (1/p) p σ Yj ; j=1 : Same as 2 but add 5 τ 0 to α 1. i.i.d. Heavy-tailed α: α j τ 0 Lap(1). Heteroskedastic α: Var(α j ) γj 2 and normal. Many outliers: Add 5 τ 0 to 10% randomly selected α j. 35/44
33 Simulation results 1 (p = 25) IVW Egger W. Median method PS APS RAPS β^ Black: boxplot of point estimate over 1000 realizations. Red: quartile using median standard error. 36/44
34 Simulation results 1 (p = 160) IVW Egger W. Median method PS APS RAPS β^ Black: boxplot of point estimate over 1000 realizations. Red: quartile using median standard error. 37/44
35 Another real data example: BMI-BMI BMI-GIANT: full dataset from the GIANT consortium (i.e. combining BMI-FEM and BMI-MAL), used to select SNPs. BMI-UKBB-1: half of the UKBB data, used as the exposure. BMI-UKBB-2: another half of UKBB data, used as the outcome. Because exposure and outcome, γ j Γ j. So true β 0 = 1 and there is no direct effect. 38/44
36 BMI-BMI results (GIANT, UKBB-1, UKBB-2) p sel # SNPs Mean F IVW W. Median W. Mode 1e (0.026) (0.039) (0.042) 1e (0.024) (0.039) (0.044) 1e (0.024) (0.036) (0.041) 1e (0.022) (0.034) (0.038) 1e (0.019) (0.033) (0.039) 1e (0.017) (0.031) (0.035) 1e (0.015) (0.027) (0.231) 1e (0.014) (0.023) (7.130) p sel # SNPs Median F Egger PS RAPS 1e (0.055) (0.023) (0.026) 1e (0.050) (0.023) (0.025) 1e (0.048) (0.021) (0.025) 1e (0.043) (0.019) (0.023) 1e (0.036) (0.018) (0.020) 1e (0.031) (0.017) (0.018) 1e (0.027) (0.016) (0.016) 1e (0.022) (0.015) (0.015) 39/44
37 Selection bias (UKBB-1, UKBB-1, UKBB-2) p sel # SNPs Mean F IVW W. Median W. Mode 1e (0.02) 0.83 (0.025) (0.046) 1e (0.017) 0.8 (0.022) (0.053) 1e (0.016) (0.019) (0.058) 1e (0.015) (0.019) (0.079) 1e (0.013) (0.016) (0.12) 1e (0.012) (0.015) (0.122) 1e (0.011) 0.57 (0.014) (0.096) 1e (0.01) (0.013) (0.093) p sel # SNPs Median F Egger PS RAPS 1e (0.051) (0.015) (0.021) 1e (0.046) (0.014) (0.018) 1e (0.043) (0.012) (0.016) 1e (0.041) (0.011) (0.016) 1e (0.037) (0.01) (0.015) 1e (0.033) (0.009) 0.66 (0.014) 1e (0.03) (0.008) (0.013) 1e (0.025) (0.008) (0.012) 40/44
38 This talk Paper: Statistical inference in two-sample Mendelian randomization using robust adjusted profile score. arxiv: Software: R package mr.raps is currently on CRAN. Can be directly called from the TwoSampleMR platform ( 42/44
39 Additional references I Overview of MR (Epidemiology): Davey Smith, G. and Ebrahim, S. (2003). Mendelian randomization : can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. Davey Smith, G. and Hemani, G. (2014). Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. Overview of MR (more Stats): Hernán, M. and Robins, J. (2006). Instruments for causal inference: an epidemiologist s dream?. Epidemiology. Didelez, V. and Sheehan N. (2007). Mendelian randomization as an instrumental variable approach to causal inference. Statistical Methods in Medical Research. Human genetics: Gamazon, E. et al. (2015). A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics. Pickrell, J. et al. (2016) Detection and interpretation of shared genetic influences on 42 human traits. Nature Genetics. 43/44
40 Additional references II Robust MR methods (individual-level data): Kang, H. et al. (2016). Instrumental Variables Estimation with Some Invalid Instruments and its Application to Mendelian Randomization. JASA. Tchetgen Tchetgen, E. et al. (2017). The GENIUS approach to robust Mendelian randomization inference. arxiv: Robust MR methods (summary-level data): Bowden, J. et al. (2015). Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology. Bowden, J. et al. (2016). Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genetic Epidemiology. Pleiotropy: Solovieff, N. et al. (2013). Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics. Boyle, E. et al. (2017). An expanded view of complex traits: from polygenic to omnigenic. Cell. 44/44
Mendelian randomization: From genetic association to epidemiological causation
Mendelian randomization: From genetic association to epidemiological causation Qingyuan Zhao Department of Statistics, The Wharton School, University of Pennsylvania April 24, 2018 2 C (Confounder) 1 Z
More informationStatistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score
Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score Qingyuan Zhao, Jingshu Wang, Jack Bowden, and Dylan S. Small Department of Statistics, The Wharton
More informationarxiv: v2 [stat.ap] 7 Feb 2018
STATISTICAL INFERENCE IN TWO-SAMPLE SUMMARY-DATA MENDELIAN RANDOMIZATION USING ROBUST ADJUSTED PROFILE SCORE arxiv:1801.09652v2 [stat.ap] 7 Feb 2018 By Qingyuan Zhao, Jingshu Wang, Gibran Hemani, Jack
More informationInstrumental variables & Mendelian randomization
Instrumental variables & Mendelian randomization Qingyuan Zhao Department of Statistics, The Wharton School, University of Pennsylvania May 9, 2018 @ JHU 2 C (Confounder) 1 Z (Gene) X (HDL) Y (Heart disease)
More informationRobust instrumental variable methods using multiple candidate instruments with application to Mendelian randomization
Robust instrumental variable methods using multiple candidate instruments with application to Mendelian randomization arxiv:1606.03729v1 [stat.me] 12 Jun 2016 Stephen Burgess 1, Jack Bowden 2 Frank Dudbridge
More informationA Comparison of Robust Methods for Mendelian Randomization Using Multiple Genetic Variants
8 A Comparison of Robust Methods for Mendelian Randomization Using Multiple Genetic Variants Yanchun Bao ISER, University of Essex Paul Clarke ISER, University of Essex Melissa C Smart ISER, University
More informationMendelian randomization (MR)
Mendelian randomization (MR) Use inherited genetic variants to infer causal relationship of an exposure and a disease outcome. 1 Concepts of MR and Instrumental variable (IV) methods motivation, assumptions,
More informationTwo-Sample Instrumental Variable Analyses using Heterogeneous Samples
Two-Sample Instrumental Variable Analyses using Heterogeneous Samples Qingyuan Zhao, Jingshu Wang, Wes Spiller Jack Bowden, and Dylan S. Small Department of Statistics, The Wharton School, University of
More informationHarvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen
Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen
More informationConfounder Adjustment in Multiple Hypothesis Testing
in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen
More informationExtending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy
Received: 20 October 2016 Revised: 15 August 2017 Accepted: 23 August 2017 DOI: 10.1002/sim.7492 RESEARCH ARTICLE Extending the MR-Egger method for multivariable Mendelian randomization to correct for
More informationCausal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables
Causal inference in biomedical sciences: causal models involving genotypes Causal models for observational data Instrumental variables estimation and Mendelian randomization Krista Fischer Estonian Genome
More informationRecent Challenges for Mendelian Randomisation Analyses
Recent Challenges for Mendelian Randomisation Analyses Vanessa Didelez Leibniz Institute for Prevention Research and Epidemiology & Department of Mathematics University of Bremen, Germany CRM Montreal,
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationUniversity of Bristol - Explore Bristol Research
Bowden, J., Del Greco M, F., Minelli, C., Davey Smith, G., Sheehan, N., & Thompson, J. (2017). A framework for the investigation of pleiotropy in twosample summary data Mendelian randomization. Statistics
More informationEstimating direct effects in cohort and case-control studies
Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationCausality and Experiments
Causality and Experiments Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania April 13, 2009 Michael R. Roberts Causality and Experiments 1/15 Motivation Introduction
More informationCross-Sectional Regression after Factor Analysis: Two Applications
al Regression after Factor Analysis: Two Applications Joint work with Jingshu, Trevor, Art; Yang Song (GSB) May 7, 2016 Overview 1 2 3 4 1 / 27 Outline 1 2 3 4 2 / 27 Data matrix Y R n p Panel data. Transposable
More informationLecture 14 October 13
STAT 383C: Statistical Modeling I Fall 2015 Lecture 14 October 13 Lecturer: Purnamrita Sarkar Scribe: Some one Disclaimer: These scribe notes have been slightly proofread and may have typos etc. Note:
More informationEmpirical Bayes Moderation of Asymptotically Linear Parameters
Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationExtending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy
Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy arxiv:1708.00272v1 [stat.me] 1 Aug 2017 Jessica MB Rees 1 Angela Wood 1 Stephen
More informationCausal exposure effect on a time-to-event response using an IV.
Faculty of Health Sciences Causal exposure effect on a time-to-event response using an IV. Torben Martinussen 1 Stijn Vansteelandt 2 Eric Tchetgen 3 1 Department of Biostatistics University of Copenhagen
More informationVarieties of Count Data
CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function
More informationPrevious lecture. Single variant association. Use genome-wide SNPs to account for confounding (population substructure)
Previous lecture Single variant association Use genome-wide SNPs to account for confounding (population substructure) Estimation of effect size and winner s curse Meta-Analysis Today s outline P-value
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationCasual Mediation Analysis
Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction
More informationSurvival Analysis for Case-Cohort Studies
Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz
More information6.3 How the Associational Criterion Fails
6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P
More informationControl Function Instrumental Variable Estimation of Nonlinear Causal Effect Models
Journal of Machine Learning Research 17 (2016) 1-35 Submitted 9/14; Revised 2/16; Published 2/16 Control Function Instrumental Variable Estimation of Nonlinear Causal Effect Models Zijian Guo Department
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint
More informationEconometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017
Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)
More informationMissing covariate data in matched case-control studies: Do the usual paradigms apply?
Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University
More informationWhen Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?
When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationCausal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD
Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable
More informationExtending causal inferences from a randomized trial to a target population
Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationExtending the results of clinical trials using data from a target population
Extending the results of clinical trials using data from a target population Issa Dahabreh Center for Evidence-Based Medicine, Brown School of Public Health Disclaimer Partly supported through PCORI Methods
More informationEmpirical Bayes Moderation of Asymptotically Linear Parameters
Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationMendelian randomization as an instrumental variable approach to causal inference
Statistical Methods in Medical Research 2007; 16: 309 330 Mendelian randomization as an instrumental variable approach to causal inference Vanessa Didelez Departments of Statistical Science, University
More informationCausal Mechanisms Short Course Part II:
Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal
More informationProportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power
Proportional Variance Explained by QTL and Statistical Power Partitioning the Genetic Variance We previously focused on obtaining variance components of a quantitative trait to determine the proportion
More information6. Assessing studies based on multiple regression
6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationThe GENIUS Approach to Robust Mendelian Randomization Inference
Original Article The GENIUS Approach to Robust Mendelian Randomization Inference Eric J. Tchetgen Tchetgen, BaoLuo Sun, Stefan Walter Department of Biostatistics,Harvard University arxiv:1709.07779v5 [stat.me]
More informationOn the Choice of Parameterisation and Priors for the Bayesian Analyses of Mendelian Randomisation Studies.
On the Choice of Parameterisation and Priors for the Bayesian Analyses of Mendelian Randomisation Studies. E. M. Jones 1, J. R. Thompson 1, V. Didelez, and N. A. Sheehan 1 1 Department of Health Sciences,
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationGeneralised method of moments estimation of structural mean models
Generalised method of moments estimation of structural mean models Tom Palmer 1,2 Roger Harbord 2 Paul Clarke 3 Frank Windmeijer 3,4,5 1. MRC Centre for Causal Analyses in Translational Epidemiology 2.
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationEMERGING MARKETS - Lecture 2: Methodology refresher
EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different
More informationSelecting (In)Valid Instruments for Instrumental Variables Estimation
Selecting InValid Instruments for Instrumental Variables Estimation Frank Windmeijer a,e, Helmut Farbmacher b, Neil Davies c,e George Davey Smith c,e, Ian White d a Department of Economics University of
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationModule 17: Bayesian Statistics for Genetics Lecture 4: Linear regression
1/37 The linear regression model Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression Ken Rice Department of Biostatistics University of Washington 2/37 The linear regression model
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationEstimating the Marginal Odds Ratio in Observational Studies
Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationGeneralised method of moments estimation of structural mean models
Generalised method of moments estimation of structural mean models Tom Palmer 1,2 Roger Harbord 2 Paul Clarke 3 Frank Windmeijer 3,4,5 1. MRC Centre for Causal Analyses in Translational Epidemiology 2.
More informationThe propensity score with continuous treatments
7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationMedian Cross-Validation
Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationProbabilistic Index Models
Probabilistic Index Models Jan De Neve Department of Data Analysis Ghent University M3 Storrs, Conneticut, USA May 23, 2017 Jan.DeNeve@UGent.be 1 / 37 Introduction 2 / 37 Introduction to Probabilistic
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationSociology 6Z03 Review I
Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing
More information8. Instrumental variables regression
8. Instrumental variables regression Recall: In Section 5 we analyzed five sources of estimation bias arising because the regressor is correlated with the error term Violation of the first OLS assumption
More informationLecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017
Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationPrentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)
National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the
More informationDe-mystifying random effects models
De-mystifying random effects models Peter J Diggle Lecture 4, Leahurst, October 2012 Linear regression input variable x factor, covariate, explanatory variable,... output variable y response, end-point,
More informationRerandomization to Balance Covariates
Rerandomization to Balance Covariates Kari Lock Morgan Department of Statistics Penn State University Joint work with Don Rubin University of Minnesota Biostatistics 4/27/16 The Gold Standard Randomized
More informationTGDR: An Introduction
TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationCausal Discovery by Computer
Causal Discovery by Computer Clark Glymour Carnegie Mellon University 1 Outline 1. A century of mistakes about causation and discovery: 1. Fisher 2. Yule 3. Spearman/Thurstone 2. Search for causes is statistical
More informationCausal Inference Basics
Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,
More informationSpatial Regression. 3. Review - OLS and 2SLS. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved
Spatial Regression 3. Review - OLS and 2SLS Luc Anselin http://spatial.uchicago.edu OLS estimation (recap) non-spatial regression diagnostics endogeneity - IV and 2SLS OLS Estimation (recap) Linear Regression
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationIV-estimators of the causal odds ratio for a continuous exposure in prospective and retrospective designs
IV-estimators of the causal odds ratio for a continuous exposure in prospective and retrospective designs JACK BOWDEN (corresponding author) MRC Biostatistics Unit, Institute of Public Health, Robinson
More informationSelective Inference for Effect Modification
Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.
More informationBayesian Hierarchical Models
Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY
ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com
More informationBasics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations
Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationDouble Robustness. Bang and Robins (2005) Kang and Schafer (2007)
Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random
More informationCausality II: How does causal inference fit into public health and what it is the role of statistics?
Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More information