Empirical Bayes Moderation of Asymptotically Linear Parameters

Similar documents
Empirical Bayes Moderation of Asymptotically Linear Parameters

Targeted Learning for High-Dimensional Variable Importance

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Statistical Inference for Data Adaptive Target Parameters

Targeted Maximum Likelihood Estimation in Safety Analysis

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Causal Inference Basics

University of California, Berkeley

University of California, Berkeley

Statistical testing. Samantha Kleinberg. October 20, 2009

Lesson 11. Functional Genomics I: Microarray Analysis

University of California, Berkeley

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

University of California, Berkeley

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

High-throughput Testing

Data Adaptive Estimation of the Treatment Specific Mean in Causal Inference R-package cvdsa

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

The International Journal of Biostatistics

Association studies and regression

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

Integrated approaches for analysis of cluster randomised trials

University of California, Berkeley

Selective Inference for Effect Modification

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

University of California, Berkeley

Robust Semiparametric Regression Estimation Using Targeted Maximum Likelihood with Application to Biomarker Discovery and Epidemiology

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018

University of California, Berkeley

Matching Methods for Observational Microarray Studies

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Advanced Statistical Methods: Beyond Linear Regression

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

2.1 Linear regression with matrices

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research

Statistical Applications in Genetics and Molecular Biology

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings

36-463/663: Multilevel & Hierarchical Models

University of California, Berkeley

Bayesian Linear Regression

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

University of California, Berkeley

Combining multiple observational data sources to estimate causal eects

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Manual: R package HTSmix

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Step-down FDR Procedures for Large Numbers of Hypotheses

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Review of Statistics 101

University of California, Berkeley

MSA220/MVE440 Statistical Learning for Big Data

High-dimensional regression

University of California, Berkeley

Statistical Applications in Genetics and Molecular Biology

Parametric Empirical Bayes Methods for Microarrays

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Causal inference in epidemiological practice

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

A Sampling of IMPACT Research:

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Non-specific filtering and control of false positives

R Lab 6 - Inference. Harvard University. From the SelectedWorks of Laura B. Balzer

Confounder Adjustment in Multiple Hypothesis Testing

University of California, Berkeley

University of California, Berkeley

Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

University of California, Berkeley

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials

Estimation of the False Discovery Rate

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS. Stanley A. Lubanski. and. Peter M.

Cross-Sectional Regression after Factor Analysis: Two Applications

Statistical tests for differential expression in count data (1)

Propensity Score Methods for Causal Inference

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

High-Throughput Sequencing Course

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Identification and Inference in Causal Mediation Analysis

Robust statistics. Michael Love 7/10/2016

Exam: high-dimensional data analysis January 20, 2014

Data Mining Stat 588

MS&E 226: Small Data

Mediation Analysis for Health Disparities Research

Biostatistics Advanced Methods in Biostatistics IV

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber

Limma = linear models for microarray data. Linear models and Limma. Expression measures. Questions of Interest. y ga = log 2 (R/G) y ga =

University of California, Berkeley

Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments

WU Weiterbildung. Linear Mixed Models

Research Article Sample Size Calculation for Controlling False Discovery Proportion

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

Transcription:

Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi slides: goo.gl/6ou8yr

Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 1

Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 1

Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 1

Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 4. The approach of moderated statistics easily extends to the case of asymptotically linear parameters. 1

Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. 2

Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Biomarkers of interest are in the form of mirna, assessed using the Illumina Human Ref-8 BeadChips platform. 2

Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Biomarkers of interest are in the form of mirna, assessed using the Illumina Human Ref-8 BeadChips platform. Occupational exposure to benzene reported as discrete values of interest (to epidemiologists): none, < 1ppm, > 5ppm. 2

Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Biomarkers of interest are in the form of mirna, assessed using the Illumina Human Ref-8 BeadChips platform. Occupational exposure to benzene reported as discrete values of interest (to epidemiologists): none, < 1ppm, > 5ppm. Background (phenotype-level) information available on each subject, including age, sex, smoking status. 2

Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b 3

Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b Generally, we have a particular model coefficent in which we are interested (e.g., effect of benzene on biomarker expression). 3

Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b Generally, we have a particular model coefficent in which we are interested (e.g., effect of benzene on biomarker expression). Controlling for baseline covariates, batch effects, and potential confounders happens by adding terms to the linear model. 3

Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b Generally, we have a particular model coefficent in which we are interested (e.g., effect of benzene on biomarker expression). Controlling for baseline covariates, batch effects, and potential confounders happens by adding terms to the linear model. Test the coefficent of interest using a standard t-test: t b = ˆβ b β b,h0 s b 3

LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. 4

LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. 4

LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. This results in false positives. Smyth proposes we get around this by an empirical Bayes shrinkage of the s 2 b. 4

LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. This results in false positives. Smyth proposes we get around this by an empirical Bayes shrinkage of the s 2 b. Test the coefficent of interest with a moderated t-test: t b = ˆβ b β b,h0 s b, s 2 b = s2 b d b + s 2 0 d 0 d b + d 0 4

LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. This results in false positives. Smyth proposes we get around this by an empirical Bayes shrinkage of the s 2 b. Test the coefficent of interest with a moderated t-test: t b = ˆβ b β b,h0 s b, s 2 b = s2 b d b + s 2 0 d 0 d b + d 0 Eliminates large t-statistics merely from very small s b. 4

Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? 5

Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? Such models are a matter of convenience and not honest scientific practice: does ˆβ b really answer our questions? 5

Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? Such models are a matter of convenience and not honest scientific practice: does ˆβ b really answer our questions? We can do better by using parameters motivated by causal models (n.b., these will reduce to variable importance measures in our case). 5

Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? Such models are a matter of convenience and not honest scientific practice: does ˆβ b really answer our questions? We can do better by using parameters motivated by causal models (n.b., these will reduce to variable importance measures in our case). As long as the parameters we seek to estimate have asymptotically linear estimators, we can readily apply the approach of moderated statistics. 5

Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] 6

Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] No need to specify a functional form or assume that we know the true data-generating distribution P 0. 6

Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] No need to specify a functional form or assume that we know the true data-generating distribution P 0. Parameters like this can be estimated using targeted minimum loss-based estimation (TMLE). 6

Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] No need to specify a functional form or assume that we know the true data-generating distribution P 0. Parameters like this can be estimated using targeted minimum loss-based estimation (TMLE). Asymptotic linearity: Ψ b (P n) Ψ b (P 0 ) = 1 n n i=1 IC(O i ) + o P ( 1 n ) 6

Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. 7

Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. Iterative procedure (though there is a one-step now) that updates an initial estimate of the relevant part (Q 0 ) of the data generating distribution (P 0 ). 7

Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. Iterative procedure (though there is a one-step now) that updates an initial estimate of the relevant part (Q 0 ) of the data generating distribution (P 0 ). Like corresponding A-IPTW estimators, removes asymptotic residual bias of initial estimator for the target parameter. If it uses a consistent estimator of g 0 (nuisance parameter), it is doubly robust. 7

Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. Iterative procedure (though there is a one-step now) that updates an initial estimate of the relevant part (Q 0 ) of the data generating distribution (P 0 ). Like corresponding A-IPTW estimators, removes asymptotic residual bias of initial estimator for the target parameter. If it uses a consistent estimator of g 0 (nuisance parameter), it is doubly robust. We can estimate the target parameter: Ψ b (P n) = 1 n n [Q (b,1) n i=1 (A i = a h,w i ) Q (b,1) n (A i = a l,w i )] 7

Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Q (b,1) n (a h,w i ) (1) 8

Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Sample variance of the influence curve: s 2 (IC n ) = 1 n n i=1 (IC n(o i )) 2 Q (b,1) n (a h,w i ) (1) 8

Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Sample variance of the influence curve: s 2 (IC n ) = 1 n n i=1 (IC n(o i )) 2 Q (b,1) n (a h,w i ) Use sample variance to estimate the standard error: s se n = 2 (IC n ) n (1) 8

Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Sample variance of the influence curve: s 2 (IC n ) = 1 n n i=1 (IC n(o i )) 2 Q (b,1) n (a h,w i ) Use sample variance to estimate the standard error: (1) s se n = 2 (IC n ) n Use this for inference that is, to derive uncertainty measures (i.e., p-values, confidence intervals). 8

Moderated statistics for target parameters One can define a standard t-test statistic for an estimator of an asymptotically linear parameter (over b = 1,...,B) as: n(ψb (P t b = n) Ψ 0 ) s b (IC b,n ) 9

Moderated statistics for target parameters One can define a standard t-test statistic for an estimator of an asymptotically linear parameter (over b = 1,...,B) as: n(ψb (P t b = n) Ψ 0 ) s b (IC b,n ) This naturally extends to the moderated t-statistic of Smyth: n(ψb (P t b = n) Ψ 0 ) s b where the posterior estimate of the variance of the influence curve is s 2 b = s2 b (IC b,n)d b + s 2 0 d 0 d b + d 0 9

An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. 10

An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. Essentially, transform original data matrix such that new entries are: Y b,i = IC b,n(o i ;P n ) + Ψ b (P n) 10

An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. Essentially, transform original data matrix such that new entries are: Y b,i = IC b,n(o i ;P n ) + Ψ b (P n) Since E[IC b,n ] = 0 across the columns (units) for each b, the average will be the original estimate Ψ b (P n). 10

An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. Essentially, transform original data matrix such that new entries are: Y b,i = IC b,n(o i ;P n ) + Ψ b (P n) Since E[IC b,n ] = 0 across the columns (units) for each b, the average will be the original estimate Ψ b (P n). For simplicity, let s assume the null value is Ψ 0 = 0 for all b. Then, applying the moderated t-test to Y b,i will generate corrected, conservative test statistics t b. 10

Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. 11

Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. To get a data-adaptive estimate, with standard implementation of these estimates, standard errors can be non-robust. 11

Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. To get a data-adaptive estimate, with standard implementation of these estimates, standard errors can be non-robust. Practically, significant estimates of variable importance measures may be driven by poorly and underestimated s 2 b (IC b,n). 11

Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. To get a data-adaptive estimate, with standard implementation of these estimates, standard errors can be non-robust. Practically, significant estimates of variable importance measures may be driven by poorly and underestimated s 2 b (IC b,n). Moderated statistics shrink these s 2 b (IC b,n) (making them bigger), thus taking biomarkers with small parameter estimates but very small s 2 b (IC b,n) out of statistical significance. 11

Software implementation: R/biotmle An R package that facilitates biomarker discovery by generalizing the moderated t-statistic of Smyth for use with asymptotically linear parameters. 12

Software implementation: R/biotmle An R package that facilitates biomarker discovery by generalizing the moderated t-statistic of Smyth for use with asymptotically linear parameters. Check it out on GitHub: nhejazi/biotmle 12

Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. 13

Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. 13

Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. 13

Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. Outcome Y is mirna expression, median normalized. 13

Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. Outcome Y is mirna expression, median normalized. Estimate the parameter: Ψ b (P n) = E[E[Y b A = max(a),w] E[Y b A = min(a),w]] 13

Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. Outcome Y is mirna expression, median normalized. Estimate the parameter: Ψ b (P n) = E[E[Y b A = max(a),w] E[Y b A = min(a),w]] Apply moderated t-test as previously discussed. 13

Analysis results I: Uncorrected tests Histogram of raw p values (applying Limma shrinkage to TMLE results) 8000 6000 count 4000 2000 4000 6000 8000 2000 0 0.00 0.25 0.50 0.75 1.00 magnitude of raw p values 14

Analysis results II: Corrected tests Histogram of FDR corrected p values (BH) (applying Limma shrinkage to TMLE results) 15000 count 10000 4000 8000 12000 16000 5000 0 0.00 0.25 0.50 0.75 1.00 magnitude of BH corrected p values 15

Analysis results III: Volcano plot 16

Analysis results IV: Heatmap of IC estimates 17

Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 18

Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 18

Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 18

Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 4. The approach of moderated statistics easily extends to the case of asymptotically linear parameters. 18

References I Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), pages 289 300.

References II Smyth, G. K. (2005). Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor, pages 397 420. Springer. Tuglus, C. and van der Laan, M. J. (2011). Targeted methods for biomarker discovery. In Targeted Learning, pages 367 382. Springer. van der Laan, M. J. and Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

Acknowledgments Alan Hubbard Mark van der Laan University of California, Berkeley University of California, Berkeley 21

Slides: goo.gl/6ou8yr stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi 22