Empirical Bayes Moderation of Asymptotically Linear Parameters

Size: px
Start display at page:

Download "Empirical Bayes Moderation of Asymptotically Linear Parameters"

Transcription

1 Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org github/nhejazi slides: goo.gl/6ou8yr

2 Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 1

3 Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 1

4 Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 1

5 Preview 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 4. The approach of moderated statistics easily extends to the case of asymptotically linear parameters. 1

6 Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. 2

7 Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Biomarkers of interest are in the form of mirna, assessed using the Illumina Human Ref-8 BeadChips platform. 2

8 Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Biomarkers of interest are in the form of mirna, assessed using the Illumina Human Ref-8 BeadChips platform. Occupational exposure to benzene reported as discrete values of interest (to epidemiologists): none, < 1ppm, > 5ppm. 2

9 Motivation: Let s meet the data Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Biomarkers of interest are in the form of mirna, assessed using the Illumina Human Ref-8 BeadChips platform. Occupational exposure to benzene reported as discrete values of interest (to epidemiologists): none, < 1ppm, > 5ppm. Background (phenotype-level) information available on each subject, including age, sex, smoking status. 2

10 Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b 3

11 Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b Generally, we have a particular model coefficent in which we are interested (e.g., effect of benzene on biomarker expression). 3

12 Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b Generally, we have a particular model coefficent in which we are interested (e.g., effect of benzene on biomarker expression). Controlling for baseline covariates, batch effects, and potential confounders happens by adding terms to the linear model. 3

13 Data analysis? Linear models! For each biomarker (b = 1,...,B), fit a linear model: E[y b ] = Xβ b Generally, we have a particular model coefficent in which we are interested (e.g., effect of benzene on biomarker expression). Controlling for baseline covariates, batch effects, and potential confounders happens by adding terms to the linear model. Test the coefficent of interest using a standard t-test: t b = ˆβ b β b,h0 s b 3

14 LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. 4

15 LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. 4

16 LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. This results in false positives. Smyth proposes we get around this by an empirical Bayes shrinkage of the s 2 b. 4

17 LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. This results in false positives. Smyth proposes we get around this by an empirical Bayes shrinkage of the s 2 b. Test the coefficent of interest with a moderated t-test: t b = ˆβ b β b,h0 s b, s 2 b = s2 b d b + s 2 0 d 0 d b + d 0 4

18 LIMMA: Linear Models for Microarray Data When the sample size is small, s 2 b may be so small that small differences ( ˆβ b β b,h0 ) lead to large t b. Uncertainty in the variance is an acute problem when the sample size is small. This results in false positives. Smyth proposes we get around this by an empirical Bayes shrinkage of the s 2 b. Test the coefficent of interest with a moderated t-test: t b = ˆβ b β b,h0 s b, s 2 b = s2 b d b + s 2 0 d 0 d b + d 0 Eliminates large t-statistics merely from very small s b. 4

19 Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? 5

20 Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? Such models are a matter of convenience and not honest scientific practice: does ˆβ b really answer our questions? 5

21 Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? Such models are a matter of convenience and not honest scientific practice: does ˆβ b really answer our questions? We can do better by using parameters motivated by causal models (n.b., these will reduce to variable importance measures in our case). 5

22 Beyond linear models It s not always desirable to specify a functional form: perhaps we can do better than linear models? Such models are a matter of convenience and not honest scientific practice: does ˆβ b really answer our questions? We can do better by using parameters motivated by causal models (n.b., these will reduce to variable importance measures in our case). As long as the parameters we seek to estimate have asymptotically linear estimators, we can readily apply the approach of moderated statistics. 5

23 Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] 6

24 Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] No need to specify a functional form or assume that we know the true data-generating distribution P 0. 6

25 Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] No need to specify a functional form or assume that we know the true data-generating distribution P 0. Parameters like this can be estimated using targeted minimum loss-based estimation (TMLE). 6

26 Target parameters for complex questions Rather than being satisfied with ˆβ b as an answer to our questions, let s consider a simple target parameter: the average treatment effect (ATE): Ψ b (P 0 ) = E W,0 [E 0 [Y b A = a high,w] E 0 [Y b A = a low,w]] No need to specify a functional form or assume that we know the true data-generating distribution P 0. Parameters like this can be estimated using targeted minimum loss-based estimation (TMLE). Asymptotic linearity: Ψ b (P n) Ψ b (P 0 ) = 1 n n i=1 IC(O i ) + o P ( 1 n ) 6

27 Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. 7

28 Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. Iterative procedure (though there is a one-step now) that updates an initial estimate of the relevant part (Q 0 ) of the data generating distribution (P 0 ). 7

29 Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. Iterative procedure (though there is a one-step now) that updates an initial estimate of the relevant part (Q 0 ) of the data generating distribution (P 0 ). Like corresponding A-IPTW estimators, removes asymptotic residual bias of initial estimator for the target parameter. If it uses a consistent estimator of g 0 (nuisance parameter), it is doubly robust. 7

30 Targeted Minimum Loss-Based Estimation TMLE produces a well-defined, unbiased, efficient substitution estimator of target parameters of a data-generating distribution. Iterative procedure (though there is a one-step now) that updates an initial estimate of the relevant part (Q 0 ) of the data generating distribution (P 0 ). Like corresponding A-IPTW estimators, removes asymptotic residual bias of initial estimator for the target parameter. If it uses a consistent estimator of g 0 (nuisance parameter), it is doubly robust. We can estimate the target parameter: Ψ b (P n) = 1 n n [Q (b,1) n i=1 (A i = a h,w i ) Q (b,1) n (A i = a l,w i )] 7

31 Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Q (b,1) n (a h,w i ) (1) 8

32 Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Sample variance of the influence curve: s 2 (IC n ) = 1 n n i=1 (IC n(o i )) 2 Q (b,1) n (a h,w i ) (1) 8

33 Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Sample variance of the influence curve: s 2 (IC n ) = 1 n n i=1 (IC n(o i )) 2 Q (b,1) n (a h,w i ) Use sample variance to estimate the standard error: s se n = 2 (IC n ) n (1) 8

34 Inference with influence curves The influence ( curve for the estimator is: 1(Ai = a IC b,n (O i ) = h ) g n (a h W i ) 1(A ) i = a l ) g n (a l W i ) (b,1) (Y b,i Q n (A i,w i )) + Q (b,1) n (a l,w i ) Ψ b (P n) Sample variance of the influence curve: s 2 (IC n ) = 1 n n i=1 (IC n(o i )) 2 Q (b,1) n (a h,w i ) Use sample variance to estimate the standard error: (1) s se n = 2 (IC n ) n Use this for inference that is, to derive uncertainty measures (i.e., p-values, confidence intervals). 8

35 Moderated statistics for target parameters One can define a standard t-test statistic for an estimator of an asymptotically linear parameter (over b = 1,...,B) as: n(ψb (P t b = n) Ψ 0 ) s b (IC b,n ) 9

36 Moderated statistics for target parameters One can define a standard t-test statistic for an estimator of an asymptotically linear parameter (over b = 1,...,B) as: n(ψb (P t b = n) Ψ 0 ) s b (IC b,n ) This naturally extends to the moderated t-statistic of Smyth: n(ψb (P t b = n) Ψ 0 ) s b where the posterior estimate of the variance of the influence curve is s 2 b = s2 b (IC b,n)d b + s 2 0 d 0 d b + d 0 9

37 An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. 10

38 An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. Essentially, transform original data matrix such that new entries are: Y b,i = IC b,n(o i ;P n ) + Ψ b (P n) 10

39 An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. Essentially, transform original data matrix such that new entries are: Y b,i = IC b,n(o i ;P n ) + Ψ b (P n) Since E[IC b,n ] = 0 across the columns (units) for each b, the average will be the original estimate Ψ b (P n). 10

40 An influence curve transform Need the estimate for each biomarker (b) and the IC for every observation for that biomarker, repeating for all b = 1,...,B. Essentially, transform original data matrix such that new entries are: Y b,i = IC b,n(o i ;P n ) + Ψ b (P n) Since E[IC b,n ] = 0 across the columns (units) for each b, the average will be the original estimate Ψ b (P n). For simplicity, let s assume the null value is Ψ 0 = 0 for all b. Then, applying the moderated t-test to Y b,i will generate corrected, conservative test statistics t b. 10

41 Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. 11

42 Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. To get a data-adaptive estimate, with standard implementation of these estimates, standard errors can be non-robust. 11

43 Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. To get a data-adaptive estimate, with standard implementation of these estimates, standard errors can be non-robust. Practically, significant estimates of variable importance measures may be driven by poorly and underestimated s 2 b (IC b,n). 11

44 Why moderated statistics in this context? Often times, such data analyses are based on relatively small samples. To get a data-adaptive estimate, with standard implementation of these estimates, standard errors can be non-robust. Practically, significant estimates of variable importance measures may be driven by poorly and underestimated s 2 b (IC b,n). Moderated statistics shrink these s 2 b (IC b,n) (making them bigger), thus taking biomarkers with small parameter estimates but very small s 2 b (IC b,n) out of statistical significance. 11

45 Software implementation: R/biotmle An R package that facilitates biomarker discovery by generalizing the moderated t-statistic of Smyth for use with asymptotically linear parameters. 12

46 Software implementation: R/biotmle An R package that facilitates biomarker discovery by generalizing the moderated t-statistic of Smyth for use with asymptotically linear parameters. Check it out on GitHub: nhejazi/biotmle 12

47 Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. 13

48 Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. 13

49 Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. 13

50 Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. Outcome Y is mirna expression, median normalized. 13

51 Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. Outcome Y is mirna expression, median normalized. Estimate the parameter: Ψ b (P n) = E[E[Y b A = max(a),w] E[Y b A = min(a),w]] 13

52 Data analysis with R/biotmle Observational study of the impact of occupational exposure (to benzene), with data collected on 125 subjects and roughly 22, 000 biomarkers. Baseline covariates W: age, sex, smoking status; all were discretized. Treatment A is degree of Benzene exposure: none, < 1ppm, and > 5ppm. Outcome Y is mirna expression, median normalized. Estimate the parameter: Ψ b (P n) = E[E[Y b A = max(a),w] E[Y b A = min(a),w]] Apply moderated t-test as previously discussed. 13

53 Analysis results I: Uncorrected tests Histogram of raw p values (applying Limma shrinkage to TMLE results) count magnitude of raw p values 14

54 Analysis results II: Corrected tests Histogram of FDR corrected p values (BH) (applying Limma shrinkage to TMLE results) count magnitude of BH corrected p values 15

55 Analysis results III: Volcano plot 16

56 Analysis results IV: Heatmap of IC estimates 17

57 Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 18

58 Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 18

59 Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 18

60 Review 1. Linear models are the standard approach for analyzing microarray and next-generation sequencing data (e.g., R package limma ). 2. Moderated statistics help reduce false positives by using an empirical Bayes method to perform standard deviation shrinkage for test statistics. 3. Beyond linear models: we can assess evidence using parameters that are more scientifically interesting (e.g., ATE) by way of TMLE. 4. The approach of moderated statistics easily extends to the case of asymptotically linear parameters. 18

61 References I Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), pages

62 References II Smyth, G. K. (2005). Limma: linear models for microarray data. In Bioinformatics and computational biology solutions using R and Bioconductor, pages Springer. Tuglus, C. and van der Laan, M. J. (2011). Targeted methods for biomarker discovery. In Targeted Learning, pages Springer. van der Laan, M. J. and Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

63 Acknowledgments Alan Hubbard Mark van der Laan University of California, Berkeley University of California, Berkeley 21

64 Slides: goo.gl/6ou8yr stat.berkeley.edu/~nhejazi nimahejazi.org github/nhejazi 22

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Targeted Learning for High-Dimensional Variable Importance

Targeted Learning for High-Dimensional Variable Importance Targeted Learning for High-Dimensional Variable Importance Alan Hubbard, Nima Hejazi, Wilson Cai, Anna Decker Division of Biostatistics University of California, Berkeley July 27, 2016 for Centre de Recherches

More information

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths for New Developments in Nonparametric and Semiparametric Statistics, Joint Statistical Meetings; Vancouver, BC,

More information

Statistical Inference for Data Adaptive Target Parameters

Statistical Inference for Data Adaptive Target Parameters Statistical Inference for Data Adaptive Target Parameters Mark van der Laan, Alan Hubbard Division of Biostatistics, UC Berkeley December 13, 2013 Mark van der Laan, Alan Hubbard ( Division of Biostatistics,

More information

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 290 Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome Mark

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark

More information

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Data Adaptive Estimation of the Treatment Specific Mean in Causal Inference R-package cvdsa

Data Adaptive Estimation of the Treatment Specific Mean in Causal Inference R-package cvdsa Data Adaptive Estimation of the Treatment Specific Mean in Causal Inference R-package cvdsa Yue Wang Division of Biostatistics Nov. 2004 Nov. 8, 2004 1 Outlines Introduction: Data structure and Marginal

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Mark J. van der Laan 1 University of California, Berkeley School of Public Health laan@berkeley.edu

More information

Low-Level Analysis of High- Density Oligonucleotide Microarray Data

Low-Level Analysis of High- Density Oligonucleotide Microarray Data Low-Level Analysis of High- Density Oligonucleotide Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley UC Berkeley Feb 23, 2004 Outline

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 282 Super Learner Based Conditional Density Estimation with Application to Marginal Structural

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

Robust Semiparametric Regression Estimation Using Targeted Maximum Likelihood with Application to Biomarker Discovery and Epidemiology

Robust Semiparametric Regression Estimation Using Targeted Maximum Likelihood with Application to Biomarker Discovery and Epidemiology Robust Semiparametric Regression Estimation Using Targeted Maximum Likelihood with Application to Biomarker Discovery and Epidemiology by Catherine Ann Tuglus A dissertation submitted in partial satisfaction

More information

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018 Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 248 Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Kelly

More information

Matching Methods for Observational Microarray Studies

Matching Methods for Observational Microarray Studies Bioinformatics Advance Access published December 19, 2008 Matching Methods for Observational Microarray Studies Ruth Heller 1,, Elisabetta Manduchi 2 and Dylan Small 1 1 Department of Statistics, Wharton

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE: #+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research Chapter 4: Efficient, doubly-robust estimation of an average treatment effect David Benkeser

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings

Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings Vector-Based Kernel Weighting: A Simple Estimator for Improving Precision and Bias of Average Treatment Effects in Multiple Treatment Settings Jessica Lum, MA 1 Steven Pizer, PhD 1, 2 Melissa Garrido,

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 164 Multiple Testing Procedures: R multtest Package and Applications to Genomics Katherine

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE A HYPOTHESIS TEST APPROACH Ismaïl Ahmed 1,2, Françoise Haramburu 3,4, Annie Fourrier-Réglat 3,4,5, Frantz Thiessard 4,5,6,

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

Manual: R package HTSmix

Manual: R package HTSmix Manual: R package HTSmix Olga Vitek and Danni Yu May 2, 2011 1 Overview High-throughput screens (HTS) measure phenotypes of thousands of biological samples under various conditions. The phenotypes are

More information

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Summary and discussion of: Controlling the False Discovery Rate via Knockoffs Statistics Journal Club, 36-825 Sangwon Justin Hyun and William Willie Neiswanger 1 Paper Summary 1.1 Quick intuitive summary

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 269 Diagnosing and Responding to Violations in the Positivity Assumption Maya L. Petersen

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 155 Estimation of Direct and Indirect Causal Effects in Longitudinal Studies Mark J. van

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 13 Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates Sandrine Dudoit Mark

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials From the SelectedWorks of Paul H. Chaffee June 22, 2012 Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials Paul Chaffee Mark J. van der Laan

More information

Causal inference in epidemiological practice

Causal inference in epidemiological practice Causal inference in epidemiological practice Willem van der Wal Biostatistics, Julius Center UMC Utrecht June 5, 2 Overview Introduction to causal inference Marginal causal effects Estimating marginal

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

R Lab 6 - Inference. Harvard University. From the SelectedWorks of Laura B. Balzer

R Lab 6 - Inference. Harvard University. From the SelectedWorks of Laura B. Balzer Harvard University From the SelectedWorks of Laura B. Balzer Fall 2013 R Lab 6 - Inference Laura Balzer, University of California, Berkeley Maya Petersen, University of California, Berkeley Alexander Luedtke,

More information

Confounder Adjustment in Multiple Hypothesis Testing

Confounder Adjustment in Multiple Hypothesis Testing in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 259 Targeted Maximum Likelihood Based Causal Inference Mark J. van der Laan University of

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2005 Paper 191 Population Intervention Models in Causal Inference Alan E. Hubbard Mark J. van der Laan

More information

Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter

Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter Doing Right By Massive Data: How To Bring Probability Modeling To The Analysis Of Huge Datasets Without Taking Over The Datacenter Alexander W Blocker Pavlos Protopapas Xiao-Li Meng 9 February, 2010 Outline

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 327 Entering the Era of Data Science: Targeted Learning and the Integration of Statistics

More information

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials Antoine Chambaz (MAP5, Université Paris Descartes) joint work with Mark van der Laan Atelier INSERM

More information

Estimation of the False Discovery Rate

Estimation of the False Discovery Rate Estimation of the False Discovery Rate Coffee Talk, Bioinformatics Research Center, Sept, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS. Stanley A. Lubanski. and. Peter M.

AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS. Stanley A. Lubanski. and. Peter M. AN EVALUATION OF PARAMETRIC AND NONPARAMETRIC VARIANCE ESTIMATORS IN COMPLETELY RANDOMIZED EXPERIMENTS by Stanley A. Lubanski and Peter M. Steiner UNIVERSITY OF WISCONSIN-MADISON 018 Background To make

More information

Cross-Sectional Regression after Factor Analysis: Two Applications

Cross-Sectional Regression after Factor Analysis: Two Applications al Regression after Factor Analysis: Two Applications Joint work with Jingshu, Trevor, Art; Yang Song (GSB) May 7, 2016 Overview 1 2 3 4 1 / 27 Outline 1 2 3 4 2 / 27 Data matrix Y R n p Panel data. Transposable

More information

Statistical tests for differential expression in count data (1)

Statistical tests for differential expression in count data (1) Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure arxiv:1706.02675v2 [stat.me] 2 Apr 2018 Laura B. Balzer, Wenjing Zheng,

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Identification and Inference in Causal Mediation Analysis

Identification and Inference in Causal Mediation Analysis Identification and Inference in Causal Mediation Analysis Kosuke Imai Luke Keele Teppei Yamamoto Princeton University Ohio State University November 12, 2008 Kosuke Imai (Princeton) Causal Mediation Analysis

More information

Robust statistics. Michael Love 7/10/2016

Robust statistics. Michael Love 7/10/2016 Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

Mediation Analysis for Health Disparities Research

Mediation Analysis for Health Disparities Research Mediation Analysis for Health Disparities Research Ashley I Naimi, PhD Oct 27 2016 @ashley_naimi wwwashleyisaacnaimicom ashleynaimi@pittedu Orientation 24 Numbered Equations Slides at: wwwashleyisaacnaimicom/slides

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 11 1 / 44 Tip + Paper Tip: Two today: (1) Graduate school

More information

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber Collaborative Targeted Maximum Likelihood Estimation by Susan Gruber A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics in the

More information

Limma = linear models for microarray data. Linear models and Limma. Expression measures. Questions of Interest. y ga = log 2 (R/G) y ga =

Limma = linear models for microarray data. Linear models and Limma. Expression measures. Questions of Interest. y ga = log 2 (R/G) y ga = Linear models and Limma Københavns Universitet, 19 August 2009 Mark D. Robinson Bioinformatics, Walter+Eliza Hall Institute Epigenetics Laboratory, Garvan Institute (with many slides taken from Gordon

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 341 The Statistics of Sensitivity Analyses Alexander R. Luedtke Ivan Diaz Mark J. van der

More information

Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments

Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments for the Computational Biology Doctoral Seminar (CMPBIO 293), organized by N. Yosef & T. Ashuach, Spring 2018, UC Berkeley

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis March 18, 2016 UVA Seminar RNA Seq 1 RNA Seq Gene expression is the transcription of the

More information