Targeted Learning. Sherri Rose. April 24, Associate Professor Department of Health Care Policy Harvard Medical School

Size: px
Start display at page:

Download "Targeted Learning. Sherri Rose. April 24, Associate Professor Department of Health Care Policy Harvard Medical School"

Transcription

1 Targeted Learning Sherri Rose Associate Professor Department of Health Care Policy Harvard Medical School Slides: drsherrirosecom/short-courses Code: githubcom/sherrirose/cncshortcourse April 24, 2017

2 Goals 1 Understand shortcomings of parametric regression-based techniques for the estimation of prediction and causal effect quantities 2 Be introduced to the ideas behind machine learning approaches as tools for confronting the curse of dimensionality 3 Become familiar with the properties and basic implementation of the super learner for prediction and TMLE for effect estimation

3 [Motivation]

4 PLoS Medicine wwwplosmedicineorg 0696 Essay Open access, freely available online Why Most Published Research Findings Are False John P A Ioannidis Summary There is increasing concern that most current published research findings are false The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias In this essay, I discuss the implications of these problems for the conduct and interpretation of research ublished research findings are sometimes refuted by subsequent Pevidence, with ensuing confusion and disappointment Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1 3] to the most modern molecular research [4,5] There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6 8] However, this should not be surprising It can be proven that most claimed research findings are false Here I will examine the key The Essay section contains opinion pieces on topics of broad interest to a general medical audience factors that influence this problem and some corollaries thereof Modeling the Framework for False Positive Findings Several methodologists have pointed out [9 11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 005 Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles It can be proven that most claimed research findings are false should be interpreted based only on p-values Research findings are defined here as any relationship reaching formal statistical significance, eg, effective interventions, informative predictors, risk factors, or associations Negative research is also very useful Negative is actually a misnomer, and the misinterpretation is widespread However, here we will target relationships that investigators claim exist, rather than null findings As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [10,11] Consider a 2 2 table in which research findings are compared against the gold standard of true relationships in a scientific field In a research field both true and false hypotheses can be made about the presence of relationships Let R be the ratio of the number of true relationships to no relationships among those tested in the field R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships The pre-study probability of a relationship being true is R (R + 1) The probability of a study finding a true relationship reflects the power 1 β (one minus the Type II error rate) The probability of claiming a relationship when none truly exists reflects the Type I error rate, α Assuming that c relationships are being probed in the field, the expected values of the 2 2 table are given in Table 1 After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV The PPV is also the complementary probability of what Wacholder et al have called the false positive report probability [10] According to the 2 2 table, one gets PPV = (1 β)r (R βr + α) A research finding is thus Citation: Ioannidis JPA (2005) Why most published research findings are false PLoS Med 2(8): e124 Copyright: 2005 John P A Ioannidis This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited Abbreviation: PPV, positive predictive value John P A Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America jioannid@ccuoigr Competing Interests: The author has declared that no competing interests exist DOI: /journalpmed August 2005 Volume 2 Issue 8 e124

5 PLoS Medicine wwwplosmedicineorg 0696 Essay Open access, freely available online Why Most Published Research Findings Are False John P A Ioannidis Summary There is increasing concern that most current published research findings are false The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias In this essay, I discuss the implications of these problems for the conduct and interpretation of research ublished research findings are sometimes refuted by subsequent Pevidence, with ensuing confusion and disappointment Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1 3] to the most modern molecular research [4,5] There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6 8] However, this should not be surprising It can be proven that most claimed research findings are false Here I will examine the key The Essay section contains opinion pieces on topics of broad interest to a general medical audience factors that influence this problem and some corollaries thereof Modeling the Framework for False Positive Findings Several methodologists have pointed out [9 11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 005 Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles It can be proven that most claimed research findings are false should be interpreted based only on p-values Research findings are defined here as any relationship reaching formal statistical significance, eg, effective interventions, informative predictors, risk factors, or associations Negative research is also very useful Negative is actually a misnomer, and the misinterpretation is widespread However, here we will target relationships that investigators claim exist, rather than null findings As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [10,11] Consider a 2 2 table in which research findings are compared against the gold standard of true relationships in a scientific field In a research field both true and false hypotheses can be made about the presence of relationships Let R be the ratio of the number of true relationships to no relationships among those tested in the field R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships The pre-study probability of a relationship being true is R (R + 1) The probability of a study finding a true relationship reflects the power 1 β (one minus the Type II error rate) The probability of claiming a relationship when none truly exists reflects the Type I error rate, α Assuming that c relationships are being probed in the field, the expected values of the 2 2 table are given in Table 1 After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV The PPV is also the complementary probability of what Wacholder et al have called the false positive report probability [10] According to the 2 2 table, one gets PPV = (1 β)r (R βr + α) A research finding is thus Citation: Ioannidis JPA (2005) Why most published research findings are false PLoS Med 2(8): e124 Copyright: 2005 John P A Ioannidis This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited Abbreviation: PPV, positive predictive value John P A Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America jioannid@ccuoigr Competing Interests: The author has declared that no competing interests exist DOI: /journalpmed August 2005 Volume 2 Issue 8 e124

6

7 Electronic Health Databases The increasing availability of electronic medical records offers a new resource to public health researchers General usefulness of this type of data to answer targeted scientific research questions is an open question Need novel statistical methods that have desirable statistical properties while remaining computationally feasible

8 Electronic Health Databases FDA s Sentinel Initiative aims to monitor drugs and medical devices for safety over time already has access to 100 million people and their medical records The $3 million Heritage Health Prize Competition where the goal was to predict future hospitalizations using existing high-dimensional patient data

9 Electronic Health Databases Truven MarketScan database Contains information on enrollment and claims from private health plans and employers Health Insurance Marketplace has enrolled over 10 million people

10 High Dimensional Big Data Parametric Regression Often dozens, hundreds, or even thousands of potential variables

11 High Dimensional Big Data Parametric Regression Often dozens, hundreds, or even thousands of potential variables Impossible challenge to correctly specify the parametric regression

12 High Dimensional Big Data Parametric Regression Often dozens, hundreds, or even thousands of potential variables Impossible challenge to correctly specify the parametric regression May have more unknown parameters than observations

13 High Dimensional Big Data Parametric Regression Often dozens, hundreds, or even thousands of potential variables Impossible challenge to correctly specify the parametric regression May have more unknown parameters than observations True functional might be described by a complex function not easily approximated by main terms or interaction terms

14 Complications of Human Art in Big Data Statistics 1 Fit several parametric models; select a favorite one 2 Parametric model is misspecified 3 Target parameter is interpreted as if the parametric model is correct 4 Parametric model is often data-adaptively (or worse!) built, and this part of the estimation procedure is not accounted for in the variance

15 Estimation is a Science 1 Data: realizations of random variables with a probability distribution 2 Statistical Model: actual knowledge about the shape of the data-generating probability distribution 3 Statistical Target Parameter: a feature/function of the data-generating probability distribution 4 Estimator: an a priori-specified algorithm, benchmarked by a dissimilarity-measure (eg, MSE) wrt target parameter

16 Roadmap for Effect Estimation How does one translate the results from studies, how do we take the information in the data, and draw effective conclusions? Define the Research Question Specify Data Specify Model Specify the Parameter of Interest Estimate the Target Parameter Inference Standard Errors / CIs Interpretation

17 Targeted Learning in Nonparametric Models Parametric MLE not targeted for effect parameters Need a subsequent targeted bias-reduction step Targeted Learning Avoid reliance on human art and unrealistic parametric models Define interesting parameters Target the parameter of interest Incorporate machine learning Statistical inference

18 Targeted Learning Super Learner Allows researchers to use multiple algorithms to outperform a single algorithm in nonparametric statistical models Builds weighted combination of estimators where weights are optimized based on loss-function specific cross-validation to guarantee best overall fit Targeted Maximum Likelihood Estimation With an initial estimate of the outcome regression, the second stage of TMLE updates this initial fit in a step targeted toward making an optimal bias-variance tradeoff for the parameter of interest

19 TMLE for Causal Effects TMLE: Double Robust Removes asymptotic residual bias of initial estimator for the target parameter, if it uses a consistent estimator of censoring/treatment mechanism g 0 If initial estimator was consistent for the target parameter, the additional fitting of the data in the targeting step may remove finite sample bias, and preserves consistency property of the initial estimator TMLE: Efficiency If the initial estimator and the estimator of g 0 are both consistent, then it is also asymptotically efficient according to semi-parametric statistical model efficiency theory

20 [Defining the Research Question]

21 Learning from Data Just what type of studies are we conducting? The often quoted ideal experiment is one that cannot be conducted in real life EXPOSED UNEXPOSED EXPOSED UNEXPOSED Subject 1 Subject 1 Subject 1 Subject 2 Subject 2 Subject 2 Subject 3 Subject 3 Subject 3 IDEAL EXPERIMENT REAL-WORLD STUDY

22 Data Random variable O, observed n times, could be defined in a simple case as O = (W, A, Y ) P 0 if we are without common issues such as missingness and censoring W : vector of covariates A: exposure or treatment Y : outcome This data structure makes for effective examples, but data structures found in practice are frequently more complicated

23 Data: Censoring & Missingness Define O = (W, A, T, ) P 0 T : time to event Y C: censoring time T = min(t, C): represents the T or C that was observed first = I (T T ) = I (C T ): indicator that T was observed at or before C Define O = (W, A,, Y ) P 0 : Indicator of missingness

24 Model General case: Observe n iid copies of random variable O with probability distribution P 0 The data-generating distribution P 0 is also known to be an element of a statistical model M: P 0 M A statistical model M is the set of possible probability distributions for P 0 ; it is a collection of probability distributions If all we know is that we have n iid copies of O, this can be our statistical model, which we call a nonparametric statistical model

25 Model A statistical model can be augmented with additional (nontestable causal) assumptions, allowing one to enrich the interpretation of Ψ(P 0 ) This does not change the statistical model

26 Target Parameters Define the parameter of the probability distribution P as function of P : Ψ(P) ψ RD = Ψ RD (P) = E W [E(Y A = 1, W ) E(Y A = 0, W )] = E(Y 1 ) E(Y 0 ) = P(Y 1 = 1) P(Y 0 = 1) and ψ RR = P(Y 1 = 1) P(Y 0 = 1) ψ OR = P(Y 1 = 1)P(Y 0 = 0) P(Y 1 = 0)P(Y 0 = 1) Y is the outcome, A the exposure, and W baseline covariates

27 Effect Estimation vs Prediction Both effect and prediction research questions are inherently estimation questions, but they are distinct in their goals

28 Effect Estimation vs Prediction Both effect and prediction research questions are inherently estimation questions, but they are distinct in their goals Effect: Interested in estimating the effect of exposure on outcome adjusted for covariates

29 Effect Estimation vs Prediction Both effect and prediction research questions are inherently estimation questions, but they are distinct in their goals Effect: Interested in estimating the effect of exposure on outcome adjusted for covariates Prediction: Interested in generating a function to input covariates and predict a value for the outcome

30 [Prediction with Super Learning]

31 Prediction Standard practice involves assuming a parametric statistical model & using maximum likelihood to estimate the parameters in that statistical model

32 Prediction: The Goal Flexible algorithm to estimate the regression function E 0 (Y W ) Y outcome W covariates

33 Prediction: Big Picture Machine learning aims to smooth over the data make fewer assumptions

34 Prediction: Big Picture Purely nonparametric model with high dimensional data? p > n! data sparsity

35 Nonparametric Prediction Example: Local Averaging Local averaging of the outcome Y within covariate neighborhoods Neighborhoods are bins for observations that are close in value The number of neighborhoods will determine the smoothness of our regression function How do you choose the size of these neighborhoods?

36 Nonparametric Prediction Example: Local Averaging Local averaging of the outcome Y within covariate neighborhoods Neighborhoods are bins for observations that are close in value The number of neighborhoods will determine the smoothness of our regression function How do you choose the size of these neighborhoods? This becomes a bias-variance trade-off question Many small neighborhoods: high variance since some neighborhoods will be empty or contain few observations Few large neighborhoods: biased estimates if neighborhoods fail to capture the complexity of data

37 Prediction: A Problem If the true data-generating distribution is very smooth, a misspecified parametric regression might beat the nonparametric estimator How will you know? We want a flexible estimator that is consistent, but in some cases it may lose to a misspecified parametric estimator because it is more variable

38 Prediction: Options? I Recent studies for prediction have employed newer algorithms (any mapping from data to a predictor)

39 Prediction: Options? I Recent studies for prediction have employed newer algorithms I Researchers are then left with questions, eg, I When should I use random forest instead of standard regression techniques?

40 Prediction: Options? I Recent studies for prediction have employed newer algorithms I Researchers are then left with questions, eg, I When should I use random forest instead of standard regression techniques?

41 Prediction: Options? I Recent studies for prediction have employed newer algorithms I Researchers are then left with questions, eg, I When should I use random forest instead of standard regression techniques?

42 Prediction: Key Concepts Loss-Based Estimation Use loss functions to define best estimator of E 0 (Y W ) & evaluate it Cross Validation Available data is partitioned to train and validate our estimators Flexible Estimation Allow data to drive estimates, but in an honest (cross validated) way These are detailed topics; we ll cover core concepts

43 Loss-Based Estimation Data structure is O = (W, Y ) P 0, with empirical distribution P n which places probability 1/n on each observed O i, i = 1,, n Loss function assigns a measure of performance to a candidate function Q = E(Y W ) when applied to an observation O

44 Formalizing the Parameter of Interest We define our parameter of interest, Q 0 = E 0 (Y W ), as the minimizer of the expected squared error loss: where L(O, Q) = (Y Q(W )) 2 Q 0 = arg min Q E 0L(O, Q), E 0 L(O, Q), which we want to be small, evaluates the candidate Q, and it is minimized at the optimal choice of Q 0 Y : Outcome, W : Covariates

45 Ensembling: Cross-Validation Ensembling methods allow implementation of multiple algorithms Do not need to decide beforehand which single technique to use; can use several by incorporating cross validation Image credit: Rose (2010, 2016)

46 Ensembling: Cross-Validation Ensembling methods allow implementation of multiple algorithms Do not need to decide beforehand which single technique to use; can use several by incorporating cross-validation Learning Set 5 6 Training Set Fold 1 Image credit: Rose (2010, 2016) Validation Set

47 Ensembling: Cross-Validation In V -fold cross-validation, our observed data O 1,, O n is referred to as the learning set and partition into V sets of size n V For any given fold, V 1 sets comprise training set and remaining 1 set is validation set Learning Set 5 6 Training Set Fold 1 Image credit: Rose (2010, 2016) Validation Set

48 Ensembling: Cross-Validation In V -fold cross-validation, our observed data O 1,, O n is referred to as the learning set and partition into V sets of size n V For any given fold, V 1 sets comprise training set and remaining 1 set is validation set Learning Set 5 Training Set Validation Set Fold 1 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 Image credit: Rose (2010, 2016)

49 Super Learner: Ensembling Build a collection of algorithms consisting of all weighted averages of the algorithms One of these weighted averages might perform better than one of the algorithms alone It is this principle that allows us to map a collection of algorithms into a library of weighted averages of these algorithms

50 Super Learner: Optimal Weight Vector It might seem that the implementation of such an estimator is problematic, since it requires minimizing the cross-validated risk over an infinite set of candidate algorithms (the weighted averages)

51 Super Learner: Optimal Weight Vector It might seem that the implementation of such an estimator is problematic, since it requires minimizing the cross-validated risk over an infinite set of candidate algorithms (the weighted averages) The contrary is true Super learner is not more computer intensive than the cross-validation selector (the single algorithm with the smallest cross-validated risk) Only the relatively trivial calculation of the optimal weight vector needs to be completed

52 Super Learner: Optimal Weight Vector Consider that the discrete super learner has already been completed Propose family of weighted combinations of the algorithms, index by the weight vector α The family of weighted combinations: includes only those α-vectors that have a sum equal to one each weight is positive or zero

53 Super Learner: Optimal Weight Vector Consider that the discrete super learner has already been completed Propose family of weighted combinations of the algorithms, index by the weight vector α The family of weighted combinations: includes only those α-vectors that have a sum equal to one each weight is positive or zero Selecting the weights that minimize the cross-validated risk is a minimization problem, formulated as a regression of the outcomes Y on the predicted values of the algorithms (Z)

54 Collection of Algorithms Data algorithm a algorithm b algorithm p algorithm a algorithm b algorithm p 1 Z 1,a Z 1,b 2 Z 2,a Z 2,b 10 Z 10,a Z 10,b CV MSE a CV MSE b Z 1,p Z 2,p Z 10,p algorithm a algorithm b algorithm p Family of weighted combinations CV MSE p Super learner function E n [Y Z] = α a,n Z a +α b,n Z b ++α p,n Z p Image credit: Polley et al (2011)

55 Super Learner: Ensembling Due to its theoretical properties, super learner: performs asymptotically as well as the best choice among the family of weighted combinations of estimators Thus, by adding more competitors, we only improve the performance of the super learner The asymptotic equivalence remains true if the number of algorithms in the library grows very quickly with sample size

56 Super Learner: Oracle Inequality B n {0, 1} n splits the sample into a training sample {i : B n (i) = 0} and validation sample {i : B n (i) = 1} P 0 n,b n and P 1 n,b n denote the empirical distribution of the training and validation sample, respectively Given candidate estimators P n ˆQ k (P n ), the loss-function-based cross-validation selector is: k n = ˆK(P n ) = arg min k E Bn P 1 n,b n L( ˆQ k (P 0 n,b n )) The resulting estimator is given by ˆQ(P n ) = ˆQ ˆK(Pn) (P n) and satisfies the following oracle inequality: for any δ > 0 E Bn {P 0 L( ˆQ kn (P 0 n,b n ) L(Q 0 )} (1 + 2δ)E Bn min k P 0 {L( ˆQ k (P 0 n,b n )) L(Q 0 )} van der Laan & Dudoit (2003) +2C(δ) 1 + log K(n) np

57 Screening: Will Be Useful for Parsimony Often beneficial to screen variables before running algorithms Can be coupled with prediction algorithms to create new algorithms in the library

58 Screening: Will Be Useful for Parsimony Often beneficial to screen variables before running algorithms Can be coupled with prediction algorithms to create new algorithms in the library Clinical subsets

59 Screening: Will Be Useful for Parsimony Often beneficial to screen variables before running algorithms Can be coupled with prediction algorithms to create new algorithms in the library Clinical subsets Test each variable with the outcome, rank by p-value

60 Screening: Will Be Useful for Parsimony Often beneficial to screen variables before running algorithms Can be coupled with prediction algorithms to create new algorithms in the library Clinical subsets Test each variable with the outcome, rank by p-value Lasso

61 The Free Lunch No point in painstakingly deciding which estimators; add them all Theory supports this approach and finite sample simulations and data analyses only confirm that it is very hard to overfit the super learner by augmenting the collection, but benefits are obtained

62

63 Super Learner: Kaiser Permanente Database Nested case-control sample (n=27,012) Outcome: death Covariates: 184 medical flags, gender & age Ensembling method outperformed all other algorithms Generally weak signal with R 2 = 011 Observed data structure on a subject can be represented as O = (Y,, X ), where X = (W, Y ) is the full data structure, and denotes the indicator of inclusion in the second-stage sample How will this electronic database perform in comparison to a cohort study? van der Laan & Rose (2011)

64 Super Learner: Sonoma Cohort Study Cohort study of n = 2, 066 residents of Sonoma, CA aged 54 and over Outcome: death Covariates: gender, age, self-rated health, leisure-time physical activity, smoking status, cardiac event history, and chronic health condition status R 2 = 0201 Two-fold improvement with less than 10% of the subjects & less than 10% the number of covariates What possible conclusions can we draw? Rose (2013)

65 Super Learner: Sonoma Cohort Study A) B) 2,000 2,000 1,500 1,500 Frequency 1,000 Frequency 1, Difference in predicted Predicted probabilities Probabilities (SuperLearner glm) Difference in predicted Predicted probabilities Probabilities (SuperLearner randomforest)

66 Super Learner: Plan Payment Implications Over 50 million people in the United States currently enrolled in an insurance program that uses risk adjustment I Redistributes funds based on health I Encourages competition based on efficiency/quality Results I Machine learning finds novel insights I Potential to impact policy, including diagnostic upcoding and fraud Rose (2016) xeroxcom

67 Super Learner: Predicting Unprofitability Hypothetical Profit-Maximizing Health Insurer: Design plan to attract profitable enrollees and deter unprofitable Cannot discriminate based on pre-existing conditions Raise/lower out of pocket costs of drugs for some conditions Distortions make it difficult for unprofitable groups to find acceptable coverage Demonstrate drug formulary identifies unprofitable enrollees Rose, Bergquist, Layton (2017)

68 Super Learner: Public Datasets Studied the super learner in publicly available data sets sample sizes ranged from 200 to 654 observations number of covariates ranged from 3 to 18 all 13 data sets have a continuous outcome and no missing values Polley et al (2011)

69 Super Learner: Public Datasets Eric C Pol 3 Description of data sets, where n is the sample size and p is the number of cov Name n p Source ais Cook and Weisberg (1994) diamond Chu (2001) cps Berndt (1991) cps Berndt (1991) cpu Kibler et al (1989) FEV Rosner (1999) Pima Newman et al (1998) laheart Afifi and Azen (1979) mussels Cook (1998) enroll Liu and Stengos (1999) fat Penrose et al (1985) diabetes Harrell (2001) house Newman et al (1998) Polley et al (2011)

70 Super Learner: Public Datasets Polley et al (2011)

71 Ensembling Literature The super learner is a generalization of the stacking algorithm (Wolpert 1992, Breiman 1996) and has optimality properties that led to the name super learner LeBlanc & Tibshirani (1996) discussed the relationship of stacking algorithms to other algorithms Additional methods for ensemble learning have also been developed (eg, Tsybakov 2003; Juditsky et al 2005; Bunea et al 2006, 2007; Dalayan & Tsybakov 2007, 2008) Refer to a review of ensemble methods (Dietterich 2000) for further background van der Laan et al (2007) original super learner paper For more references, see Chapter 3 of Targeted Learning

72 [Super Learner Example Code]

73 Super Learner Packages SuperLearner (Polley): Main super learner R package h2oensemble (LeDell): Java-based, designed for big data, uses H2O R interface to run super learning SAS macro (Brooks): SAS implementation available on Github Eric Polley Github: githubcom/ecpolley More: targetedlearningbookcom/software

74 Super Learner Sample Code

75 Super Learner Sample Code installpackages("superlearner") library(superlearner) ##Generate simulated data## setseed(27) n<-500 data <- dataframe(w1=runif(n, min = 5, max = 1), W2=runif(n, min = 0, max = 1), W3=runif(n, min = 25, max = 75), W4=runif(n, min = 0, max = 1)) data <- transform(data, W5=rbinom(n, 1, 1/(1+exp(15*W2-W3)))) data <- transform(data, Y=rbinom(n, 1,1/(1+exp(-(-2*W5-2*W1+4*W5*W1-15*W2+sin(W4))))))

76 Super Learner Sample Code ##Specify a library of algorithms## SLlibrary <- c("slglm", "SLmean", "SLrandomForest", "SLglmnet")

77 Super Learner Sample Code Could use various forms of screening to consider differing variable sets SLlibrary <- list(c("slglm","screenrandomforest", "All"), c("slmean", "screenrandomforest", "All"), c("slrandomforest", "screenrandomforest", "All"), c("slglmnet", "screenrandomforest","all")) Or the same algorithm with different tuning parameters SLglmnetalpha0 <- function(, alpha=0){ SLglmnet(, glmnetalpha=alpha)} SLglmnetalpha50 <- function(, alpha=50){ SLglmnet(, glmnetalpha=alpha)} SLlibrary <- c("slglm","slglmnet", "SLglmnetalpha50", "SLglmnetalpha0","SLrandomForest")

78 Super Learner Sample Code ##Specify a library of algorithms## SLlibrary <- c("slglm", "SLmean", "SLrandomForest", "SLglmnet")

79 Super Learner Sample Code ##Run the super learner to obtain predicted values for the super learner as well as CV risk for algorithms in the library## setseed(27) fitdatasl<-superlearner(y=data[,6],x=data[,1:5], SLlibrary=SLlibrary, family=binomial(), method="methodnnls", verbose=true)

80 Super Learner Sample Code

81 Super Learner Sample Code

82 Super Learner Sample Code #Run the cross-validated super learner to obtain its CV risk## setseed(27) fitsldatacv <- CVSuperLearner(Y=data[,6],X=data[,1:5], V=10, SLlibrary=SLlibrary,verbose = TRUE, method = "methodnnls", family = binomial())

83 Super Learner Sample Code ##Cross validated risks## #CV risk for super learner mean((data[,6]-fitsldatacv$slpredict)^2) #CV risks for algorithms in the library fitdatasl

84 Super Learner Sample Code

85 When Learning a New Package

86 More on SuperLearner R Package SuperLearner (Polley): CRAN Eric Polley Github: githubcom/ecpolley More: targetedlearningbookcom/software

87 [(Causal) Effect Estimation]

88 Causal Model Assume a structural causal model (SCM) (Pearl 2009), comprised of endogenous variables X = (X j : j) and exogenous variables U = (U Xj : j) Each X j is a deterministic function of other endogenous variables and an exogenous error U j The errors U are never observed For each X j we characterize its parents from among X with Pa(X j )

89 Causal Model X j = f Xj (Pa(X j ), U Xj ), j = 1, J, The functional form of f Xj is often unspecified An SCM can be fully parametric, but we do not do that here as our background knowledge does not support the assumptions involved

90 Causal Model We could specify the following SCM: W = f W (U W ), A = f A (W, U A ), Y = f Y (W, A, U Y ), Recall that we assume for the full data: 1 for each X j, X j = f j (Pa(X j ), U Xj ) depends on the other endogenous variables only through the parents Pa(X j ), 2 the exogenous variables have a particular joint distribution P U ; U A U Y W In our simple study, X = (W, A, Y ), and Pa(A) = W We know this due to the time ordering of the variables

91 Causal Graph U W U A U W U A W A U Y W A U Y (a) Y (b) Y U W U A U W U A W A U Y W A U Y (c) Y (d) Y Figure: Causal graphs with various assumptions about the distribution of P U

92 A Note on Causal Assumptions We could alternatively use the Neyman Rubin Causal Model and assume randomization (A Y a W ) and stable unit treatment value assumption (SUTVA; no interference between subjects and consistency assumption)

93 Positivity Assumption We need that each possible exposure level occurs with some positive probability within each stratum of W For our data structure (W, A, Y ) we are assuming: P 0 (A = 1 W = w) > 0 and P 0 (A = 0 W = w) > 0, for each possible w

94 Landscape: Effect Estimators An estimator is an algorithm that can be applied to any empirical distribution to provide a mapping from the empirical distribution to the parameter space Maximum-Likelihood-Based Estimators Estimating-Equation-Based Methods The target parameters we discussed depend on P 0 through the conditional mean Q 0 (A, W ) = E 0 (Y A, W ), and the marginal distribution Q W,0 of W Thus we can also write Ψ(Q 0 ), where Q 0 = ( Q 0, Q W,0 )

95 Landscape: Effect Estimators Maximum-Likelihood-Based Estimators will be of the type ψ n = Ψ(Q n ) = 1 n n { Q n (1, W i ) Q n (0, W i )}, i=1 where this estimate is obtained by plugging in Q n = ( Q n, Q W,n ) into the mapping Ψ Qn (A = a, W i ) = E n (Y A = a, W i ) Estimating-Equation-Based Methods An estimating function is a function of the data O and the parameter of interest If D(ψ)(O) is an estimating function, then we can define a corresponding estimating equation: 0 = n i=1 D(ψ)(O i), and solution ψ n satisfying n i=1 D(ψ n)(o i ) = 0

96 Maximum-Likelihood-Based Methods MLE using regression Outcome regression estimated with parametric methods and plugged into ψ n = 1 n n { Q n (1, W i ) Q n (0, W i )} i=1

97 Maximum-Likelihood-Based Methods MLE using regression Outcome regression estimated with parametric methods and plugged into ψ n = 1 n n { Q n (1, W i ) Q n (0, W i )} i=1 STOP! When does this differ from traditional regression?

98 Maximum-Likelihood-Based Methods MLE using regression: Continuous outcome example True effect is -035 W 1 = gender W 2 = medication use A = high ozone exposure Y = continuous measure of lung function Model 1: E(Y A) = α 0 + α 1 A Both Effects: -023 Model 2: E(Y A, W ) = α 0 + α 1 A + α 2 W 1 + α 3 W 2 Both Effects: -036 Model 3: E(Y A, W ) = α 0 + α 1 A + α 2 W 1 + α 3 A W 2 Regression Effect: -049 MLE Effect: -034

99 Maximum-Likelihood-Based Methods MLE using regression: Binary outcomes P(Y = 1 A, W ) = e β 0+β 1 A+β 2 W EY a = P(Y a = 1) = 1 n n e β 0+β 1 A i +β 2 W i i=1 EY 1 /(1 EY 1 ) EY 0 /(1 EY 0 ) eβ 1

100 Medical Schools in Fragile States: Delivery of Care We found that fragile states lack the infrastructure to train sufficient numbers of medical professionals to meet their population health needs Fragile states were 176 (95%CI ) to 237 (95%CI ) times more likely to have < 2 medical schools than non-fragile states Mateen, McKenzie, Rose (2017)

101 Maximum-Likelihood-Based Methods MLE using machine learning Outcome regression estimated with machine learning and plugged into ψ n = 1 n n { Q n (1, W i ) Q n (0, W i )} i=1

102 Noncommunicable Disease and Poverty Studied relative risk of death from noncommunicable disease on three poverty measures in Matlab, Bangladesh Implemented parametric and machine learning substitution estimators Mirelman et al (2016)

103 Estimating Equation Methods IPW Estimate causal risk difference with ψ n = 1 n n Y i {I (A i = 1) I (A i = 0)} g n (A i, W i ) i=1 This estimator is a solution of an IPW estimating equation that relies on an estimate of the treatment mechanism, playing the role of a nuisance parameter of the IPW estimating function A-IPW One estimates Ψ(P 0 ) with ψ n = 1 n + 1 n n i=1 {I (A i = 1) I (A i = 0)} (Y i Q n (A i, W i )) g n (A i, W i ) n { Q n (1, W i ) Q n (0, W i )} i=1

104 TMLE for Causal Effects Super Learner Allows researchers to use multiple algorithms to outperform a single algorithm in nonparametric statistical models Builds weighted combination of estimators where weights are optimized based on loss-function specific cross-validation to guarantee best overall fit Targeted Maximum Likelihood Estimation With an initial estimate of the outcome regression, the second stage of TMLE updates this initial fit in a step targeted toward making an optimal bias-variance tradeoff for the parameter of interest Produces a well-defined, unbiased, efficient substitution estimator

105 TMLE for Causal Effects TMLE: Double Robust Removes asymptotic residual bias of initial estimator for the target parameter, if it uses a consistent estimator of censoring/treatment mechanism g 0 If initial estimator was consistent for the target parameter, the additional fitting of the data in the targeting step may remove finite sample bias, and preserves consistency property of the initial estimator TMLE: Efficiency If the initial estimator and the estimator of g 0 are both consistent, then it is also asymptotically efficient according to semi-parametric statistical model efficiency theory

106 TMLE for Causal Effects TMLE: In Practice Allows the incorporation of machine learning methods for the estimation of both Q 0 and g 0 so that we do not make assumptions about the probability distribution P 0 we do not believe Thus, every effort is made to achieve minimal bias and the asymptotic semi-parametric efficiency bound for the variance

107 TMLE Estimator is asymptotically linear if the standardized TMLE can be written as the empirical mean of the influence curve plus a random variable that converges to zero as n goes to infinity: n( ˆΨ(P n ) ψ 0 ) = 1/ n n IC(O i ) + o P0 (1) Asymptotic linearity is desirable since it indicates that the estimator behaves like an empirical mean and as a consequence its bias converges to zero in sample size at a faster rate than 1/ n, and for large n it is normally distributed i=1

108 TMLE: Parametric Submodel With an initial estimator Q 0 n = (Q W,n, Q 0 n) and g n Let s describe a submodel {Qn(ɛ) 0 : ɛ} through Qn 0 with a one-dimensional fluctuation parameter ɛ that goes through the initial estimate Q n(a, 0 W ) at ɛ = 0 Let Q 0 n(ɛ)(y = 1 A, W ) = expit ( log Q n 0 ) (1 Q n) 0 (A, W ) + ɛh n(a, W ) be a parametric submodel through the conditional distribution of Y, given A, W H (A, W ) = I (A=1) g n(a=1 W ) I (A=0) g n(a=0 W )

109 TMLE: Parametric Submodel Classic Result: The score of a coefficient in front of a covariate in a logistic linear regression in a parametric statistical model for a conditional distribution of binary Y equals the covariate times the residual It follows that the score of ɛ of this univariate logistic regression submodel at ɛ = 0 equals the appropriate component of the efficient influence curve The parametric family of fluctuations of Q 0 n is defined by a parametric regression including a clever covariate chosen so the derivative condition holds

110 Example: TMLE for the Risk Difference Here ɛ n is obtained by performing a regression of Y on H n(a, W ), where Q 0 n(a, W ) is used as an offset We then update Q 0 n with logit Q 1 n(a, W ) = logit Q 0 n(a, W ) + ɛ 1 nh n(a, W ) This converges in one step, so that the TMLE is given by Q n = Q 1 n Lastly, one evaluates the target parameter ψ n, where Q n = ( Q 1 n, Q W,n ), by plugging Q 1 n and Q W,n into substitution estimator to get the TMLE of ψ 0 : ψ n = 1 n n { Q n(1, 1 W i ) Q n(0, 1 W i )}, i=1

111 Targeted Learning in Nonparametric Models Observed data random variables Target parameter map O1,,On Ψ() INPUTS Initial estimator of the probability distribution of the data P 0 n Targeted estimator of the probability distribution of the data True probability distribution P0 P n STATISTICAL MODEL Set of possible probability distributions of the data Initial estimator Ψ(P 0 n) Ψ(P0) Ψ(P n) Targeted estimator True value (estimand) of target parameter VALUES OF TARGET PARAMETER Values mapped to the real line with better estimates closer to the truth

112 Example: Sonoma Cohort Study Cohort study of n = 2, 066 residents of Sonoma, CA aged 54 and over Outcome was death Covariates were gender, age, self-rated health, leisure-time physical activity, smoking status, cardiac event history, and chronic health condition status The data structure is O = (W, A, Y ), where Y = I (T 5 years), T is time to the event death No right censoring in this cohort

113 Sonoma Study ID W 1 W12 A Y Super learner function Step ID W1 Y Q 0 n(a i,w i) Q0 n (1,W i) Q0 n (0,W i) Step

114 Sonoma Study: Estimating Q 0 ID W 1 W12 A Y Super learner function Step ID W 1 Y Q 0 n(a i,w i) Q0 n (1,W i) Q0 n (0,W i) Step Super learner exposure mechanism function ID 1 W1 66 Q 0 n(0,w i) g n(1 W i) g n(0 W i) Step 3

115 Sonoma Study: Estimating Q 0 At this stage we could plug our estimates Q 0 n(1, W i ) and Q 0 n(0, W i ) for each subject into our substitution estimator of the risk difference: ψ MLE,n = Ψ(Q n ) = 1 n n { Q n(1, 0 W i ) Q n(0, 0 W i )} i=1

116 Sonoma Study: Estimating g 0 Our targeting step required an estimate of the conditional distribution of LTPA given covariates W This estimate of P 0 (A W ) g 0 is denoted g n We estimated predicted values using a super learner prediction function, adding two more columns to our data matrix: g n (1 W i ) and g n (0 W i ) (Step 3)

117 ID W 1 Y Q 0 n(a i,w i) Q0 n (1,W i) Q0 n (0,W i) Step Super learner exposure mechanism function ID 1 W 1 66 Q 0 n(0,w i) 077 g n(1 W i) g n(0 W i) 032 Step ID W g n(0 W i) H n(a i,w i) H n(1,w i) H n(0,w i) Step 4 1 1

118 Sonoma Study: Determining a Submodel The targeting step used the estimate g n in a clever covariate to define a parametric working model coding fluctuations of the initial estimator This clever covariate Hn(A, W ) is given by ( ) I (A = 1) Hn(A, I (A = 0) W ) g n (1 W ) g n (0 W )

119 Sonoma Study: Determining a Submodel Thus, for each subject with A i = 1 in the observed data, we calculated the clever covariate as H n(1, W i ) = 1/g n (1 W i ) Similarly, for each subject with A i = 0 in the observed data, we calculated the clever covariate as H n(0, W i ) = 1/g n (0 W i ) We combined these values to form a single column H n(a i, W i ) in the data matrix We also added two columns H n(1, W i ) and H n(0, W i ) The values for these columns were generated by setting a = 0 and a = 1 (Step 4)

120 1 n 1 1 Super learner exposure mechanism function ID W Q 0 n(0,w i) g n(1 W i) g n(0 W i) Step 3 ID W g n(0 W i) H n(a i,w i) H n(1,w i) H n(0,w i) Step 4 ID W1 H n(0,w i) Q1 n (1,W i) Q1 n (0,W i) Step 5

121 Sonoma Study: Updating Q 0 n We then ran a logistic regression of our outcome Y on the clever covariate using as intercept the offset logit Q 0 n(a, W ) to obtain the estimate ɛ n, where ɛ n is the resulting coefficient in front of the clever covariate H n(a, W ) We next wanted to update the estimate Q 0 n into a new estimate Q 1 n of the true regression function Q 0 : logit Q 1 n(a, W ) = logit Q 0 n(a, W ) + ɛ n H n(a, W ) This parametric working model incorporated information from g n, through H n(a, W ), into an updated regression

122 Sonoma Study: Updating Q 0 n The TMLE of Q 0 was given by Q n = ( Q 1 n, Q 0 W,n ) With ɛ n, we were ready to update our prediction function at a = 1 and a = 0 according to the logistic regression working model We calculated for all subjects, and then logit Q 1 n(1, W ) = logit Q 0 n(1, W ) + ɛ n H n(1, W ), logit Q 1 n(0, W ) = logit Q 0 n(0, W ) + ɛ n H n(0, W ) for all subjects and added a column for Q 1 n(1, W i ) and Q 1 n(0, W i ) to the data matrix Updating Q 0 n is also illustrated in Step 5

123 ID W g n(0 W i) H n(a i,w i) H n(1,w i) H n(0,w i) Step 4 ID W H n(0,w i) Q1 n (1,W i) Q1 n (0,W i) Step 5 ψ n = 1 n ni=1 [ Q 1 n(1,w i ) Q 1 n(0,w i )] Step 6

124 Sonoma Study: Targeted Substitution Estimator Our formula from the first step becomes ψ TMLE,n = Ψ(Q n) = 1 n n { Q n(1, 1 W i ) Q n(0, 1 W i )} i=1 This mapping was accomplished by evaluating Q 1 n(1, W i ) and Q 1 n(0, W i ) for each observation i, and plugging these values into the above equation Our estimate of the causal risk difference for the mortality study was ψ TMLE,n = 0055

125 ID W H n(0,w i) Q1 n (1,W i) Q1 n (0,W i) Step 5 ψ n = 1 n ni=1 [ Q 1 n(1,w i ) Q 1 n(0,w i )] Step 6

126 Sonoma Study: Inference (Standard Errors) We then needed to calculate the influence curve for our estimator in order to obtain standard errors: ( I (Ai = 1) IC n (O i ) = g n (1 W i ) I (A ) i = 0) (Y g n (0 W i ) Q n(a 1 i, W i )) + Q 1 n(1, W i ) Q 1 n(0, W i ) ψ TMLE,n, where I is an indicator function: it equals 1 when the logical statement it evaluates, eg, A i = 1, is true

127 Sonoma Study: Inference (Standard Errors) Note that this influence curve is evaluated for each O i With the influence curve of an estimator one can now proceed with statistical inference as if the estimator minus its estimand equals the empirical mean of the influence curve

128 Sonoma Study: Inference (Standard Errors) Next, we calculated the sample mean of these estimated influence curve values: IC n = 1 n n i=1 IC n(o i ) For the TMLE we have IC n = 0 Using this mean, we calculated the sample variance of the estimated influence curve values: S 2 (IC n ) = 1 n ( n i=1 ICn (o i ) IC ) 2 n Lastly, we used our sample variance to estimate the standard error of our estimator: S σ n = 2 (IC n ) n This estimate of the standard error in the mortality study was σ n = 0012

129 Sonoma Study: Inference (CIs and p-values) ψ TMLE,n ± z 0975 σ n n, where z α denotes the α-quantile of the standard normal density N(0, 1) A p-value for ψ TMLE,n can be calculated as: [ ( )] ψ TMLE,n 2 1 Φ σ n / n, where Φ denotes the standard normal cumulative distribution function The p-value was < 0001 and the CI was [ 0078, 0033]

130 Sonoma Study: Interpretation Interpretation of ψ TMLE,n = 0055, under causal assumptions, is that meeting or exceeding recommended levels of LTPA decreases 5-year mortality in an elderly population by 55 percentage points This result was significant, with a p-value of < 0001 and a confidence interval of [ 0078, 0033]

131 Example: TMLE with Missingness SCM for a point treatment data structure with missing outcome W = f W (U W ), A = f A (W, U A ), = f A (W, A, U ), Y = f Y (W, A,, U Y ) We can now define counterfactuals Y 1,1 and Y 0,1 corresponding with interventions setting A and The additive causal effect EY 1 EY 0 equals: Ψ(P) = E[E(Y A = 1, = 1, W ) E(Y A = 0, = 1, W )

132 Example: TMLE with Missingness Our first step is to generate an initial estimator of P 0 n of P; we estimate E(Y A, = 1, W ), possible with super learning We fluctuate this initial estimator with a logistic regression: logitp 0 n(ɛ)(y = 1 A, = 1, W ) = logitp 0 n(y = 1 A, = 1, W ) + ɛh where ( 1 A h(a, W ) = Π(A, W ) g(1 W ) 1 A ) g(0 W and g(1 W ) = P(A = 1 W ) Treatment Mechanism Π(A, W ) = P( = 1 A, W ) Missingness Mechanism Let ɛ n be the maximum likelihood estimator and The TMLE is given by Ψ(P n) P n = P 0 n(ɛ n )

133 TMLE Example: Impact of Medical Conditions Evaluate how much more enrollees with each medical condition cost after controlling for demographic information and other medical conditions

134 TMLE Example: Impact of Medical Conditions Evaluate how much more enrollees with each medical condition cost after controlling for demographic information and other medical conditions Health Tracking Trends National Health Spending By Medical Condition, Mental disorders and heart conditions were found to be the most costly by Charles Roehrig, George Miller, Craig Lake, and Jenny Bryant ABSTRACT: This study responds to recent calls for information about how personal health expenditures from the National Health Expenditure Accounts are distributed across medical conditions It provides annual estimates from 1996 through 2005 for thirty-two conditions mapped into thirteen all-inclusive diagnostic categories Circulatory system spending was highest among the diagnostic categories, accounting for 17 percent of spending in 2005 The most costly conditions were mental disorders and heart conditions Spending growth rates were lowest for lung cancer, chronic obstructive pulmonary disease, pneumonia, coronary heart disease, and stroke, perhaps reflecting benefits of preventive care [Health Affairs 28, no 2 (2009): w358 w367 (published online 24 February 2009; /hlthaff282358)] T he national Health Expenditure Accounts (NHEA) provide official estimates of total annual US health care spending for use by researchers and policymakers They routinely track personal health spending by type of service (such as hospital, physician, and prescription drugs) and source of funds (such as private insurance, Medicare, and Medicaid), but they do not track spending by medical condition Yet such informadiscuss improvements to the NHEA recommended that they be extended to include spending by disease 1 This was consistent with an earlier Institute of Medicine (IOM) recommendation that the Agency for Healthcare Research and Quality (AHRQ) identify at least fifteen priority conditions, taking into account frequency of occurrence, health burden and resource use 2 The information gap is largely attributable

135 TMLE Example: Impact of Medical Conditions Evaluate how much more enrollees with each medical condition cost after controlling for demographic information and other medical conditions Health Tracking Trends National Health Spending By Medical Condition, Mental disorders and heart conditions were found to be the most costly by Charles Roehrig, George Miller, Craig Lake, and Jenny Bryant ABSTRACT: This study responds to recent calls for information about how personal health expenditures from the National Health Expenditure Accounts are distributed across medical conditions It provides annual estimates from 1996 through 2005 for thirty-two conditions mapped into thirteen all-inclusive diagnostic categories Circulatory system spending was highest among the diagnostic categories, accounting for 17 percent of spending in 2005 The most costly conditions were mental disorders and heart conditions Spending growth rates were lowest for lung cancer, chronic obstructive pulmonary disease, pneumonia, coronary heart disease, and stroke, perhaps reflecting benefits of preventive care [Health Affairs 28, no 2 (2009): w358 w367 (published online 24 February 2009; /hlthaff282358)] T he national Health Expenditure Accounts (NHEA) provide official estimates of total annual US health care spending for use by researchers and policymakers They routinely track personal health spending by type of service (such as hospital, physician, and prescription drugs) and source of funds (such as private insurance, Medicare, and Medicaid), but they do not track spending by medical condition Yet such informadiscuss improvements to the NHEA recommended that they be extended to include spending by disease 1 This was consistent with an earlier Institute of Medicine (IOM) recommendation that the Agency for Healthcare Research and Quality (AHRQ) identify at least fifteen priority conditions, taking into account frequency of occurrence, health burden and resource use 2 The information gap is largely attributable Which Medical Conditions Account For The Rise In Health Care Spending? The fifteen most costly medical conditions accounted for half of the overall growth in health care spending between 1987 and 2000 by Kenneth E Thorpe, Curtis S Florence, and Peter Joski Health Spending ABSTRACT: We calculate the level and growth in health care spending attributable to the fifteen most expensive medical conditions in 1987 and 2000 Growth in spending by medical condition is decomposed into changes attributable to rising cost per treated case, treated prevalence, and population growth We find that a small number of conditions account for most of the growth in health care spending the top five medical conditions accounted for 31 percent For four of the conditions, a rise in treated prevalence, rather than rising treatment costs per case or population growth, accounted for most of the spending growth

136 TMLE Example: Impact of Medical Conditions Truven MarketScan database, those with continuous coverage in ; 109 million people Variables: age, sex, region, procedures, expenditures, etc Enrollment and claims from private health plans and employers Extracted random sample of 1,000,000 people Enrollees were eligible for insurance throughout this entire 24 month period and thus there is no drop-out due to death

137 TMLE Example: Impact of Medical Conditions ψ = E W,M [E(Y A = 1, W, M ) E(Y A = 0, W, M )], represents the effect of A = 1 versus A = 0 after adjusting for all other medical conditions M and baseline variables W Interpretation The difference in total annual expenditures when enrollees have the medical condition under consideration (ie, A = 1) Y =total annual expenditures, A=medical condition category of interest

138 TMLE Example: Impact of Medical Conditions Leverage available big data novel machine learning tools to improve conclusions and policy insights Rose (2017)

139 TMLE Example: Impact of Medical Conditions First investigation of the impact of medical conditions on health spending as a variable importance question using double robust estimators Five most expensive medical conditions were 1 multiple sclerosis 2 congestive heart failure 3 lung, brain, and other severe cancers 4 major depression and bipolar disorders 5 chronic hepatitis Differing results compared to parametric regression What does this mean for incentives for prevention and care?

140 Effect of Drug-Eluting Stents Expected Outcome by Stent 1-Year MACE % A1 n = 709 C B A A3 622 B2 70 C4 31 C2 72 A2 640 TMLE MLE Ridge RF Truth C3 227 Rose and Normand (2017)

141 Hospital Profiling Spertus et al (2016)

142 Effect Estimation Literature Maximum-Likelihood-Based Estimators: g-formula, Robins 1986 Estimating equations: Robins and Rotnitzky 1992, Robins 1999, Hernan et al 2000, Robins et al 2000, Robins 2000, Robins and Rotnitzky 2001 Additional bibliographic history found in Chapter 1 of van der Laan and Robins 2003 For even more references, see Chapter 4 of Targeted Learning

143 [TMLE Example Code]

144 TMLE Packages tmle (Gruber): Main point-treatment TMLE package ltmle (Schwab): Main longitudinal TMLE package SAS code (Brooks): Github Julia code (Lendle): Github More: targetedlearningbookcom/software

145 TMLE Sample Code ##Code lightly adapted from Schuler & Rose, 2017, AJE## library(tmle) setseed(1) N < ##Generate simulated data## #X1=Gender; X2=Therapy; X3=Antidepressant use X1 <- rbinom(n, 1, prob=55) X2 <- rbinom(n, 1, prob=30) X3 <- rbinom(n, 1, prob=25) W <- cbind(x1,x2,x3) #Exposure=regular physical exercise A <- rbinom(n, 1, plogis( *X1 + 1*X2 + 15*X3)) #Outcome=CES-D score Y <- 24-3*A+3*X1-4*X2-6*X3-15*A*X3+rnorm(N,mean=0,sd=45)

146 TMLE Sample Code ##Specify a library of algorithms## SLlibrary <- c("slglm","slstepinteraction","slglmnet", "SLrandomForest","SLgam","SLrpart" )

147 TMLE Sample Code Could use various forms of screening to consider differing variable sets SLlibrary <- list(c("slglm","screenrandomforest", "All"), c("slmean", "screenrandomforest", "All"), c("slrandomforest", "screenrandomforest", "All"), c("slglmnet", "screenrandomforest","all")) Or the same algorithm with different tuning parameters SLglmnetalpha0 <- function(, alpha=0){ SLglmnet(, glmnetalpha=alpha)} SLglmnetalpha50 <- function(, alpha=50){ SLglmnet(, glmnetalpha=alpha)} SLlibrary <- c("slglm","slglmnet", "SLglmnetalpha50", "SLglmnetalpha0","SLrandomForest")

148 TMLE Sample Code ##Specify a library of algorithms## SLlibrary <- c("slglm","slstepinteraction","slglmnet", "SLrandomForest","SLgam","SLrpart" )

149 TMLE Sample Code ##TMLE approach: Super Learning## tmlesl1 <- tmle(y, A, W, QSLlibrary = SLlibrary, gsllibrary = SLlibrary) tmlesl1

150 TMLE Sample Code

151 TMLE Sample Code True value is -338

152 TMLE Sample Code ##TMLE approach: GLM, MT misspecification of outcome## #Misspecified outcome regression: Y ~ A + X1 + X2 + X3# tmleglm1 <- tmle(y, A, W, Qform=Y~A+X1+X2+X3, gform=a~x1+x2+x3) tmleglm1

153 TMLE Sample Code True value is -338

154 TMLE Sample Code ##TMLE approach: GLM, OV misspecification of outcome## #Misspecified outcome regression: Y ~ A + X1 + X2# tmleglm2 <- tmle(y, A, W, Qform=Y~A+X1+X2, gform=a~x1+x2+x3) tmleglm2

155 TMLE Sample Code True value is -338

156 TMLE Sample Code ##TMLE approach: GLM, OV misspecification of exposure## #Misspecified exposure regression: A ~ X1 + X2# tmleglm3 <- tmle(y, A, W, Qform=Y~A+X1+X2+X3+A:X3, gform=a~x1+x2) tmleglm3

157 TMLE Sample Code True value is -338

158 TMLE Sample Code TMLE for Causal Inference Machine Learning Misspecified Parametric % Super Learner MT Outcome OV Outcome OV Super Learner MT Exposure Outcome OV Super Learner OV Outcome Exposure TMLE G-Computation IPW Estimator igure Schuler 2 Percent and Rose bias(2017) in the mean average treatment effect estimate in a simulation study of 3 different estimation methods (targeted maxium likelihood estimation (TMLE), G-computation, and inverse probability weighting (IPW)) across 1,000 simulated data sets The true Y was

159 TMLE Sample Code Schuler and Rose (2017)

160 Targeted Learning Methods van der Laan & Rose, Targeted Learning: Causal Inference for Observational and Experimental Data New York: Springer, 2011 targetedlearningbookcom

Targeted Maximum Likelihood Estimation in Safety Analysis

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35

More information

Randomly Significant

Randomly Significant 1/66 Randomly Significant Why most science reporting is misleading Peter Hoff Statistics, University of Washington 2/66 Breakthroughs in Science Studies recently in the news: Female hurricanes are deadlier

More information

Targeted Learning for High-Dimensional Variable Importance

Targeted Learning for High-Dimensional Variable Importance Targeted Learning for High-Dimensional Variable Importance Alan Hubbard, Nima Hejazi, Wilson Cai, Anna Decker Division of Biostatistics University of California, Berkeley July 27, 2016 for Centre de Recherches

More information

Big Data, Causal Modeling, and Estimation

Big Data, Causal Modeling, and Estimation Big Data, Causal Modeling, and Estimation The Center for Interdisciplinary Studies in Security and Privacy Summer Workshop Sherri Rose NSF Mathematical Sciences Postdoctoral Research Fellow Department

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 327 Entering the Era of Data Science: Targeted Learning and the Integration of Statistics

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 282 Super Learner Based Conditional Density Estimation with Application to Marginal Structural

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 260 Collaborative Targeted Maximum Likelihood For Time To Event Data Ori M. Stitelman Mark

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 290 Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome Mark

More information

Targeted Group Sequential Adaptive Designs

Targeted Group Sequential Adaptive Designs Targeted Group Sequential Adaptive Designs Mark van der Laan Department of Biostatistics, University of California, Berkeley School of Public Health Liver Forum, May 10, 2017 Targeted Group Sequential

More information

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths

Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths Fair Inference Through Semiparametric-Efficient Estimation Over Constraint-Specific Paths for New Developments in Nonparametric and Semiparametric Statistics, Joint Statistical Meetings; Vancouver, BC,

More information

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial

Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Targeted Maximum Likelihood Estimation for Adaptive Designs: Adaptive Randomization in Community Randomized Trial Mark J. van der Laan 1 University of California, Berkeley School of Public Health laan@berkeley.edu

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 259 Targeted Maximum Likelihood Based Causal Inference Mark J. van der Laan University of

More information

Statistical Inference for Data Adaptive Target Parameters

Statistical Inference for Data Adaptive Target Parameters Statistical Inference for Data Adaptive Target Parameters Mark van der Laan, Alan Hubbard Division of Biostatistics, UC Berkeley December 13, 2013 Mark van der Laan, Alan Hubbard ( Division of Biostatistics,

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

Causal Inference for Case-Control Studies. Sherri Rose. A dissertation submitted in partial satisfaction of the. requirements for the degree of

Causal Inference for Case-Control Studies. Sherri Rose. A dissertation submitted in partial satisfaction of the. requirements for the degree of Causal Inference for Case-Control Studies By Sherri Rose A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics in the Graduate Division

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials

Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials Construction and statistical analysis of adaptive group sequential designs for randomized clinical trials Antoine Chambaz (MAP5, Université Paris Descartes) joint work with Mark van der Laan Atelier INSERM

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 330 Online Targeted Learning Mark J. van der Laan Samuel D. Lendle Division of Biostatistics,

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber

Collaborative Targeted Maximum Likelihood Estimation. Susan Gruber Collaborative Targeted Maximum Likelihood Estimation by Susan Gruber A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Biostatistics in the

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION

SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Johns Hopkins University, Dept. of Biostatistics Working Papers 3-3-2011 SIMPLE EXAMPLES OF ESTIMATING CAUSAL EFFECTS USING TARGETED MAXIMUM LIKELIHOOD ESTIMATION Michael Rosenblum Johns Hopkins Bloomberg

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm

On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm On the Use of the Bross Formula for Prioritizing Covariates in the High-Dimensional Propensity Score Algorithm Richard Wyss 1, Bruce Fireman 2, Jeremy A. Rassen 3, Sebastian Schneeweiss 1 Author Affiliations:

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE: #+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials

Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials From the SelectedWorks of Paul H. Chaffee June 22, 2012 Targeted Maximum Likelihood Estimation for Dynamic Treatment Regimes in Sequential Randomized Controlled Trials Paul Chaffee Mark J. van der Laan

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Adaptive Trial Designs

Adaptive Trial Designs Adaptive Trial Designs Wenjing Zheng, Ph.D. Methods Core Seminar Center for AIDS Prevention Studies University of California, San Francisco Nov. 17 th, 2015 Trial Design! Ethical:!eg.! Safety!! Efficacy!

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

UC Berkeley UC Berkeley Electronic Theses and Dissertations

UC Berkeley UC Berkeley Electronic Theses and Dissertations UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Super Learner Permalink https://escholarship.org/uc/item/4qn0067v Author Polley, Eric Publication Date 2010-01-01 Peer reviewed Thesis/dissertation

More information

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian

Estimation of Optimal Treatment Regimes Via Machine Learning. Marie Davidian Estimation of Optimal Treatment Regimes Via Machine Learning Marie Davidian Department of Statistics North Carolina State University Triangle Machine Learning Day April 3, 2018 1/28 Optimal DTRs Via ML

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017 Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2010 Paper 269 Diagnosing and Responding to Violations in the Positivity Assumption Maya L. Petersen

More information

Deductive Derivation and Computerization of Semiparametric Efficient Estimation

Deductive Derivation and Computerization of Semiparametric Efficient Estimation Deductive Derivation and Computerization of Semiparametric Efficient Estimation Constantine Frangakis, Tianchen Qian, Zhenke Wu, and Ivan Diaz Department of Biostatistics Johns Hopkins Bloomberg School

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Targeted Learning with Daily EHR Data

Targeted Learning with Daily EHR Data Targeted Learning with Daily EHR Data Oleg Sofrygin 1,2, Zheng Zhu 1, Julie A Schmittdiel 1, Alyce S. Adams 1, Richard W. Grant 1, Mark J. van der Laan 2, and Romain Neugebauer 1 arxiv:1705.09874v1 [stat.ap]

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 341 The Statistics of Sensitivity Analyses Alexander R. Luedtke Ivan Diaz Mark J. van der

More information

R Lab 6 - Inference. Harvard University. From the SelectedWorks of Laura B. Balzer

R Lab 6 - Inference. Harvard University. From the SelectedWorks of Laura B. Balzer Harvard University From the SelectedWorks of Laura B. Balzer Fall 2013 R Lab 6 - Inference Laura Balzer, University of California, Berkeley Maya Petersen, University of California, Berkeley Alexander Luedtke,

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect

More information

Mediation Analysis for Health Disparities Research

Mediation Analysis for Health Disparities Research Mediation Analysis for Health Disparities Research Ashley I Naimi, PhD Oct 27 2016 @ashley_naimi wwwashleyisaacnaimicom ashleynaimi@pittedu Orientation 24 Numbered Equations Slides at: wwwashleyisaacnaimicom/slides

More information

Comparative effectiveness of dynamic treatment regimes

Comparative effectiveness of dynamic treatment regimes Comparative effectiveness of dynamic treatment regimes An application of the parametric g- formula Miguel Hernán Departments of Epidemiology and Biostatistics Harvard School of Public Health www.hsph.harvard.edu/causal

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure arxiv:1706.02675v2 [stat.me] 2 Apr 2018 Laura B. Balzer, Wenjing Zheng,

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Methodology Marginal versus conditional causal effects Kazem Mohammad 1, Seyed Saeed Hashemi-Nazari 2, Nasrin Mansournia 3, Mohammad Ali Mansournia 1* 1 Department

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

Predicting the Treatment Status

Predicting the Treatment Status Predicting the Treatment Status Nikolay Doudchenko 1 Introduction Many studies in social sciences deal with treatment effect models. 1 Usually there is a treatment variable which determines whether a particular

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation

G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS G-Estimation G-ESTIMATION OF STRUCTURAL NESTED MODELS (CHAPTER 14) BIOS 776 1 14 G-Estimation ( G-Estimation of Structural Nested Models 14) Outline 14.1 The causal question revisited 14.2 Exchangeability revisited

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2016 Paper 352 Scalable Collaborative Targeted Learning for High-dimensional Data Cheng Ju Susan Gruber

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Methods for Multiple Treatment Comparisons

Methods for Multiple Treatment Comparisons Methods for Multiple Treatment Comparisons Laura Hatfield & Sherri Rose Assistant Professors Department of Health Care Policy Harvard Medical School September 30, 2015 Today s Agenda 12:00-12:20 Welcome,

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

Simulation-based robust IV inference for lifetime data

Simulation-based robust IV inference for lifetime data Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia 1 Myra Yazbeck 2 David Wensley 3 1 Department of Economics Carleton University 2 Department of Economics

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research

Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research Modern Statistical Learning Methods for Observational Data and Applications to Comparative Effectiveness Research Chapter 4: Efficient, doubly-robust estimation of an average treatment effect David Benkeser

More information