Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Similar documents
11. Bootstrap Methods

Nuisance Parameter Estimation in Survival Models

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

The Nonparametric Bootstrap

UNIVERSITÄT POTSDAM Institut für Mathematik

Bootstrap Confidence Intervals

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Continuous Time Survival in Latent Variable Models

Analytical Bootstrap Methods for Censored Data

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

ST745: Survival Analysis: Nonparametric methods

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

inferences on stress-strength reliability from lindley distributions

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Some Theoretical Properties and Parameter Estimation for the Two-Sided Length Biased Inverse Gaussian Distribution

A Non-parametric bootstrap for multilevel models

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Multilevel Statistical Models: 3 rd edition, 2003 Contents

4 Resampling Methods: The Bootstrap

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

The Bootstrap Suppose we draw aniid sample Y 1 ;:::;Y B from a distribution G. Bythe law of large numbers, Y n = 1 B BX j=1 Y j P! Z ydg(y) =E

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Generalized Information Reuse for Optimization Under Uncertainty with Non-Sample Average Estimators

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Introduction to Statistical Analysis

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Bias evaluation and reduction for sample-path optimization

The exact bootstrap method shown on the example of the mean and variance estimation

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Integrated likelihoods in survival models for highlystratified

IEOR E4703: Monte-Carlo Simulation

Estimation of the Conditional Variance in Paired Experiments

Defect Detection using Nonparametric Regression

Inferences on stress-strength reliability from weighted Lindley distributions

Locally Robust Semiparametric Estimation

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

Frequency Estimation of Rare Events by Adaptive Thresholding

Fast and robust bootstrap for LTS

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Block Bootstrap Prediction Intervals for Vector Autoregression

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

ECAS Summer Course. Quantile Regression for Longitudinal Data. Roger Koenker University of Illinois at Urbana-Champaign

Improving linear quantile regression for

The bootstrap and Markov chain Monte Carlo

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

Finite Population Correction Methods

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Remedial Measures for Multiple Linear Regression Models

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

Statistical methods for evaluating the linearity in assay validation y,z

Analysis of Type-II Progressively Hybrid Censored Data

Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn.

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

SPECIAL TOPICS IN REGRESSION ANALYSIS

ON THE NUMBER OF BOOTSTRAP REPETITIONS FOR BC a CONFIDENCE INTERVALS. DONALD W. K. ANDREWS and MOSHE BUCHINSKY COWLES FOUNDATION PAPER NO.

Estimating Causal Effects of Organ Transplantation Treatment Regimes

Chapter 1 Statistical Inference

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Quantile Regression Methods for Reference Growth Charts

interval forecasting

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

On Certain Indices for Ordinal Data with Unequally Weighted Classes

Lecture 13: Ensemble Methods

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

ITERATED BOOTSTRAP PREDICTION INTERVALS

Robust Outcome Analysis for Observational Studies Designed Using Propensity Score Matching

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

R and Stata for Causal Mechanisms

Uncertainty Quantification for Inverse Problems. November 7, 2011

Modelling Survival Data using Generalized Additive Models with Flexible Link

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

On the Accuracy of Bootstrap Confidence Intervals for Efficiency Levels in Stochastic Frontier Models with Panel Data

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

STAT440/840: Statistical Computing

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

STAT Section 2.1: Basic Inference. Basic Definitions

Resampling and the Bootstrap

Simulating Uniform- and Triangular- Based Double Power Method Distributions

How to make projections of daily-scale temperature variability in the future: a cross-validation study using the ENSEMBLES regional climate models

Uniform Inference for Conditional Factor Models with Instrumental and Idiosyncratic Betas

University of California, Berkeley

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Robustifying Trial-Derived Treatment Rules to a Target Population

ON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION

Symmetric Tests and Condence Intervals for Survival Probabilities and Quantiles of Censored Survival Data Stuart Barber and Christopher Jennison Depar

Transcription:

Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation study in the main paper, while Section 2 proves that the mean-squared error of the treatment specific conditional survival functions place an upper bound on the mean-squared error of the restricted mean treatment effect. 1 Furthering the Simulation Study We present several extensions to the simulation study. Section 1.1 in the Supporting Information investigates the bias and MSE of the restricted mean estimators at a larger sample size (n = 9). Section 1.2 in the Supporting Information compares the bootstrap variance to the Monte Carlo variance. Section 1.3 in the Supporting Information considers different approaches (including a bias correction) to bootstrap estimation of confidence intervals. Section 1.4 in the Supporting Information investigates the influence of time point selection in minimizing the Brier Score for stacked survival models. 1.1 Larger Sample Size We present the results of the simulation scenarios from the main paper with a larger sample size (n = 9 rather than n = 3) as the more robust approaches (i.e., splines and random 1

survival forests) should perform better with a larger sample size. Tables 1 and 2 present the relative bias ([Eˆγ(τ) γ(τ)]/γ(τ)), the mean-squared error (MSE) ratio relative to the Cox estimator, and the integrated squared survival error (ISSE) ratio relative to the Cox estimator. For the exponential scenarios, both Stacked estimators show improved relative bias and MSE ratio compared to a smaller sample size. In addition, the Splines estimator performs relatively better in terms of MSE and, in particular, ISSE. In contrast, the Cox estimator maintains, as expected, the same relative bias in the exponential scenario with non-linear covariate effects despite the larger sample size. For the gamma scenarios, each estimator experiences a small increase in relative bias. This is likely due to the misspecification of each parametric and semi-parametric model. Both Stacked estimators, which perform very similarly at a larger sample size, still possess good MSE compared to the Cox model, while the Splines estimator improved the most in terms of the MSE and ISSE ratios compared to the scenarios with a smaller sample size. The correlation between the mean-squared error of the conditional survival function (i.e., Integrated Squared Survival Error ) with the MSE of the restricted mean treatment effect is stronger at a larger sample size, while the correlation is attenuated for the bias of the restricted mean treatment effect. In particular, Figure 1 illustrates that the Pearson correlation between ISSE and the MSE of the restricted mean treatment effect is close to 1, while the correlation between ISSE and the bias of the restricted mean treatment effect and ISSE is much higher at a larger sample size (.65 rather than.2). 1.2 Standard Errors: Bootstrap versus MC To help explain the poor performance of the confidence intervals for the Stacked estimator with random survival forests (RSF), we compare the bootstrapped variance to the expected variance from the Monte Carlo simulations (i.e., MC Var = MSE Bias 2 ). If the bootstrap variance is less than the Monte Carlo variance, then the bootstrap based confidence intervals 2

Table 1: Simulation results for the exponential distributed scenarios: N = 9, N SIM = 1, and a marginal censoring of 3%. Percent Relative Bias is the ratio of the bias and true restricted mean difference. MSE Ratio is the ratio of MSE relative to the Cox estimator. ISSE Ratio is the ratio of integrated squared survival error, which corresponds to the mean-squared of the conditional survival function, relative to the Cox estimator. γ(2) = 2.965 Non- γ(2) = 2.69 γ(5) = 12.318 Non- γ(5) = 7.929 Percent Estimator Relative Bias MSE Ratio ISSE Ratio Cox 1. 1. Splines -2 1.12 1.23 Stacked -2 1.3 1.4 Stacked (with RSF) -2 1.4 1.4 Cox 9 1. 1. Splines 1.61.81 Stacked 2.57.8 Stacked (with RSF) 2.56.81 Cox 1. 1. Splines 1.8 1.12 Stacked.99 1. Stacked (with RSF) 1 1.1.99 Cox 9 1. 1. Splines 2.58.83 Stacked 3.58.82 Stacked (with RSF) 2.56.82 3

Table 2: Simulation results for the gamma distributed scenarios: N = 9, N SIM = 1, and a marginal censoring of 3%. Percent Relative Bias is the ratio of the bias and true restricted mean difference. MSE Ratio is the ratio of MSE relative to the Cox estimator. ISSE Ratio is the ratio of integrated squared survival error, which corresponds to the meansquared of the conditional survival function, relative to the Cox estimator. γ(2) =.753 Non- γ(2) =.931 γ(5) = 6.599 Non- γ(5) = 6.47 Percent Estimator Relative Bias MSE Ratio ISSE Ratio Cox -14 1. 1. Splines -1.98 1.14 Stacked -6.81.87 Stacked (with RSF) -7.83.89 Cox -18 1. 1. Splines -11.85.86 Stacked -8.7.81 Stacked (with RSF) -9.72.81 Cox -8 1. 1. Splines -6 1.3 1.13 Stacked -3.78.93 Stacked (with RSF) -3.81.94 Cox -15 1. 1. Splines -8.68.83 Stacked -7.58.8 Stacked (with RSF) -7.57.8 4

Figure 1: An investigation into the relationship between restricted mean treatment effect performance and the quality of the conditional survival function estimation with a larger sample size (n = 9). Absolute Bias..5 1. 1.5 Pearson Correlation =.34 Mean Squared Error (MSE)..5 1. 1.5 Pearson Correlation =.93..5 1. 1.5 Integrated Squared Survival Error (ISSE)..2.4.6.8 1. Integrated Squared Survival Error (ISSE) are more likely to perform poorly in terms of coverage probability. As illustrated in Table 3, the Stacked estimator with RSF possesses a bootstrapped variance up to 1% lower than the expected Monte Carlo variance for the gamma distributed scenarios (i.e., the situations that the Stacked estimator with RSF achieved less than 9% coverage). 1.3 Confidence Interval Construction Due to the poor confidence interval performance of the Stacked estimator with random survival forests (RSF), we investigate two additional bootstrap approaches to confidence interval estimation. We first consider estimating the confidence intervals with a Normal approximation based only on B = 5 bootstrap replications. The second approach is the Bias-Corrected bootstrap (referred to as BC Bootstrap ), which modifies the percentile based confidence intervals with a bias correction (Efron, 1981; Efron and Tibshirani, 1993). Confidence interval performance is assessed with two measures: the ratio average confi- 5

Table 3: The ratio of the Monte Carlo variance and the bootstrapped variance for the Boot Var simulation scenarios in the main paper (i.e., ). In particular, a ratio greater than one MC Var indicates that the bootstrap overestimates the variance, while a ratio less than one indicates that the bootstrap underestimates the variance. N = 3 and N SIM = 1 with a marginal censoring of 3%. τ = 2 τ = 5 Exponential Scenarios Gamma Scenarios Estimator Non- Non- Cox 1.13 1.4 1.2 1.3 Splines 1.22 1.11 1.15 1.12 Stacked 1.18 1.13 1.26 1.24 Stacked (with RSF) 1.1.89 1.13 1.8 Cox 1.2 1.6.94.94 Splines 1.7 1.1.95.95 Stacked 1.9 1.13 1.5 1.1 Stacked (with RSF).84.77.88.84 dence interval length (ACL) compared to the Cox estimator and coverage probability. Tables 4 and 5 present the performance of the two confidence interval methods for the simulation scenarios presented in the main paper. As noted in Section 1.2, the Stacked estimator with random survival forests (RSF) occasionally possesses a smaller than expected bootstrap variance. Thus, the Normal based confidence intervals still fail to achieve nominal coverage. Although, the Normal based confidence intervals perform better than the percentile based approach from the main paper. The bias-corrected confidence intervals are associated with substantially worse performance than the percentile based approach for each estimator. For example, the Stacked estimator with RSF possesses coverage levels as low as 59%. Note that Efron and Tibshirani (1993) warn that bias-corrected bootstrap confidence intervals are potentially dangerous in practice. 6

Table 4: The confidence intervals results for the exponential distributed scenarios in the main paper: N = 3, N SIM = 1, and a marginal censoring of 3%. The Normal Bootstrap estimates the variance with B = 5 bootstrap replicates, while the BC Bootstrap estimates the confidence interval with B = 3 bootstrap replicates. ACL Ratio is the ratio of average confidence interval lengths compared to the Cox estimator. The γ(2) and γ(5) are the true restricted mean treatment effects for τ = 2 and τ = 5, respectively. γ(2) = 2.965 Non- γ(2) = 2.69 γ(5) = 12.318 Non- γ(5) = 7.929 Normal Bootstrap BC Bootstrap Estimator ACL Ratio Cov. ACL Ratio Cov. Cox 1..96 1..95 Splines 1.12.96 1.12.95 Stacked 1.5.95 1.3.91 Stacked (with RSF).97.95.91.75 Cox 1..93 1..94 Splines 1.4.95 1.5.95 Stacked.98.95.98.92 Stacked (with RSF).86.92.81.63 Cox 1..95 1..94 Splines 1.6.95 1.1.94 Stacked 1..95 1.1.9 Stacked (with RSF).42.92.77.42 Cox 1..91 1..93 Splines.99.94 1.6.94 Stacked.96.93.98.92 Stacked (with RSF).76.88.78.8 7

Table 5: The confidence interval results for the gamma distributed scenarios in the main paper: N = 3, N SIM = 1, and a marginal censoring of 3%. The Normal Bootstrap estimates the variance with B = 5 bootstrap replicates, while the BC Bootstrap estimates the confidence interval with B = 3 bootstrap replicates. ACL Ratio is the ratio of average confidence interval lengths compared to the Cox estimator. The γ(2) and γ(5) are the true restricted mean treatment effects for τ = 2 and τ = 5, respectively. γ(2) =.753 Non- γ(2) =.931 γ(5) = 6.599 Non- γ(5) = 6.47 Normal Bootstrap BC Bootstrap Estimator ACL Ratio Cov. ACL Ratio Cov. Cox 1..94 1..94 Splines 1.18.96 1.2.92 Stacked 1.5.97 1.5.84 Stacked (with RSF) 1.2.96.94.63 Cox 1..92 1..93 Splines 1.15.95 1.15.93 Stacked 1.4.97 1.4.87 Stacked (with RSF).99.95.9.59 Cox 1..92 1..93 Splines 1.17.93 1.13.93 Stacked 1.11.95 1.3.87 Stacked (with RSF) 1.29.92.86.56 Cox 1..9 1..92 Splines 1.9.94 1.7.94 Stacked 1.4.94 1..89 Stacked (with RSF) 1.21.91.84.59 8

1.4 Stacking Question: Selection of t s As noted by Wey et al. (213), the selection of time points for minimizing the Brier Score can have a substantial effect on the performance of stacked survival models. Wey et al. (213) recommend minimizing over nine equally spaced quantiles of the observed event distribution. However, when estimating restricted mean treatment effects, stacked survival models may work better by restricting to time points within the support of interest. For example, in the simulation scenarios, the largest time point of interest is τ = 5. Yet Table 6 shows that for the gamma distributed scenarios the recommended nine equally spaced quantiles of the observed event distribution results in approximately 4% of these points beyond τ = 5. Tables 7 and 8 compares the performance of stacking when the nine equally spaced quantiles of the observed event distribution are restricted to times less than τ = 5 to the performance of stacking over nine equally spaced quantiles of the unrestricted observed event distribution. Restricting the time points has a negligible effect on bias and MSE, but it is associated with up to 9% larger ISSE in the gamma scenarios (i.e., the scenarios expected to benefit the most from restricting the event distribution). Table 6: The average percent of t s for the stacked survival model that occur beyond τ = 2 and τ = 5 for the simulation scenarios in the main paper. For each simulation iteration, the t s were the nine equally spaced quantiles of the observed event distribution. Distribution Scenario τ = 2 τ = 5 Exponential 58% 23% Non- 31% 11% Gamma 87% 4% Non- 79% 41% 9

Table 7: Simulation results for restricting the minimization procedure for the exponential distributed scenarios: N = 3, N SIM = 1, and a marginal censoring of 3%. Relative Bias is the ratio of the bias and true restricted mean difference. MSE Ratio is the ratio of MSE relative to the Stacked estimator. ISSE Ratio is the ratio of integrated squared survival error, which corresponds to the mean-squared of the conditional survival function, relative to the Stacked estimator. The Stacked estimators use nine equally spaced quantiles of the unrestricted observed event distribution, while the Res. Stacked estimators use nine equally spaced quantiles of the observed event distribution restricted to τ = 5. γ(2) = 2.965 Non- γ(2) = 2.69 γ(5) = 12.318 Non- γ(5) = 7.929 Estimator Relative Bias MSE Ratio ISSE Ratio Stacked -.2 1. 1. Res. Stacked -.2.99 1. Stacked (with RSF) -.1 1. 1. Res. Stacked (with RSF) -.1.98 1. Stacked.5 1. 1. Res. Stacked.4 1.2 1.1 Stacked (with RSF).6 1. 1. Res. Stacked (with RSF).5 1. 1.1 Stacked.2 1. 1. Res. Stacked.1 1. 1.1 Stacked (with RSF).3 1. 1. Res. Stacked (with RSF).2.98 1. Stacked.6 1. 1. Res. Stacked.7 1.1 1.1 Stacked (with RSF).6 1. 1. Res. Stacked (with RSF).6.99 1.2 1

Table 8: Simulation results for the gamma distributed scenarios: N = 3, N SIM = 1, and a marginal censoring of 3%. Relative Bias is the ratio of the bias and true restricted mean difference. MSE Ratio is the ratio of MSE relative to the Stacked estimator. ISSE Ratio is the ratio of integrated squared survival error, which corresponds to the meansquared of the conditional survival function, relative to the Stacked estimator. The Stacked estimators use nine equally spaced quantiles of the unrestricted observed event distribution, while the Res. Stacked estimators use nine equally spaced quantiles of the observed event distribution restricted to τ = 5. γ(2) =.753 Non- γ(2) =.931 γ(5) = 6.599 Non- γ(5) = 6.47 Estimator Relative Bias MSE Ratio ISSE Ratio Stacked -.1 1. 1. Res. Stacked. 1.2 1.9 Stacked (with RSF) -.4 1. 1. Res. Stacked (with RSF) -.3.99 1.8 Stacked -.3 1. 1. Res. Stacked -.3 1.1 1.2 Stacked (with RSF) -.7 1. 1. Res. Stacked (with RSF) -.7 1.2 1.6 Stacked -.1 1. 1. Res. Stacked -.2 1.1 1.4 Stacked (with RSF) -.2 1. 1. Res. Stacked (with RSF) -.3 1.1 1.5 Stacked -.6 1. 1. Res. Stacked -.6 1.3 1.3 Stacked (with RSF) -.7 1. 1. Res. Stacked (with RSF) -.8 1.2 1.5 11

2 Influence of the Conditional Survival Function This section proves that the mean-squared error of the treatment specific conditional survival functions places an upper bound on the MSE of the restricted mean treatment effect. Similar to Wey et al. (213), we define the mean-squared error for a conditional survival function estimator of the a th treatment as the integral of the squared error at time t over (, τ): MSE τ {Ŝ(a) ( x)} = E τ [Ŝ(a) (t x) S (a) (t x)] 2 dt. Note that the expectation is with respect to the covariate distribution and the sampling distribution of the estimator. A significant difference between this investigation and Wey et al. (213) is the addition of treatment a. Since restricted mean treatment effect estimation requires two conditional survival functions (one for each treatment), we take the simple average of the mean-squared error for both treatments. That is, the main paper uses MSE τ {Ŝ( x)} = 1 2 {MSE τ{ŝ() ( x)} + MSE τ {Ŝ(1) ( x)}}, (1) as the mean-squared error for an estimator of the conditional survival function. We can then show that the mean squared error of restricted mean treatment effect is bounded by the mean-squared error of the conditional survival function: Theorem 1. Let the mean squared error of a restricted mean treatment effect be MSE[ˆγ(τ)] = E{ˆγ(τ) γ(τ)} 2, then MSE[ˆγ(τ)] 2τ MSE τ {Ŝ( x)}. The result - which is a consequence of a sequential application of Jensen s, the triangle, and 12

Schwarz inequalities - helps justify the strong association of the restricted mean squared error with the performance of the conditional survival function estimator. The bias is also bounded, but the limit is less tight due to a positive variance term. This results in a less strong, but still positive, association of bias with the mean-squared error of the conditional survival function. Proof: We need to first make a distinction between the sampling distribution for the estimator of the conditional survival function [which we call the learning sample (LS) distribution] and the covariate distribution X. It is important to note that the learning sample distribution is independent of the covariate distribution (and the survival time distribution). E{ˆγ(τ) γ(τ)} 2 = E LS { E X LS τ E X LS τ E LS E X LS { τ τ [Ŝ(1) (t x) Ŝ() (t x)]dt } 2 [S (1) (t x) S () (t x)]dt [Ŝ(1) (t x) S (1) (t x)]dt+ (2) [S () (t x) Ŝ() (t x)]dt τ E LS,X ({ [Ŝ(1) (t x) S (1) (t x)]dt} 2 + (3) τ ) { [S () (t x) Ŝ() (t x)]dt} 2 ( τ τ E LS,X [Ŝ(1) (t x) S (1) (t x)] 2 dt+ (4) τ ) [S () (t x) Ŝ() (t x)] 2 dt } 2 = τ {MSE τ {Ŝ() ( x)} + MSE τ {Ŝ(1) ( x)}} = 2τ MSE τ {Ŝ( x)}, where line (2) holds by Jensen s inequality, line (3) holds by the triangle inequality, and line (4) holds by Schwarz s inequality. 13

References Efron, B. (1981). Nonparametric standard errors and confidence intervals. The Canadian Journal of Statistics 9, 139 172. Efron, B. and Tibshirani, R. (1993). An introduction to the bootstrap. Chapman and Hall. Wey, A., Connett, J., and Rudser, K. (213). Stacking survival models. arxiv:139.7936v3. 14