Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Similar documents
MODEL SELECTION, ESTIMATION, AND BOOTSTRAP SMOOTHING. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California

Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn.

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

The automatic construction of bootstrap confidence intervals

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

The bootstrap and Markov chain Monte Carlo

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Bootstrap, Jackknife and other resampling methods

Finite Population Correction Methods

Regression Shrinkage and Selection via the Lasso

Selective Inference for Effect Modification

Bootstrap & Confidence/Prediction intervals

Bootstrap Confidence Intervals

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

Statistical Inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Lawrence D. Brown* and Daniel McCarthy*

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

Iterative Selection Using Orthogonal Regression Techniques

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Inference After Variable Selection

UNIVERSITÄT POTSDAM Institut für Mathematik

Spatial Lasso with Applications to GIS Model Selection

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

The Nonparametric Bootstrap

Data Mining Stat 588

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.

Regression, Ridge Regression, Lasso

4 Resampling Methods: The Bootstrap

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Classification using stochastic ensembles

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

Bagging and the Bayesian Bootstrap

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

1/sqrt(B) convergence 1/B convergence B

Regularization Paths

Rank conditional coverage and confidence intervals in high dimensional problems

AFT Models and Empirical Likelihood

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

11. Bootstrap Methods

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

A better way to bootstrap pairs

Least Angle Regression, Forward Stagewise and the Lasso

A Bias Correction for the Minimum Error Rate in Cross-validation

Heteroskedasticity. Part VII. Heteroskedasticity

Bayesian inference and the parametric bootstrap

Empirical Bayes Deconvolution Problem

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

High-dimensional regression modeling

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

7 Estimation. 7.1 Population and Sample (P.91-92)

High-dimensional regression

Generalized Elastic Net Regression

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Analytical Bootstrap Methods for Censored Data

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

A Study of Relative Efficiency and Robustness of Classification Methods

Generalized Cp (GCp) in a Model Lean Framework

Defect Detection using Nonparametric Regression

Or How to select variables Using Bayesian LASSO

Interval Estimation III: Fisher's Information & Bootstrapping

Master s Written Examination

STAT 518 Intro Student Presentation

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

CMSC858P Supervised Learning Methods

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Lecture 14: Variable Selection - Beyond LASSO

ASSESSING A VECTOR PARAMETER

ITERATED BOOTSTRAP PREDICTION INTERVALS

Homoskedasticity. Var (u X) = σ 2. (23)

Prediction Intervals for Regression Models

NONINFORMATIVE NONPARAMETRIC BAYESIAN ESTIMATION OF QUANTILES

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

. a m1 a mn. a 1 a 2 a = a n

Resampling and the Bootstrap

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Stat 5101 Lecture Notes

Biostatistics Advanced Methods in Biostatistics IV

Constructing Prediction Intervals for Random Forests

Regularization Paths. Theme

Knockoffs as Post-Selection Inference

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Transcription:

Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University

Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates using chosen model (d) analyze as if pre-chosen Today: include model selection process in the analysis Question: Effects on standard errors, confidence intervals, etc.? Two Examples: nonparametric, parametric Model Selection Estimation Bootstrap Smoothing 1

Cholesterol Data n = 164 men took Cholestyramine for 7 years x = compliance measure (adjusted: x N(0, 1)) y = cholesterol decrease Regression y on x? [wish to estimate: µ j = E{y x j }, j = 1, 2,..., n] Model Selection Estimation Bootstrap Smoothing 2

2 1 0 1 2 0 50 100 Cholesterol data, n=164 subjects: cholesterol decrease plotted versus adjusted compliance; Green curve is OLS cubic regression; Red points indicate 5 featured subjects compliance cholesterol decrease 1 2 3 4 5 Model Selection Estimation Bootstrap Smoothing 3

Regression Model C p Criterion C p Selection Criterion y n 1 = X n m β m 1 y X ˆβ 2 + 2mσ 2 + e n 1 [ ei (0, σ 2 ) ] ˆβ = OLS estimate, m = degrees of freedom Model Selection: From possible models X 1, X 2, X 3,... choose the one minimizing C p. Then use OLS estimate from chosen model. Model Selection Estimation Bootstrap Smoothing 4

C p for Cholesterol Data Model df C p 80000 (Boot %) M 1 (linear) 2 1132 (19%) M 2 (quad) 3 1412 (12%) M 3 (cubic) 4 667 (34%) M 4 (quartic) 5 1591 (8%) M 5 (quintic) 6 1811 (21%) M 6 (sextic) 7 2758 (6%) (σ = 22 from full model M 6 ) Model Selection Estimation Bootstrap Smoothing 5

Nonparametric Bootstrap Analysis data = {(x i, y i ), i = 1, 2,..., n = 164} gave original estimate ˆµ = X 3 ˆβ 3 Bootstrap data set data = { (x j, y j ), j = 1, 2,..., n } where (x j, y j ) drawn randomly and with replacement from data: data Cp m OLS ˆβ m ˆµ = X m ˆβ m I did this all B = 4000 times. Model Selection Estimation Bootstrap Smoothing 6

B=4000 nonparametric bootstrap replications for the model selected regression estimate of Subject 1; boot (m,stdev)=( 2.63,8.02); 76% of the replications less than original estimate 2.71 Frequency 0 50 100 150 200 250 ^ 2.71 ^ 30 20 10 0 10 20 bootstrap estimates for subject 1 Red triangles are 2.5th and 97.5th boot percentiles Model Selection Estimation Bootstrap Smoothing 7

Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* '; Red curves level surfaces of equal estimation for thetahat=t(y) 3 2 1 0 1 2 3 y thetahat =t(y) 3 2 1 0 1 2 3 Model Selection Estimation Bootstrap Smoothing 8

Model Selection Estimation Bootstrap Smoothing 9

1 2 3 4 5 6 40 30 20 10 0 10 20 30 Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps; Red bars indicate selection proportions for Models 1 6 only 1/3 of the bootstrap replications chose Model 3 selected model subject 1 estimates Model3 ********* MODEL 3 Model Selection Estimation Bootstrap Smoothing 10

Bootstrap Confidence Intervals Standard: Percentile: ˆµ ± 1.96 ŝe [ ˆµ (.025), ˆµ (.975)] Smoothed Standard: µ ± 1.96 se BCa/ABC: corrects percentiles for bias and changing se Model Selection Estimation Bootstrap Smoothing 11

95% Bootstrap Confidence Intervals for Subject 1 Confidence Interval 20 10 0 10 20 Standard ( 13.0,18.4) Percentile ( 17.8,13.5) Smoothed ( 13.3,8.0) 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Model Selection Estimation Bootstrap Smoothing 12

Bootstrap Smoothing Idea Replace original estimator t(y) with bootstrap average s(y) = B t ( ) / y i B i=1 Model averaging Same as bagging ( bootstrap aggregation Breiman) Removes discontinuities Reduces variance Model Selection Estimation Bootstrap Smoothing 13

Accuracy Theorem Notation s 0 = s(y), t i = t(y ), i = 1, 2,... B i Y = # of times jth data point appears in ith boot sample ij cov j = B i=1 Y ij (t i s ) / [ 0 B covariance Y ij with ] t i Theorem The delta method standard deviation estimate for s 0 is always B i=1 sd = n cov 2 j j=1 1/2 1/2 ( t i s ) 2 / 0 B, the boot stdev for t(y)., Model Selection Estimation Bootstrap Smoothing 14

Projection Interpretation 0 Model Selection Estimation Bootstrap Smoothing 15

Standard Deviation of smoothed estimate relative to original (Red) for five subjects; green line is stdev Naive Cubic Model Relative stdev 0.0 0.2 0.4 0.6 0.8 1.0 1.2 * * * 7.9 3.9 4.1 4.7 6.8 * * 1 2 3 4 5 Subject number bottom numbers show original standard deviations Model Selection Estimation Bootstrap Smoothing 16

How Many Bootstrap Replications Are Enough? How accurate is sd? Jackknife Divide the 4000 bootreps t into 20 groups of 200 each i Recompute sd with each group removed in turn Jackknife gave coef variation ( sd ) 0.02 for all 164 subjects (could have stopped at B = 500) Model Selection Estimation Bootstrap Smoothing 17

How Stable Are The Standard Deviations? Smoothed standard interval µ ± 1.96 sd assumes sd stable Acceleration ã = d sd/d µ, ã 1 6 / ( cov 3 j cov 2 j ) 3/2 ã 0.02 for all 164 subjects Model Selection Estimation Bootstrap Smoothing 18

Model Probability Estimates 34% of the 4000 bootreps chose the cubic model Poor man s Bayes posterior prob for cubic How accurate is that 34%? Apply accuracy theorem to indicator function for choosing cubic Model Selection Estimation Bootstrap Smoothing 19

Model Boot % ± Standard Error M 1 (linear) 19% ±24 M 2 (quad) 12% ±18 M 3 (cubic) 34% ±24 M 4 (quartic) 8% ±14 M 5 (quintic) 21% ±27 M 6 (sextic) 6% ±6 Model Selection Estimation Bootstrap Smoothing 20

The Supernova Data data = { (x j, y j ), j = 1, 2,..., n = 39 } y j = absolute magnitude of Type Ia supernova x j = vector of 10 spectral energies (350 850nm) [ ] ind Full Model y = X β + e e i N(0, 1) 39 10 Model Selection Estimation Bootstrap Smoothing 21

Ordinary Least Squares Prediction Full Model y N 39 (Xβ, I) OLS Estimates ˆµ OLS = X ˆβ OLS [ arg min y Xβ 2 ] Naive R 2 = 0.82 Adjusted R 2 = 0.69 [ = ĉor ( ˆµOLS, y ) 2 ] [ R 2 ( 1 R 2) ] m n m where m = 10 the df Model Selection Estimation Bootstrap Smoothing 22

4 2 0 2 4 6 4 2 0 2 4 6 Adjusted absolute magnitudes for 39 Type1A supernovas plotted versus OLS predictions from 10 spectral measurments; Naive R2 (squared correlation)=.82; Adjusted R2=.62 Red points are 5 featured cases Full Model OLS Prediction > Absolute Magnitude > 1 2 3 4 5 Model Selection Estimation Bootstrap Smoothing 23

Lasso Model Selection Lasso estimate is ˆβ minimizing y Xβ 2 + λ p 1 β k Shrinks OLS estimates toward zero (all the way for some) Degrees of freedom m = number of nonzero ˆβ k s Model Selection: Choose λ (or m) to maximize adjusted R 2. Then ˆµ = X ˆβ m. Model Selection Estimation Bootstrap Smoothing 24

Lasso for the Supernova Data λ m (# nonzero ˆβ k s) Naive R 2 Adjusted R 2 63.0 1.17.12 12.9 4.77.72 3.56 7.81.73 (Selected) 0.50 9.82.71 0 10.82.69 (OLS) Model Selection Estimation Bootstrap Smoothing 25

Parametric Bootstrap Smoothing Original Estimates y Lasso m, ˆβ m ˆµ = X ˆβ m Full Model Bootstrap y N 39 ( ˆµ OLS, I) y m, ˆβ m ˆµ = X ˆβ m I did this all B = 4000 times. t ik = ˆµ ik Smoothed Estimates s k = 4000 i=1 t ik/ 4000 [k = 1, 2,..., 39] Model Selection Estimation Bootstrap Smoothing 26

Parametric Accuracy Theorem Theorem is The delta method standard deviation estimate for s k ŝd k = [ ĉov k G ĉov k ] 1/2, where G = X X and ĉov k is bootstrap covariance between ˆβ OLS and t k. Always less than the bootstrap estimate of stdev for t k Projection into L ( ) ˆβ OLS Exponential families Model Selection Estimation Bootstrap Smoothing 27

Standard Deviation of smoothed estimate relative to original (Red) for five Supernova; green line using Bootstrap reweighting Relative stdev 0.0 0.2 0.4 0.6 0.8 1.0 1.2 * * * 0.37 0.36 0.53 0.24 0.6 * * 1 2 3 4 5 Supernova number bottom numbers show original standard deviations Model Selection Estimation Bootstrap Smoothing 28

Better Confidence Intervals Smoothed standard intervals and percentile intervals have coverage errors of order O ( 1 / ) n. ABC intervals have errors O(1/n): corrects for bias and acceleration (change in stdev as estimate varies). Uses local reweighting for 2nd order correction Model Selection Estimation Bootstrap Smoothing 29

95% intervals five selected SNs (subtracting smoothed ests); ABC black; smoothed standard red. centered 95%iInterval 1.0 0.5 0.0 0.5 1.0 1 2 3 4 5 1 2 3 4 5 supernova index Model Selection Estimation Bootstrap Smoothing 30

Brute Force Simulation Sample 500 times: y N ( ˆµ OLS, I ) ; gives ˆµ OLS Resample B = 1000 times: y N ( ˆµ OLS, I) Use ABC to get sd, bias, acceleration Calculate ABC coverage of one-sided interval (, s k ) [s k the original smoothed estimate] Should be uniform [0, 1] Model Selection Estimation Bootstrap Smoothing 31

K=500 Brute force sims, B=1000 bootreps each; SN1 Supernova 2 Frequency 0 10 20 30 40 50 ^ ^ Frequency 0 10 20 30 40 50 ^ ^ 0.0 0.2 0.4 0.6 0.8 1.0 Asl 0.0 0.2 0.4 0.6 0.8 1.0 Asl Supernova 3 Supernova 4 Frequency 0 10 20 30 40 ^ ^ Frequency 0 10 20 30 40 50 ^ ^ 0.0 0.2 0.4 0.6 0.8 1.0 Asl 0.0 0.2 0.4 0.6 0.8 1.0 Asl Model Selection Estimation Bootstrap Smoothing 32

References Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2012). Valid post-selection inference. Submitted Ann. Statist. http: //stat.wharton.upenn.edu/ zhangk/posi-submit.pdf; conservative frequentist intervals à la Tukey, Scheffé. Buja, A. and Stuetzle, W. (2006). Observations on bagging. Statist. Sinica 16: 323 351, more on bagging. DiCiccio, T. J. and Efron, B. (1992). More accurate confidence intervals in exponential families. Biometrika 79: 231 245, ABC confidence intervals. DiCiccio, T. J. and Efron, B. (1996). Bootstrap confidence Model Selection Estimation Bootstrap Smoothing 33

intervals. Statist. Sci. 11: 189 228, with comments and a rejoinder by the authors. Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. Springer Series in Statistics. New York: Springer, 2nd ed., Section 8.7, bagging. Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98: 879 899, model selection asymptotics. Model Selection Estimation Bootstrap Smoothing 34