Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Size: px

Start display at page:

Download "Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University"

Christiana Mathews
5 years ago
Views:

1 Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University

2 Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates using chosen model (d) analyze as if pre-chosen Today: include model selection process in the analysis Question: Effects on standard errors, confidence intervals, etc.? Two Examples: nonparametric, parametric Model Selection Estimation Bootstrap Smoothing 1

3 Cholesterol Data n = 164 men took Cholestyramine for 7 years x = compliance measure (adjusted: x N(0, 1)) y = cholesterol decrease Regression y on x? [wish to estimate: µ j = E{y x j }, j = 1, 2,..., n] Model Selection Estimation Bootstrap Smoothing 2

4 Cholesterol data, n=164 subjects: cholesterol decrease plotted versus adjusted compliance; Green curve is OLS cubic regression; Red points indicate 5 featured subjects compliance cholesterol decrease Model Selection Estimation Bootstrap Smoothing 3

5 Regression Model C p Criterion C p Selection Criterion y n 1 = X n m β m 1 y X ˆβ 2 + 2mσ 2 + e n 1 [ ei (0, σ 2 ) ] ˆβ = OLS estimate, m = degrees of freedom Model Selection: From possible models X 1, X 2, X 3,... choose the one minimizing C p. Then use OLS estimate from chosen model. Model Selection Estimation Bootstrap Smoothing 4

6 C p for Cholesterol Data Model df C p (Boot %) M 1 (linear) (19%) M 2 (quad) (12%) M 3 (cubic) (34%) M 4 (quartic) (8%) M 5 (quintic) (21%) M 6 (sextic) (6%) (σ = 22 from full model M 6 ) Model Selection Estimation Bootstrap Smoothing 5

7 Nonparametric Bootstrap Analysis data = {(x i, y i ), i = 1, 2,..., n = 164} gave original estimate ˆµ = X 3 ˆβ 3 Bootstrap data set data = { (x j, y j ), j = 1, 2,..., n } where (x j, y j ) drawn randomly and with replacement from data: data Cp m OLS ˆβ m ˆµ = X m ˆβ m I did this all B = 4000 times. Model Selection Estimation Bootstrap Smoothing 6

8 B=4000 nonparametric bootstrap replications for the model selected regression estimate of Subject 1; boot (m,stdev)=( 2.63,8.02); 76% of the replications less than original estimate 2.71 Frequency ^ 2.71 ^ bootstrap estimates for subject 1 Red triangles are 2.5th and 97.5th boot percentiles Model Selection Estimation Bootstrap Smoothing 7

9 Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* '; Red curves level surfaces of equal estimation for thetahat=t(y) y thetahat =t(y) Model Selection Estimation Bootstrap Smoothing 8

10 Model Selection Estimation Bootstrap Smoothing 9

11 Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps; Red bars indicate selection proportions for Models 1 6 only 1/3 of the bootstrap replications chose Model 3 selected model subject 1 estimates Model3 ********* MODEL 3 Model Selection Estimation Bootstrap Smoothing 10

12 Bootstrap Confidence Intervals Standard: Percentile: ˆµ ± 1.96 ŝe [ ˆµ (.025), ˆµ (.975)] Smoothed Standard: µ ± 1.96 se BCa/ABC: corrects percentiles for bias and changing se Model Selection Estimation Bootstrap Smoothing 11

13 95% Bootstrap Confidence Intervals for Subject 1 Confidence Interval Standard ( 13.0,18.4) Percentile ( 17.8,13.5) Smoothed ( 13.3,8.0) Model Selection Estimation Bootstrap Smoothing 12

14 Bootstrap Smoothing Idea Replace original estimator t(y) with bootstrap average s(y) = B t ( ) / y i B i=1 Model averaging Same as bagging ( bootstrap aggregation Breiman) Removes discontinuities Reduces variance Model Selection Estimation Bootstrap Smoothing 13

15 Accuracy Theorem Notation s 0 = s(y), t i = t(y ), i = 1, 2,... B i Y = # of times jth data point appears in ith boot sample ij cov j = B i=1 Y ij (t i s ) / [ 0 B covariance Y ij with ] t i Theorem The delta method standard deviation estimate for s 0 is always B i=1 sd = n cov 2 j j=1 1/2 1/2 ( t i s ) 2 / 0 B, the boot stdev for t(y)., Model Selection Estimation Bootstrap Smoothing 14

16 Projection Interpretation 0 Model Selection Estimation Bootstrap Smoothing 15

17 Standard Deviation of smoothed estimate relative to original (Red) for five subjects; green line is stdev Naive Cubic Model Relative stdev * * * * * Subject number bottom numbers show original standard deviations Model Selection Estimation Bootstrap Smoothing 16

18 How Many Bootstrap Replications Are Enough? How accurate is sd? Jackknife Divide the 4000 bootreps t into 20 groups of 200 each i Recompute sd with each group removed in turn Jackknife gave coef variation ( sd ) 0.02 for all 164 subjects (could have stopped at B = 500) Model Selection Estimation Bootstrap Smoothing 17

19 How Stable Are The Standard Deviations? Smoothed standard interval µ ± 1.96 sd assumes sd stable Acceleration ã = d sd/d µ, ã 1 6 / ( cov 3 j cov 2 j ) 3/2 ã 0.02 for all 164 subjects Model Selection Estimation Bootstrap Smoothing 18

20 Model Probability Estimates 34% of the 4000 bootreps chose the cubic model Poor man s Bayes posterior prob for cubic How accurate is that 34%? Apply accuracy theorem to indicator function for choosing cubic Model Selection Estimation Bootstrap Smoothing 19

21 Model Boot % ± Standard Error M 1 (linear) 19% ±24 M 2 (quad) 12% ±18 M 3 (cubic) 34% ±24 M 4 (quartic) 8% ±14 M 5 (quintic) 21% ±27 M 6 (sextic) 6% ±6 Model Selection Estimation Bootstrap Smoothing 20

22 The Supernova Data data = { (x j, y j ), j = 1, 2,..., n = 39 } y j = absolute magnitude of Type Ia supernova x j = vector of 10 spectral energies ( nm) [ ] ind Full Model y = X β + e e i N(0, 1) Model Selection Estimation Bootstrap Smoothing 21

23 Ordinary Least Squares Prediction Full Model y N 39 (Xβ, I) OLS Estimates ˆµ OLS = X ˆβ OLS [ arg min y Xβ 2 ] Naive R 2 = 0.82 Adjusted R 2 = 0.69 [ = ĉor ( ˆµOLS, y ) 2 ] [ R 2 ( 1 R 2) ] m n m where m = 10 the df Model Selection Estimation Bootstrap Smoothing 22

24 Adjusted absolute magnitudes for 39 Type1A supernovas plotted versus OLS predictions from 10 spectral measurments; Naive R2 (squared correlation)=.82; Adjusted R2=.62 Red points are 5 featured cases Full Model OLS Prediction > Absolute Magnitude > Model Selection Estimation Bootstrap Smoothing 23

25 Lasso Model Selection Lasso estimate is ˆβ minimizing y Xβ 2 + λ p 1 β k Shrinks OLS estimates toward zero (all the way for some) Degrees of freedom m = number of nonzero ˆβ k s Model Selection: Choose λ (or m) to maximize adjusted R 2. Then ˆµ = X ˆβ m. Model Selection Estimation Bootstrap Smoothing 24

26 Lasso for the Supernova Data λ m (# nonzero ˆβ k s) Naive R 2 Adjusted R (Selected) (OLS) Model Selection Estimation Bootstrap Smoothing 25

27 Parametric Bootstrap Smoothing Original Estimates y Lasso m, ˆβ m ˆµ = X ˆβ m Full Model Bootstrap y N 39 ( ˆµ OLS, I) y m, ˆβ m ˆµ = X ˆβ m I did this all B = 4000 times. t ik = ˆµ ik Smoothed Estimates s k = 4000 i=1 t ik/ 4000 [k = 1, 2,..., 39] Model Selection Estimation Bootstrap Smoothing 26

28 Parametric Accuracy Theorem Theorem is The delta method standard deviation estimate for s k ŝd k = [ ĉov k G ĉov k ] 1/2, where G = X X and ĉov k is bootstrap covariance between ˆβ OLS and t k. Always less than the bootstrap estimate of stdev for t k Projection into L ( ) ˆβ OLS Exponential families Model Selection Estimation Bootstrap Smoothing 27

29 Standard Deviation of smoothed estimate relative to original (Red) for five Supernova; green line using Bootstrap reweighting Relative stdev * * * * * Supernova number bottom numbers show original standard deviations Model Selection Estimation Bootstrap Smoothing 28

30 Better Confidence Intervals Smoothed standard intervals and percentile intervals have coverage errors of order O ( 1 / ) n. ABC intervals have errors O(1/n): corrects for bias and acceleration (change in stdev as estimate varies). Uses local reweighting for 2nd order correction Model Selection Estimation Bootstrap Smoothing 29

31 95% intervals five selected SNs (subtracting smoothed ests); ABC black; smoothed standard red. centered 95%iInterval supernova index Model Selection Estimation Bootstrap Smoothing 30

32 Brute Force Simulation Sample 500 times: y N ( ˆµ OLS, I ) ; gives ˆµ OLS Resample B = 1000 times: y N ( ˆµ OLS, I) Use ABC to get sd, bias, acceleration Calculate ABC coverage of one-sided interval (, s k ) [s k the original smoothed estimate] Should be uniform [0, 1] Model Selection Estimation Bootstrap Smoothing 31

33 K=500 Brute force sims, B=1000 bootreps each; SN1 Supernova 2 Frequency ^ ^ Frequency ^ ^ Asl Asl Supernova 3 Supernova 4 Frequency ^ ^ Frequency ^ ^ Asl Asl Model Selection Estimation Bootstrap Smoothing 32

34 References Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2012). Valid post-selection inference. Submitted Ann. Statist. http: //stat.wharton.upenn.edu/ zhangk/posi-submit.pdf; conservative frequentist intervals à la Tukey, Scheffé. Buja, A. and Stuetzle, W. (2006). Observations on bagging. Statist. Sinica 16: , more on bagging. DiCiccio, T. J. and Efron, B. (1992). More accurate confidence intervals in exponential families. Biometrika 79: , ABC confidence intervals. DiCiccio, T. J. and Efron, B. (1996). Bootstrap confidence Model Selection Estimation Bootstrap Smoothing 33

35 intervals. Statist. Sci. 11: , with comments and a rejoinder by the authors. Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. Springer Series in Statistics. New York: Springer, 2nd ed., Section 8.7, bagging. Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98: , model selection asymptotics. Model Selection Estimation Bootstrap Smoothing 34

MODEL SELECTION, ESTIMATION, AND BOOTSTRAP SMOOTHING. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California

MODEL SELECTION, ESTIMATION, AND BOOTSTRAP SMOOTHING By Bradley Efron Technical Report 262 August 2012 Division of Biostatistics STANFORD UNIVERSITY Stanford, California MODEL SELECTION, ESTIMATION, AND