PoSI Valid Post-Selection Inference

Similar documents
PoSI and its Geometry

Construction of PoSI Statistics 1

Post-Selection Inference for Models that are Approximations

VALID POST-SELECTION INFERENCE

Variable Selection Insurance aka Valid Statistical Inference After Model Selection

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

VALID POST-SELECTION INFERENCE

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

On Post-selection Confidence Intervals in Linear Regression

Post-Selection Inference

Selective Inference for Effect Modification

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

Mallows Cp for Out-of-sample Prediction

Recent Developments in Post-Selection Inference

Day 4: Shrinkage Estimators

ECON 5350 Class Notes Functional Form and Structural Change

Linear models and their mathematical foundations: Simple linear regression

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Generalized Cp (GCp) in a Model Lean Framework

Lawrence D. Brown* and Daniel McCarthy*

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Regression, Ridge Regression, Lasso

Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn.

Multiple Regression Analysis. Part III. Multiple Regression Analysis

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Homoskedasticity. Var (u X) = σ 2. (23)

Linear Model Selection and Regularization

arxiv: v2 [math.st] 9 Feb 2017

Formal Statement of Simple Linear Regression Model

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

UNIVERSITÄT POTSDAM Institut für Mathematik

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

School of Mathematical Sciences. Question 1. Best Subsets Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

STAT 3A03 Applied Regression With SAS Fall 2017

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Consistent high-dimensional Bayesian variable selection via penalized credible regions

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Machine Learning for OR & FE

Chapter 1. Linear Regression with One Predictor Variable

Ch 3: Multiple Linear Regression

The Simple Linear Regression Model

Introductory Econometrics

Econ 510 B. Brown Spring 2014 Final Exam Answers

Chapter 3 Multiple Regression Complete Example

Review of Statistics 101

Regression Diagnostics for Survey Data

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Multicollinearity and A Ridge Parameter Estimation Approach

Inference For High Dimensional M-estimates: Fixed Design Results

A discussion on multiple regression models

Business Statistics. Lecture 10: Correlation and Linear Regression

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Data Mining Stat 588

Machine Learning for OR & FE

Lecture 18: Simple Linear Regression

Specification Errors, Measurement Errors, Confounding

STAT 540: Data Analysis and Regression

A Non-parametric bootstrap for multilevel models

Lecture 17 May 11, 2018

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Statistical Inference

Correlation and Regression Bangkok, 14-18, Sept. 2015

Lecture 6 Multiple Linear Regression, cont.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Lecture 4 Multiple linear regression

Simple and Multiple Linear Regression

Categorical Predictor Variables

Introduction to Statistical modeling: handout for Math 489/583

Machine Learning Linear Regression. Prof. Matteo Matteucci

Knockoffs as Post-Selection Inference

CS6220: DATA MINING TECHNIQUES

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Selective Inference for Effect Modification: An Empirical Investigation

Inference After Variable Selection

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

Variable Selection and Model Building

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Model comparison and selection

ECNS 561 Multiple Regression Analysis

Multiple linear regression S6

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Exam Applied Statistical Regression. Good Luck!

Exact Post Model Selection Inference for Marginal Screening

Regression diagnostics

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Linear Regression. Junhui Qian. October 27, 2014

[y i α βx i ] 2 (2) Q = i=1

UNIVERSITY OF TORONTO Faculty of Arts and Science

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Regression Discontinuity Designs

Quantitative Methods I: Regression diagnostics

Empirical Economic Research, Part II

Transcription:

PoSI Valid Post-Selection Inference Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia, USA UF 2014/01/18

Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Simmons, Nelson, Simonsohn (2011, Psychol.Sci): False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Simmons, Nelson, Simonsohn (2011, Psychol.Sci): False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Many potential causes two major ones: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Simmons, Nelson, Simonsohn (2011, Psychol.Sci): False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Many potential causes two major ones: publication bias: file drawer problem (Rosenthal 1979) statistical biases: researcher degrees of freedom (SNS 2011) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: All three modes of model selection may be used in much empirical research. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: All three modes of model selection may be used in much empirical research. Ironically, the most thorough and competent data analysts may also be the ones who produce the most spurious findings. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: All three modes of model selection may be used in much empirical research. Ironically, the most thorough and competent data analysts may also be the ones who produce the most spurious findings. If we develop valid post-selection inference for adaptive Lasso, say, it won t solve the problem because few empirical researchers would commit themselves a priori to one formal selection method and nothing else. Meta-Selection Problem Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Conventional inference after variable selection ignores the fact that the model was obtained through a stochastic selection process. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Conventional inference after variable selection ignores the fact that the model was obtained through a stochastic selection process. Stochastic variable selection distorts sampling distributions of the post-selection parameter estimates: Most selection procedures search for strong, hence highly significant looking predictors. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Conventional inference after variable selection ignores the fact that the model was obtained through a stochastic selection process. Stochastic variable selection distorts sampling distributions of the post-selection parameter estimates: Most selection procedures search for strong, hence highly significant looking predictors. Some forms of the problem has been known for decades: Koopmans (1949); Buehler and Fedderson (1963); Brown (1967); and Olshen (1973); Sen (1979); Sen and Saleh (1987); Dijkstra and Veldkamp (1988); Arabatzis et al. (1989); Hurvich and Tsai (1990); Regal and Hook (1991); Pötscher (1991); Chiou and Han (1995a,b); Giles (1992); Giles and Srivastava (1993); Kabaila (1998); Brockwell and Gordeon (2001); Leeb and Pötscher (2003; 2005; 2006a; 2006b; 2008a; 2008b); Kabaila (2005); Kabaila and Leeb (2006): Berk, Brown and Zhao (2009); Kabaila (2009). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Response: log-length of sentences Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Response: log-length of sentences p = 11 covariates (predictors, explanatory variables): race gender initial age marital status employment status seriousness of crime psychological problems education drug related alcohol usage prior record Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Response: log-length of sentences p = 11 covariates (predictors, explanatory variables): race gender initial age marital status employment status seriousness of crime psychological problems education drug related alcohol usage prior record What variables should be included? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

Example: Length of Criminal Sentence (contd.) All-subset search with BIC chooses a model ˆM with seven variables: initial age gender employment status seriousness of crime drugs related alcohol usage prior records Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 6 / 32

Example: Length of Criminal Sentence (contd.) All-subset search with BIC chooses a model ˆM with seven variables: initial age gender employment status seriousness of crime drugs related alcohol usage prior records t-statistics of selected covariates, in descending order: talcohol = 3.95; tprior records = 3.59; tseriousness = 3.57; tdrugs = 3.31; temployment = 3.04; tinitial age = 2.56; tgender = 2.33. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 6 / 32

Example: Length of Criminal Sentence (contd.) All-subset search with BIC chooses a model ˆM with seven variables: initial age gender employment status seriousness of crime drugs related alcohol usage prior records t-statistics of selected covariates, in descending order: talcohol = 3.95; tprior records = 3.59; tseriousness = 3.57; tdrugs = 3.31; temployment = 3.04; tinitial age = 2.56; tgender = 2.33. Can we use the cutoff t.975,250 8 = 1.97? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 6 / 32

Linear Model Inference and Variable Selection Y = Xβ + ɛ X = fixed design matrix, N p, N > p, full rank. ɛ N N ( 0, σ 2 I N ) In textbooks: 1 Variables selected 2 Data seen 3 Inference produced In common practice: 1 Data seen 2 Variables selected 3 Inference produced Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 7 / 32

Linear Model Inference and Variable Selection Y = Xβ + ɛ X = fixed design matrix, N p, N > p, full rank. ɛ N N ( 0, σ 2 I N ) In textbooks: 1 Variables selected 2 Data seen 3 Inference produced In common practice: 1 Data seen 2 Variables selected 3 Inference produced Is this inference valid? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 7 / 32

Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details For simplicity: Protect x and select only among z 1,...,z 10 ; interest is in inference for β. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details For simplicity: Protect x and select only among z 1,...,z 10 ; interest is in inference for β. Model selection: All-subset search with BIC among z 1,...,z 10 ; always including x. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details For simplicity: Protect x and select only among z 1,...,z 10 ; interest is in inference for β. Model selection: All-subset search with BIC among z 1,...,z 10 ; always including x. Proper coverage of a 95% CI on the slope β of x under the chosen model requires that the t-statistic is about N (0, 1) distributed. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

Evidence from a Simulation (contd.) Marginal Distribution of Post-Selection t-statistics: Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Nominal Dist. 4 2 0 2 4 t X Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 9 / 32

Evidence from a Simulation (contd.) Marginal Distribution of Post-Selection t-statistics: Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Nominal Dist. Actual Dist. 4 2 0 2 4 t X The overall coverage probability of the conventional post-selection CI is 83.5% < 95%. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 9 / 32

Evidence from a Simulation (contd.) Marginal Distribution of Post-Selection t-statistics: Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Nominal Dist. Actual Dist. 4 2 0 2 4 t X The overall coverage probability of the conventional post-selection CI is 83.5% < 95%. For p = 30, the coverage probability can be as low as 39%. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 9 / 32

The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Simultaneity is across all submodels and all slopes in them. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Simultaneity is across all submodels and all slopes in them. As a result, we obtain valid PoSI for all variable selection procedures! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Simultaneity is across all submodels and all slopes in them. As a result, we obtain valid PoSI for all variable selection procedures! But first we need some preliminaries on Targets of Inference and Inference in Wrong Models Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

PoSI What s in a word? http://www.thefreedictionary.com/posies Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 11 / 32

Submodels Notation, Parameters, Assumptions Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. The LS estimators in the submodel M are ˆβ M = ( X T MX M ) 1X T M Y IR m Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. The LS estimators in the submodel M are ˆβ M = ( X T MX M ) 1X T M Y IR m What does ˆβ M estimate, not assuming the truth of M? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. The LS estimators in the submodel M are ˆβ M = ( X T MX M ) 1X T M Y IR m What does ˆβ M estimate, not assuming the truth of M? A: Its expectation i.e., we ask for unbiasedness. µ := E[Y] IR N arbitrary!! β M := E[ˆβ M ] = ( X T MX M ) 1X T M µ Once again: We do not assume that the submodel is correct, i.e., we allow µ X M β M! But X M β M is the best approximation to µ. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Answer: Unless X is an orthogonal design, ˆβ M is not a subset of ˆβ and βm is not a subset of β. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Answer: Unless X is an orthogonal design, ˆβ M is not a subset of ˆβ and βm is not a subset of β. Reason: Slopes, both estimates and parameters, depend on what the other predictors are in value and in meaning. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Answer: Unless X is an orthogonal design, ˆβ M is not a subset of ˆβ and βm is not a subset of β. Reason: Slopes, both estimates and parameters, depend on what the other predictors are in value and in meaning. Message: ˆβ M does not estimate full-model parameters! (Exception: The full model is causal or data generating... Submodel estimates suffer then from omitted variables bias. ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 But a regression of LoP on Age and Income indicates that, adjusted for Income, younger customers have higher LoP: β Age Income < 0 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 But a regression of LoP on Age and Income indicates that, adjusted for Income, younger customers have higher LoP: β Age Income < 0 Enabling factor: (partial) collinearity between Age and Income. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 But a regression of LoP on Age and Income indicates that, adjusted for Income, younger customers have higher LoP: β Age Income < 0 Enabling factor: (partial) collinearity between Age and Income. A case of Simpson s paradox: β Age > 0 > β Age Income. The marginal and the Income-adjusted slope have very different values and different meanings. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

Adjustment, Estimates, Parameters, t-statistics Notation and facts for the components of ˆβ M and β M, assuming j M: Let Xj M be the predictor X j adjusted for the other predictors in M: X j M := ( ) I H M {j} Xj X k k M {j}. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 15 / 32

Adjustment, Estimates, Parameters, t-statistics Notation and facts for the components of ˆβ M and β M, assuming j M: Let Xj M be the predictor X j adjusted for the other predictors in M: X j M := ( ) I H M {j} Xj X k k M {j}. Let ˆβj M be the slope estimate and β j M be the parameter for X j in M: ˆβ j M := XT j MY X j M 2, β j M := XT j ME[Y] X j M 2. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 15 / 32

Adjustment, Estimates, Parameters, t-statistics Notation and facts for the components of ˆβ M and β M, assuming j M: Let Xj M be the predictor X j adjusted for the other predictors in M: X j M := ( ) I H M {j} Xj X k k M {j}. Let ˆβj M be the slope estimate and β j M be the parameter for X j in M: ˆβ j M := XT j MY X j M 2, β j M := XT j ME[Y] X j M 2. Let tj M be the t-statistic for ˆβ j M and β j M : t j M := ˆβ j M β j M ˆσ/ X j M = XT j M(Y E[Y]). X j M ˆσ Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 15 / 32

Parameters One More Time Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

Parameters One More Time Once more: If the predictors are partly collinear (non-orthogonal) then M M β j M β j M in value and in meaning. Motto: A difference in adjustment implies a difference in parameters. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

Parameters One More Time Once more: If the predictors are partly collinear (non-orthogonal) then M M β j M β j M in value and in meaning. Motto: A difference in adjustment implies a difference in parameters. It follows that there are up to p 2 p 1 different parameters β j M! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

Parameters One More Time Once more: If the predictors are partly collinear (non-orthogonal) then M M β j M β j M in value and in meaning. Motto: A difference in adjustment implies a difference in parameters. It follows that there are up to p 2 p 1 different parameters β j M! However, they are intrinsically p-dimensional: β M = (X T MX M ) 1 X T M X β where X and β are from the full model. Hence each β j M is a lin. comb. of the full model parameters β 1,..., β p. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

Geometry of Adjustment P 2 x 2.1 O φ 0 x 2 x 1 P 1 Column space of X for p =2 predictors, partly collinear x 1.2 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 17 / 32

Error Estimates ˆσ 2 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Answer: ˆσ 2 Full will be inflated and inference will be conservative. But better estimates are available if... Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Answer: ˆσ 2 Full will be inflated and inference will be conservative. But better estimates are available if... exact replicates exist: use ˆσ 2 from the 1-way ANOVA of replicates; a larger than the full model can be assumed 1st order correct: use ˆσ 2 Large ; a previous dataset provided a valid estimate: use ˆσ 2 previous ; nonparametric estimates are available: use ˆσ 2 nonpar (Hall and Carroll 1989). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Answer: ˆσ 2 Full will be inflated and inference will be conservative. But better estimates are available if... exact replicates exist: use ˆσ 2 from the 1-way ANOVA of replicates; a larger than the full model can be assumed 1st order correct: use ˆσ 2 Large ; a previous dataset provided a valid estimate: use ˆσ 2 previous ; nonparametric estimates are available: use ˆσ 2 nonpar (Hall and Carroll 1989). PS: In the fashionable p > N literature, what is their ˆσ 2? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

Statistical Inference under First Order Incorrectness Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) No assumption is made that the submodels are 1st order correct; Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) No assumption is made that the submodels are 1st order correct; Even the full model may be 1st order incorrect if a valid ˆσ 2 is otherwise available; Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) No assumption is made that the submodels are 1st order correct; Even the full model may be 1st order incorrect if a valid ˆσ 2 is otherwise available; A single error estimate opens up the possibility of simultaneous inference across submodels. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

Variable Selection Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Conditional on j ˆM, the target component βj ˆM(Y) has random meanings. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Conditional on j ˆM, the target component βj ˆM(Y) has random meanings. When j / ˆM both βj ˆM and ˆβ j ˆM are undefined. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Conditional on j ˆM, the target component βj ˆM(Y) has random meanings. When j / ˆM both βj ˆM and ˆβ j ˆM are undefined. Hence the coverage probability P[ βj ˆM CI j ˆM (K ) ] is undefined. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

Universal Post-Selection Inference Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Universal Post-Selection Inference Candidates for meaningful coverage probabilities: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Overcoming the next difficulty: Problem: None of the above coverage probabilities are known or can be estimated for most selection procedures ˆM. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Overcoming the next difficulty: Problem: None of the above coverage probabilities are known or can be estimated for most selection procedures ˆM. Solution: Ask for more! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Overcoming the next difficulty: Problem: None of the above coverage probabilities are known or can be estimated for most selection procedures ˆM. Solution: Ask for more! Universal Post-Selection Inference for all selection procedures is doable. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

Reduction to Simultaneous Inference Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 22 / 32

Reduction to Simultaneous Inference Lemma For any variable selection procedure ˆM = ˆM(Y), we have the following significant triviality bound : max t j ˆM max max t j M Y, µ IR N. j ˆM M j M Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 22 / 32

Reduction to Simultaneous Inference Lemma For any variable selection procedure ˆM = ˆM(Y), we have the following significant triviality bound : Theorem max t j ˆM max max t j M Y, µ IR N. j ˆM M j M Let K be the 1 α quantile of the max-max- t statistic of the lemma: P [ max max t j M K ] ( ) = 1 α. M j M Then we have the following universal PoSI guarantee: P [ β j ˆM CI j ˆM (K ) j ˆM ] 1 α ˆM. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 22 / 32

PoSI Geometry Simultaneity P 2 x 2.1 O φ 0 x 2 x 1 P 1 PoSI polytope = intersection of all t-bands. x 1.2 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 23 / 32

How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR A crude name: Significance Hunting a special case of p-hacking (Simmons, Nelson, Simonsohn 2011) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR A crude name: Significance Hunting a special case of p-hacking (Simmons, Nelson, Simonsohn 2011) SPAR requires the full PoSI protection by construction! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR A crude name: Significance Hunting a special case of p-hacking (Simmons, Nelson, Simonsohn 2011) SPAR requires the full PoSI protection by construction! How realistic is SPAR in describing real data analysts behaviors? It ignores the goodness of fit of the selected model. It looks for the minimal achievable p-value / strongest effect. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

Computing PoSI Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p 1 2 3 4 5 6 7 8 9 10 # t j M 1 4 12 32 80 192 448 1, 024 2, 304 5, 120 p 11 12 13 14 15 16 17 18 19 20 # t j M 11, 264 24, 576 53, 248 114, 688 245, 760 524, 288 1, 114, 112 2, 359, 296 4, 980, 736 10, 485, 760 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p 1 2 3 4 5 6 7 8 9 10 # t j M 1 4 12 32 80 192 448 1, 024 2, 304 5, 120 p 11 12 13 14 15 16 17 18 19 20 # t j M 11, 264 24, 576 53, 248 114, 688 245, 760 524, 288 1, 114, 112 2, 359, 296 4, 980, 736 10, 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p 1 2 3 4 5 6 7 8 9 10 # t j M 1 4 12 32 80 192 448 1, 024 2, 304 5, 120 p 11 12 13 14 15 16 17 18 19 20 # t j M 11, 264 24, 576 53, 248 114, 688 245, 760 524, 288 1, 114, 112 2, 359, 296 4, 980, 736 10, 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Computations are specific to a design X: K PoSI = K PoSI (X, α, df ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p 1 2 3 4 5 6 7 8 9 10 # t j M 1 4 12 32 80 192 448 1, 024 2, 304 5, 120 p 11 12 13 14 15 16 17 18 19 20 # t j M 11, 264 24, 576 53, 248 114, 688 245, 760 524, 288 1, 114, 112 2, 359, 296 4, 980, 736 10, 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Computations are specific to a design X: K PoSI = K PoSI (X, α, df ) Computations depend only on the inner product matrix X T X. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p 1 2 3 4 5 6 7 8 9 10 # t j M 1 4 12 32 80 192 448 1, 024 2, 304 5, 120 p 11 12 13 14 15 16 17 18 19 20 # t j M 11, 264 24, 576 53, 248 114, 688 245, 760 524, 288 1, 114, 112 2, 359, 296 4, 980, 736 10, 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Computations are specific to a design X: K PoSI = K PoSI (X, α, df ) Computations depend only on the inner product matrix X T X. The limiting factor is p (N may only matter for ˆσ 2 ). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32