PoSI Valid Post-Selection Inference

Size: px
Start display at page:

Download "PoSI Valid Post-Selection Inference"

Transcription

1 PoSI Valid Post-Selection Inference Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia, USA UF 2014/01/18

2 Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

3 Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

4 Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

5 Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Simmons, Nelson, Simonsohn (2011, Psychol.Sci): False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

6 Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Simmons, Nelson, Simonsohn (2011, Psychol.Sci): False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Many potential causes two major ones: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

7 Larger Problem: Non-Reproducible Empirical Findings Indicators of a problem (from: Berger, 2012, Reproducibility of Science: P-values and Multiplicity ) Bayer Healthcare reviewed 67 in-house attempts at replicating findings in published research: < 1/4 were viewed as replicated. Arrowsmith (2011, Nat. Rev. Drug Discovery 10): Increasing failure rate in Phase II drug trials Ioannidis (2005, PLOS Medicine): Why Most Published Research Findings Are False Simmons, Nelson, Simonsohn (2011, Psychol.Sci): False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Many potential causes two major ones: publication bias: file drawer problem (Rosenthal 1979) statistical biases: researcher degrees of freedom (SNS 2011) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 2 / 32

8 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

9 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

10 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

11 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

12 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

13 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

14 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: All three modes of model selection may be used in much empirical research. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

15 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: All three modes of model selection may be used in much empirical research. Ironically, the most thorough and competent data analysts may also be the ones who produce the most spurious findings. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

16 Statistical Biases one among several Hypothesis: A statistical bias is due to an absence of accounting for model/variable selection. Model selection is done on several levels: formal selection: AIC, BIC, Lasso,... informal selection: residual plots, influence diagnostics,... post hoc selection: The effect size is too small in relation to the cost of data collection to warrant inclusion of this predictor. Suspicions: All three modes of model selection may be used in much empirical research. Ironically, the most thorough and competent data analysts may also be the ones who produce the most spurious findings. If we develop valid post-selection inference for adaptive Lasso, say, it won t solve the problem because few empirical researchers would commit themselves a priori to one formal selection method and nothing else. Meta-Selection Problem Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 3 / 32

17 The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

18 The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Conventional inference after variable selection ignores the fact that the model was obtained through a stochastic selection process. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

19 The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Conventional inference after variable selection ignores the fact that the model was obtained through a stochastic selection process. Stochastic variable selection distorts sampling distributions of the post-selection parameter estimates: Most selection procedures search for strong, hence highly significant looking predictors. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

20 The Problem of Post-Selection Inference How can Variable Selection invalidate Conventional Inference? Conventional inference after variable selection ignores the fact that the model was obtained through a stochastic selection process. Stochastic variable selection distorts sampling distributions of the post-selection parameter estimates: Most selection procedures search for strong, hence highly significant looking predictors. Some forms of the problem has been known for decades: Koopmans (1949); Buehler and Fedderson (1963); Brown (1967); and Olshen (1973); Sen (1979); Sen and Saleh (1987); Dijkstra and Veldkamp (1988); Arabatzis et al. (1989); Hurvich and Tsai (1990); Regal and Hook (1991); Pötscher (1991); Chiou and Han (1995a,b); Giles (1992); Giles and Srivastava (1993); Kabaila (1998); Brockwell and Gordeon (2001); Leeb and Pötscher (2003; 2005; 2006a; 2006b; 2008a; 2008b); Kabaila (2005); Kabaila and Leeb (2006): Berk, Brown and Zhao (2009); Kabaila (2009). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 4 / 32

21 Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

22 Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Response: log-length of sentences Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

23 Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Response: log-length of sentences p = 11 covariates (predictors, explanatory variables): race gender initial age marital status employment status seriousness of crime psychological problems education drug related alcohol usage prior record Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

24 Example: Length of Criminal Sentence Question: What covariates predict length of a criminal sentence best? A small empirical study: N = 250 observations. Response: log-length of sentences p = 11 covariates (predictors, explanatory variables): race gender initial age marital status employment status seriousness of crime psychological problems education drug related alcohol usage prior record What variables should be included? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 5 / 32

25 Example: Length of Criminal Sentence (contd.) All-subset search with BIC chooses a model ˆM with seven variables: initial age gender employment status seriousness of crime drugs related alcohol usage prior records Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 6 / 32

26 Example: Length of Criminal Sentence (contd.) All-subset search with BIC chooses a model ˆM with seven variables: initial age gender employment status seriousness of crime drugs related alcohol usage prior records t-statistics of selected covariates, in descending order: talcohol = 3.95; tprior records = 3.59; tseriousness = 3.57; tdrugs = 3.31; temployment = 3.04; tinitial age = 2.56; tgender = Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 6 / 32

27 Example: Length of Criminal Sentence (contd.) All-subset search with BIC chooses a model ˆM with seven variables: initial age gender employment status seriousness of crime drugs related alcohol usage prior records t-statistics of selected covariates, in descending order: talcohol = 3.95; tprior records = 3.59; tseriousness = 3.57; tdrugs = 3.31; temployment = 3.04; tinitial age = 2.56; tgender = Can we use the cutoff t.975,250 8 = 1.97? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 6 / 32

28 Linear Model Inference and Variable Selection Y = Xβ + ɛ X = fixed design matrix, N p, N > p, full rank. ɛ N N ( 0, σ 2 I N ) In textbooks: 1 Variables selected 2 Data seen 3 Inference produced In common practice: 1 Data seen 2 Variables selected 3 Inference produced Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 7 / 32

29 Linear Model Inference and Variable Selection Y = Xβ + ɛ X = fixed design matrix, N p, N > p, full rank. ɛ N N ( 0, σ 2 I N ) In textbooks: 1 Variables selected 2 Data seen 3 Inference produced In common practice: 1 Data seen 2 Variables selected 3 Inference produced Is this inference valid? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 7 / 32

30 Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

31 Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details For simplicity: Protect x and select only among z 1,...,z 10 ; interest is in inference for β. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

32 Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details For simplicity: Protect x and select only among z 1,...,z 10 ; interest is in inference for β. Model selection: All-subset search with BIC among z 1,...,z 10 ; always including x. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

33 Evidence from a Simulation Generate Y from the following linear model: Y = βx + 10 j=1 γ j z j + ɛ, where p = 11, N = 250, and ɛ N (0, I) iid. More Details For simplicity: Protect x and select only among z 1,...,z 10 ; interest is in inference for β. Model selection: All-subset search with BIC among z 1,...,z 10 ; always including x. Proper coverage of a 95% CI on the slope β of x under the chosen model requires that the t-statistic is about N (0, 1) distributed. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 8 / 32

34 Evidence from a Simulation (contd.) Marginal Distribution of Post-Selection t-statistics: Density Nominal Dist t X Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 9 / 32

35 Evidence from a Simulation (contd.) Marginal Distribution of Post-Selection t-statistics: Density Nominal Dist. Actual Dist t X The overall coverage probability of the conventional post-selection CI is 83.5% < 95%. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 9 / 32

36 Evidence from a Simulation (contd.) Marginal Distribution of Post-Selection t-statistics: Density Nominal Dist. Actual Dist t X The overall coverage probability of the conventional post-selection CI is 83.5% < 95%. For p = 30, the coverage probability can be as low as 39%. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 9 / 32

37 The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

38 The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

39 The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

40 The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Simultaneity is across all submodels and all slopes in them. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

41 The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Simultaneity is across all submodels and all slopes in them. As a result, we obtain valid PoSI for all variable selection procedures! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

42 The PoSI Procedure Rough Outline We propose to construct Post Selection Inference (PoSI) with guarantees for the coverage of CIs and Type I errors of tests. We widen CIs and retention intervals to achieve correct/conservative post-selection coverage probabilities. This is the price we have to pay. The approach is a reduction of PoSI to simultaneous inference. Simultaneity is across all submodels and all slopes in them. As a result, we obtain valid PoSI for all variable selection procedures! But first we need some preliminaries on Targets of Inference and Inference in Wrong Models Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 10 / 32

43 PoSI What s in a word? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 11 / 32

44 Submodels Notation, Parameters, Assumptions Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

45 Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

46 Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. The LS estimators in the submodel M are ˆβ M = ( X T MX M ) 1X T M Y IR m Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

47 Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. The LS estimators in the submodel M are ˆβ M = ( X T MX M ) 1X T M Y IR m What does ˆβ M estimate, not assuming the truth of M? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

48 Submodels Notation, Parameters, Assumptions Denote a submodel by the integers M = {j 1, j 2,..., j m } for the predictors: X M = ( X j1, X j2,..., X jm ) IR N m. The LS estimators in the submodel M are ˆβ M = ( X T MX M ) 1X T M Y IR m What does ˆβ M estimate, not assuming the truth of M? A: Its expectation i.e., we ask for unbiasedness. µ := E[Y] IR N arbitrary!! β M := E[ˆβ M ] = ( X T MX M ) 1X T M µ Once again: We do not assume that the submodel is correct, i.e., we allow µ X M β M! But X M β M is the best approximation to µ. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 12 / 32

49 Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

50 Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Answer: Unless X is an orthogonal design, ˆβ M is not a subset of ˆβ and βm is not a subset of β. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

51 Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Answer: Unless X is an orthogonal design, ˆβ M is not a subset of ˆβ and βm is not a subset of β. Reason: Slopes, both estimates and parameters, depend on what the other predictors are in value and in meaning. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

52 Submodels versus the Full Model Confusions Abbreviate ˆβ := ˆβ MF and β := β MF, M F = {1, 2,..., p} = full model. Questions: How do submodel estimates ˆβM relate to full-model estimates ˆβ? How do submodel parameters βm relate to full-model parameters β? Is ˆβ M a subset of ˆβ and β M a subset of β? Answer: Unless X is an orthogonal design, ˆβ M is not a subset of ˆβ and βm is not a subset of β. Reason: Slopes, both estimates and parameters, depend on what the other predictors are in value and in meaning. Message: ˆβ M does not estimate full-model parameters! (Exception: The full model is causal or data generating... Submodel estimates suffer then from omitted variables bias. ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 13 / 32

53 Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

54 Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

55 Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

56 Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

57 Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 But a regression of LoP on Age and Income indicates that, adjusted for Income, younger customers have higher LoP: β Age Income < 0 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

58 Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 But a regression of LoP on Age and Income indicates that, adjusted for Income, younger customers have higher LoP: β Age Income < 0 Enabling factor: (partial) collinearity between Age and Income. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

59 Slopes depend on... An Illustration Survey of potential purchasers of a new high-tech gizmo: Response: LoP = Likelihood of Purchase (self-reported on a Likert scale) Predictor 1: Age Predictor 2: Income Expectation: Younger customers have higher LoP, that is, β Age < 0. Outcome of the analysis: Against expectations, a regression of LoP on Age alone indicates that older customers have higher LoP: β Age > 0 But a regression of LoP on Age and Income indicates that, adjusted for Income, younger customers have higher LoP: β Age Income < 0 Enabling factor: (partial) collinearity between Age and Income. A case of Simpson s paradox: β Age > 0 > β Age Income. The marginal and the Income-adjusted slope have very different values and different meanings. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 14 / 32

60 Adjustment, Estimates, Parameters, t-statistics Notation and facts for the components of ˆβ M and β M, assuming j M: Let Xj M be the predictor X j adjusted for the other predictors in M: X j M := ( ) I H M {j} Xj X k k M {j}. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 15 / 32

61 Adjustment, Estimates, Parameters, t-statistics Notation and facts for the components of ˆβ M and β M, assuming j M: Let Xj M be the predictor X j adjusted for the other predictors in M: X j M := ( ) I H M {j} Xj X k k M {j}. Let ˆβj M be the slope estimate and β j M be the parameter for X j in M: ˆβ j M := XT j MY X j M 2, β j M := XT j ME[Y] X j M 2. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 15 / 32

62 Adjustment, Estimates, Parameters, t-statistics Notation and facts for the components of ˆβ M and β M, assuming j M: Let Xj M be the predictor X j adjusted for the other predictors in M: X j M := ( ) I H M {j} Xj X k k M {j}. Let ˆβj M be the slope estimate and β j M be the parameter for X j in M: ˆβ j M := XT j MY X j M 2, β j M := XT j ME[Y] X j M 2. Let tj M be the t-statistic for ˆβ j M and β j M : t j M := ˆβ j M β j M ˆσ/ X j M = XT j M(Y E[Y]). X j M ˆσ Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 15 / 32

63 Parameters One More Time Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

64 Parameters One More Time Once more: If the predictors are partly collinear (non-orthogonal) then M M β j M β j M in value and in meaning. Motto: A difference in adjustment implies a difference in parameters. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

65 Parameters One More Time Once more: If the predictors are partly collinear (non-orthogonal) then M M β j M β j M in value and in meaning. Motto: A difference in adjustment implies a difference in parameters. It follows that there are up to p 2 p 1 different parameters β j M! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

66 Parameters One More Time Once more: If the predictors are partly collinear (non-orthogonal) then M M β j M β j M in value and in meaning. Motto: A difference in adjustment implies a difference in parameters. It follows that there are up to p 2 p 1 different parameters β j M! However, they are intrinsically p-dimensional: β M = (X T MX M ) 1 X T M X β where X and β are from the full model. Hence each β j M is a lin. comb. of the full model parameters β 1,..., β p. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 16 / 32

67 Geometry of Adjustment P 2 x 2.1 O φ 0 x 2 x 1 P 1 Column space of X for p =2 predictors, partly collinear x 1.2 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 17 / 32

68 Error Estimates ˆσ 2 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

69 Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

70 Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

71 Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

72 Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

73 Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Answer: ˆσ 2 Full will be inflated and inference will be conservative. But better estimates are available if... Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

74 Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Answer: ˆσ 2 Full will be inflated and inference will be conservative. But better estimates are available if... exact replicates exist: use ˆσ 2 from the 1-way ANOVA of replicates; a larger than the full model can be assumed 1st order correct: use ˆσ 2 Large ; a previous dataset provided a valid estimate: use ˆσ 2 previous ; nonparametric estimates are available: use ˆσ 2 nonpar (Hall and Carroll 1989). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

75 Error Estimates ˆσ 2 Important: To enable simultaneous inference for all t j M, do not use the error estimate ///// ˆσ 2 M := Y X M ˆβM 2 /(n m) in M; (the selected model M may well be 1st order wrong;) instead, for all models M use ˆσ 2 = ˆσ 2 Full from the full model; = t j M will have a t-distribution with the same dfs M, j M. What if even the full model is 1st order wrong? Answer: ˆσ 2 Full will be inflated and inference will be conservative. But better estimates are available if... exact replicates exist: use ˆσ 2 from the 1-way ANOVA of replicates; a larger than the full model can be assumed 1st order correct: use ˆσ 2 Large ; a previous dataset provided a valid estimate: use ˆσ 2 previous ; nonparametric estimates are available: use ˆσ 2 nonpar (Hall and Carroll 1989). PS: In the fashionable p > N literature, what is their ˆσ 2? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 18 / 32

76 Statistical Inference under First Order Incorrectness Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

77 Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

78 Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

79 Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) No assumption is made that the submodels are 1st order correct; Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

80 Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) No assumption is made that the submodels are 1st order correct; Even the full model may be 1st order incorrect if a valid ˆσ 2 is otherwise available; Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

81 Statistical Inference under First Order Incorrectness Statistical inference, one parameter at a time: If r = dfs in ˆσ 2 and K = t 1 α/2,r, then the confidence intervals CI j M(K ) := [ ˆβ j M ± K ˆσ/ X j M ] satisfy each P[ β j M CI j M(K ) ] = 1 α. Achieved so far: Y = µ + ɛ, ɛ N N ( 0, σ 2 I ) No assumption is made that the submodels are 1st order correct; Even the full model may be 1st order incorrect if a valid ˆσ 2 is otherwise available; A single error estimate opens up the possibility of simultaneous inference across submodels. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 19 / 32

82 Variable Selection Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

83 Variable Selection What is a variable selection procedure? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

84 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

85 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

86 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

87 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

88 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

89 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

90 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

91 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Conditional on j ˆM, the target component βj ˆM(Y) has random meanings. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

92 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Conditional on j ˆM, the target component βj ˆM(Y) has random meanings. When j / ˆM both βj ˆM and ˆβ j ˆM are undefined. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

93 Variable Selection What is a variable selection procedure? A map Y ˆM = ˆM(Y), IR N P({1,...p}) ˆM divides the response space IR N into up to 2 p subsets. In a fixed-predictor framework, selection purely based on X does not invalidate inference (example: deselect predictors based on VIF, H,...). Facing up to post-selection inference: Confusers! Target of Inference: the vector β ˆM(Y), its components β j ˆM(Y) for j ˆM(Y). The target of inference is random. The target of inference has a random dimension: β ˆM(Y) IR ˆM(Y) Conditional on j ˆM, the target component βj ˆM(Y) has random meanings. When j / ˆM both βj ˆM and ˆβ j ˆM are undefined. Hence the coverage probability P[ βj ˆM CI j ˆM (K ) ] is undefined. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 20 / 32

94 Universal Post-Selection Inference Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

95 Universal Post-Selection Inference Candidates for meaningful coverage probabilities: Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

96 Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

97 Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

98 Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

99 Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Overcoming the next difficulty: Problem: None of the above coverage probabilities are known or can be estimated for most selection procedures ˆM. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

100 Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Overcoming the next difficulty: Problem: None of the above coverage probabilities are known or can be estimated for most selection procedures ˆM. Solution: Ask for more! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

101 Universal Post-Selection Inference Candidates for meaningful coverage probabilities: P[ j ˆM & βj ˆM CI j ˆM (K ) ] ( P[ j ˆM ]) P[ βj ˆM CI j ˆM (K ) j ˆM ] (P[ j ˆM ] =???) P[ j ˆM : βj ˆM CI j ˆM (K ) ] All are meaningful; the last will be our choice. Overcoming the next difficulty: Problem: None of the above coverage probabilities are known or can be estimated for most selection procedures ˆM. Solution: Ask for more! Universal Post-Selection Inference for all selection procedures is doable. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 21 / 32

102 Reduction to Simultaneous Inference Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 22 / 32

103 Reduction to Simultaneous Inference Lemma For any variable selection procedure ˆM = ˆM(Y), we have the following significant triviality bound : max t j ˆM max max t j M Y, µ IR N. j ˆM M j M Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 22 / 32

104 Reduction to Simultaneous Inference Lemma For any variable selection procedure ˆM = ˆM(Y), we have the following significant triviality bound : Theorem max t j ˆM max max t j M Y, µ IR N. j ˆM M j M Let K be the 1 α quantile of the max-max- t statistic of the lemma: P [ max max t j M K ] ( ) = 1 α. M j M Then we have the following universal PoSI guarantee: P [ β j ˆM CI j ˆM (K ) j ˆM ] 1 α ˆM. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 22 / 32

105 PoSI Geometry Simultaneity P 2 x 2.1 O φ 0 x 2 x 1 P 1 PoSI polytope = intersection of all t-bands. x 1.2 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 23 / 32

106 How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

107 How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

108 How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

109 How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR A crude name: Significance Hunting a special case of p-hacking (Simmons, Nelson, Simonsohn 2011) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

110 How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR A crude name: Significance Hunting a special case of p-hacking (Simmons, Nelson, Simonsohn 2011) SPAR requires the full PoSI protection by construction! Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

111 How Conservative is PoSI? Is there a model selection procedure that requires full PoSI protection? Consider ˆM defined as follows: ˆM := argmaxm max j M t j M A polite name: Single Predictor Adjusted Regression =: SPAR A crude name: Significance Hunting a special case of p-hacking (Simmons, Nelson, Simonsohn 2011) SPAR requires the full PoSI protection by construction! How realistic is SPAR in describing real data analysts behaviors? It ignores the goodness of fit of the selected model. It looks for the minimal achievable p-value / strongest effect. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 24 / 32

112 Computing PoSI Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

113 Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

114 Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p # t j M , 024 2, 304 5, 120 p # t j M 11, , , , , , 288 1, 114, 112 2, 359, 296 4, 980, , 485, 760 Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

115 Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p # t j M , 024 2, 304 5, 120 p # t j M 11, , , , , , 288 1, 114, 112 2, 359, 296 4, 980, , 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

116 Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p # t j M , 024 2, 304 5, 120 p # t j M 11, , , , , , 288 1, 114, 112 2, 359, 296 4, 980, , 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Computations are specific to a design X: K PoSI = K PoSI (X, α, df ) Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

117 Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p # t j M , 024 2, 304 5, 120 p # t j M 11, , , , , , 288 1, 114, 112 2, 359, 296 4, 980, , 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Computations are specific to a design X: K PoSI = K PoSI (X, α, df ) Computations depend only on the inner product matrix X T X. Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

118 Computing PoSI The simultaneity challenge: there are p 2 p 1 statistics t j M. p # t j M , 024 2, 304 5, 120 p # t j M 11, , , , , , 288 1, 114, 112 2, 359, 296 4, 980, , 485, 760 Monte Carlo-approximation of K PoSI in R, brute force, for p 20. Computations are specific to a design X: K PoSI = K PoSI (X, α, df ) Computations depend only on the inner product matrix X T X. The limiting factor is p (N may only matter for ˆσ 2 ). Andreas Buja (Wharton, UPenn) PoSI Valid Post-Selection Inference 2014/01/18 25 / 32

PoSI and its Geometry

PoSI and its Geometry PoSI and its Geometry Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia, USA Simon Fraser

More information

Construction of PoSI Statistics 1

Construction of PoSI Statistics 1 Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,

More information

Post-Selection Inference for Models that are Approximations

Post-Selection Inference for Models that are Approximations Post-Selection Inference for Models that are Approximations Andreas Buja joint work with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin, Dan

More information

VALID POST-SELECTION INFERENCE

VALID POST-SELECTION INFERENCE Submitted to the Annals of Statistics VALID POST-SELECTION INFERENCE By Richard Berk, Lawrence Brown,, Andreas Buja, Kai Zhang and Linda Zhao The Wharton School, University of Pennsylvania It is common

More information

Variable Selection Insurance aka Valid Statistical Inference After Model Selection

Variable Selection Insurance aka Valid Statistical Inference After Model Selection Variable Selection Insurance aka Valid Statistical Inference After Model Selection L. Brown Wharton School, Univ. of Pennsylvania with Richard Berk, Andreas Buja, Ed George, Michael Traskin, Linda Zhao,

More information

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail

More information

VALID POST-SELECTION INFERENCE

VALID POST-SELECTION INFERENCE Submitted to the Annals of Statistics VALID POST-SELECTION INFERENCE By Richard Berk, Lawrence Brown,, Andreas Buja, Kai Zhang and Linda Zhao The Wharton School, University of Pennsylvania It is common

More information

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory Andreas Buja joint with the PoSI Group: Richard Berk, Lawrence Brown, Linda Zhao, Kai Zhang Ed George, Mikhail Traskin, Emil Pitkin,

More information

On Post-selection Confidence Intervals in Linear Regression

On Post-selection Confidence Intervals in Linear Regression Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Spring 5-2017 On Post-selection Confidence Intervals in Linear

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Mallows Cp for Out-of-sample Prediction

Mallows Cp for Out-of-sample Prediction Mallows Cp for Out-of-sample Prediction Lawrence D. Brown Statistics Department, Wharton School, University of Pennsylvania lbrown@wharton.upenn.edu WHOA-PSI conference, St. Louis, Oct 1, 2016 Joint work

More information

Recent Developments in Post-Selection Inference

Recent Developments in Post-Selection Inference Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Generalized Cp (GCp) in a Model Lean Framework

Generalized Cp (GCp) in a Model Lean Framework Generalized Cp (GCp) in a Model Lean Framework Linda Zhao University of Pennsylvania Dedicated to Lawrence Brown (1940-2018) September 9th, 2018 WHOA 3 Joint work with Larry Brown, Juhui Cai, Arun Kumar

More information

Lawrence D. Brown* and Daniel McCarthy*

Lawrence D. Brown* and Daniel McCarthy* Comments on the paper, An adaptive resampling test for detecting the presence of significant predictors by I. W. McKeague and M. Qian Lawrence D. Brown* and Daniel McCarthy* ABSTRACT: This commentary deals

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn.

Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn. Discussion of Estimation and Accuracy After Model Selection By Bradley Efron Discussant: Lawrence D Brown* Statistics Department, Wharton, Univ. of Penn. lbrown@wharton.upenn.edu JSM, Boston, Aug. 4, 2014

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to the Annals of Statistics PREDICTION ERROR AFTER MODEL SEARCH By Xiaoying Tian Harris, Department of Statistics, Stanford University arxiv:1610.06107v math.st 9 Feb 017 Estimation of the prediction

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression

On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Working Paper 2016:1 Department of Statistics On the equivalence of confidence interval estimation based on frequentist model averaging and least-squares of the full model in linear regression Sebastian

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

School of Mathematical Sciences. Question 1. Best Subsets Regression

School of Mathematical Sciences. Question 1. Best Subsets Regression School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 9 and Assignment 8 Solutions Question 1 Best Subsets Regression Response is Crime I n W c e I P a n A E P U U l e Mallows g E P

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Impact of serial correlation structures on random effect misspecification with the linear mixed model. Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

A Non-parametric bootstrap for multilevel models

A Non-parametric bootstrap for multilevel models A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is

More information

Lecture 17 May 11, 2018

Lecture 17 May 11, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 17 May 11, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes and Zhimei Ren) 1 Outline Agenda: Topics in selective inference 1. Inference After Model

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Correlation and Regression Bangkok, 14-18, Sept. 2015

Correlation and Regression Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H. ACE 564 Spring 2006 Lecture 8 Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information by Professor Scott H. Irwin Readings: Griffiths, Hill and Judge. "Collinear Economic Variables,

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Knockoffs as Post-Selection Inference

Knockoffs as Post-Selection Inference Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University January 23, 2012 Joint work with L. Keele (Penn State)

More information

Selective Inference for Effect Modification: An Empirical Investigation

Selective Inference for Effect Modification: An Empirical Investigation Observational Studies () Submitted ; Published Selective Inference for Effect Modification: An Empirical Investigation Qingyuan Zhao Department of Statistics The Wharton School, University of Pennsylvania

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?) 12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression

More information

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University February 23, 2012 Joint work with L. Keele (Penn State)

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Exact Post Model Selection Inference for Marginal Screening

Exact Post Model Selection Inference for Marginal Screening Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton. 1/17 Research Methods Carlos Noton Term 2-2012 Outline 2/17 1 Econometrics in a nutshell: Variation and Identification 2 Main Assumptions 3/17 Dependent variable or outcome Y is the result of two forces:

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Regression Discontinuity Designs

Regression Discontinuity Designs Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational

More information

Quantitative Methods I: Regression diagnostics

Quantitative Methods I: Regression diagnostics Quantitative Methods I: Regression University College Dublin 10 December 2014 1 Assumptions and errors 2 3 4 Outline Assumptions and errors 1 Assumptions and errors 2 3 4 Assumptions: specification Linear

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information