Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Size: px

Start display at page:

Download "Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities"

Malcolm Townsend
5 years ago
Views:

1 Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus

2 Outline 1 Intro and background 2 Framework: quadratic model selection events 3 Implementation: forward stepwise with groups 4 Conclusion

3 Intro and background First the results (forward stepwise with groups) Simulation setup: X with x ij i.i.d. N(0, 1) Predictors are grouped a priori. Groups of size 2, the first and second columns, the third and fourth, etc. Y = X β + ɛ with ɛ i i.i.d. N(0, 1) β 1 = = β 4 = 2, β i = 0 for i > 4. I.e. group sparsity is 2. Normalize X so groups have Frobenius norm 1. Run forward stepwise for 4 steps. Each step adds a group two columns to the model. Compute our T χ statistic, apply appropriate (under the null) CDF transform. Repeat for 500 realizations.

4 Intro and background ECDF of Pvalues in forward stepwise, by step y 0.50 step step1 step2 step3 step Pvalue

5 Intro and background Null vs. non-null added variable y 0.50 null FALSE TRUE Pvalue

6 Intro and background How did we adjust the classical χ 2 test? Even if the sparsity was 0 (global null), the first 4 steps would give p-values corresponding to the largest 4 out of 100 χ 2 statistics. In the same situation our T χ p-values would be uniform (correct). Let s review the backstory (briefly)

7 Intro and background Inference with LASSO model selection The covariance test Test with 1 degree of freedom corresponding to 1 variable entering the model Knots in LASSO path r.v. containing useful information Issue: asymptotics break down for groups of variables Pioneering work, new approach to combining inference with model selection Lockhart, R.; Taylor, J.; Tibshirani, R. J.; Tibshirani, R. A significance test for the lasso. Ann. Statist. 42 (2014)

8 Intro and background Applying geometry of random fields Enter Geometry: the Kac-Rice test Extend covariance test to group lasso Successful for first knot, lead to some new directions Non-asymptotic, truncated distributions instead of Exp(1) Generalized to a global null hypothesis test for a large class of penalized regression problems (including, e.g. matrix completion) Power not well-understood (Azaïs et al, 2015) Taylor, J.; Loftus, J.; Tibshirani, R. J. Tests in adaptive regression via the Kac-Rice formula. arxiv Preprint. (2014)

Intro and background Affine model selection events Inference conditional on selection Focus on fixed-λ LASSO (non-sequential) General framework for model selection events characterized by affine

9 Intro and background Affine model selection events Inference conditional on selection Focus on fixed-λ LASSO (non-sequential) General framework for model selection events characterized by affine inequalities Inference conditional on selected model. Rigorous, interpretable hypotheses Lee, J.; Sun, D.; Sun, Y.; Taylor, J. Exact post-selection inference, with application to the lasso. arxiv Preprint. (2015)

10 Intro and background Selective inference and optimality The exponential family context UMPU tests by applying Lehmann-Scheffé Define selective type I error, the error criterion we want to control P M,H0 (reject H 0 (M, H 0 ) selected) Language and notation: model-hypothesis pairs Fithian, W.; Sun, D.; Taylor, J. Optimal Inference After Model Selection. arxiv Preprint. (2014)

11 Intro and background The goal of this work Our choice is to control the selective type I error Previous work established how to do this with affine model selection events Can we extend it to some non-affine examples, e.g. quadratic?

12 Framework: quadratic model selection events Background: the affine framework Let M : R n M be a model selection map, with M denoting a space of potential models. We observe {M(y) = m} and wish to condition on this event. For many model selection procedures (forward stepwise, LAR, LASSO, elastic net, marginal screening...) the event {M(y) = m} can be written as {A(m)y b(m)} for some (A, b) which depend on y only through m. What does this give us? L(y M(y) = m) = L(y A(m)y b(m) ) on {M(y) = m} }{{}}{{} what we want simple geometry MVN constrained to a polytope.

13 Framework: quadratic model selection events Forward stepwise example, first step A forward stepwise model after s steps will have the form m s = (A s, u s ) with A s an ordered set of active variables and u s their signs (included for computational purposes). Denoting x 1 j = X j / X j, at the first step u 1 (x 1 j 1 ) T y ±(x 1 j ) T y j j 1 So A(m 1 ) has 2(p 1) rows of the form u 1 (x 1 j1 )T ± (x 1 j )T and b(m 1 ) = 0. TLTT. Exact Post-selection Inference for Forward Stepwise and Least Angle Regression. arxiv Preprint. (2014)

14 Framework: quadratic model selection events Forward stepwise example, later steps Denoting x s j = P A s 1 X j / P A s 1 X j, with P A s 1 orthogonal to X As 1, we have at the sth step the projection (u s x s j s ±x s j ) T y 0 j / A s Form A(m s ) by appending the corresponding 2(p s) rows to A(m s 1 ), so {M s (y) = m s } = {z : A(m s )z 0}. Inference from conditional law, if we can sample from the constrained MVN.

15 Framework: quadratic model selection events Quadratic events: forward stepwise with groups With predetermined groups of variables FS can add entire groups in each step. At the first step, X j1 is a submatrix with 1 columns. Now the event that j 1 is the first group is equivalent to (I X j1 X j 1 )y 2 2 (I X j X j )y 2 2 j j 1 or y T [X j1 X j 1 X j X j ]y 0 j j 1 Problem Model selection is no longer linear in y. (Drop signs from definition of model). Loftus, J.; Taylor, J. A significance test for forward stepwise model selection. arxiv preprint (2014). Acknowledgement: Léonard Blier

16 Framework: quadratic model selection events Is this problem important? Interpretation with factor models and categorical variables, e.g. inference about location on genome vs. inference about one particular variant Hierarchical models via overlapping groups, e.g. glinternet X = [ X 1 X 2 X 1 X 2 X 1:2 ] Mathematically interesting: proofs can involve non-trivial applications of random field theory

17 Framework: quadratic model selection events Quadratic framework Abstractly, we want to consider problems where {M(y) = m} = {y T Q(m)y + A(m)y b(m)} Issues to address case-by-case Getting a model selection procedure into this form (if possible) Which things to condition on (power vs. computation) Form of test statistic, interpretation of hypotheses Complicated geometry of selection region: how to sample?

18 Implementation: forward stepwise with groups Forward stepwise with groups, step 1 Before we introduce randomness, imagine we just have some fixed ordering of the groups, with j 1 the first in this order. Denote P 1 j = X j X j. Define the event that j 1 is the first group as E j 1 = {z Rn : z T [P 1 j 1 P 1 j ]z k Tr(P 1 j 1 P 1 j ) j j 1 } Using aic or bic to compare groups of different size determines a multiplier k of the penalty on model size.

19 Implementation: forward stepwise with groups Forward stepwise with groups, step s With P A s 1 the projection orthogonal to X As 1, denote X s j = P A s 1 X j and P s j = X s j (Xs j ). Now at step s, if j s is the next group we must append the conditions E js A s 1 = {z R n : z T [P s j 1 P s j ]z k Tr(P s j 1 P s j ) j / A s } Intersection of p s quadratic inequalities.

20 Implementation: forward stepwise with groups Model selection = intersection of quadratic inequalities Consider the event E that a random y yields the active set A s = {j 1,..., j s }. Denote A 0 =, and A l = {j 1,..., j l } the initial segment with the same order for any l s. Characterization of model selection E := {M(y) = A s } = s l=1 E j l A l 1 The selection event is the intersection of a list of quadratic inequalities. Depends on y only through A s.

21 Implementation: forward stepwise with groups Inference conditional on selection E is the support of L(y M(y) = A s ). Challenging to sample from. Closed form results for statistics supported on one-dimensional slices through E. Test based on S = Py 2 for some projection P (think of χ statistics). Let U = Py/S and Z = y Py. Truncation interval M S = {t 0 : M(Ut + Z) = A s }. Quadratic model selection univariate quadratics in t. M S is the support of L(S A s, Z).

22 Implementation: forward stepwise with groups Not your father s polyhedral lemma

23 Implementation: forward stepwise with groups Inference for the active groups Tests for groups in A s proceed the usual way (χ 2 test in regression). Let P j be the projection for adding group j to A s \j, and S = P j y 2. Unconditionally, S is a χ-r.v., and conditionally it is just truncated to M S. Known degrees of freedom and scale. Assumptions: saturated model with known σ Model assumption: y N(µ, σ 2 I ) with σ 2 known. Null hypothesis: Pµ = 0.

24 Implementation: forward stepwise with groups Simulation results again: when does the null hold? y 0.50 null FALSE TRUE Pvalue

25 Implementation: forward stepwise with groups Conditional on correct selection y 0.50 null FALSE TRUE Pvalue

26 Conclusion Final remarks R package: Coming soon to a CRAN near you (with Rob and Ryan Tibshirani). To do: unknown σ case, sampling from selected model (more powerful), univariate tests/intervals. Drawback: computationally expensive, after s steps we must store about s(p 1) projections, each is n n. Multiply all of these by s vectors. Can be done in parallel. Another quadratic model selection procedure: k-means. Forthcoming work with Léonard Blier and Jonathan Taylor.

27 Conclusion Recap & conclusion Selective inference for designs with groups of variables. Interpretable model selection, corrected version of classical regression tests. Computationally expensive.

28 Conclusion Recap & conclusion Selective inference for designs with groups of variables. Interpretable model selection, corrected version of classical regression tests. Computationally expensive. Thanks for your attention! I owe you a coffee. Questions? (I ll be on the job market this year!)

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,