Variable selection with error control: Another look at Stability Selection

Size: px

Start display at page:

Download "Variable selection with error control: Another look at Stability Selection"

Eustace Osborne
5 years ago
Views:

1 Variable selection with error control: Another look at Stability Selection Richard J. Samworth and Rajen D. Shah University of Cambridge RSS Journal Webinar 25 October 2017 Samworth & Shah (Cambridge) Variable selection 25 October / 28

High-dimensional data Many modern applications, e.g. in genomics, can have the number of predictors p greatly exceeding the number of observations n.

2 High-dimensional data Many modern applications, e.g. in genomics, can have the number of predictors p greatly exceeding the number of observations n. In these settings, variable selection is particularly important. (a) Microarray data (b) Sparsity Samworth & Shah (Cambridge) Variable selection 25 October / 28

3 What is Stability Selection Stability Selection (Meinshausen & Bühlmann, 2010) is a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. A key feature of Stability Selection is the error control provided in the form an upper bound on the expected number of falsely selected variables. Samworth & Shah (Cambridge) Variable selection 25 October / 28

4 A general model for variable selection Let Z 1,..., Z n be i.i.d. random vectors. We think of indices S of some components of Z i as being signal variables, and the rest N as noise variables. E.g. Z i = (X i, Y i ), with covariate vector X i R p, response Y i R and log-likelihood of the form n i=1 L(Y i, X T i β) with β R p. Then S = {k : β k 0} and N = {k : β k = 0}. Thus S {1,..., p} and N = {1,..., p} \ S. A variable selection procedure is a statistic Ŝ n := Ŝ n (Z 1,..., Z n ) taking values in the set of all subsets of {1,..., p}. Samworth & Shah (Cambridge) Variable selection 25 October / 28

5 How does Stability Selection work? For a subset A = {i 1,..., i A } {1,..., n}, write Meinshausen and Bühlmann defined ˆΠ(k) := Ŝ := Ŝ A (Z i1,..., Z i A ). ( n ) 1 n/2 1. {k Ŝ(A)} A {1,...,n}, A = n/2 Stability selection fixes τ [0, 1] and selects Ŝ SS n,τ = {k : ˆΠ(k) τ}. Samworth & Shah (Cambridge) Variable selection 25 October / 28

6 Error control of Stability Selection Assume that {1 {k Ŝ n/2 } : k N} is exchangeable, and that Ŝ n/2 is no worse than random guessing: Then, for τ ( 1 2, 1], E( Ŝ n/2 S ) E( Ŝ n/2 N ) S N. E( Ŝ n,τ SS N ) 1 (E Ŝ n/2 ) 2 2τ 1 p Samworth & Shah (Cambridge) Variable selection 25 October / 28

7 Error control discussion In principle, this theorem allows to user to choose τ based on the expected number of false positives they are willing to tolerate. However: The theorem requires two conditions, and the exchangeability assumption is very strong There are too many subsets to evaluate Ŝ SS n,τ exactly when n 30 The bound tends to be rather weak. Samworth & Shah (Cambridge) Variable selection 25 October / 28

8 Complementary Pairs Stability Selection (CPSS) Let {(A 2j 1, A 2j ) : j = 1,..., B} be randomly chosen independent pairs of subsets of {1,..., n} of size n/2 such that A 2j 1 A 2j =. Define ˆΠ B (k) := 1 2B B j=1 1 {k Ŝ(A j )} and select Ŝ CPSS n,τ := {k : ˆΠ B (k) τ}. Samworth & Shah (Cambridge) Variable selection 25 October / 28

9 Worst case error control bounds Define the selection probability of variable k to be p k,n = P(k Ŝ n ). We can divide our variables into those that have low and high selection probabilities: for θ [0, 1], let L θ := {k : p k, n/2 θ} and H θ := {k : p k, n/2 > θ}. If τ ( 1 2, 1], then Moreover, if τ [0, 1 2 ), then E Ŝ CPSS n,τ L θ θ 2τ 1 E Ŝ n/2 L θ. CPSS E ˆN n,τ H θ 1 θ 1 2τ E ˆN n/2 H θ. Samworth & Shah (Cambridge) Variable selection 25 October / 28

10 Illustration and discussion Suppose p = 1000 and q := E Ŝ n/2 = 50. On average, CPSS with τ = 0.6 selects no more than a quarter of the variables that have below average selection probability under Ŝ n/2. The theorem requires no exchangeability or random guessing conditions It holds even when B = 1 If exchangeability and random guessing conditions do hold, then we recover E Ŝn,τ CPSS N 1 q ) E Ŝ 2τ 1( p n/2 L q/p 1 2τ 1 ( q 2 p ). Samworth & Shah (Cambridge) Variable selection 25 October / 28

11 Proof Let Π B (k) := 1 B B 1 {k Ŝ(A 2j 1 )} 1 {k Ŝ(A 2j )}. j=1 Note that E{ Π B (k)} = p 2 k, n/2. Now Thus 0 1 B B (1 1 {k Ŝ(A2j 1 )} )(1 1 {k Ŝ(A 2j )} ) j=1 = 1 2ˆΠ B (k) + Π B (k). P{ˆΠ B (k) τ} P{ 1 2 (1 + Π B (k)) τ} = P{ Π B (k) 2τ 1} 1 2τ 1 p2 k, n/2. Samworth & Shah (Cambridge) Variable selection 25 October / 28

12 Proof 2 It follows that ( L θ = E E Ŝ CPSS n,τ 1 2τ 1 k:p k, n/2 θ k:p k, n/2 θ 1 {k Ŝ CPSS n,τ } ) = p 2 k, n/2 where the final inequality follows because ( E Ŝ n/2 L θ = E k:p k, n/2 θ 1 {k Ŝ n/2 } k:p k, n/2 θ P(k Ŝ CPSS n,τ )) θ 2τ 1 E Ŝ n/2 L θ, ) = k:p k, n/2 θ p k, n/2. Samworth & Shah (Cambridge) Variable selection 25 October / 28

13 Bounds with no assumptions whatsoever If Z 1,..., Z n are not identically distributed, the same bound holds, provided in L θ we redefine p k, n/2 := ( ) n 1 P{k Ŝ n/2 n/2 (A)}. A =n/2 Similarly, if Z 1,..., Z n are not independent, the same bound holds, with pk, n/2 2 as the average of over all complementary pairs A 1, A 2. P{k Ŝ n/2 (A 1 ) Ŝ n/2 (A 2 )} Samworth & Shah (Cambridge) Variable selection 25 October / 28

14 Can we improve on Markov s inequality? 0.0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

15 Can we improve on Markov s inequality? 0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k. Samworth & Shah (Cambridge) Variable selection 25 October / 28

16 Improved bound under unimodality Suppose that the distribution of Π B (k) is unimodal for each k L θ. If τ { B, B, B,..., 1}, then E Ŝ CPSS n,τ L θ C(τ, B)θE Ŝ n/2 L θ, where, when θ 1/ 3, 1 2(2τ 1 1/2B) C(τ, B) = 4(1 τ + 1/2B) 1 + 1/B if τ (min( θ2, B θ2 ), 3 4 ] if τ ( 3 4, 1]. Samworth & Shah (Cambridge) Variable selection 25 October / 28

17 Extremal distribution under unimodality 0.0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

18 Extremal distribution under unimodality Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

19 The r-concavity constraint r-concavity provides a continuum of constraints that interpolate between unimodality and log-concavity. A non-negative function f on an interval I R is r-concave with r < 0 if f r is convex on I. A pmf f on {0, 1/B,..., 1} is r-concave if the linear interpolant to {(i, f (i/b)) : i = 0, 1,..., B} is r-concave. The constraint becomes weaker as r increases to 0. Samworth & Shah (Cambridge) Variable selection 25 October / 28

20 Further improvements under r-concavity Suppose Π B (k) is r-concave for all k L θ. Then for τ = ( 1 2, 1], E Ŝ CPSS n,τ L θ D(θ 2, 2τ 1, B, r) L θ where D can be evaluated numerically. Our simulations suggest r = 1/2 is a reasonable choice. Samworth & Shah (Cambridge) Variable selection 25 October / 28

21 Extremal distribution under 1/2-concavity 0.0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

22 Extremal distribution under 1/2-concavity Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k. Samworth & Shah (Cambridge) Variable selection 25 October / 28

23 r = 1/2 is sensible Probability 1/ Typical and extremal pmfs raised to the power 1/ Probability 10-2 Tail probabilities from 0.2 onwards. Worst case Unimodal 1 2 -concave Empirical Samworth & Shah (Cambridge) Variable selection 25 October / 28

24 Reducing the threshold τ Suppose ˆΠ B (k) is 1/4-concave, and that Π B (k) is 1/2-concave for all k L θ. Then E Ŝ CPSS n,τ L θ min{d(θ 2, 2τ 1, B, 1/2), D(θ, τ, 2B, 1/2)} L θ, for all τ (θ, 1]. (We take D(, t,, ) = 1 for t 0.) Samworth & Shah (Cambridge) Variable selection 25 October / 28

25 Error bound 15 Improved bounds τ CPSS E S n,τ Figure : Comparison of the bounds on Lq/p where p = 1000, q = 50 showing the M & B (dashes), worst case (dot dash), unimodal and r -concave bounds, and the true value for a simulated example. Samworth & Shah (Cambridge) Variable selection 25 October / 28

26 Simulation study Linear model Y i = Xi T β + ε i with X i N p (0, Σ). Toeplitz covariance Σ ij = ρ i j p/2 p/2. β has sparsity s with s/2 equally spaced within [ 1, 0.5] and s/2 equally spaced within [0.5, 1]. n = 200, p = Use Lasso and seek E Ŝn,τ CPSS L q/p l. Fix q = 0.8lp and for worst-case bound choose τ = 0.9. Choose τ from r-concave bound, oracle τ, and oracle λ for Lasso Ŝ λ n. Compare E Ŝ CPSS n,0.9 S E Ŝ CPSS n,τ E Ŝ CPSS n, τ S S, E Ŝ CPSS S and n,τ E Ŝ λ n S E Ŝ CPSS n,τ S. Samworth & Shah (Cambridge) Variable selection 25 October / 28

27 Simulation results 1 3/4 1 1/2 1/4 0 SNR s r /4 2 1/2 1/4 0 Figure : Expected number or true positives using worst case and r-concave bounds, and an oracle Lasso procedure (crosses), as a fraction of the expected number of true positives for an oracle CPSS procedure. The y-axis label gives the desired error control level l. Samworth & Shah (Cambridge) Variable selection 25 October / 28

28 Summary CPSS can be used with any variable selection procedure. We can bound the average number of low selection probability variables chosen by CPSS with no conditions on the model or original selection procedure needed. Under mild conditions e.g. unimodality or r-concavity, the bounds can be strengthened, yielding tight error control. This allows the user to choose the threshold τ in an effective way. R packages: mboost and stabsel. Thank you for listening. Samworth & Shah (Cambridge) Variable selection 25 October / 28

VARIABLE SELECTION AND INDEPENDENT COMPONENT

VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological