VARIABLE SELECTION AND INDEPENDENT COMPONENT

Size: px

Start display at page:

Download "VARIABLE SELECTION AND INDEPENDENT COMPONENT"

Merilyn Bryant
6 years ago
Views:

1 VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan

2 My core research interests A broad range of methodological and theoretical problems in nonparametric and high-dimensional statistics. Examples include Shape-constrained nonparametric density estimation

3 R. J. Samworth Research interests My core research interests Classification problems wni d = 50 Regret ratio i d April 21,

4 Variable selection using Stability Selection Stability Selection (Meinshausen and Bühlmann, 2010) is a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. A particularly attractive feature of Stability Selection is the error control provided by an upper bound on the expected number of falsely selected variables. April 21,

5 A general model for variable selection Let Z 1,...,Z n be i.i.d. random vectors. We think of the indices S of some components of Z i as being signal variables, and others N as being noise variables. E.g. Z i = (X i,y i ), with covariate X i R p, response Y i R and log-likelihood of the form n L(Y i,xi T β), i=1 with β R p. Then S = {k : β k 0} and N = {k : β k = 0}. Thus S {1,...,p} and N = {1,...,p}\S. A variable selection procedure is a statistic Ŝn := Ŝn(Z 1,...,Z n ) taking values in the set of all subsets of {1,...,p}. April 21,

6 How does Stability Selection work? For a subset A = {i 1,...,i A } {1,...,n}, write Ŝ(A) := Ŝ A (Z i1,...,z i A ). Meinshausen and Bühlmann defined ˆΠ(k) = ( n n/2 ) 1 A {1,...,n} A = n/2 ½ {k Ŝ(A)}. Stability Selection fixes τ [0,1] and selects Ŝ SS n,τ = {k : ˆΠ(k) τ}. April 21,

7 Error control Meinshausen and Bühlmann (2010) Assume that {½ {k Ŝ n/2 } : k N} is exchangeable, and that Ŝ n/2 is not worse than random guessing: Then, for τ ( 1 2,1], E( Ŝ n/2 S ) E( Ŝ n/2 N ) S N. E( ŜSS n,τ N ) 1 2τ 1 (E Ŝ n/2 ) 2. p April 21,

8 Error control discussion In principle, this theorem helps the practitioner choose the tuning parameter τ. However: The theorem requires two conditions, and the exchangeability assumption is very strong There are too many subsets to evaluate Ŝ SS n,τ when n 20 The bound tends to be rather weak. April 21,

9 Complementary Pairs Stability Selection Shah and S. (2013) Let {(A 2j 1,A 2j ) : j = 1,...,B} be randomly chosen independent pairs of subsets of {1,...,n} of size n/2 such that A 2j 1 A 2j =. Define ˆΠ B (k) := 1 2B 2B j=1 ½ {k Ŝ(A j )}, and select ŜCPSS n,τ = {k : ˆΠ B (k) τ}. April 21,

10 Worst case error control bounds Let p k,n = P(k Ŝn). For θ [0,1], let L θ = {k : p k, n/2 θ} and H θ = {k : p k, n/2 > θ}. If τ ( 1 2,1], then E ŜCPSS n,τ L θ θ 2τ 1 E Ŝ n/2 L θ. Moreover, if τ [0, 1 2 ), then E ˆN CPSS n,τ H θ 1 θ 1 2τ E ˆN n/2 H θ. April 21,

11 Illustration and discussion Suppose p = 1000, and q := E Ŝ n/2 = 50. Then on average, CPSS with τ = 0.6 selects no more than a quarter of the variables that have below average selection probability under Ŝ n/2. The theorem requires no exchangeability or random guessing conditions It holds even when B = 1 If exchangeability and random guessing conditions do hold, then we recover E Ŝn,τ CPSS N 1 q ) E Ŝ 2τ 1( p n/2 L q/p 1 ( q 2). 2τ 1 p April 21,

12 Improved bounds The main inequality in the proof is Markov s inequality. By modelling the distribution more carefully, we can prove new versions of Markov s inequality and obtain improved bounds. April 21,

13 Improved bounds Error bounds τ April 21,

14 Summary CPSS can be used in conjunction with any variable selection procedure. We can bound the average number of low selection probability variables chosen by CPSS under no conditions on the model or original selection procedure Under mild conditions, the bounds can be strengthened, yielding tight error control. This allows the practitioner to choose the threshold τ in an effective way. April 21,

15 R. J. Samworth Independent Component Analysis What are ICA models? ICA is a special case of a blind source separation problem, where from a set of mixed signals, we aim to infer both the source signals and mixing process; e.g. cocktail party problem. It was pioneered by Comon (1994), and has become enormously popular in signal processing, machine learning, medical imaging... April 21,

16 R. J. Samworth Independent Component Analysis Mathematical definition In the simplest, noiseless case, we observe replicates x 1,...,x n of X = A S, d 1 d d d 1 where the mixing matrix A is invertible and S has independent components. Our main aim is to estimate the unmixing matrix W = A 1 ; estimation of marginals P 1,...,P d of S = (S 1,...,S d ) is a secondary goal. This semiparametric model is therefore related to PCA. April 21,

17 R. J. Samworth Independent Component Analysis Different previous approaches Postulate parametric family for marginals P 1,...,P d ; optimise contrast function involving (W,P 1,...,P d ). Contrast usually represents mutual information or maximum entropy; or non-gaussianity (Eriksson et al., 2000, Karvanen et al., 2000). Postulate smooth (log) densities for marginals (Bach and Jordan, 2002; Hastie and Tibshirani, 2003; Samarov and Tsybakov, 2004, Chen and Bickel, 2006). April 21,

18 R. J. Samworth Independent Component Analysis Our approach S. and Yuan (2012) To avoid assumptions of existence of densities, and choice of tuning parameters, we propose to maximise the log-likelihood log detw + 1 n n i=1 d logf j (wj T x i ) j=1 over all d d non-singular matrices W = (w 1,...,w d ) T, and univariate log-concave densities f 1,...,f d. A major attraction of this approach is that the log-concave ICA projection preserves the unmixing matrix W. April 21,

19 R. J. Samworth Independent Component Analysis References Bach, F., Jordan, M. I. (2002) Kernel independent component analysis. Journal of Machine Learning Research, 3, Chen, A. and Bickel, P. J. (2006) Efficient independent component analysis, The Annals of Statistics, 34, Comon, P. (1994) Independent component analysis, A new concept? Signal Proc., 36, Hastie, T. and Tibshirani, R. (2003) ProDenICA: Product Density Estimation for ICA using tilted Gaussian density estimates. R package version Mason, D. M. and Newton, M. A. (1992) A rank statistics approach to the consistency of a general bootsrap, Ann. Statist., 20, April 21,

20 R. J. Samworth Independent Component Analysis Meinshausen, N. and Bühlmann, P. (2010) Stability selection, J. Roy. Statist. Soc., Ser. B (with discussion), 72, Præstgaard, J. and Wellner, J. A. (1993) Exchangeably weighted bootstraps of the general empirical process, Ann. Probab., 21, Samarov, A. and Tsybakov, A. (2004), Nonparametric independent component analysis. Bernoulli, 10, Samworth, R. J. and Yuan, M. (2012) Independent component analysis via nonparametric maximum likelihood estimation. Ann. Statist., 40, Shah, R. D. and Samworth, R. J. (2013) Variable selection with error control: Another look at Stability Selection, J. Roy. Statist. Soc., Ser. B, 75, April 21,

Variable selection with error control: Another look at Stability Selection

Variable selection with error control: Another look at Stability Selection Richard J. Samworth and Rajen D. Shah University of Cambridge RSS Journal Webinar 25 October 2017 Samworth & Shah (Cambridge)