Variable selection with error control: Another look at Stability Selection

Size: px
Start display at page:

Download "Variable selection with error control: Another look at Stability Selection"

Transcription

1 Variable selection with error control: Another look at Stability Selection Richard J. Samworth and Rajen D. Shah University of Cambridge RSS Journal Webinar 25 October 2017 Samworth & Shah (Cambridge) Variable selection 25 October / 28

2 High-dimensional data Many modern applications, e.g. in genomics, can have the number of predictors p greatly exceeding the number of observations n. In these settings, variable selection is particularly important. (a) Microarray data (b) Sparsity Samworth & Shah (Cambridge) Variable selection 25 October / 28

3 What is Stability Selection Stability Selection (Meinshausen & Bühlmann, 2010) is a very general technique designed to improve the performance of a variable selection algorithm. It is based on aggregating the results of applying a selection procedure to subsamples of the data. A key feature of Stability Selection is the error control provided in the form an upper bound on the expected number of falsely selected variables. Samworth & Shah (Cambridge) Variable selection 25 October / 28

4 A general model for variable selection Let Z 1,..., Z n be i.i.d. random vectors. We think of indices S of some components of Z i as being signal variables, and the rest N as noise variables. E.g. Z i = (X i, Y i ), with covariate vector X i R p, response Y i R and log-likelihood of the form n i=1 L(Y i, X T i β) with β R p. Then S = {k : β k 0} and N = {k : β k = 0}. Thus S {1,..., p} and N = {1,..., p} \ S. A variable selection procedure is a statistic Ŝ n := Ŝ n (Z 1,..., Z n ) taking values in the set of all subsets of {1,..., p}. Samworth & Shah (Cambridge) Variable selection 25 October / 28

5 How does Stability Selection work? For a subset A = {i 1,..., i A } {1,..., n}, write Meinshausen and Bühlmann defined ˆΠ(k) := Ŝ := Ŝ A (Z i1,..., Z i A ). ( n ) 1 n/2 1. {k Ŝ(A)} A {1,...,n}, A = n/2 Stability selection fixes τ [0, 1] and selects Ŝ SS n,τ = {k : ˆΠ(k) τ}. Samworth & Shah (Cambridge) Variable selection 25 October / 28

6 Error control of Stability Selection Assume that {1 {k Ŝ n/2 } : k N} is exchangeable, and that Ŝ n/2 is no worse than random guessing: Then, for τ ( 1 2, 1], E( Ŝ n/2 S ) E( Ŝ n/2 N ) S N. E( Ŝ n,τ SS N ) 1 (E Ŝ n/2 ) 2 2τ 1 p Samworth & Shah (Cambridge) Variable selection 25 October / 28

7 Error control discussion In principle, this theorem allows to user to choose τ based on the expected number of false positives they are willing to tolerate. However: The theorem requires two conditions, and the exchangeability assumption is very strong There are too many subsets to evaluate Ŝ SS n,τ exactly when n 30 The bound tends to be rather weak. Samworth & Shah (Cambridge) Variable selection 25 October / 28

8 Complementary Pairs Stability Selection (CPSS) Let {(A 2j 1, A 2j ) : j = 1,..., B} be randomly chosen independent pairs of subsets of {1,..., n} of size n/2 such that A 2j 1 A 2j =. Define ˆΠ B (k) := 1 2B B j=1 1 {k Ŝ(A j )} and select Ŝ CPSS n,τ := {k : ˆΠ B (k) τ}. Samworth & Shah (Cambridge) Variable selection 25 October / 28

9 Worst case error control bounds Define the selection probability of variable k to be p k,n = P(k Ŝ n ). We can divide our variables into those that have low and high selection probabilities: for θ [0, 1], let L θ := {k : p k, n/2 θ} and H θ := {k : p k, n/2 > θ}. If τ ( 1 2, 1], then Moreover, if τ [0, 1 2 ), then E Ŝ CPSS n,τ L θ θ 2τ 1 E Ŝ n/2 L θ. CPSS E ˆN n,τ H θ 1 θ 1 2τ E ˆN n/2 H θ. Samworth & Shah (Cambridge) Variable selection 25 October / 28

10 Illustration and discussion Suppose p = 1000 and q := E Ŝ n/2 = 50. On average, CPSS with τ = 0.6 selects no more than a quarter of the variables that have below average selection probability under Ŝ n/2. The theorem requires no exchangeability or random guessing conditions It holds even when B = 1 If exchangeability and random guessing conditions do hold, then we recover E Ŝn,τ CPSS N 1 q ) E Ŝ 2τ 1( p n/2 L q/p 1 2τ 1 ( q 2 p ). Samworth & Shah (Cambridge) Variable selection 25 October / 28

11 Proof Let Π B (k) := 1 B B 1 {k Ŝ(A 2j 1 )} 1 {k Ŝ(A 2j )}. j=1 Note that E{ Π B (k)} = p 2 k, n/2. Now Thus 0 1 B B (1 1 {k Ŝ(A2j 1 )} )(1 1 {k Ŝ(A 2j )} ) j=1 = 1 2ˆΠ B (k) + Π B (k). P{ˆΠ B (k) τ} P{ 1 2 (1 + Π B (k)) τ} = P{ Π B (k) 2τ 1} 1 2τ 1 p2 k, n/2. Samworth & Shah (Cambridge) Variable selection 25 October / 28

12 Proof 2 It follows that ( L θ = E E Ŝ CPSS n,τ 1 2τ 1 k:p k, n/2 θ k:p k, n/2 θ 1 {k Ŝ CPSS n,τ } ) = p 2 k, n/2 where the final inequality follows because ( E Ŝ n/2 L θ = E k:p k, n/2 θ 1 {k Ŝ n/2 } k:p k, n/2 θ P(k Ŝ CPSS n,τ )) θ 2τ 1 E Ŝ n/2 L θ, ) = k:p k, n/2 θ p k, n/2. Samworth & Shah (Cambridge) Variable selection 25 October / 28

13 Bounds with no assumptions whatsoever If Z 1,..., Z n are not identically distributed, the same bound holds, provided in L θ we redefine p k, n/2 := ( ) n 1 P{k Ŝ n/2 n/2 (A)}. A =n/2 Similarly, if Z 1,..., Z n are not independent, the same bound holds, with pk, n/2 2 as the average of over all complementary pairs A 1, A 2. P{k Ŝ n/2 (A 1 ) Ŝ n/2 (A 2 )} Samworth & Shah (Cambridge) Variable selection 25 October / 28

14 Can we improve on Markov s inequality? 0.0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

15 Can we improve on Markov s inequality? 0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k. Samworth & Shah (Cambridge) Variable selection 25 October / 28

16 Improved bound under unimodality Suppose that the distribution of Π B (k) is unimodal for each k L θ. If τ { B, B, B,..., 1}, then E Ŝ CPSS n,τ L θ C(τ, B)θE Ŝ n/2 L θ, where, when θ 1/ 3, 1 2(2τ 1 1/2B) C(τ, B) = 4(1 τ + 1/2B) 1 + 1/B if τ (min( θ2, B θ2 ), 3 4 ] if τ ( 3 4, 1]. Samworth & Shah (Cambridge) Variable selection 25 October / 28

17 Extremal distribution under unimodality 0.0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

18 Extremal distribution under unimodality Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

19 The r-concavity constraint r-concavity provides a continuum of constraints that interpolate between unimodality and log-concavity. A non-negative function f on an interval I R is r-concave with r < 0 if f r is convex on I. A pmf f on {0, 1/B,..., 1} is r-concave if the linear interpolant to {(i, f (i/b)) : i = 0, 1,..., B} is r-concave. The constraint becomes weaker as r increases to 0. Samworth & Shah (Cambridge) Variable selection 25 October / 28

20 Further improvements under r-concavity Suppose Π B (k) is r-concave for all k L θ. Then for τ = ( 1 2, 1], E Ŝ CPSS n,τ L θ D(θ 2, 2τ 1, B, r) L θ where D can be evaluated numerically. Our simulations suggest r = 1/2 is a reasonable choice. Samworth & Shah (Cambridge) Variable selection 25 October / 28

21 Extremal distribution under 1/2-concavity 0.0 Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k Samworth & Shah (Cambridge) Variable selection 25 October / 28

22 Extremal distribution under 1/2-concavity Probability Figure : Typical and extremal pmfs of Π 25 (k) for a low selection probability variable k. Samworth & Shah (Cambridge) Variable selection 25 October / 28

23 r = 1/2 is sensible Probability 1/ Typical and extremal pmfs raised to the power 1/ Probability 10-2 Tail probabilities from 0.2 onwards. Worst case Unimodal 1 2 -concave Empirical Samworth & Shah (Cambridge) Variable selection 25 October / 28

24 Reducing the threshold τ Suppose ˆΠ B (k) is 1/4-concave, and that Π B (k) is 1/2-concave for all k L θ. Then E Ŝ CPSS n,τ L θ min{d(θ 2, 2τ 1, B, 1/2), D(θ, τ, 2B, 1/2)} L θ, for all τ (θ, 1]. (We take D(, t,, ) = 1 for t 0.) Samworth & Shah (Cambridge) Variable selection 25 October / 28

25 Error bound 15 Improved bounds τ CPSS E S n,τ Figure : Comparison of the bounds on Lq/p where p = 1000, q = 50 showing the M & B (dashes), worst case (dot dash), unimodal and r -concave bounds, and the true value for a simulated example. Samworth & Shah (Cambridge) Variable selection 25 October / 28

26 Simulation study Linear model Y i = Xi T β + ε i with X i N p (0, Σ). Toeplitz covariance Σ ij = ρ i j p/2 p/2. β has sparsity s with s/2 equally spaced within [ 1, 0.5] and s/2 equally spaced within [0.5, 1]. n = 200, p = Use Lasso and seek E Ŝn,τ CPSS L q/p l. Fix q = 0.8lp and for worst-case bound choose τ = 0.9. Choose τ from r-concave bound, oracle τ, and oracle λ for Lasso Ŝ λ n. Compare E Ŝ CPSS n,0.9 S E Ŝ CPSS n,τ E Ŝ CPSS n, τ S S, E Ŝ CPSS S and n,τ E Ŝ λ n S E Ŝ CPSS n,τ S. Samworth & Shah (Cambridge) Variable selection 25 October / 28

27 Simulation results 1 3/4 1 1/2 1/4 0 SNR s r /4 2 1/2 1/4 0 Figure : Expected number or true positives using worst case and r-concave bounds, and an oracle Lasso procedure (crosses), as a fraction of the expected number of true positives for an oracle CPSS procedure. The y-axis label gives the desired error control level l. Samworth & Shah (Cambridge) Variable selection 25 October / 28

28 Summary CPSS can be used with any variable selection procedure. We can bound the average number of low selection probability variables chosen by CPSS with no conditions on the model or original selection procedure needed. Under mild conditions e.g. unimodality or r-concavity, the bounds can be strengthened, yielding tight error control. This allows the user to choose the threshold τ in an effective way. R packages: mboost and stabsel. Thank you for listening. Samworth & Shah (Cambridge) Variable selection 25 October / 28

VARIABLE SELECTION AND INDEPENDENT COMPONENT

VARIABLE SELECTION AND INDEPENDENT COMPONENT VARIABLE SELECTION AND INDEPENDENT COMPONENT ANALYSIS, PLUS TWO ADVERTS Richard Samworth University of Cambridge Joint work with Rajen Shah and Ming Yuan My core research interests A broad range of methodological

More information

Variable selection with error control: another look at stability selection

Variable selection with error control: another look at stability selection J. R. Statist. Soc. B (3) 75, Part, pp. 55 8 Variable selection with error control: another look at stability selection Rajen D. Shah and Richard J. Samworth University of Cambridge, UK [Received May.

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Ranking-Based Variable Selection for high-dimensional data.

Ranking-Based Variable Selection for high-dimensional data. Ranking-Based Variable Selection for high-dimensional data. Rafal Baranowski 1 and Piotr Fryzlewicz 1 1 Department of Statistics, Columbia House, London School of Economics, Houghton Street, London, WC2A

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

The annals of Statistics (2006)

The annals of Statistics (2006) High dimensional graphs and variable selection with the Lasso Nicolai Meinshausen and Peter Buhlmann The annals of Statistics (2006) presented by Jee Young Moon Feb. 19. 2010 High dimensional graphs and

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

arxiv: v2 [stat.me] 16 May 2009

arxiv: v2 [stat.me] 16 May 2009 Stability Selection Nicolai Meinshausen and Peter Bühlmann University of Oxford and ETH Zürich May 16, 9 ariv:0809.2932v2 [stat.me] 16 May 9 Abstract Estimation of structure, such as in variable selection,

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Nonrobust and Robust Objective Functions

Nonrobust and Robust Objective Functions Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i

More information

Relaxed Lasso. Nicolai Meinshausen December 14, 2006

Relaxed Lasso. Nicolai Meinshausen December 14, 2006 Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) Electronic Journal of Statistics Vol. 0 (2010) ISSN: 1935-7524 The adaptive the thresholded Lasso for potentially misspecified models ( a lower bound for the Lasso) Sara van de Geer Peter Bühlmann Seminar

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

An ensemble learning method for variable selection

An ensemble learning method for variable selection An ensemble learning method for variable selection Vincent Audigier, Avner Bar-Hen CNAM, MSDMA team, Paris Journées de la statistique 2018 1 / 17 Context Y n 1 = X n p β p 1 + ε n 1 ε N (0, σ 2 I) β sparse

More information

Uncertainty quantification in high-dimensional statistics

Uncertainty quantification in high-dimensional statistics Uncertainty quantification in high-dimensional statistics Peter Bühlmann ETH Zürich based on joint work with Sara van de Geer Nicolai Meinshausen Lukas Meier 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Goodness-of-fit tests for high dimensional linear models

Goodness-of-fit tests for high dimensional linear models J. R. Statist. Soc. B (2018) 80, Part 1, pp. 113 135 Goodness-of-fit tests for high dimensional linear models Rajen D. Shah University of Cambridge, UK and Peter Bühlmann Eidgenössiche Technische Hochschule

More information

Pruning Variable Selection Ensembles

Pruning Variable Selection Ensembles Pruning Variable Selection Ensembles Chunxia Zhang School of Mathematics and Statistics, Xi an Jiaotong University, China arxiv:1704.08265v1 [stat.ml] 26 Apr 2017 Yilei Wu and Mu Zhu Department of Statistics

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Data assimilation in high dimensions

Data assimilation in high dimensions Data assimilation in high dimensions David Kelly Kody Law Andy Majda Andrew Stuart Xin Tong Courant Institute New York University New York NY www.dtbkelly.com February 3, 2016 DPMMS, University of Cambridge

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center

On the power and the limits of evolvability. Vitaly Feldman Almaden Research Center On the power and the limits of evolvability Vitaly Feldman Almaden Research Center Learning from examples vs evolvability Learnable from examples Evolvable Parity functions The core model: PAC [Valiant

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions H. V. Poor Princeton University March 17, 5 Exercise 1: Let {h k,l } denote the impulse response of a

More information

Optimal Designs for 2 k Experiments with Binary Response

Optimal Designs for 2 k Experiments with Binary Response 1 / 57 Optimal Designs for 2 k Experiments with Binary Response Dibyen Majumdar Mathematics, Statistics, and Computer Science College of Liberal Arts and Sciences University of Illinois at Chicago Joint

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations)

ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations) ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations) D. Richard Brown III Worcester Polytechnic Institute 26-January-2011 Worcester Polytechnic Institute

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

False Discovery and Its Control in Low-Rank Estimation

False Discovery and Its Control in Low-Rank Estimation False Discovery and Its Control in Low-Rank Estimation Armeen Taeb a, Parikshit Shah b, and Venkat Chandrasekaran a,c a Department of Electrical Engineering, California Institute of Technology b Yahoo

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates DOI 1.17/s11-17-975- Gradient boosting for distributional regression: faster tuning and improved variable selection via non updates Janek Thomas 1 Andreas Mayr,3 Bernd Bischl 1 Matthias Schmid 3 Adam Smith

More information

Sparse survival regression

Sparse survival regression Sparse survival regression Anders Gorst-Rasmussen gorst@math.aau.dk Department of Mathematics Aalborg University November 2010 1 / 27 Outline Penalized survival regression The semiparametric additive risk

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY 1

SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY 1 The Annals of Statistics 2010, Vol. 38, No. 6, 3567 3604 DOI: 10.1214/10-AOS798 Institute of Mathematical Statistics, 2010 SURE INDEPENDENCE SCREENING IN GENERALIZED LINEAR MODELS WITH NP-DIMENSIONALITY

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Information Theory and Statistics Lecture 3: Stationary ergodic processes Information Theory and Statistics Lecture 3: Stationary ergodic processes Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Measurable space Definition (measurable space) Measurable space

More information

Calibrating Environmental Engineering Models and Uncertainty Analysis

Calibrating Environmental Engineering Models and Uncertainty Analysis Models and Cornell University Oct 14, 2008 Project Team Christine Shoemaker, co-pi, Professor of Civil and works in applied optimization, co-pi Nikolai Blizniouk, PhD student in Operations Research now

More information

Variable Selection and Weighting by Nearest Neighbor Ensembles

Variable Selection and Weighting by Nearest Neighbor Ensembles Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Weak Signals: machine-learning meets extreme value theory

Weak Signals: machine-learning meets extreme value theory Weak Signals: machine-learning meets extreme value theory Stephan Clémençon Télécom ParisTech, LTCI, Université Paris Saclay machinelearningforbigdata.telecom-paristech.fr 2017-10-12, Workshop Big Data

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Geostatistical Modeling for Large Data Sets: Low-rank methods

Geostatistical Modeling for Large Data Sets: Low-rank methods Geostatistical Modeling for Large Data Sets: Low-rank methods Whitney Huang, Kelly-Ann Dixon Hamil, and Zizhuang Wu Department of Statistics Purdue University February 22, 2016 Outline Motivation Low-rank

More information

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Minwise hashing for large-scale regression and classification with sparse data

Minwise hashing for large-scale regression and classification with sparse data Minwise hashing for large-scale regression and classification with sparse data Nicolai Meinshausen (Seminar für Statistik, ETH Zürich) joint work with Rajen Shah (Statslab, University of Cambridge) Simons

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information