A Significance Test for the Lasso

Similar documents
A Significance Test for the Lasso

A significance test for the lasso

A significance test for the lasso

Post-selection Inference for Forward Stepwise and Least Angle Regression

Statistical Inference

Post-selection inference with an application to internal inference

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression

Linear models and their mathematical foundations: Simple linear regression

Data Mining Stat 588

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Regression, Ridge Regression, Lasso

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

Post-Selection Inference

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Business Statistics. Lecture 10: Correlation and Linear Regression

Linear model selection and regularization

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

MS&E 226: Small Data

Ch 2: Simple Linear Regression

1 Least Squares Estimation - multiple regression.

Simple Linear Regression

Linear Methods for Prediction

Shrinkage Methods: Ridge and Lasso

Dimension Reduction Methods

A significance test for the lasso

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Day 4: Shrinkage Estimators

Weighted Least Squares

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Linear regression methods

BIOS 2083 Linear Models c Abdus S. Wahed

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

General Linear Model: Statistical Inference

High-dimensional regression

CMSC858P Supervised Learning Methods

Covariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20

Model Checking and Improvement

ECONOMETRICS II, FALL Testing for Unit Roots.

Linear Regression (9/11/13)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

MSA220/MVE440 Statistical Learning for Big Data

Lecture 14: Shrinkage

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro

Consistent high-dimensional Bayesian variable selection via penalized credible regions

MSA220/MVE440 Statistical Learning for Big Data

Bias Variance Trade-off

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Linear Methods for Prediction

Statistics 262: Intermediate Biostatistics Model selection

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Summary and discussion of: Controlling the False Discovery Rate via Knockoffs

Binary Logistic Regression

A Significance Test for the Lasso

The linear model is the most fundamental of all serious statistical models encompassing:

Median Cross-Validation

AFT Models and Empirical Likelihood

STT 843 Key to Homework 1 Spring 2018

Lecture 6: Methods for high-dimensional problems

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Machine Learning for OR & FE

Classification 1: Linear regression of indicators, linear discriminant analysis

Homework 2: Simple Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

36-707: Regression Analysis Homework Solutions. Homework 3

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

De-mystifying random effects models

Introductory Econometrics

Simple and Multiple Linear Regression

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Semi-Penalized Inference with Direct FDR Control

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Binary choice 3.3 Maximum likelihood estimation

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Multiple Regression Analysis: The Problem of Inference

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Marginal Screening and Post-Selection Inference

Adaptive Lasso for correlated predictors

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Multiple linear regression S6

Correlation 1. December 4, HMS, 2017, v1.1

Review of Statistics 101

Chapter Eight: Assessment of Relationships 1/42

Multiple Linear Regression

Math 423/533: The Main Theoretical Topics

R 2 and F -Tests and ANOVA

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Bayesian Linear Models

Transcription:

A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1

Last time Problem: Many clinical covariates which are important to a certain medical outcome? Want to choose the important variables and say how important these variables are Bad solution: Forward stepwise regression very anti-conservative p-values Better solution: Lasso with p-values from newly proposed covariance test statistic 2

Framework Consider regression setup with outcome vector y R n with covariate matrix X R n p and y = βx + ɛ with ɛ N(0, σ 2 I ). The lasso estimator is obtained by finding β that minimizes where λ is the lasso penalty. 1 2 y X β 2 + λ p β i, i=1 3

Lasso solution path (λ 1 > λ 2 > λ 3 > λ 4 >...) Coefficients 0.2 0.0 0.2 0.4 0.6 λ 1 λ 2 λ 3 λ 4 constraint ˆβ lasso = arg min β 1 2 y X β 2 + λ p β i i=1 4

Obtain p-value for covariate entering the model Coefficients 0.2 0.0 0.2 0.4 0.6 λ 1 λ 2 λ 3 λ 4 constraint 5

Form of test statistic 1 Forward stepwise regression: Lasso: RSS null RSS σ 2 = y ŷ null 2 y ŷ 2 σ [ 2 y T ŷ y T ] ŷ = 2 null + ŷ null 2 ŷ 2 σ 2 T k = y T ŷ y T ŷ null σ 2 σ 2 1 Taking σ 2 as known (for now) 6

What is ŷ? testing that variable that enters at λ 3 has β = 0 ŷ = X ˆβ(λ 4 ) Coefficients 0.2 0.0 0.2 0.4 0.6 λ 1 λ 2 λ 3 λ 4 constraint 7

What about ŷ null? testing that variable that enters at λ 3 has β = 0 ŷ = X ˆβ(λ 4 ) ŷ null = X null ˆβ null (λ 4 ) Coefficients 0.2 0.0 0.2 0.4 0.6 λ 1 λ 2 λ 3 λ 4 constraint 8

What about ŷ null? testing that variable that enters at λ 3 has β = 0 ŷ = X ˆβ(λ 4 ) ŷ null = X null ˆβ null (λ 4 ) Coefficients 0.2 0.0 0.2 0.4 0.6 λ 1 λ 2 λ 3 λ 4 constraint 8

Putting this together The covariance test statistic for testing the predictor that enters at the kth step is T k = y T ŷ y T ŷ null σ 2 = y T X ˆβ(λ k+1 ) y T X null ˆβ null (λ k+1 ) σ 2. 9

10 What exactly is the null? Under the global null (β = 0), then T 1 d Exp(1) T 2 d Exp(1/2) T 3 d Exp(1/3). for orthogonal predictor matrix X. Asymptotic distributions are stochastically smaller for general X.

Does it work for finite samples? Simulation of distribution of test statistics for first covariate to enter model under global null (β = 0) n = 100, p = 10 Forward Stepwise Lasso Test statistic 0 2 4 6 8 10 Test statistic 0 1 2 3 4 5 6 0 2 4 6 8 10 0 1 2 3 4 5 6 Chi squared on 1 df Exp(1) 11

Does it work for finite samples? Simulation of distribution of test statistics for first covariate to enter model under global null (β = 0) n = 100, p = 10 Forward Stepwise Lasso Test statistic 0 2 4 6 8 10 Test statistic 0 1 2 3 4 5 6 0 2 4 6 8 10 0 1 2 3 4 5 6 Chi squared on 1 df Exp(1) 11

12 What exactly is the null? Under the weaker null where there are k 0 truly active covariates (and they have entered the model), then T k0 +1 d Exp(1) T k0 +2 d Exp(1/2) T k0 +3 d Exp(1/3) for orthogonal predictor matrix X. Asymptotic distributions are stochastically smaller for general X..

13 See, it works... Simulation of distribution of test statistics when true β has three non-zero components n = 100, p = 10 F 1 (p) = θ log(1 p) for Exp(θ) 4th predictor 5th predictor 6th predictor Test statistic 0 1 2 3 4 5 Test statistic 0 1 2 3 4 5 Test statistic 0 1 2 3 4 5 0 1 2 3 4 5 Exp(1) 0 1 2 3 4 5 Exp(1) 0 1 2 3 4 5 Exp(1)

14 Simulation setup Distribution of T 1 under global null (β = 0) n = 100 and p (10, 50, 200) Varying correlation structure of predictors with ρ (0, 0.2, 0.4, 0.6, 0.8) Exchangeable AR(1) Block diagonal Mean, variance, and tail probability of distribution

The authors results 15

Some commentary... I don t have any applied or technical comments on the paper at hand (except for feeling strongly that Tables 2 and 3 should really really really be made into a graph... do we really care that a certain number is 315.216?) Andrew Gelman 2 2 via his blog 16

The authors results 17

18 Sampling distribution of simulation results 100 replications of the simulation for given parameters Note large variance of each distribution Larger number of replications needed for accurate estimate n=100, p=10, rho=0 n=100, p=10, rho=0 n=100, p=10, rho=0 Frequency 0 5 10 15 Frequency 0 5 10 15 20 Frequency 0 5 10 15 20 0.90 1.00 1.10 Mean 1.0 1.4 1.8 Variance 0.04 0.06 0.08 0.10 Tail probability

Lots of sampling distributions 19

20 My results mean Exchangeable correlation AR(1) correlation Block diagonal correlation mean 0.90 0.95 1.00 1.05 1.10 p=10 p=50 p=200 mean 0.90 0.95 1.00 1.05 1.10 p=10 p=50 p=200 mean 0.90 0.95 1.00 1.05 1.10 p=10 p=50 p=200 0.0 0.2 0.4 0.6 0.8 ρ 0.0 0.2 0.4 0.6 0.8 ρ 0.0 0.2 0.4 0.6 0.8 ρ

21 My results variance Exchangeable correlation AR(1) correlation Block diagonal correlation variance 0.9 1.1 1.3 1.5 p=10 p=50 p=200 variance 0.9 1.1 1.3 1.5 p=10 p=50 p=200 variance 0.9 1.1 1.3 1.5 p=10 p=50 p=200 0.0 0.2 0.4 0.6 0.8 ρ 0.0 0.2 0.4 0.6 0.8 ρ 0.0 0.2 0.4 0.6 0.8 ρ

22 My results tail probability Exchangeable correlation AR(1) correlation Block diagonal correlation tail probability 0.00 0.02 0.04 0.06 0.08 0.10 p=10 p=50 p=200 tail probability 0.00 0.02 0.04 0.06 0.08 0.10 p=10 p=50 p=200 tail probability 0.00 0.02 0.04 0.06 0.08 0.10 p=10 p=50 p=200 0.0 0.2 0.4 0.6 0.8 ρ 0.0 0.2 0.4 0.6 0.8 ρ 0.0 0.2 0.4 0.6 0.8 ρ

23 What to do when σ 2 is unknown? (n > p) Estimate in the usual way: ˆσ 2 = RSS n p = y X ˆβ LS 2 n p Asymptotic distribution is now F 2,n p Numerator is Exp(1) = χ 2 2 /2 Denominator is χ 2 n p /(n p) Numerator and denominator independent

24 What to do when σ 2 is unknown? (n p) Estimate from least squares fit from model selected by cross-validation No rigorous theory here (fingers crossed!)

25 What s the big idea? Use covariance test statistic to obtain p-value for covariates as they enter the lasso model Compare to asymptotic distribution Exp(1) to obtain p-values Reasonable performance in finite samples Possibly extend this to obtaining inference for all coefficients from a model for a specific lasso penalty

26 What s next! To do: Obtain data for p > n case (HIV data) Finish simulations Next time: Real data examples More on assumptions and theory