Some new ideas for post selection inference and model assessment

Size: px
Start display at page:

Download "Some new ideas for post selection inference and model assessment"

Transcription

1 Some new ideas for post selection inference and model assessment Robert Tibshirani, Stanford WHOA!! 2018 Thanks to Jon Taylor and Ryan Tibshirani for helpful feedback 1 / 23

2 Two topics 1. How to improve post-selection inference for the lasso: Keli Liu, Jelena Markovic & RT (with further generalizations by Jon Taylor) 2. Maybe we re answering the wrong question in #1: Post model-fitting exploration via Next-Door analysis Leying Guan & RT 2 / 23

3 Keli Liu Jelena Markovic Leying Guan 3 / 23

4 Post-selection inference for the lasso Data (x i, y i ), i = 1, 2,... N; x i = (x i1, x i2,... x ip ). X fixed. Model y i = β 0 + x ij β j + ɛ i ; ɛ i N(0, σ 2 ). j 4 / 23

5 Post-selection inference for the lasso Data (x i, y i ), i = 1, 2,... N; x i = (x i1, x i2,... x ip ). X fixed. Model y i = β 0 + x ij β j + ɛ i ; ɛ i N(0, σ 2 ). j The Lasso { arg min (y i β 0 β 0,β 1,...,β p i j x ij β j ) 2 + λ j } β j for some λ 0. 4 / 23

6 Review of truncated Gaussian approach Polyhedral selection events Response vector y N(µ, Σ). Suppose we make a selection that can be written as {y : Ay b} with A, b not depending on y. This is true for forward stepwise regression, lasso with fixed λ, least angle regression and other procedures. 5 / 23

7 The polyhedral lemma [Lee et al, Ryan Tibshirani et al.] For any vector η F [V,V + ] η µ,σ 2 η η (η y) {Ay b} Unif(0, 1) (truncated Gaussian distribution), where V, V + are (computable) values that are functions of η, A, b. Typically choose η so that η T y is the partial least squares estimate for a selected variable 6 / 23

8 V (y) P η y y η T y V + (y) η P η y {Ay b} 7 / 23

9 Example: Lasso with fixed-λ HIV data: mutations that predict response to a drug. Selection intervals for lasso with fixed tuning parameter λ. Coefficient Naive interval Selection adjusted interval Predictor 8 / 23

10 A big shortcoming of this approach Intervals are often very wide, can even be infinite. 9 / 23

11 A big shortcoming of this approach Intervals are often very wide, can even be infinite. Why? We have conditioned on too much, leaving not enough variation for inference [Fithian, Taylor- data carving ]. 9 / 23

12 A big shortcoming of this approach Intervals are often very wide, can even be infinite. Why? We have conditioned on too much, leaving not enough variation for inference [Fithian, Taylor- data carving ]. Jonathan Taylor & co-authors have worked to solve this problem by adding noise to the data before model fitting. This is clever and produces shorter intervals and more powerful tests. 9 / 23

13 A big shortcoming of this approach Intervals are often very wide, can even be infinite. Why? We have conditioned on too much, leaving not enough variation for inference [Fithian, Taylor- data carving ]. Jonathan Taylor & co-authors have worked to solve this problem by adding noise to the data before model fitting. This is clever and produces shorter intervals and more powerful tests. Here we show how the problem can be largely solved without randomization to provide shorter intervals. 9 / 23

14 Forming a Data Driven Query: Two Costs 1. Variable Selection: The data is used to decide which variables are worthy of attention, e.g., running the lasso and focusing on the active set. 10 / 23

15 Forming a Data Driven Query: Two Costs 1. Variable Selection: The data is used to decide which variables are worthy of attention, e.g., running the lasso and focusing on the active set. 2. Target Formation: Having settled on a subset M {1,..., p} of variables for careful study, what should be the target of our estimation? Two choices: 10 / 23

16 Forming a Data Driven Query: Two Costs 1. Variable Selection: The data is used to decide which variables are worthy of attention, e.g., running the lasso and focusing on the active set. 2. Target Formation: Having settled on a subset M {1,..., p} of variables for careful study, what should be the target of our estimation? Two choices: full target β F j, j M, where or partial target β F = ( X X ) 1 X µ, β (M) = ( X MX M ) 1 X M µ. 10 / 23

17 Consequences With the full target, our only cost is in #1. Our proposal: instead of conditioning on the entire active set and signs, we can condition just on the event that a given variable X j was chosen. [minimal conditioning: it s the event that leads us to ask a question about X j ] This leads to a truncated Gaussian distribution on the union of two disjoint intervals, with exact coverage under Gaussian errors. 11 / 23

18 Consequences With the full target, our only cost is in #1. Our proposal: instead of conditioning on the entire active set and signs, we can condition just on the event that a given variable X j was chosen. [minimal conditioning: it s the event that leads us to ask a question about X j ] This leads to a truncated Gaussian distribution on the union of two disjoint intervals, with exact coverage under Gaussian errors. With the partial target, we have to deal with both #1 and #2. Details in a few slides. 11 / 23

19 Full Model Coefficients Naïve (0.33) TZ V (0.29) TZ M (0.82) TZ Ms (1.19) lcavol svi lweight age lbph pgg45 gleason Prostate cancer data. Naive ignore selection; TZ V condition just on selected variable; TZ M condition on active set; TZ Ms condition on active set and signs (Lee et al.). 12 / 23

20 Partial targets Idea: we choose a subset Ĥ ˆM of high value targets (details below). How we choose to summarize the effect of a variable j ˆM depends on whether j is a high value target: 13 / 23

21 Partial targets Idea: we choose a subset Ĥ ˆM of high value targets (details below). How we choose to summarize the effect of a variable j ˆM depends on whether j is a high value target: High Value: We summarize the effect of j using βĥj βĥ = ( X Ĥ X Ĥ) 1 X Ĥ µ. where So our choice of target is fully adaptive for high value targets. Low Value: If variable j is selected by the lasso but is not deemed a high value target, we summarize its effect via βĥ {j} j βĥ {j} = ( X Ĥ {j} X Ĥ {j}) 1 X Ĥ {j} µ where and XĤ {j} is the matrix containing the high value targets as well as variable j. The coefficient βĥ {j} j is the effect of variable j after partialing out the effect of the high value targets, i.e., it allows us to ask the question whether variable j contributes any explanatory power beyond the variables in Ĥ. 13 / 23

22 Defining high and low-value targets Stable-t: Take Ĥ to be those variables in ˆM with t-statistics surpassing a Bonferroni corrected threshold. We first fit a OLS model using all the variables in ˆM, i.e., ˆβ ˆM = ( X ˆM X ˆM) 1 X ˆM y and allow j to be a high value target if the t-statistic for ˆβ ˆM j if ˆβ ˆM j ( ) 1 σ X ˆM X > c ˆM for some ( ) cutoff c. If we choose c by Bonferroni, it has the form Φ 1 α 2p 2 log p for large p; jj is large, i.e., We again get a truncated Gaussian over a union of intervals, and exact coverage with finite samples. 14 / 23

23 Partial Model Coefficients High Value Naïve (0.30) TZ stab-t (0.40) TZ M (0.80) TZ Ms (1.12) Low Value lcavol svi lweight age lbph pgg45 gleason Prostate cancer data. Naive ignore selection; TZ V condition just on selected variable; TZ M condition on active set; TZ Ms condition on active set and signs (Lee et al.); TZ stab t stable-t for high value target selection. 15 / 23

24 n=100, p=250, pure noise naive bonf TZ t TZ l1 TZ M TZ Ms len: 0.32, cov: 0.00 len: 0.51, cov: 0.47, len: 0.78, cov: 0.92 len: 0.74, cov: 0.91 len: 0.51, cov: 0.92 Prop. Inf. t: 0.02 l1: 0.00 M: 0.09 Ms: 0.47 len: 7.62, cov: Boxplot of lengths of 90% confidence intervals for partial regression coefficients. Naive ignore selection; Bonf Bonferroni; TZ t stable-t for high value target selection; TZ l1 stable-l 1 for high value target selection; TZ M condition on active set; TZ Ms condition on active set and signs (Lee et al.); 16 / 23

25 Wrapup All of this is for N > p; 17 / 23

26 Wrapup All of this is for N > p; The ideas extended for the high-dimensional full target case via ROSI: in preparation with Kevin Fry, Keli Liu, Jonathan Taylor and Rob Tibshirani. Gets good power as well! Application to large GWAS problems. 17 / 23

27 Wrapup All of this is for N > p; The ideas extended for the high-dimensional full target case via ROSI: in preparation with Kevin Fry, Keli Liu, Jonathan Taylor and Rob Tibshirani. Gets good power as well! Application to large GWAS problems. Will be added to our selectiveinference R and Python packages. 17 / 23

28 Next-door analysis Motivation Having fit a model by e.g. lasso, post-selection inference (as above) focusses on significance and confidence intervals for each chosen feature But scientists will often have different questions: Is the chosen model the uniquely best one? Are there other models with similar prediction performance? Is a given predictor indispensible or can it be swapped out for one or more other predictors? These are model-centric as opposed to feature-centric questions Our proposed solution is an application of the LOCO (leave-one-covariate-out) method of Lei et al (the CMU group) [no data splitting; focus on models, not variables] 18 / 23

29 leave out x1 x2,x3,x4 leave out x2 x1,x3 Chosen model x1,x2,x3 Leave-one out models minimum error higher error leave out x3 x1,x5 19 / 23

30 Algorithm: Next-Door analysis for the lasso 1. Fit the lasso with parameter λ chosen by cross-validation. Let the solutions be ˆβ(ˆλ). Let S be the active set where the coefficient in ˆβ(ˆλ) is non-zero. 2. For each j S, solve the lasso problem with the coefficient for the j th predictor being fixed at 0: 1 { ˆβ 0, ˆβ; ˆλ, j} = argmin βj =0 2 (y i β 0 X il β l ) 2 + ˆλ i l j l β l (1) Let ˆβ(ˆλ; j) be the coefficients and d j be the increase in validation error for this model relative to the base model. 3. Form an approximately unbiased estimate of d j and test if predictor j is indispensable: that is, test whether the increase in estimated prediction error d j is significantly larger than zero. 20 / 23

31 Details Need to condition on selection events: (1) chosen model has minimum CV error, (2) predictor j is in chosen model We use tricks of Markovic and Taylor (adding noise in CV) and Xiaoying Tian (adding ± noise for Cp) to obtain approximately debiased prediction error estimates and the bootstrap to get approximate type I error control 21 / 23

32 Table: Prostate cancer results. The leftmost column shows the fitted model from the lasso, and the remaining columns show the nearby models corresponding to the removal of each predictor.. base lcavol lwt svi lcp lbph pgg45 age lcavol lwt svi lcp lbph pgg age cv error debiased error test error selection freq NextDoor pvalue Feature (Post-Sel) pval Post selection p-value, Frequency of selection Feature indispensability!! 22 / 23

33 Final comments Paper on arxiv by Guan & Tibshirani NextDoor R package will soon be on CRAN. Idea: run glmnet to fit model, then run NextDoor on the output to get post-fitting summary report 23 / 23

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University October 1, 2016 Workshop on Higher-Order Asymptotics and Post-Selection Inference, St. Louis Joint work with

More information

Recent Advances in Post-Selection Statistical Inference

Recent Advances in Post-Selection Statistical Inference Recent Advances in Post-Selection Statistical Inference Robert Tibshirani, Stanford University June 26, 2016 Joint work with Jonathan Taylor, Richard Lockhart, Ryan Tibshirani, Will Fithian, Jason Lee,

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Post-selection Inference for Forward Stepwise and Least Angle Regression

Post-selection Inference for Forward Stepwise and Least Angle Regression Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities Joshua R. Loftus Outline 1 Intro and background 2 Framework: quadratic model selection events

More information

Recent Developments in Post-Selection Inference

Recent Developments in Post-Selection Inference Recent Developments in Post-Selection Inference Yotam Hechtlinger Department of Statistics yhechtli@andrew.cmu.edu Shashank Singh Department of Statistics Machine Learning Department sss1@andrew.cmu.edu

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

arxiv: v5 [stat.me] 11 Oct 2015

arxiv: v5 [stat.me] 11 Oct 2015 Exact Post-Selection Inference for Sequential Regression Procedures Ryan J. Tibshirani 1 Jonathan Taylor 2 Richard Lochart 3 Robert Tibshirani 2 1 Carnegie Mellon University, 2 Stanford University, 3 Simon

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

Mallows Cp for Out-of-sample Prediction

Mallows Cp for Out-of-sample Prediction Mallows Cp for Out-of-sample Prediction Lawrence D. Brown Statistics Department, Wharton School, University of Pennsylvania lbrown@wharton.upenn.edu WHOA-PSI conference, St. Louis, Oct 1, 2016 Joint work

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

Selective Inference for Effect Modification: An Empirical Investigation

Selective Inference for Effect Modification: An Empirical Investigation Observational Studies () Submitted ; Published Selective Inference for Effect Modification: An Empirical Investigation Qingyuan Zhao Department of Statistics The Wharton School, University of Pennsylvania

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Robert Tibshirani Stanford University ASA Bay Area chapter meeting Collaborations with Trevor Hastie, Jerome Friedman, Ryan Tibshirani, Daniela Witten,

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive: Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Lecture 17 May 11, 2018

Lecture 17 May 11, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 17 May 11, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes and Zhimei Ren) 1 Outline Agenda: Topics in selective inference 1. Inference After Model

More information

Covariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20

Covariance test Selective inference. Selective inference. Patrick Breheny. April 18. Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Patrick Breheny April 18 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/20 Introduction In our final lecture on inferential approaches for penalized regression, we will discuss two rather

More information

Selective Inference for Effect Modification

Selective Inference for Effect Modification Inference for Modification (Joint work with Dylan Small and Ashkan Ertefaie) Department of Statistics, University of Pennsylvania May 24, ACIC 2017 Manuscript and slides are available at http://www-stat.wharton.upenn.edu/~qyzhao/.

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R

Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Identify Relative importance of covariates in Bayesian lasso quantile regression via new algorithm in statistical program R Fadel Hamid Hadi Alhusseini Department of Statistics and Informatics, University

More information

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression

Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Jonathan Taylor 1 Richard Lockhart 2 Ryan J. Tibshirani 3 Robert Tibshirani 1 1 Stanford University, 2 Simon Fraser University,

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

High-dimensional data analysis

High-dimensional data analysis High-dimensional data analysis HW3 Reproduce Figure 3.8 3.10 and Table 3.3. (do not need PCR PLS Std Error) Figure 3.8 There is Profiles of ridge coefficients for the prostate cancer example, as the tuning

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Chapter 6. Ensemble Methods

Chapter 6. Ensemble Methods Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction

More information

Selection-adjusted estimation of effect sizes

Selection-adjusted estimation of effect sizes Selection-adjusted estimation of effect sizes with an application in eqtl studies Snigdha Panigrahi 19 October, 2017 Stanford University Selective inference - introduction Selective inference Statistical

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015 Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Regression III: Computing a Good Estimator with Regularization

Regression III: Computing a Good Estimator with Regularization Regression III: Computing a Good Estimator with Regularization -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 Another way to choose the model Let (X 0, Y 0 ) be a new observation

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

High-dimensional statistics and data analysis Course Part I

High-dimensional statistics and data analysis Course Part I and data analysis Course Part I 3 - Computation of p-values in high-dimensional regression Jérémie Bigot Institut de Mathématiques de Bordeaux - Université de Bordeaux Master MAS-MSS, Université de Bordeaux,

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to Biometrika Selective inference with unknown variance via the square-root LASSO arxiv:1504.08031v2 [math.st] 9 Feb 2017 1. Introduction Xiaoying Tian, and Joshua R. Loftus, and Jonathan E.

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

Knockoffs as Post-Selection Inference

Knockoffs as Post-Selection Inference Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:

More information

Lecture 4: Newton s method and gradient descent

Lecture 4: Newton s method and gradient descent Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech

More information

A knockoff filter for high-dimensional selective inference

A knockoff filter for high-dimensional selective inference 1 A knockoff filter for high-dimensional selective inference Rina Foygel Barber and Emmanuel J. Candès February 2016; Revised September, 2017 Abstract This paper develops a framework for testing for associations

More information

Construction of PoSI Statistics 1

Construction of PoSI Statistics 1 Construction of PoSI Statistics 1 Andreas Buja and Arun Kumar Kuchibhotla Department of Statistics University of Pennsylvania September 8, 2018 WHOA-PSI 2018 1 Joint work with "Larry s Group" at Wharton,

More information

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays Hui Zou and Trevor Hastie Department of Statistics, Stanford University December 5, 2003 Abstract We propose the

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

PoSI and its Geometry

PoSI and its Geometry PoSI and its Geometry Andreas Buja joint work with Richard Berk, Lawrence Brown, Kai Zhang, Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania Philadelphia, USA Simon Fraser

More information

arxiv: v2 [math.st] 9 Feb 2017

arxiv: v2 [math.st] 9 Feb 2017 Submitted to the Annals of Statistics PREDICTION ERROR AFTER MODEL SEARCH By Xiaoying Tian Harris, Department of Statistics, Stanford University arxiv:1610.06107v math.st 9 Feb 017 Estimation of the prediction

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Distribution-Free Predictive Inference for Regression

Distribution-Free Predictive Inference for Regression Distribution-Free Predictive Inference for Regression Jing Lei, Max G Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman Department of Statistics, Carnegie Mellon University Abstract We

More information

Tutorial on Linear Regression

Tutorial on Linear Regression Tutorial on Linear Regression HY-539: Advanced Topics on Wireless Networks & Mobile Systems Prof. Maria Papadopouli Evripidis Tzamousis tzamusis@csd.uoc.gr Agenda 1. Simple linear regression 2. Multiple

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Exact Post Model Selection Inference for Marginal Screening

Exact Post Model Selection Inference for Marginal Screening Exact Post Model Selection Inference for Marginal Screening Jason D. Lee Computational and Mathematical Engineering Stanford University Stanford, CA 94305 jdl17@stanford.edu Jonathan E. Taylor Department

More information

Tutz, Binder: Boosting Ridge Regression

Tutz, Binder: Boosting Ridge Regression Tutz, Binder: Boosting Ridge Regression Sonderforschungsbereich 386, Paper 418 (2005) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Boosting Ridge Regression Gerhard Tutz 1 & Harald Binder

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Package covtest. R topics documented:

Package covtest. R topics documented: Package covtest February 19, 2015 Title Computes covariance test for adaptive linear modelling Version 1.02 Depends lars,glmnet,glmpath (>= 0.97),MASS Author Richard Lockhart, Jon Taylor, Ryan Tibshirani,

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Linear Regression. Machine Learning Seyoung Kim. Many of these slides are derived from Tom Mitchell. Thanks!

Linear Regression. Machine Learning Seyoung Kim. Many of these slides are derived from Tom Mitchell. Thanks! Linear Regression Machine Learning 10-601 Seyoung Kim Many of these slides are derived from Tom Mitchell. Thanks! Regression So far, we ve been interested in learning P(Y X) where Y has discrete values

More information

Consistent Selection of Tuning Parameters via Variable Selection Stability

Consistent Selection of Tuning Parameters via Variable Selection Stability Journal of Machine Learning Research 14 2013 3419-3440 Submitted 8/12; Revised 7/13; Published 11/13 Consistent Selection of Tuning Parameters via Variable Selection Stability Wei Sun Department of Statistics

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Multinomial functional regression with application to lameness detection for horses

Multinomial functional regression with application to lameness detection for horses Department of Mathematical Sciences Multinomial functional regression with application to lameness detection for horses Helle Sørensen (helle@math.ku.dk) Joint with Seyed Nourollah Mousavi user! 2015,

More information

arxiv: v3 [stat.me] 11 Sep 2017

arxiv: v3 [stat.me] 11 Sep 2017 Scalable methods for Bayesian selective inference arxiv:703.0676v3 [stat.me] Sep 207 Snigdha Panigrahi, Jonathan Taylor Abstract: Modeled along the truncated approach in Panigrahi et al. (206), selection-adjusted

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

Median Cross-Validation

Median Cross-Validation Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

arxiv: v2 [stat.me] 13 Mar 2015

arxiv: v2 [stat.me] 13 Mar 2015 arxiv:1503.00334v2 [stat.me] 13 Mar 2015 Sparse regression and marginal testing using cluster prototypes Stephen Reid and Robert Tibshirani Abstract We propose a new approach for sparse regression and

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

arxiv: v1 [stat.me] 26 May 2017

arxiv: v1 [stat.me] 26 May 2017 Tractable Post-Selection Maximum Likelihood Inference for the Lasso arxiv:1705.09417v1 [stat.me] 26 May 2017 Amit Meir Department of Statistics University of Washington January 8, 2018 Abstract Mathias

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information