High-dimensional regression with unknown variance

Size: px
Start display at page:

Download "High-dimensional regression with unknown variance"

Transcription

1 High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012

2 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f 1,..., f n ) and σ 2 are unknown we want to estimate f Ex 1 : sparse linear regression f = X β with β sparse in some sense and X R n p with possibly p > n Ex 2 : non-parametric regression f i = F (x i ) with F : X R

3 A plethora of estimators Sparse linear regression Coordinate sparsity: Lasso, Dantzig, Elastic-Net, Exponential-Weighting, Projection on subspaces {V λ : λ Λ} given by PCA, Random Forest, etc. Structured sparsity: Group-lasso, Fused-Lasso, Bayesian estimators, etc Non-parametric regression Spline smoothing, Nadaraya kernel smoothing, kernel ridge estimators, nearest neighbors, L 2 -basis projection, Sparse Additive Models, etc

4 Important practical issues Which estimator should be used? Sparse regression : Lasso? Random-Forest? Exponential-Weighting? Non-parametric regression : Kernel regression? (which kernel?) Spline smoothing? Which tuning parameter? which penalty level for the lasso? which bandwith for kernel regression? etc

5 The objective Difficulties No procedure is universally better than the others A sensible choice of the tuning parameters depends on some unknown characteristics of f (sparsity, smoothness, etc) the unknown variance σ 2. Ideal objective Select the best estimator among a collection {ˆf λ, λ Λ}. (alternative objective: combine at best the estimators)

6 Impact of not knowing the variance

7 Impact of the unknown variance? Case of coordinate-sparse linear regression σ known or k known σ unknown and k unknown Minimax risk n k Ultra-high dimension 2klog(p/k) n Minimax prediction risk over k-sparse signal as a function of k

8 Ultra-high dimensional phenomenon Theorem (N. Verzelen EJS 2012) When σ 2 is unknown, there exist designs X of size n p such that for any estimator β, we have either [ X( β 0 p ) 2] > C 1 nσ 2, or sup E σ 2 >0 sup β 0 k-sparse σ 2 > 0 [ E X( β β 0 ) 2] > C 2 k log ( p ) [ k ( p ) ] exp C 3 k n log σ 2. k Consequence When σ 2 unknown, the best we can expect to have is [ E X( β β 0 ) 2] { C inf X(β β0 ) β 0 log(p)σ 2} β 0 for any σ 2 > 0 and any β 0 fulfilling 1 β 0 0 C n/ log(p).

9 Some generic selection schemes

10 Cross-Validation Hold-out V -fold CV Leave-q-out Penalized empirical lost Penalized log-likelihood (AIC, BIC, etc) Plug-in criteria (with Mallows C p, etc) Slope heuristic Approximation versus complexity penalization LinSelect

11 LinSelect (Y. Baraud, C. G. & S. Huet) Ingredients A collection S of linear spaces (for approximation) A weight function : S R + (measure of complexity) Criterion: Crit( f λ ) = inf S b S where residuals + approximation + complexity [ Y Π S fλ ] 2 f λ Π S fλ 2 + pen (S) σ S 2 Ŝ S, possibly data-dependent, Π S orthogonal projector onto S, pen (S) dim(s) 2 (S) when dim(s) 2 (S) 2n/3, σ 2 S = Y Π S Y 2 2 n dim(s).

12 Non-asymptotic risk bound Assumptions 1. 1 dim(s) 2 (S) 2n/3 for all S S, 2. S S e (S) 1. Theorem (Y. Baraud, C.G., S. Huet) [ E f f bλ 2] [ { C E inf f f λ 2 + inf { f λ Π S fλ 2 + [dim(s) (S)]σ 2}}] λ Λ S S b The bound also holds in deviation.

13 Sparse linear regression

14 Instantiation of LinSelect Estimators Linear regressor: { fλ = X β λ : λ Λ }. (e.g. Lasso, Exponential-Weighting, etc) Approximation and complexity { } S = range(x J ) : J {1,..., p}, 1 J n/(3 log p) (S) = log ( p dim(s)) + log(dim(s)) dim(s) log(p). Subcollection Ŝ ( ) We set Ŝλ = range X supp( ˆβλ ) and define Ŝ = {Ŝλ, λ Λ }, where Λ { } = λ Λ : Ŝ λ S.

15 Case of the Lasso estimators Lasso estimators { β λ = argmin Y Xβ 2 } + 2λ β 1, λ > 0 β Parameter tuning: theory For X with columns normalized to 1 λ σ 2 log(p) Parameter tuning: practice V -fold CV BIC criterion

16 Recent criterions pivotal with respect to the variance l 1 -penalized log-likelihood. (Stadler, Buhlmann, van de Geer) [ β λ LL, σll λ := argmin n log(σ ) + Y Xβ 2 2 β R p,σ >0 2σ 2 + λ β ] 1 σ. l 1 -penalized Huber s loss. (Belloni et al., Antoniadis) [ nσ β λ SR, σsr λ := argmin β R p,σ >0 2 + Y Xβ 2 2 2σ + λ β 1 Equivalent to Square-Root Lasso (introduced before) [ Y Xβ 22 + n λ ] β 1. β SR λ = argmin β R p Sun & Zhang : optimization with a single LARS-call ].

17 The compatibility constant κ[ξ, T ] = min u C(ξ,T ) where C(ξ, T ) = {u : u T c 1 < ξ u T 1 }. { T 1/2 Xu 2 / u T 1 }, Restricted eigenvalue For k = n/(3 log(p)) we set φ = sup { Xu 2 / u 2 : u k -sparse} Theorem for Square-Root Lasso (Sun & Zhang) For λ = 2 2 log(p), if we assume that β 0 0 C 1 κ 2 [4, supp(β 0 )] n log(p), then, with high probability, { } X( β β 0 ) 2 2 inf X(β 0 β) 2 β 0 log(p) 2 + C 2 β 0 κ 2 [4, supp(β)] σ2.

18 The compatibility constant κ[ξ, T ] = min u C(ξ,T ) where C(ξ, T ) = {u : u T c 1 < ξ u T 1 }. { T 1/2 Xu 2 / u T 1 }, Restricted eigenvalue For k = n/(3 log(p)) we set φ = sup { Xu 2 / u 2 : u k -sparse} Theorem for LinSelect Lasso If we assume that β 0 0 C 1 κ 2 [4, supp(β 0 )] n φ log(p), then, with high probability, { } X( β β 0 ) 2 2 C inf X(β 0 β) 2 β 0 log(p) 2 + C 2 β 0 φ κ 2 [4, supp(β)] σ2.

19 Numerical experiments (1/2) Tuning the Lasso 165 examples extracted from the literature each example e is evaluated on the basis of 400 runs Comparison to the oracle β λ procedure quantiles 0% 50% 75% 90% 95% Lasso 10-fold CV Lasso LinSelect Square-Root Lasso ] ] For each procedure l, quantiles of R [ βˆλl ; β 0 /R [ βλ ; β 0, for e = 1,..., 165.

20 Numerical experiments (2/2) Computation time n p 10-fold CV LinSelect Square-Root s 0.21 s 0.18 s s 0.43 s 0.4 s s 11 s 6.3 s Packages: enet for 10-fold CV and LinSelect lars for Square-Root Lasso (procedure of Sun & Zhang)

21 Non-parametric regression

22 An important class of estimators Linear estimators : fλ = A λ Y with A λ R n n spline smoothing or kernel ridge estimators with smoothing parameter λ R + Nadaraya estimators A λ with smoothing parameter λ R + λ-nearest neighbors, λ {1,..., k} L 2 -basis projection (on the λ first elements) etc Selection criterions (with σ 2 unknown) Cross-Validation schemes (including GCV) Mallows C L + plug-in / slope heuristic LinSelect

23 An important class of estimators Linear estimators : fλ = A λ Y with A λ R n n spline smoothing or kernel ridge estimators with smoothing parameter λ R + Nadaraya estimators A λ with smoothing parameter λ R + λ-nearest neighbors, λ {1,..., k} L 2 -basis projection (on the λ first elements) etc Selection criterions (with σ 2 unknown) Cross-Validation schemes (including GCV) Mallows C L + plug-in / slope heuristic LinSelect

24 Slope heuristic (Arlot & Bach) Procedure for f λ = A λ Y 1. compute λ 0 (σ ) = argmin λ { Y f } λ 2 + σ Tr(2A λ A λ A λ) 2. select σ such that Tr(Aˆλ 0 (ˆσ) ) [n/10, n/3] 3. select λ = argmin λ { Y f } λ σ 2 Tr(A λ ). Main assumptions A λ shrinkage or averaging matrix (covers all classics) Bias assumption : λ 1, Tr(A λ1 ) n and (I A λ1 )f 2 σ 2 n log(n) Theorem (Arlot & Bach) With high proba: fˆλ f 2 (1 + ε) inf λ f λ f 2 + C ε 1 log(n)σ 2

25 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), where A + λ is the inverse of the of A λ to range(a λ ) range(a λ) Π λ is induced by the projection onto range(a λ ) Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have [ E f f 2] [ C inf E f λ f 2] λ Λ

26 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), Remark: when A λ is symmetric positive definite, Sλ k is spanned by the k first eigenvectors of A λ. Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have [ E f f 2] [ C inf E f λ f 2] λ Λ

27 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), Remark: when A λ is symmetric positive definite, Sλ k is spanned by the k first eigenvectors of A λ. Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have [ E f f 2] [ C inf E f λ f 2] λ Λ

28 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), Remark: when A λ is symmetric positive definite, Sλ k is spanned by the k first eigenvectors of A λ. Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have with high proba f f 2 C inf λ Λ f λ f 2 + log(n)σ 2

29 A review High-dimensional regression with unknown variance C.G., S. Huet & N. Verzelen arxiv: (including coordinate-sparsity, group-sparsity, variation-sparsity and multivariate regression)

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees

A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Journal of Machine Learning Research 17 (2016) 1-20 Submitted 11/15; Revised 9/16; Published 12/16 A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Michaël Chichignoud

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA

GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Submitted to the Annals of Statistics GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Let Y be a Gaussian

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Risk estimation for high-dimensional lasso regression

Risk estimation for high-dimensional lasso regression Risk estimation for high-dimensional lasso regression arxiv:161522v1 [stat.me] 4 Feb 216 Darren Homrighausen Department of Statistics Colorado State University darrenho@stat.colostate.edu Daniel J. McDonald

More information

High dimensional thresholded regression and shrinkage effect

High dimensional thresholded regression and shrinkage effect J. R. Statist. Soc. B (014) 76, Part 3, pp. 67 649 High dimensional thresholded regression and shrinkage effect Zemin Zheng, Yingying Fan and Jinchi Lv University of Southern California, Los Angeles, USA

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Kernel change-point detection

Kernel change-point detection 1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

The risk of machine learning

The risk of machine learning / 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Statistics for high-dimensional data

Statistics for high-dimensional data Statistics for high-dimensional data Vincent Rivoirard Université Paris-Dauphine Vincent Rivoirard (Université Paris-Dauphine) Statistics for high-dimensional data 1 / 105 Introduction High-dimensional

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Oracle Inequalities for High-dimensional Prediction

Oracle Inequalities for High-dimensional Prediction Bernoulli arxiv:1608.00624v2 [math.st] 13 Mar 2018 Oracle Inequalities for High-dimensional Prediction JOHANNES LEDERER 1, LU YU 2, and IRINA GAYNANOVA 3 1 Department of Statistics University of Washington

More information

arxiv: v1 [econ.em] 16 Jan 2019

arxiv: v1 [econ.em] 16 Jan 2019 lassopack: Model selection and prediction with regularized regression in Stata arxiv:1901.05397v1 [econ.em] 16 Jan 2019 Achim Ahrens The Economic and Social Research Institute Dublin, Ireland achim.ahrens@esri.ie

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Introduction to Machine Learning and Cross-Validation

Introduction to Machine Learning and Cross-Validation Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection Daniel Coutinho Pedro Souza (Orientador) Marcelo Medeiros (Co-orientador) November 30, 2017 Daniel Martins Coutinho

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Variable selection in high-dimensional regression problems

Variable selection in high-dimensional regression problems Variable selection in high-dimensional regression problems Jan Polish Academy of Sciences and Warsaw University of Technology Based on joint research with P. Pokarowski, A. Prochenka, P. Teisseyre and

More information

Package Grace. R topics documented: April 9, Type Package

Package Grace. R topics documented: April 9, Type Package Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

arxiv: v2 [math.st] 15 Sep 2015

arxiv: v2 [math.st] 15 Sep 2015 χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Sparsity oracle inequalities for the Lasso

Sparsity oracle inequalities for the Lasso Electronic Journal of Statistics Vol. 1 (007) 169 194 ISSN: 1935-754 DOI: 10.114/07-EJS008 Sparsity oracle inequalities for the Lasso Florentina Bunea Department of Statistics, Florida State University

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Iassopack: Model Selection and Prediction with Regularized Regression in Stata

Iassopack: Model Selection and Prediction with Regularized Regression in Stata DISCUSSION PAPER SERIES IZA DP No. 12081 Iassopack: Model Selection and Prediction with Regularized Regression in Stata Achim Ahrens Christian B. Hansen Mark E. Schaffer JANUARY 2019 DISCUSSION PAPER SERIES

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Theoretical Exercises Statistical Learning, 2009

Theoretical Exercises Statistical Learning, 2009 Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

arxiv: v1 [math.st] 10 May 2009

arxiv: v1 [math.st] 10 May 2009 ESTIMATOR SELECTION WITH RESPECT TO HELLINGER-TYPE RISKS ariv:0905.1486v1 math.st 10 May 009 YANNICK BARAUD Abstract. We observe a random measure N and aim at estimating its intensity s. This statistical

More information

Lecture 24 May 30, 2018

Lecture 24 May 30, 2018 Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information