High-dimensional regression with unknown variance
|
|
- Sheena Thornton
- 5 years ago
- Views:
Transcription
1 High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012
2 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f 1,..., f n ) and σ 2 are unknown we want to estimate f Ex 1 : sparse linear regression f = X β with β sparse in some sense and X R n p with possibly p > n Ex 2 : non-parametric regression f i = F (x i ) with F : X R
3 A plethora of estimators Sparse linear regression Coordinate sparsity: Lasso, Dantzig, Elastic-Net, Exponential-Weighting, Projection on subspaces {V λ : λ Λ} given by PCA, Random Forest, etc. Structured sparsity: Group-lasso, Fused-Lasso, Bayesian estimators, etc Non-parametric regression Spline smoothing, Nadaraya kernel smoothing, kernel ridge estimators, nearest neighbors, L 2 -basis projection, Sparse Additive Models, etc
4 Important practical issues Which estimator should be used? Sparse regression : Lasso? Random-Forest? Exponential-Weighting? Non-parametric regression : Kernel regression? (which kernel?) Spline smoothing? Which tuning parameter? which penalty level for the lasso? which bandwith for kernel regression? etc
5 The objective Difficulties No procedure is universally better than the others A sensible choice of the tuning parameters depends on some unknown characteristics of f (sparsity, smoothness, etc) the unknown variance σ 2. Ideal objective Select the best estimator among a collection {ˆf λ, λ Λ}. (alternative objective: combine at best the estimators)
6 Impact of not knowing the variance
7 Impact of the unknown variance? Case of coordinate-sparse linear regression σ known or k known σ unknown and k unknown Minimax risk n k Ultra-high dimension 2klog(p/k) n Minimax prediction risk over k-sparse signal as a function of k
8 Ultra-high dimensional phenomenon Theorem (N. Verzelen EJS 2012) When σ 2 is unknown, there exist designs X of size n p such that for any estimator β, we have either [ X( β 0 p ) 2] > C 1 nσ 2, or sup E σ 2 >0 sup β 0 k-sparse σ 2 > 0 [ E X( β β 0 ) 2] > C 2 k log ( p ) [ k ( p ) ] exp C 3 k n log σ 2. k Consequence When σ 2 unknown, the best we can expect to have is [ E X( β β 0 ) 2] { C inf X(β β0 ) β 0 log(p)σ 2} β 0 for any σ 2 > 0 and any β 0 fulfilling 1 β 0 0 C n/ log(p).
9 Some generic selection schemes
10 Cross-Validation Hold-out V -fold CV Leave-q-out Penalized empirical lost Penalized log-likelihood (AIC, BIC, etc) Plug-in criteria (with Mallows C p, etc) Slope heuristic Approximation versus complexity penalization LinSelect
11 LinSelect (Y. Baraud, C. G. & S. Huet) Ingredients A collection S of linear spaces (for approximation) A weight function : S R + (measure of complexity) Criterion: Crit( f λ ) = inf S b S where residuals + approximation + complexity [ Y Π S fλ ] 2 f λ Π S fλ 2 + pen (S) σ S 2 Ŝ S, possibly data-dependent, Π S orthogonal projector onto S, pen (S) dim(s) 2 (S) when dim(s) 2 (S) 2n/3, σ 2 S = Y Π S Y 2 2 n dim(s).
12 Non-asymptotic risk bound Assumptions 1. 1 dim(s) 2 (S) 2n/3 for all S S, 2. S S e (S) 1. Theorem (Y. Baraud, C.G., S. Huet) [ E f f bλ 2] [ { C E inf f f λ 2 + inf { f λ Π S fλ 2 + [dim(s) (S)]σ 2}}] λ Λ S S b The bound also holds in deviation.
13 Sparse linear regression
14 Instantiation of LinSelect Estimators Linear regressor: { fλ = X β λ : λ Λ }. (e.g. Lasso, Exponential-Weighting, etc) Approximation and complexity { } S = range(x J ) : J {1,..., p}, 1 J n/(3 log p) (S) = log ( p dim(s)) + log(dim(s)) dim(s) log(p). Subcollection Ŝ ( ) We set Ŝλ = range X supp( ˆβλ ) and define Ŝ = {Ŝλ, λ Λ }, where Λ { } = λ Λ : Ŝ λ S.
15 Case of the Lasso estimators Lasso estimators { β λ = argmin Y Xβ 2 } + 2λ β 1, λ > 0 β Parameter tuning: theory For X with columns normalized to 1 λ σ 2 log(p) Parameter tuning: practice V -fold CV BIC criterion
16 Recent criterions pivotal with respect to the variance l 1 -penalized log-likelihood. (Stadler, Buhlmann, van de Geer) [ β λ LL, σll λ := argmin n log(σ ) + Y Xβ 2 2 β R p,σ >0 2σ 2 + λ β ] 1 σ. l 1 -penalized Huber s loss. (Belloni et al., Antoniadis) [ nσ β λ SR, σsr λ := argmin β R p,σ >0 2 + Y Xβ 2 2 2σ + λ β 1 Equivalent to Square-Root Lasso (introduced before) [ Y Xβ 22 + n λ ] β 1. β SR λ = argmin β R p Sun & Zhang : optimization with a single LARS-call ].
17 The compatibility constant κ[ξ, T ] = min u C(ξ,T ) where C(ξ, T ) = {u : u T c 1 < ξ u T 1 }. { T 1/2 Xu 2 / u T 1 }, Restricted eigenvalue For k = n/(3 log(p)) we set φ = sup { Xu 2 / u 2 : u k -sparse} Theorem for Square-Root Lasso (Sun & Zhang) For λ = 2 2 log(p), if we assume that β 0 0 C 1 κ 2 [4, supp(β 0 )] n log(p), then, with high probability, { } X( β β 0 ) 2 2 inf X(β 0 β) 2 β 0 log(p) 2 + C 2 β 0 κ 2 [4, supp(β)] σ2.
18 The compatibility constant κ[ξ, T ] = min u C(ξ,T ) where C(ξ, T ) = {u : u T c 1 < ξ u T 1 }. { T 1/2 Xu 2 / u T 1 }, Restricted eigenvalue For k = n/(3 log(p)) we set φ = sup { Xu 2 / u 2 : u k -sparse} Theorem for LinSelect Lasso If we assume that β 0 0 C 1 κ 2 [4, supp(β 0 )] n φ log(p), then, with high probability, { } X( β β 0 ) 2 2 C inf X(β 0 β) 2 β 0 log(p) 2 + C 2 β 0 φ κ 2 [4, supp(β)] σ2.
19 Numerical experiments (1/2) Tuning the Lasso 165 examples extracted from the literature each example e is evaluated on the basis of 400 runs Comparison to the oracle β λ procedure quantiles 0% 50% 75% 90% 95% Lasso 10-fold CV Lasso LinSelect Square-Root Lasso ] ] For each procedure l, quantiles of R [ βˆλl ; β 0 /R [ βλ ; β 0, for e = 1,..., 165.
20 Numerical experiments (2/2) Computation time n p 10-fold CV LinSelect Square-Root s 0.21 s 0.18 s s 0.43 s 0.4 s s 11 s 6.3 s Packages: enet for 10-fold CV and LinSelect lars for Square-Root Lasso (procedure of Sun & Zhang)
21 Non-parametric regression
22 An important class of estimators Linear estimators : fλ = A λ Y with A λ R n n spline smoothing or kernel ridge estimators with smoothing parameter λ R + Nadaraya estimators A λ with smoothing parameter λ R + λ-nearest neighbors, λ {1,..., k} L 2 -basis projection (on the λ first elements) etc Selection criterions (with σ 2 unknown) Cross-Validation schemes (including GCV) Mallows C L + plug-in / slope heuristic LinSelect
23 An important class of estimators Linear estimators : fλ = A λ Y with A λ R n n spline smoothing or kernel ridge estimators with smoothing parameter λ R + Nadaraya estimators A λ with smoothing parameter λ R + λ-nearest neighbors, λ {1,..., k} L 2 -basis projection (on the λ first elements) etc Selection criterions (with σ 2 unknown) Cross-Validation schemes (including GCV) Mallows C L + plug-in / slope heuristic LinSelect
24 Slope heuristic (Arlot & Bach) Procedure for f λ = A λ Y 1. compute λ 0 (σ ) = argmin λ { Y f } λ 2 + σ Tr(2A λ A λ A λ) 2. select σ such that Tr(Aˆλ 0 (ˆσ) ) [n/10, n/3] 3. select λ = argmin λ { Y f } λ σ 2 Tr(A λ ). Main assumptions A λ shrinkage or averaging matrix (covers all classics) Bias assumption : λ 1, Tr(A λ1 ) n and (I A λ1 )f 2 σ 2 n log(n) Theorem (Arlot & Bach) With high proba: fˆλ f 2 (1 + ε) inf λ f λ f 2 + C ε 1 log(n)σ 2
25 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), where A + λ is the inverse of the of A λ to range(a λ ) range(a λ) Π λ is induced by the projection onto range(a λ ) Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have [ E f f 2] [ C inf E f λ f 2] λ Λ
26 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), Remark: when A λ is symmetric positive definite, Sλ k is spanned by the k first eigenvectors of A λ. Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have [ E f f 2] [ C inf E f λ f 2] λ Λ
27 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), Remark: when A λ is symmetric positive definite, Sλ k is spanned by the k first eigenvectors of A λ. Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have [ E f f 2] [ C inf E f λ f 2] λ Λ
28 LinSelect Approximation spaces Ŝ = { } λ Sλ 1,..., S n/2 λ where Sλ k is spanned by the k last right-singular vectors of A + λ Π λ : range(a λ ) range(a λ ), Remark: when A λ is symmetric positive definite, Sλ k is spanned by the k first eigenvectors of A λ. Weight (S) = β ( 1 + dim(s) ) with β > 0 such that S e (S) 1. Corollary When σ n/2 (A + λ Π λ ) 1/2 for all λ Λ, we have with high proba f f 2 C inf λ Λ f λ f 2 + log(n)σ 2
29 A review High-dimensional regression with unknown variance C.G., S. Huet & N. Verzelen arxiv: (including coordinate-sparsity, group-sparsity, variation-sparsity and multivariate regression)
Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationSliced Inverse Regression
Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationModel selection theory: a tutorial with applications to learning
Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical
More informationA Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees
Journal of Machine Learning Research 17 (2016) 1-20 Submitted 11/15; Revised 9/16; Published 12/16 A Practical Scheme and Fast Algorithm to Tune the Lasso With Optimality Guarantees Michaël Chichignoud
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationGAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA
Submitted to the Annals of Statistics GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Let Y be a Gaussian
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationRisk estimation for high-dimensional lasso regression
Risk estimation for high-dimensional lasso regression arxiv:161522v1 [stat.me] 4 Feb 216 Darren Homrighausen Department of Statistics Colorado State University darrenho@stat.colostate.edu Daniel J. McDonald
More informationHigh dimensional thresholded regression and shrinkage effect
J. R. Statist. Soc. B (014) 76, Part 3, pp. 67 649 High dimensional thresholded regression and shrinkage effect Zemin Zheng, Yingying Fan and Jinchi Lv University of Southern California, Los Angeles, USA
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationAsymptotic Equivalence of Regularization Methods in Thresholded Parameter Space
Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space Jinchi Lv Data Sciences and Operations Department Marshall School of Business University of Southern California http://bcf.usc.edu/
More informationLeast Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006
Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationKernel change-point detection
1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationAlternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods
Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationLog Covariance Matrix Estimation
Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationThe risk of machine learning
/ 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationStatistics for high-dimensional data
Statistics for high-dimensional data Vincent Rivoirard Université Paris-Dauphine Vincent Rivoirard (Université Paris-Dauphine) Statistics for high-dimensional data 1 / 105 Introduction High-dimensional
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationOracle Inequalities for High-dimensional Prediction
Bernoulli arxiv:1608.00624v2 [math.st] 13 Mar 2018 Oracle Inequalities for High-dimensional Prediction JOHANNES LEDERER 1, LU YU 2, and IRINA GAYNANOVA 3 1 Department of Statistics University of Washington
More informationarxiv: v1 [econ.em] 16 Jan 2019
lassopack: Model selection and prediction with regularized regression in Stata arxiv:1901.05397v1 [econ.em] 16 Jan 2019 Achim Ahrens The Economic and Social Research Institute Dublin, Ireland achim.ahrens@esri.ie
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationLikelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science
1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationThe Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection
The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection Daniel Coutinho Pedro Souza (Orientador) Marcelo Medeiros (Co-orientador) November 30, 2017 Daniel Martins Coutinho
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationEmpirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods
Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationVariable selection in high-dimensional regression problems
Variable selection in high-dimensional regression problems Jan Polish Academy of Sciences and Warsaw University of Technology Based on joint research with P. Pokarowski, A. Prochenka, P. Teisseyre and
More informationPackage Grace. R topics documented: April 9, Type Package
Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationarxiv: v2 [math.st] 15 Sep 2015
χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationSparsity oracle inequalities for the Lasso
Electronic Journal of Statistics Vol. 1 (007) 169 194 ISSN: 1935-754 DOI: 10.114/07-EJS008 Sparsity oracle inequalities for the Lasso Florentina Bunea Department of Statistics, Florida State University
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationIassopack: Model Selection and Prediction with Regularized Regression in Stata
DISCUSSION PAPER SERIES IZA DP No. 12081 Iassopack: Model Selection and Prediction with Regularized Regression in Stata Achim Ahrens Christian B. Hansen Mark E. Schaffer JANUARY 2019 DISCUSSION PAPER SERIES
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationA Confidence Region Approach to Tuning for Variable Selection
A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationTheoretical results for lasso, MCP, and SCAD
Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationTheoretical Exercises Statistical Learning, 2009
Theoretical Exercises Statistical Learning, 2009 Niels Richard Hansen April 20, 2009 The following exercises are going to play a central role in the course Statistical learning, block 4, 2009. The exercises
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationarxiv: v1 [math.st] 10 May 2009
ESTIMATOR SELECTION WITH RESPECT TO HELLINGER-TYPE RISKS ariv:0905.1486v1 math.st 10 May 009 YANNICK BARAUD Abstract. We observe a random measure N and aim at estimating its intensity s. This statistical
More informationLecture 24 May 30, 2018
Stats 3C: Theory of Statistics Spring 28 Lecture 24 May 3, 28 Prof. Emmanuel Candes Scribe: Martin J. Zhang, Jun Yan, Can Wang, and E. Candes Outline Agenda: High-dimensional Statistical Estimation. Lasso
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More information