Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Size: px
Start display at page:

Download "Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)"

Transcription

1 Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, / 36

2 Outlines 2 / 36

3 Motivation Variable selection is fundamental to high-dimensional statistical modeling Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process The theoretical properties for stepwise deletion and subset selection are somewhat hard to understand The most severe drawback of the best subset variable selection is its lack of stability as analyzed by Breiman (1996) 3 / 36

4 Overview of Proposed Method Penalized likelihood approaches are proposed, which simultaneously select variables and estimate coefficients Penalty functions are symmetric, nonconcave on (0, ), and have singularities at the origin to produce sparse solutions Furthermore the penalty functions are bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions A new algorithm is proposed for optimizing penalized likelihood functions, which is widely applicable Rates of convergence of the proposed penalized likelihood estimators are established With proper choice of regularization parameters, the proposed estimators perform as well as the oracle procedure (as if the correct submodel were known) 4 / 36

5 Penalized Least Squares and Subset Selection Linear regression model: y = X β + ɛ, where y is n 1 and X is n d. Assume for now that the columns of X are orthonormal. Denote z = X T y. A form of the penalized least squares is 1 y X 2 β 2 + λ d j=1 p j( β j ) = 1 y XX T y d 2 2 j=1 (z j β j ) 2 + λ d j=1 p j( β j ). p j (.) are not necessarily the same for all j; but we will assume they are the same for simplicity. From now on, denote λp(. ) by p λ (. ). The minimization problem is equivalent to minimize componentwise, which leads us to consider the penalized least squares problem 1 2 (z θ)2 + p λ ( θ ). (2.3) 5 / 36

6 Penalized Least Squares and Subset Selection Cont d By taking the hard thresholding penalty function p λ ( θ ) = λ 2 ( θ λ) 2 I ( θ < λ), the hard thresholding rule is obtained as ˆθ = zi ( z > λ). This coincides with the best subset selection and stepwise deletion and addition for orthonormal designs! 6 / 36

7 What Makes a Good Penalty Function? Unbiasedness: The resulting estimator is nearly unbiased when the true unknown parameter is large to avoid unnecessary modeling bias Sparsity: The resulting estimator is a thresholding rule, which automatically sets small estimated coefficients to zero to reduce model complexity Continuity: The resulting estimator is continuous in data z to avoid instability in model prediction 7 / 36

8 Sufficient Conditions for Good Penalty Functions p λ( θ ) = 0 for large θ Unbiasedness Intuition: The first order derivative of (2.3) w.r.t. θ is sgn(θ)( θ + p λ( θ )) z. When p λ( θ ) = 0 for large θ, the resulting estimator is z when z is sufficiently large, which is approximately unbiased. The minimum of the function θ + p λ( θ ) is positive Sparsity The minimum of the function θ + p λ( θ ) is attained at 0 Continuity Intuition: See nextpage 8 / 36

9 Sufficient Conditions for Good Penalty Functions Cont d when z < min θ 0 { θ + p λ( θ )}, the derivative of (2.3) is positive for all positive θ and negative for all negative θ ˆθ = 0; when z > min θ 0 { θ + p λ( θ )}, two crossings may exist as shown, the larger one is a penalized least squares estimator. further, this implies that a sufficient and necessary condition for continuity is to require the minimum to be attained at 0 9 / 36

10 Penalty Function of Interest hard thresholding penalty: p λ ( θ ) = λ 2 ( θ λ) 2 I ( θ < λ). soft thresholding penalty (L 1 penalty, LASSO): p λ ( θ ) = λ θ. L 2 penalty (ridge regression): p λ ( θ ) = λ θ 2. SCAD penalty (Smoothly Clipped Absolute Deviation penalty): p λ(θ) = λ{i (θ λ) + (aλ θ)+ I (θ > λ)} for some a > 2 and θ > 0. (a 1)λ Note: As for the three properties that make a penalty function good, only SCAD possesses all (and therefore is advocated by the authors); whereas all the other penalty functions are unable to satisfy three sufficient conditions simultaneously. 10 / 36

11 Penalty Function of Interest Cont d 11 / 36

12 Penalty Function of Interest Cont d When the design matrix X is orthonormal, closed form solutions are obtained as follows. hard thresholding rule: ˆθ = zi ( z > λ) LASSO (L 1 penalty): ˆθ = sgn(z)( z λ) + SCAD: sgn(z)( z λ) + ˆθ = {(a 1)z sgn(z)aλ}/(a 2) z when z 2λ when 2λ < z aλ when z > aλ 12 / 36

13 Penalty Function of Interest Cont d 13 / 36

14 More on SCAD SCAD penalty has two unknown parameters λ and a. They could be chosen by searching grids using cross-validation (CV) or generalized cross-validation (GCV), but that is computation intensive. The authors applied Bayesian risk analysis to help choose a, in that Bayesian risks seem not very sensitive to the values of a. It is found that a = 3.7 works similarly to that chosen by GCV. 14 / 36

15 Performance Comparison Refer to the last graph on previous slide. SCAD performs favorably compared with the other two thresholding rules. It is actually expected to perform the best, given that it retains all the good mathematical properties of the other two penalty functions. 15 / 36

16 Penalized Likelihood Formulation From now on, assume that X is standardized so that each column has mean 0 and variance 1, and is no longer orthonormal. A form of penalized least squares is 1 (y X 2 β)t (y X β) + n d j=1 p λ( β j ). (3.1) A form of penalized robust least squares is n i=1 ψ( y i x i β ) + n d j=1 p λ( β j ). (3.2) A form of penalized likelihood for generalized linear models is n i=1 l i(g(xi T β), y i ) + n d j=1 p λ( β j ). (3.3) 16 / 36

17 Sampling Properties and Oracle Properties Let β 0 = (β 10,..., β d0 ) T = (β T 10, β T 20) T. Assume that β 20 = 0. Let I (β 0 ) be the Fisher information matrix and I 1(β 10, 0) be the Fisher information knowing β 20 = 0. General setting: Let V i = (X i, Y i ), i = 1,..., n. Let L(β) be the log-likelihood function of observations V 1,..., V n and let Q(β) be the penalized likelihood function L(β) n d j=1 p λ( β j ). 17 / 36

18 Sampling Properties and Oracle Properties Cont d Regularity Conditions (A) The observations V i are iid with density f (V, β). f (V, β) has a common support and the model is identifiable. Furthermore, the following holds: E β ( logf (V,β) β j ) = 0 for j = 1,..., d and I jk (β) = E β ( β j logf (V, β) β k logf (V, β)) = E β ( (B) The Fisher information matrix I (β) = E{( β 2 β j β k logf (V, β)). logf (V, β))( β logf (V, β))t } is finite and positive definite at β = β 0. (C) There exists an open subset ω of Ω that contains the true parameter point β 0 such that for almost all V the density f (V, β) admits all third derivatives for all β ω. Furthermore, there exists functions M jkl such that 3 β i β k β l logf (V, β) M jkl (V ) for all β ω, where m j kl = E β (M jkl (V )) < for j, k, l. 18 / 36

19 Sampling Properties and Oracle Properties Cont d Theorem 1. Let V 1,..., V n be iid, each with a density f (V, β) that satisfies conditions (A)-(C). If max{ p λ n ( β j0 ) : β j0 0} 0, then there exists a local maximizer ˆβ of Q(β) such that ˆβ β 0 = O p(n 1/2 + a n), where a n = max{ p λ n ( β j0 ) : β j0 0}. Note: For hard thresholding and SCAD penalty functions, λ n 0 implies a n = 0, therefore the penalized likelihood estimator is root-n consistent. 19 / 36

20 Sampling Properties and Oracle Properties Cont d Lemma 1. Let V 1,..., V n be iid, each with a density f (V, β) that satisfies conditions (A)-(C). Assume that lim inf n lim inf θ 0 +p λ n (θ)/λ n > 0 (3.5). If λ n 0 and nλ n ad n, then with probability tending to 1, for any given β 1 satisfying β 1 β 10 = O p(n 1/2 ) and any constant C, Q{(β1 T, 0) T } = max β2 Cn 1/2Q{(βT 1, β2 T ) T } Aside: Denote Σ = diag{p λ n (β 10),..., p λ n (β s0)} and b = (p λ n (β 10)sgn(β 10),..., p λ n (β s0)sgn(β s0)) T, where s is the number of components of β 10. Theorem 2 (Oracle Property). Let V 1,..., V n be iid, each with a density f (V, β) that satisfies conditions (A)-(C). Assume that the penalty function p λn ( θ ) satisfies condition (3.5). If λ n 0 and nλ n, then with probability tending to 1, the root-n consistent local maximizers T T ˆβ = ( ˆβ 1, ˆβ 2 ) T in Theorem 1 must satisfy: (a) Sparsity: ˆβ 2 = 0 (b) Asymptotic normality: n(i1(β 10) + Σ){ ˆβ 1 β 10 + (I 1(β 10) + Σ) 1 b} N{0, I 1(β 10)} 20 / 36

21 Sampling Properties and Oracle Properties Cont d Lemma 1. Let V 1,..., V n be iid, each with a density f (V, β) that satisfies conditions (A)-(C). Assume that lim inf n lim inf θ 0 +p λ n (θ)/λ n > 0 (3.5). If λ n 0 and nλ n ad n, then with probability tending to 1, for any given β 1 satisfying β 1 β 10 = O p(n 1/2 ) and any constant C, Q{(β1 T, 0) T } = max β2 Cn 1/2Q{(βT 1, β2 T ) T } Aside: Denote Σ = diag{p λ n (β 10),..., p λ n (β s0)} and b = (p λ n (β 10)sgn(β 10),..., p λ n (β s0)sgn(β s0)) T, where s is the number of components of β 10. Theorem 2 (Oracle Property). Let V 1,..., V n be iid, each with a density f (V, β) that satisfies conditions (A)-(C). Assume that the penalty function p λn ( θ ) satisfies condition (3.5). If λ n 0 and nλ n, then with probability tending to 1, the root-n consistent local maximizers T T ˆβ = ( ˆβ 1, ˆβ 2 ) T in Theorem 1 must satisfy: (a) Sparsity: ˆβ 2 = 0 (b) Asymptotic normality: n(i1(β 10) + Σ){ ˆβ 1 β 10 + (I 1(β 10) + Σ) 1 b} N{0, I 1(β 10)} 21 / 36

22 Sampling Properties and Oracle Properties Cont d Lemma 1. Let V 1,..., V n be iid, each with a density f (V, β) that satisfies conditions (A)-(C). Assume that lim inf n lim inf θ 0 +p λ n (θ)/λ n > 0 (3.5). If λ n 0 and nλ n ad n, then with probability tending to 1, for any given β 1 satisfying β 1 β 10 = O p(n 1/2 ) and any constant C, Q{(β1 T, 0) T } = max β2 Cn 1/2Q{(βT 1, β2 T ) T } Aside: Denote Σ = diag{p λ n (β 10),..., p λ n (β s0)} and b = (p λ n (β 10)sgn(β 10),..., p λ n (β s0)sgn(β s0)) T, where s is the number of components of β 10. Theorem 2 (Oracle Property). Let V 1,..., V n be iid, each with a density f (V, β) that satisfies conditions (A)-(C). Assume that the penalty function p λn ( θ ) satisfies condition (3.5). If λ n 0 and nλ n, then with probability tending to 1, the root-n consistent local maximizers T T ˆβ = ( ˆβ 1, ˆβ 2 ) T in Theorem 1 must satisfy: (a) Sparsity: ˆβ 2 = 0 (b) Asymptotic normality: n(i1(β 10) + Σ){ ˆβ 1 β 10 + (I 1(β 10) + Σ) 1 b} N{0, I 1(β 10)} 22 / 36

23 Sampling Properties and Oracle Properties Cont d Remarks: (1)For the hard and SCAD thresholding penalty functions, if λ n 0, a n = 0. Hence, by Thm 2, when nλ n, their corresponding penalized likelihood estimators possess the oracle property and perform as well as the maximum likelihood estimates for estimating β 1 knowing β 2 = 0. (2)However, for LASSO (L 1) penalty, a n = λ n. Hence, the root-n consistency requires λ n = O p(n 1/2 ). On the other hand, the oracle property requires nλn. These two conditions cannot be satisfied simultaneously. The authors conjecture that the oracle property does not hold for LASSO. (3)For L q penalty with q < 1, the oracle property continues to hold with suitable choice of λ n. 23 / 36

24 Tibshirani (1996) proposed an algorithm for solving constrained least squares problem of LASSO Fu (1998) provided a shooting algorithm for LASSO LASSO2 submitted by Berwin Turlach at Statlib Here the authors proposed a unified algorithm, which optimizes problems (3.1) (3.2) (3.3) via local quadratic approximations. 24 / 36

25 The Algorithm Re-written (3.1) (3.2) (3.3) as l(β) + n d j=1 p λ( β j ) (3.6), where l(β) is a general loss function. Given an initial value β 0 that is close to the minimizer of (3.6). If β j0 is very close to 0, then set ˆβ j = 0; otherwise they can be locally approximated by a quadratic function as [p λ ( β j )] = p λ( β j )sgn(β j ) {p λ( β j0 )/ β j0 }β j. In other words, p λ ( β j ) p λ (β j0 ) {p λ( β j0 )/ β j0 }(β 2 j β 2 j0), for β j β j0. ψ( y x T β ) in robust penalized least squares case can be approximated by {ψ( y x T β 0 )/(y x T β 0) 2 }(y x T β) 2. assume the log-likelihood function is smooth, so can be locally approximated by a quadratic function. Newton-Raphson algorithm applies. The updating equation is β 1 = β 0 ( 2 l(β 0) + nσ λ (β 0)) 1 ( l(β 0) + nu λ (β 0)), where Σ λ (β 0) = diag{p λ( β 10 )/ β 10,..., p λ( β d0 )/ β d0 }, U λ (β 0) = Σ λ (β 0). 25 / 36

26 Standard Error Formula Obtained directly from the algorithm. Sandwich formula: cov( ˆ ˆβ 1) = ( 2 l( ˆβ 1) + nσ λ ( ˆβ 1)) 1 cov{ l( ˆ ˆβ 1)}( 2 l( ˆβ 1) + nσ λ ( ˆβ 1)) 1 for nonvanishing component of β. 26 / 36

27 Prediction and Model Error Two regression situations: X random and X controlled For ease of presentation, consider only X-random case PE(ˆµ) = E(Y ˆµx) 2 = E(Y E(Y x)) 2 + E(Y x ˆµ(x)) 2 The second component is the model error 27 / 36

28 Simulation Results 28 / 36

29 Simulation Results Cont d Note: SD = median absolute deviation of ˆβ 1/ SD m=median of ˆσ( ˆβ 1) SD mad =median absolute deviation of ˆσ( ˆβ 1) 29 / 36

30 Simulation Results Cont d 30 / 36

31 Real Data Analysis 31 / 36

32 A family of penalty functions was introduced, of which SCAD performs favorably Rates of convergence of the proposed penalized likelihood estimators were established With proper choice of regularization parameters, the estimators perform as well as the oracle procedure The unified algorithm was demonstrated effective and standard errors were estimated with good accuracy The proposed approach can be applied to various statistical contexts 32 / 36

33 Proofs 33 / 36

34 Proofs 34 / 36

35 Proofs 35 / 36

36 The End! 36 / 36

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Full terms and conditions of use:

Full terms and conditions of use: This article was downloaded by: [York University Libraries] On: 2 March 203, At: 9:24 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 07294 Registered office:

More information

VARIABLE SELECTION IN ROBUST LINEAR MODELS

VARIABLE SELECTION IN ROBUST LINEAR MODELS The Pennsylvania State University The Graduate School Eberly College of Science VARIABLE SELECTION IN ROBUST LINEAR MODELS A Thesis in Statistics by Bo Kai c 2008 Bo Kai Submitted in Partial Fulfillment

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Genomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.

Genomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs. Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Genomic Data German Cancer Research Center, Heidelberg, Germany June 16, 6 Diagnostics signatures

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009)

A UNIFIED APPROACH TO MODEL SELECTION AND SPARS. REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) A UNIFIED APPROACH TO MODEL SELECTION AND SPARSE RECOVERY USING REGULARIZED LEAST SQUARES by Jinchi Lv and Yingying Fan The annals of Statistics (2009) Mar. 19. 2010 Outline 1 2 Sideline information Notations

More information

Variable Selection in Measurement Error Models 1

Variable Selection in Measurement Error Models 1 Variable Selection in Measurement Error Models 1 Yanyuan Ma Institut de Statistique Université de Neuchâtel Pierre à Mazel 7, 2000 Neuchâtel Runze Li Department of Statistics Pennsylvania State University

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Tuning parameter selectors for the smoothly clipped absolute deviation method

Tuning parameter selectors for the smoothly clipped absolute deviation method Tuning parameter selectors for the smoothly clipped absolute deviation method Hansheng Wang Guanghua School of Management, Peking University, Beijing, China, 100871 hansheng@gsm.pku.edu.cn Runze Li Department

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS A Dissertation in Statistics by Ye Yu c 2015 Ye Yu Submitted

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Variable Selection in Multivariate Multiple Regression

Variable Selection in Multivariate Multiple Regression Variable Selection in Multivariate Multiple Regression by c Anita Brobbey A thesis submitted to the School of Graduate Studies in partial fulfillment of the requirement for the Degree of Master of Science

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Logistic Regression with the Nonnegative Garrote

Logistic Regression with the Nonnegative Garrote Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Relaxed Lasso. Nicolai Meinshausen December 14, 2006

Relaxed Lasso. Nicolai Meinshausen December 14, 2006 Relaxed Lasso Nicolai Meinshausen nicolai@stat.berkeley.edu December 14, 2006 Abstract The Lasso is an attractive regularisation method for high dimensional regression. It combines variable selection with

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

High dimensional thresholded regression and shrinkage effect

High dimensional thresholded regression and shrinkage effect J. R. Statist. Soc. B (014) 76, Part 3, pp. 67 649 High dimensional thresholded regression and shrinkage effect Zemin Zheng, Yingying Fan and Jinchi Lv University of Southern California, Los Angeles, USA

More information

Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function

Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2011-03-10 Variable Selection and Parameter Estimation Using a Continuous and Differentiable Approximation to the L0 Penalty Function

More information

ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS

ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS Submitted to the Annals of Statistics ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS BY LI WANG, XIANG LIU, HUA LIANG AND RAYMOND J. CARROLL University of Georgia, University

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign

More information

Adaptive Lasso for Cox s Proportional Hazards Model

Adaptive Lasso for Cox s Proportional Hazards Model Adaptive Lasso for Cox s Proportional Hazards Model By HAO HELEN ZHANG AND WENBIN LU Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203, U.S.A. hzhang@stat.ncsu.edu

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Regularization in Cox Frailty Models

Regularization in Cox Frailty Models Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Comments on \Wavelets in Statistics: A Review by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill Comments on \Wavelets in Statistics: A Review" by A. Antoniadis Jianqing Fan University of North Carolina, Chapel Hill and University of California, Los Angeles I would like to congratulate Professor Antoniadis

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Statistical challenges with high dimensionality: feature selection in knowledge discovery

Statistical challenges with high dimensionality: feature selection in knowledge discovery Statistical challenges with high dimensionality: feature selection in knowledge discovery Jianqing Fan and Runze Li Abstract. Technological innovations have revolutionized the process of scientific research

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

VARIABLE SELECTION IN QUANTILE REGRESSION

VARIABLE SELECTION IN QUANTILE REGRESSION Statistica Sinica 19 (2009), 801-817 VARIABLE SELECTION IN QUANTILE REGRESSION Yichao Wu and Yufeng Liu North Carolina State University and University of North Carolina, Chapel Hill Abstract: After its

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Graduate Econometrics I: Unbiased Estimation

Graduate Econometrics I: Unbiased Estimation Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation

More information

The annals of Statistics (2006)

The annals of Statistics (2006) High dimensional graphs and variable selection with the Lasso Nicolai Meinshausen and Peter Buhlmann The annals of Statistics (2006) presented by Jee Young Moon Feb. 19. 2010 High dimensional graphs and

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Consistent Group Identification and Variable Selection in Regression with Correlated Predictors Dhruv B. Sharma, Howard D. Bondell and Hao Helen Zhang Abstract Statistical procedures for variable selection

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Parameter Norm Penalties. Sargur N. Srihari

Parameter Norm Penalties. Sargur N. Srihari Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS

MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS Statistica Sinica 23 (2013), 901-927 doi:http://dx.doi.org/10.5705/ss.2011.058 MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS Hyunkeun Cho and Annie Qu University of Illinois at

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information