Model Selection via Bayesian Information Criterion for Quantile Regression Models

Size: px
Start display at page:

Download "Model Selection via Bayesian Information Criterion for Quantile Regression Models"

Transcription

1 Model election via Bayesian Information Criterion for Quantile Regression Models Eun Ryung Lee Hohsuk Noh Byeong U. Park final version for JAA August 5, 2013 Abstract Bayesian Information Criterion BIC is known to identify the true model consistently as long as the predictor dimension is finite. Recently, its moderate modifications have been shown to be consistent in model selection even when the number of variables diverges. Those works have been done mostly in mean regression, but rarely in quantile regression. The best known results about BIC for quantile regression are for linear models with a fixed number of variables. In this paper, we investigate how BIC can be adapted to high-dimensional linear quantile regression and show that a modified BIC is consistent in model selection when the number of variables diverges as the sample size increases. We also discuss how it can be used for choosing the regularization parameters of penalized approaches that are designed to conduct variable selection and shrinkage estimation simultaneously. Moreover, we extend the results to structured nonparametric quantile models with a diverging number of covariates. We illustrate our theoretical results via some simulated examples and a real data analysis on human eye disease. Eun Ryung Lee is postdoctoral researcher, Department of Economics, University of Mannheim, Mannheim, Germany. Hohsuk Noh is Assistant Professor, Department of tatistics, ookmyung Women s University, eoul , outh Korea. Byeong U. Park is Professor, Department of tatistics, eoul National University, eoul , outh Korea. E. R. Lee acknowledges financial support from the the Collaborative Research Center FB 884 Political Economy of Reforms, funded by the German Research Foundation DFG. H. Noh acknowledges financial support from the European Research Council under the European Community s eventh Framework Programme FP7/ / ERC Grant agreement No and IAP research network P7/06 of the Belgian Government Belgian cience Policy. Research of B. U. Park was supported by the NRF Grant funded by the Korea government MET No The authors would like to thank two referees, the Associate Editor and the Coeditor for their valuable suggestions, which have significantly improved the paper. This research was partly done while the second author was visiting the first author in University of Mannheim. The first and second authors appreciate the support from Prof. Enno Mammen and Prof. Ingrid Van Keilegom regarding their research visits. 1

2 Key words: high-dimension, linear quantile regression, nonparametric quantile regression, model selection consistency, regularization parameter selection, shrinkage method 1 Introduction In regression analysis, model selection is necessary in that an underfitted model brings out seriously biased results, whereas an overfitted model leads to substantial loss in estimation efficiency. Hence finding a parsimonious model with good prediction ability is an important task. For a traditional linear mean regression model, Bayesian Information Criterion BIC has been used for model selection because it has been well understood that best subset selection with the BIC identifies the true model consistently Nishii, 1984; hao, Recently, Wang et al and Chen and Chen 2008 found that the ordinary BIC is too liberal for large model spaces and proposed its modifications that work when the number of variables increases with the sample size. The selection consistency of BIC has been extended to the M-estimation setting by several authors such as Machado 1993 and Wu and Zen 1999, particularly when the predictor dimension is finite. However, such methods are computationally expensive, especially when the number of variables is large. To tackle this problem, various shrinkage methods such as the Least Absolute hrinkage and election Operator LAO and moothly Clipped Absolute Deviation CAD have been proposed and recognized as a promising way of doing estimation and variable selection simultaneously. An interesting thing is that, along with the popularity of such shrinkage methods, BIC has gained attention as a useful way of determining the amount of shrinkage. The seminal work of Wang and Leng 2007 followed by Wang et al. 2007b and Wang et al suggested to use a BIC-type criterion for choosing the shrinkage parameter in penalized least squares and established the model selection consistency of the resulting procedure. Zhang et al showed that BIC can play the same role in penalized likelihood settings. As shrinkage methods gained popularity in mean regression, many researchers have turned their interest to variable selection in quantile regression. Wang et al. 2007a and Wu and Liu 2009 considered shrinkage estimators for variable selection in linear quantile regression models, and Li and Zhu 2008 proposed an efficient algorithm to compute the entire solution path of L 1 -norm regularized quantile regression. Recently, Li et al and Wang et al investigated CAD penalizations for variable selection in high-dimensional median and quantile regression, respectively. In the context of 2

3 nonparametric models, Noh et al and Noh and Lee 2012 proposed variable selection methods for varying coefficient models and additive models in quantile regression setting, respectively. Although some of these works considered BIC-type criteria to select the shrinkage parameter, apparently there has been no sound theory about why we should consider BIC-type criteria for model selection in quantile regression models. This is the main motivation of the present work. Model selection in quantile regression has two interesting features. One is that considering median regression it can be seen as a way of achieving robustness in model selection. Along this direction, Machado 1993 proposed a BIC for median regression and showed that it identifies the true model consistently if the number of variables is finite. The other feature is that when heterogeneity exists due to either heteroscedastic variance or other types of non-location-scale covariate effects, the sets of relevant covariates can vary depending on the segment of interest in the conditional distribution. Wang et al showed that the CAD-penalized quantile regression estimator is able to detect this heterogeneity of the relevance pattern in high-dimension. However, to the best of our knowledge no theoretical work about BIC has been done in a quantile regression context for the case where the number of variables is diverging, which is encountered in many applications. From this observation, we are motivated to establish a more general and systematic theory of BIC in quantile regression including high-dimensional cases. The rest of this paper is organized as follows. ection 2 briefly describes how a BIC for linear quantile regression is derived and proposes its modification in the case of a diverging number of variables. We also show the selection consistency of the modified BIC there. In ection 3, we propose BICs for covariate selection in some structured nonparametric models and show that they identify the true model consistently when the number of covariates diverges. Finally, we illustrate the theoretical results with some simulated examples and a real data analysis in ections 4 and 5, respectively. All the theoretical details are given in the Appendix. 2 BIC for Linear Quantile Regression In this section, we describe how a BIC for a linear quantile regression model can be derived and then propose its extension to the cases where the number of variables diverges as the sample size increases. 3

4 2.1 Motivation of BIC for linear quantile regression Consider the following linear quantile regression model: Y i = X i β + U i, i = 1,..., n, 2.1 where Y i, X i, U i are independent and identically distributed as Y, X, U R R p R and P U 0 X = x = τ for almost every x. Let X i = X1 i,..., Xi p and β = β1,..., β p. The number of variables p = p n is allowed to increase with the sample size n. We omit such dependence on n in notation for simplicity whenever it has no confusion. We consider the situations where only d covariates among the Xj i s are relevant, i.e., the p d coefficients are zero in the model 2.1. This brings out the statistical problem of selecting the relevant variables. To derive a BIC for the model 2.1, suppose that the error U i follows an asymmetric Laplace distribution whose density is given by fu = τ1 τ σ { exp ρ } τ u 2σ 2.2 with the check loss ρ τ u = u2τ 2Iu < 0, and U i is independent of X i. ince P U i < 0 = τ, the conditional τ-quantile of Y i given X i = x i is x i β. Before deriving a BIC, we introduce some notations. The generic notation = {j 1,..., j d } {1,..., p} denotes an arbitrary candidate model, which includes X j1,..., X jd as relevant predictors and let X = X j1,..., X jd. Additionally, we define to be the cardinality d of. Based on these notations and setup, we first note that the maximum likelihood estimator of β, σ for a model is given as ˆβ, ˆσ, where ˆβ = argmin n β R ρ τ Y i X i β and ˆσ = 2n 1 n ρ τ Y i X i ˆβ. From the definition of BIC in a form of a penalized log-likelihood chwartz, 1978, we obtain the following BIC for linear quantile regression: BIC O L = log ρ τ Y i X i ˆβ + log n 2n. 2.3 The above BIC for τ = 1/2 has been already considered as a robust alternative to the traditional BIC by Machado 1993, who showed its robustness in consistent model selection over a range of the error distributions. Actually, Wu and Zen 1999 considered another type of BIC for median regression in the context of M-estimation. If we set σ = 1 in 2.2 when deriving the BIC, it leads us to one of the 4

5 BICs proposed by Wu and Zen 1999, which is ρ τ Y i X i ˆβ + log n. 2.4 Wu and Zen 1999 showed that the BIC with τ = 1/2 at 2.4 is strongly consistent under certain conditions. The criterion 2.3 is invariant under scale change of the response variable. To see this, note that changing Y i to cy i for some constant c > 0 results in changing ˆβ to c ˆβ for each and thus adding log c to the criterion. This means the criterion 2.3 chooses the same model regardless of the dependent variable s underlying scale of units, whereas 2.4 does not. For this reason, we concentrate on the criterion 2.3 in this paper although it is also possible to extend 2.4 to quantile regression. 2.2 Modified BIC for high-dimension Regarding the BIC in 2.3, recently Lian 2012 proved its consistency in model selection when the number of variables p is finite. However, when p is diverging with the sample size it does not guarantee a consistent result. The reason is that the ordinary BIC is too liberal for model selection and tends to seriously overfit in such a situation, as observed in our simulations. A recent attempt to remedy this problem is to put more penalty on the complexity of the model in the BIC so that it can be strengthened in overfit resistance. This type of modification has been shown to be successful in the context of mean regression in Wang et al and Chen and Chen Adopting this idea, we consider the following modification of the BIC in 2.3 for the case where p is diverging: BIC H L = log ρ τ Y i X i ˆβ where C n is some positive constant which diverges to infinity as n increases. + log n 2n C n, 2.5 Additionally, to facilitate the theoretical analysis of the BIC in 2.5 in high-dimensional setting where the number of variables exceeds the sample size, we consider setting an upper bound on the cardinality of, say s n, and searching for the best model among those sub-models of which the cardinality is smaller than or equal to s n. This idea of restricting model spaces was considered in 5

6 Chen and Chen We define the modified BIC with restriction as Ŝ = argmin BIC H L, 2.6 Ms n where Ms n = { {1,..., p n } : s n }. In ection 2.4, we will show that the modified BIC with restriction identifies the true model consistently when p n = On κ and s n = On α, where κ is an arbitrary positive number and 0 < α < 1/2. Without the restriction on the model space, i.e., with s n = p n, the modified BIC may enjoy selection consistency only when p n = On κ for some 0 < κ < 1/2. Hereafter, we will only consider the modified BIC with restriction and omit the term with restriction whenever it brings out no confusion. 2.3 Main challenges in high-dimension howing the validity of the modified BIC at 2.5 in quantile regression is much more difficult and challenging than in mean regression due to an intrinsic feature of quantile regression. To show the consistency of BIC H L, a crucial and often difficult step is to prove that BICH L is resistant to overfitting, in other words, P min, = BICH L > BIC H L 1 as n, 2.7 where is the true model. Note that it is not equivalent to prove P BIC H L > BICH L 1 for any overfitted candidate model :,, particularly when p is diverging, which leads to a difficulty in high-dimension. One simple and effective idea of establishing 2.7 is to use the following universal lower bound for every overfitted model : min, = BICH L BIC H L log ρ τ Y i X i ˆβF F log ρ τ Y i X i ˆβ + log n 2n C n = A ˆβ F, ˆβ + log n 2n C n, 2.8 where F = {1,..., p} is the full model. It can be shown using the standard arguments in quantile regression that when p = On κ, 0 < κ < 1/2, A ˆβ F, ˆβ = O p p/n

7 For a proof of 2.9, see the Appendix. Consequently, the universal lower bound at 2.8 is useful enough to prove 2.7 as long as p is slowly diverging with an order less than or equal to log n since it is typically considered that C n as n. However, if the order of p is larger than log n, then the lower bound is too rough so that it is no longer useful. It is possible to prove that BIC H L with C n p is overfit-resistant with such a bound. However, such a modification is too naive, hence it has a considerable risk of suffering from underfit inflation in many situations. Because of the reason discussed above, for each overfitted model one should obtain an -specific lower bound for BIC H L BICH L, precisely n ρ τ Y i X i ˆβ n ρ τ Y i X i ˆβ as Wang et al did in linear mean regression. Because the estimators ˆβ and ˆβ have no explicit forms due to the non-differentiability of the check function at zero, obtaining such bound is not feasible without an approximation of n 1 n ρ τ Y i X i β n 1 n ρ τ Y i X i β that is uniform for all β with β β being in some compact set C R and for all :. Thus, getting such a uniform approximation is a usual practice in quantile regression. However, even if we get an -specific lower bound from a uniform approximation, it does not help us prove 2.7 unless the probability of ˆβ β C for all : is large enough. To prove the latter is very challenging since the number of the candidates increases exponentially with p. One of the key elements in our work is to show that there exists a common compact set C R p such that a rescaled version of ˆβ, 0 c β, 0 c for every : is in the set C with large probability, where 0 c is the zero vector in R c, see the supplementary file for details. 2.4 Consistency of BIC H L in model selection In this section, we show that the modified BIC at 2.5 identifies the true model consistently in highdimensional quantile regression models and discuss how the established theory can be used in the context of shrinkage methods. The generic notations and Ŝ, respectively, denote the true model and the selected model by BIC H L. Recall that = {j : βj 0, 1 j p} and d = under the model 2.1. We make the following assumptions in order to facilitate theoretical analysis of the BIC H L. A1 The conditional distribution F U X x of the error U, given X = x, has a density f U X which satisfies: i sup u,x f U X u x < ; ii there exists positive constants δ 1 and δ 2 such that inf x inf u δ1 f U X u x δ 2. A2 The variables X j satisfy: 7

8 i max 1 j p X j M < ; ii the eigenvalues of Σ = EX X is uniformly bounded away from zero and infinity over : M2s n, that is, 0 < l min inf l min Σ sup l max Σ l max <, 2s n 2s n where l min A and l max A denote the smallest and largest eigenvalues of a real-valued square matrix A, respectively. A3 d is fixed, p n = On κ, κ > 0 and d s n = On α, 0 α < 1/2. A4 C n and C n log n/n 0. A5 E U <. The boundedness condition, i of A2, is commonly used for high-dimensional analysis as in Wang et al The condition ii of A2 is the sparse Riesz condition assumed by Zhang and Huang 2008 and Chen and Chen 2008, which is one of the well-known conditions to deal with the situation where the number of regression coefficients exceeds the number of observations p > n. The assumption on d in A3 is rather strong considering that recent works in high-dimensional data analysis try to allow d to grow with n. The main reason for considering fixed d is a technical difficulty in dealing with the maximum of \ 1 n 1 n ρ τ Y i X i ˆβ n 1 n ρ τ Y i X i ˆβ over all overfitted candidate models :,, which arises from the implicit form of the quantile regression estimator. The condition A5 is assumed for conciseness of the proof. It can be relaxed to include some heavy-tailed distributions not satisfying A5. For example, if we add to A4 an additional assumption on C n, log n 4 C 2 n/n 0, the established theory also holds for the case where U follows the Cauchy distribution that does not have the first moment. The following theorem demonstrates that the modified BIC has consistency in model selection in the high-dimensional linear quantile regression model. Theorem 2.1 Under the model 2.1, suppose that A1-A5 hold. Then, we have P Ŝ = 1 as n. Remark Regarding the choice of C n, any positive sequence that satisfies A4 works in theory. In our simulations and real data analysis, C n = log p is used and this choice seems to work reasonably well in a wide range of settings. 8

9 The BIC H L may be also used to select a regularization parameter in shrinkage estimation of quantile regression models. Let ˆβ λ = ˆβ λ,1,..., ˆβ λ,p be a penalized estimator with a regularization parameter λ that determines the amount of shrinkage. Adopting the idea of the BIC for selecting λ that was developed in the mean regression setting, we suggest choosing λ for ˆβ λ as the minimizer of ˆλ = argmin λ BIC H L λ = argmin λ log ρ τ Y i X i ˆβλ + Ŝλ log n 2n C n, 2.10 where argmin runs over all λ > 0 such that the selected subset Ŝλ {j : ˆβ λ,j 0, 1 j p} by the penalized estimator ˆβ λ has a cardinality Ŝλ s n. It can be easily shown that, as long as there exists a sequence λ n such that the shrinkage estimator ˆβ λn satisfies lim P n Ŝλ n = = 1, 2.11 BIC H L at 2.10 works as a criterion of selecting λ for ˆβ λ that gives model selection consistency with a diverging number of variables, i.e., P Ŝˆλ = 1. The property 2.11 holds for some sequence λ n if ˆβ λ possesses the oracle property in the sense of Fan and Li One example of such estimator is the CAD-penalized median regression estimator when p = on 1/2 Li et al., BIC for Nonparametric Quantile Regression In this section we consider two nonparametric models and their estimation based on basis function approximation. We propose some extensions of the BIC criterion derived in ection 2 to these nonparametric models. We note that Huang and Yang 2004 developed a BIC for model selection in nonparametric mean regression models from the time series context. An extension of such work to the nonparametric quantile setting has not been done even in the case where the number of covariates is fixed. Here, we consider the situation where the number of variables p = p n is allowed to increase with the sample size n. Although we consider two examples of nonparametric quantile models, the principle of our extension is quite general and applicable to other nonparametric quantile models. 9

10 3.1 BICs for structured nonparametric models Varying coefficient model We consider the varying coefficient quantile regression model Y i = X i βz i + U i = p Xj i β j Z i + U i, i = 1,..., n, 3.1 j=0 where βz = β 0 z,..., β p z is a coefficient vector, X i = X i 0, Xi 1,..., Xi p, X i 0 1 and P U i 0 X i = x i, Z i = z i = τ for almost every x i, z i. Assume that the data Y i, X i, Z i for 1 i n are independent and identically distributed as Y, X, Z R R p+1 R. We assume that only d covariates among the X 1,..., X p are relevant in the model 3.1. To fit the model 3.1, one may approximate each coefficient function β j using basis functions {B jl, l = 1,..., q j }, β j z q j l=1 γ jl B jl z, j = 0,..., p, 3.2 and then estimate γ jl by minimizing n ρ τ Y i p qj j=0 l=1 γ jlxj ib jlz i. This gives the estimator ˆβ j = q j l=1 ˆγ jlb jl for j = 0,..., p. tatistical properties of the estimator ˆβ j was established by Kim To define a criterion, we need estimates of the coefficient functions for each candidate model. We always include the baseline function β 0 in the model 3.1. For each = {j 1,..., j d } {1,..., p}, define ˆβ,j = q j l=1 ˆγ,jlB jl for j {0}, where ˆγ = argmin γ ρ Y i q j j {0} l=1 γ jl XjB i jl Z i. 3.3 Let N denote the number of basis functions B jl, j {0}, 1 l q j, used to fit the model with the approximation 3.2, i.e., N = q 0 + j q j. Following the idea in ection 2, we propose the following extension of BIC H L for the model 3.1: BIC H VC = log ρ τ Y i j {0} Xj i ˆβ,j Z i log n + N 2n C n, 3.4 where C n is a positive constant which diverges to infinity as n increases. The diverging order of C n 10

11 will be specified in the asymptotic theory. As in linear quantile regression, we select the model which minimizes BIC H VC within the restricted model space Ms n Additive model In a similar spirit of the BIC in 3.4, we may also develop a BIC for the additive quantile regression model p Y i = µ + m j Xj i + U i, i = 1,..., n, 3.5 j=1 where Y i, X i, U i, 1 i n are independent and identically distributed as Y, X, U R [0, 1] p R and P U 0 X = x = τ for almost every x. For identification, we assume that 1 0 m jx dx = 0 for j = 1,..., p. Also, we assume that only d covariates among the X j s are relevant in the model 3.5. With a basis {B jl } q j l=1 for approximating m j, the estimators for a candidate model are given by ˆm,j = q j l=1 ˆγ,jlB jl for j, where ˆγ = argmin ρ τ Y i γ 0 j q j l=1 γ jl B jl Xj i. We always include the intercept µ in the model 3.5, and take ˆµ = ˆγ,0. imilarly to the BIC at 3.4, we propose to use the following BIC as a model selection criterion for the model 3.5: BIC H AD = log ρ τ Y i ˆµ j ˆm,j Xj i log n + N 2n C n, 3.6 where N = 1 + j q j. Here again, we find the optimal model over the model space Ms n. 3.2 Consistency of the BICs in covariate selection If one approximates a nonparametric model by a parametric model, then the problem of estimating the approximate model is essentially the same as the estimation of a parametric model. That said, the two estimation problems are methodologically similar. However, from the model selection point of view, the two are different since, for a nonparametric model, the selection of the coefficients variables in the approximate model is done groupwise. For example, in our varying coefficient model at 3.1, the variables {X j B jl Z : 1 l q j } enter and leave the model together. From a theoretical point of view, one also needs to take care of the approximating error in the nonparametric setting. 11

12 To facilitate our theoretical analysis of BIC H VC and BICH AD, we introduce some notations. Let X denote the predictor variable X, Z or X, and X i its ith observation. The true index sets in the nonparametric models 3.1 and 3.5 are {1 j p : β j L2 0} and {1 j p : m j L2 0}, respectively, where L2 denotes the L 2 -norm. To present theoretical results for the two models in one framework, we use the following generic notations whose definitions change according to the model. Let π j = B j1,..., B jqj be the set of the basis functions to approximate the jth coefficient function β j in 3.1 or the jth additive component function m j in 3.5. uppose that π j γ j are the best approximations, in the sup-norm, of β j s or m j s. Let r nj denote the approximation error, that is, r nj = sup z π j z γ j β jz or r nj = sup x π j x γ j m jx. With some slight abuse of notation we denote, by Π = Π 0,..., Π p, both Πx, z = π 0 z, x 1 π 1 z,..., x p π p z under the varying coefficient model 3.1 and Πx = 1, π 1 x 1,..., π p x p under the additive model 3.5. Let Π i be the value of Π at the ith observation X i. Also, let Π = Π 0, Π j 1,..., Π j d where = {j 1,..., j d }. Define rn = R n γ where γ = γ0, γ 1,..., γ p and R n γ to be sup x,z p j=0 x j β j z Πx, z γ under 3.1 or sup x µ + p j=1 m jx j Πx γ under 3.5. Then, Π γ is the best approximation of the true τth conditional quantile in the function space generated by the chosen basis functions. Using these notations, the problems of estimating γ for the two nonparametric models can be rewritten in a single form of the minimization problem, ˆγ = argmin γ ρ τ Y i Π i γ. 3.7 For an approximation of the jth coefficient additive component function, the number of basis functions, q j, should tend to infinity as n, so q j depend on n. However, we omit such dependence whenever it brings no confusion. Under the mild condition that lim sup n max j q j / min j q j <, without loss of generality we can assume q j = q for all j in our asymptotic analysis. We make the following assumptions to establish the model consistency results of the BICs for the nonparametric quantile regression models. B1 The conditional distribution F U X x of the error U, given X = x, has a density f U X that satisfies i and ii of A1. B2 Let q = q n n 1/2r+1 where r is the constant in iii below and a n b n means that the ratio a n /b n is bounded away from zero and infinity. For the matrices Π i and Π satisfy: i max 1 i n max 0 j p Π i j = O q; 12

13 ii all the eigenvalues of Σ = EΠ Π is uniformly bounded away from zero and infinity over : M2s n, that is, 0 < l min inf l min Σ sup l max Σ l max < ; 2s n 2s n iii r nj = Oq r and r n = Oq r for all j and some r > 1/2. B3 d is fixed, p = On κ, κ > 0 and d s n = On α, 0 α < 2r 1/4r + 2. B4 C n and C n q log n/n 0. B5 E U <. The conditions i and iii of B2 are not strong and are satisfied in most nonparametric estimation problems based on spline basis approximation, see Kim 2007 and Horowitz and Lee 2005, for example. Especially, the constant r in iii depends on the degree of the smoothness of the coefficient or component function as well as the basis functions. pecifically, when the dth order derivative of β j or m j for every j satisfies the Hölder condition of order γ, B-splines of order d + 1 give r = d + γ chumaker, 1981, Corollary The condition ii of B2 is an extension of ii of A2 for the nonparametric models, which Wei et al have already used for the analysis of high-dimensional varying coefficient models. When the covariates are independent, the condition ii of B2 holds for the B-spline basis that Kim 2007 and Horowitz and Lee 2005 used. With these assumptions, we obtain the model selection consistency of BIC H VC and BICH AD covariates is diverging. when the number of Theorem 3.1 Under the model 3.1 or 3.5, suppose that B1-B5 hold. Then, we have P Ŝ = 1 as n. As we discussed at the end of ection 2 in the case of linear quantile regression, the above result justifies the use of BIC H VC and BICH AD for the selection of a tuning parameter in nonparametric penalized quantile regression estimation. Additionally, our proof of Theorem 3.1 reveals that BIC O VC and BIC O AD, which are BICH VC and BICH AD with C n = 1 respectively, work when the number of covariates p is finite. Remark In the structured nonparametric models, BIC may be used to determine the number of basis functions as well as to choose significant variables. In our work, we focus on the use of the 13

14 BICs in choosing significant covariates when the appropriate number of basis for each component or coefficient is given. When there is no need of covariate selection, the BICs can be also used to determine the number of basis functions in practice. uch examples can be found in He and hi 1996, Doksum and Koo 2000, Horowitz and Lee 2005 and Kim imulation tudies imulation experiments are conducted to confirm our theoretical results. We consider three quantile regression models introduced in ections 2 and 3, one of which is a parametric linear model and the others are nonparametric additive and varying coefficient models. For all models, the error ε i is independent of the covariate vector X i or X i, Z i. We examine the performance of the modified BICs in various scenarios, changing the number of variables p relative to the sample size n. We consider the case n, p = 100, 10, where p n and thus the ordinary BICs BIC O L, BICO VC and BICO AD are applicable. For high-dimensional models, we consider n, p = 100, 100 and 100, 200 for the parametric model, and n, p = 150, 100 for the nonparametric models. In the latter scenarios, since p is comparable to or larger than n, the ordinary BICs are expected to fail in identifying the true model and the modified ones to excel. As for s n, we set s n = 50 for BIC H L and s n = 20 for BIC H VC and BIC H AD. Here are the detailed descriptions of the models. Model L Linear model : Y i = 3X i 1 1.5X i 2 + 2X i 5 + 2X i 8ε i, i = 1,..., n, 4.1 where all the covariates X1 i,..., Xi p are generated from a multivariate normal distribution with mean 0 and covariance matrix Σ = σ j1,j 2 with σ j1,j 2 = 0.5 j 1 j 2. Then, the eighth covariate X8 i is replaced by the random number from the uniform distribution U[0.5, 1]. For ε i, we consider the standard normal and t distribution with df = 2. Model VC Varying coefficient model : Y i = β 0 Z i + β 1 Z i X i 1 + β 2 Z i X i X i 3 ε i τ, i = 1,..., n, 4.2 where the covariates X i = X i 1,..., Xi p is from a multivariate normal distribution with mean 0 and covariance matrix Σ = σ j1,j 2 with σ j1,j 2 = 0.7 j 1 j 2, Z i is from a uniform distribution 14

15 U[0, 1] and ε i τ = ε i Fε 1 τ. Here, the subtraction of Fε 1 τ from ε i is to make the τ-th quantile of ε i τ be zero. As for the distribution F ε, we consider N0, 3. Finally, we set β 0 u = 4u, β 1 u = 2 sin2πu, β 2 u = 1 + 3u1 u. Model AD Additive model : Y i = f 1 X i 1 + f 2 X i 2 + f 3 X i 3 + σ 4 X i 4ε i, i = 1,..., n, 4.3 where f 1 x = 5x, f 2 x = 4 sin2πx/2 sin2πx, f 3 x = 60.1 sin2πx cos2πx sin 2 2πx cos 3 2πx sin 3 2πx and σ 4 x = 4.52x 1 2. The error ε follows a standard normal distribution and the covariates X1 i,..., Xi p have a compound symmetry covariance structure: X k = W k + tu/1 + t, k = 1,..., p, where W 1,..., W p and U are i.i.d. from U[0, 1]. We set t = Construction of sub-models When p is large, it is not computationally feasible to calculate BIC H L as well as BICH VC and BIC H AD for all : Ms n. of sub-models and choose the best model among them. A simple solution to this problem is to construct a sequence A typical approach for this is using the regularization path of a penalized estimator. Recently, many efficient methods for finding the path have been developed such as the LAR algorithm Efron et al., 2004 and the coordinate-descent based algorithms Mazumder et al., 2011; Breheny and Huang, Below we give some details about the model selection procedure with BIC H L at 2.5 and a sparse penalized regression method. For a given penalty p λ indexed by λ > 0, compute the regularization path of a penalized estimator { ˆβ λ : λ > 0}, where ˆβ λ = argmin β ρ τ Y i X i β + n p p λ β j. 4.4 j=1 15

16 Let Γ = {λ : Ŝλ Ŝλ, Ŝλ s n }. Choose the best model Ŝ among Ŝλ with λ Γ, i.e., Ŝ = argmin Ŝ λ, λ Γ = argmin Ŝ λ, λ Γ BIC H L Ŝλ log ρ τ Y i X i ˆβŜλ Ŝ λ + Ŝλ log n 2n C n. Here, ˆβŜλ should not be confused with ˆβ λ defined at 4.4. The former is the minimizer of n ρ τ Y i X i Ŝ λ βŝλ over βŝλ R Ŝλ. In a similar manner, we construct sub-models for BIC H VC and BICH AD using the penalized quantile regression estimators in the additive and varying coefficient models. Throughout all numerical simulations, we use the CAD-penalized quantile regression estimator to construct a sequence of sub-models. To implement the regularization path, we compute the penalized estimators over a fine grid of λ by applying the local linear approximation algorithm iteratively starting from the zero initial vector. For numerical details, see Wang et al and Noh et al In the case of high-dimensional linear mean regression, Kim and Kwon 2012 and Zhang 2010 established the consistency of the solution path which means that the regularization path includes the true model for a non-convex penalized estimator with either the CAD penalty or minimax concave penalty under certain conditions. However, such results are not available in high-dimensional quantile regression. Because of this, in our Monte-carlo simulations we report the proportion of the cases where the regularization path includes the true model the column incl. in Tables 1 and 2. We observe that the proportion of inclusion of the true model is generally high. Additionally, to see whether a failure of the BICs can be attributed to the exclusion of the true model in the path, we add the true model to a sequence of candidate sub-models whenever it is not in the path, and evaluate the model selection performance of the BICs. We find that the model selection performance does not seem to be significantly affected by whether or not we artificially add the true model to the candidate models, see the online supplementary file for the simulation results about this. 4.2 Model selection consistency of the modified BICs To calculate the BICs for nonparametric models Model VC and AD, we use cubic splines and set the number of knots to be 3, which we find enables the corresponding basis to approximate all the coefficient and component functions reasonably well in our simulations. For the assessment of model selection, we say that is correct if = ; overfits if and ; and underfits if 16

17 , and report the percentage of each case out of 200 Monte Carlo replications. Additionally, we report the average number of correctly selected nonzero components in the column labeled NC and the average number of zero components incorrectly selected into the final model in the column labeled NIC. The standard deviations based on the 200 replications are presented in parentheses. Tables 1 and 2 summarize model selection results of BIC O L and BICH L based on 200 replications when the error is normal and t-distributed, respectively. Note that due to the heteroscedastic error of the model, the index set of the relevant variables changes depending on a given quantile level τ, {1, 2, 5} and {1, 2, 5, 8} for τ = 0.5 and τ 0.5, respectively. When the predictor dimension is moderate p = 10, both BICs seem to work reasonably well. However, in high-dimensional situations p = 100 and 200 the ordinary BIC tends to overfit seriously, which was already observed in the mean regression setting by Chen and Chen 2008, whereas the modified BIC performs pretty well in resisting against overfit without losing much efficiency in detecting the true nonzero variables overall. Additionally, we observe that when τ = 0.25 and 0.75 both BICs tend to underfit more than when τ = 0.5. This is because it is more difficult to identify the variable X 8 involved in heteroscedasticity than other variables. Detecting such heterogeneity of relevant variables seems to become even more difficult in high-dimension especially considering the result of t errors with p = 200 at τ = 0.25 and Concerning BIC H VC and BICH AD, the model selection results are presented in Table 3. Note that = {1, 2} and {1, 2, 3} for model VC and AD, respectively, when τ = 0.5. Because we find that C n = log p makes the corresponding BICs for both the nonparametric models excessively overfit-resistant, resulting in underfit inflation, we also take the choices C n = 2 1 log p and 3 1 log p. Additionally, we present the results of the case where C n = 1, which corresponds to the ordinary BIC. From Table 3, we learn the same lesson, as in the parametric case, that in high-dimensional situations the modified BICs for the nonparametric models manage to control the proliferation of overfit without much losing the sensitivity of detecting significant covariates. However, this advantage over the ordinary BICs is dependent on the choice of C n, which is kind of subjective in our framework. This issue deserves further investigation in the future. 4.3 Effectiveness as regularization parameter selector To evaluate the performance of the BIC defined at 2.10 as a regularization parameter selector for penalized estimators, we use 200 replications of random samples of n = 100 and p = 50 from Model 17

18 Table 1: Model selection results when n = 100 with the normal errors. BIC O L BIC H L τ p incl. C O U NC NIC C O U NC NIC % 44.0% 0.0% % 4.0% 0.0% % 98.5% 1.5% % 4.5% 7.0% % 94.5% 5.5% % 4.5% 14.0% % 34.0% 0.0% % 3.5% 0.0% % 97.0% 0.0% % 0.5% 0.5% % 99.5% 0.0% % 0.5% 2.0% % 46.5% 0.0% % 4.5% 0.0% % 98.0% 2.0% % 4.5% 5.0% % 93.0% 7.0% % 1.5% 13.5% Note: The numbers under C, O and U, respectively, are the percentages of the cases in which the selected index sets are correct, overfit and underfit. Table 2: Model selection results when n = 100 with the t errors. BIC O L BIC H L τ p incl. C O U NC NIC C O U NC NIC % 29.0% 1.0% % 2.0% 11.5% % 83.5% 6.0% % 0.5% 50.0% % 83.0% 15.0% % 2.0% 68.5% % 13.0% 0.5% % 0.5% 2.5% % 72.5% 1.5% % 0.5% 25.0% % 82.5% 4.5% % 0.0% 35.0% % 26.0% 0.5% % 3.5% 15.5% % 86.0% 7.5% % 1.5% 54.5% % 79.0% 19.5% % 1.0% 67.0% Note: The numbers under C, O and U, respectively, are the percentages of the cases in which the selected index sets are correct, overfit and underfit.

19 L. For illustration, we consider the one step CAD-penalized estimator ˆβ λ = ˆβ λ,1,..., ˆβ λ,p = argmin β ρ τ Y i X i β + n p ṗ λ β j β j, 4.5 j=1 which is implemented in the spirit of Zou and Li Here, the function ṗ λ is the derivative of the CAD penalty function defined on R + as ṗ λ x = λix λ + aλ x + Ix > λ a 1 for some constant a > 2, where I is the indicator function and β = β 1,..., β p is the unpenalized estimator with λ = 0 in 4.5. Table 4 gives model selection results of the estimator in 4.5 when the regularization parameter λ is chosen by BIC H L λ at 2.10 and by BIC O L λ = log ρ τ Y i X i ˆβλ + Ŝλ log n 2n. 4.6 We also present the results when λ is selected by the cross-validation 5-folds in our simulation based on the check loss ρ τ as was considered in Wu and Liu We observe that BIC H L λ, as a regularization parameter selector, is as effective as BIC H L, as a model selection criterion, when the number of variables is comparable to the sample size, whereas the cross-validation and BIC O L λ lead to considerable overfitting. 5 Real Data Example For illustration of the modified BIC for high-dimensional linear quantile regression, we use the data analyzed in cheetz et al. 2006, which has gene expression values of 31,042 probe sets on 120 rats. Gene expression levels were analyzed on a log scale with base 2. The main objective of this analysis is to study how the expression of gene T RIM32 probe at, known to cause human hereditary diseases of the retina, depends on the expression of other genes. Following cheetz et al. 2006, we exclude probes that were not expressed in the eye or were not sufficiently variable from the 31,042 probe sets: remove each probe for which the maximum expression value among the 120 rats is less than the 25th percentile of the entire set of the expression values, and choose probes that exhibited at least 2-fold variation in expression level among the 120 rats. After this process, we have 18,986 19

20 Table 3: Model selection results of BIC H VC and BICH AD when n = 100 τ = 0.5. BIC H VC BIC H AD p C n C O U NC NIC C O U NC NIC % 1.5% 1.5% % 4.5% 0.0% log p = % 0.0% 78.5% % 0.0% 1.0% log p = % 1.5% 5.0% % 2.5% 0.0% log p = % 3.5% 0.0% % 14.0% 0.0% % 46.0% 2.0% % 44.5% 0.5% log p = % 0.0% 95.0% % 0.0% 88.5% log p = % 0.0% 77.0% % 1.5% 10.5% log p = % 11.5% 22.0% % 10.0% 1.5% Note: The numbers under C, O and U, respectively, are the percentages of the cases in which the selected index sets are correct, overfit and underfit. Table 4: Model selection results of the penalized estimator τ = 0.5 when n = 100 and p = 50. N0, 1 t2 Method C O U NC NIC C O U NC NIC BIC O L 56.0% 44.0% 0.0% % 42.0% 8.5% BIC H L 97.0% 0.5% 2.5% % 5.0% 27.0% CV 71.0% 28.0% 1.0% % 33.5% 20.0% Note: The numbers under C, O and U, respectively, are the percentages of the cases in which the selected index sets are correct, overfit and underfit.

21 probes left. Among these probes, as in Huang et al. 2008, we select 3000 genes with the largest variance in expression value, and then choose the top 300 genes whose expression value has the largest absolute value of correlation with the expression of gene T RIM 32 corresponding to probe at. Then we apply several methods to find relevant genes to the gene T RIM32 at different quantiles τ = 0.25, 0.5, To be more specific, we consider the following linear quantile regression model as in Wang et al. 2012: 300 Q τ Y i X i = β 0 τ + β j τxj, i j=1 where Q τ Y X denotes the τth conditional quantile of the expression of gene T RIM32 Y given all the expression of other genes X = X 1,..., X 300. The appropriateness of the linear quantile regression model for this data has been checked in Wang et al via the simulation-based graphical method of Wei and He ince it is impossible to do an exhaustive search over 300 genes for the best model, we rely on the regularization path of the penalized estimator as in ection 4 and restrict the cardinality of sub-models to 20. With the selected sub-models, we find relevant genes at different quantiles τ = 0.25, 0.5, 0.75 using BIC H L and the 5-fold cross-validation CV1. For comparison, we also present the result of the CAD-penalized estimator in Wang et al with the shrinkage parameter λ selected by the 5-fold cross-validation CV2. To assess the quality of model selection results, we conduct 50 random partitions. For each partition, we randomly choose 80 rats and 40 rats as the training and test data, respectively. With the training data, we conduct model selection based on the three methods and report the average number of nonzero coefficients and the average prediction error calculated using the test data in Table 5. The standard errors based on the 200 replications are presented in parentheses. As a measure of prediction error, we use 40 ρ τ Y i Ŷ i. From Table 5, we observe that the model chosen by the modified BIC is much simpler than the ones chosen by the cross-validation but shows comparable or even better prediction error performance all across the quantiles we consider, which suggests that the proposed BIC is a sensible criterion for model selection in high-dimensional linear quantile regression. 6 Discussion Even though the BIC for linear quantile regression has been known to select the true model consistently when the predictor dimension is finite, it has not been understood how a diverging number of variables 21

22 Table 5: Analysis of gene expression dataset Quantile level Method Ave # of nonzero Prediction error BIC H L τ = 0.25 CV CV BIC H L τ = 0.5 CV CV BIC H L τ = 0.75 CV CV would affect the selection consistency of BIC in quantile regression. In this work we propose several extensions of BIC in quantile regression for the cases of a diverging number of variables both in parametric and nonparametric models and proved their consistency in model selection. ince we extend the BIC to nonparametric models with a basis approximation, it is natural to ask whether such an extension is also possible for the kernel smoothing approach. One can think of a BIC for nonparametric quantile regression in kernel smoothing context by modifying the proposed nonparametric BICs based on the well-known intuition h q 1 between the bandwidth h and the number of basis functions q. Additionally, it seems to be possible to consider the extension from the BIC for parametric quantile regression using the effective sample size nh in kernel smoothing, instead of the original sample size n, as Wang and Xia 2009 did in varying coefficient mean regression models. In our simulations, we find that the finite sample performance of the modified BICs for highdimensional quantile regression seems to vary with the choice of C n. Although we provide a quite wide range of C n that works in theory, a practical choice of C n can be subjective so it deserves further research. As one of the referees commented, it is worthwhile to investigate whether a rigorous Bayesian formulation is available for quantile regression as an extension of the work of Chen and Chen 2008 in the mean regression setting. Based on such investigation, one may discover another BIC with a Bayesian justification, which may provide some suitable choice of C n for our modified BICs. Our theoretical results about model selection consistency can be extended to the ultra-high dimensional case, depending on the cardinality restriction s n of candidate submodels. To be specific, our modified BICs continue to have selection consistency when p = Oexpn κ for 0 < κ < 1/2 α, where α is the constant in A3 for the linear model, and the one in B3 for the structured nonparametric 22

23 models. This result only requires a stronger condition on C n, which is C n log n/ log p instead of C n in the conditions A4 and B4. Further, when the model dimension is very large, it is worthwhile to consider reducing the number of the covariates using a recently developed screening procedure for quantile regression, such as He et al. 2013, before applying our proposed BICs to choose the final model. Finally, we remark that our modified BICs with a broad range of C n can be also understood in a framework of the generalized information criterion. Appendix We present only the proof of Theorem 3.1, because in nonparametric quantile regression one should handle the model approximation error as well so that it is more complicated than the one in the linear quantile regression setting. The proof of Theorem 2.1, which considers the linear quantile model, can be found in the online supplementary file. A.1 Proof of Theorem 3.1 Let M s n = { Ms n : } and B δ = {θ R N : θ δ}, where is the Euclidean norm. Recall that N is the number of the basis functions used to fit the model, i.e., N = q + 1 under the varying coefficient model 3.1, or N = q + 1 under the additive model 3.5. We denote the maximum of N over Ms n by N. Let R n = Π γ Q τ X and R i n = Π i γ Q τ X i, where Q τ x is the τth conditional quantile of the response variable Y given X = x. Define ˆl max = sup sn l max ˆΣ and ˆl min = inf sn l min ˆΣ where ˆΣ = n 1 n Πi Πi. For a matrix A, let A M = sup u: u =1 Au denote the operator norm of A. Using Theorem 1.6 of Tropp 2012 with the assumption B2, one has that there exists a constant C > 0 such that for all t 0, P n 1 Π i jπ i k EΠ jπ k > t M This gives sup 0 j,k p n Πi j Πi k /n EΠ jπ k sup u,v: u = v =1 u, ˆΣ Σ v Cq exp nt 2 Cq 2 + qt sup 0 j,k p n Πi j Πi k /n EΠ jπ k. M = O p log p/n 1/2 q. ince ˆΣ Σ M = s n, we may assume M without loss of generality that there exist positive constants c and C such that 0 < c < ˆl min ˆl max < C <. 23

24 Lemma A.1 uppose that B1,B2 and B3 hold. Then, for any sequence {L n } satisfying 1 L n N δ 0/10 for some δ 0 > 0 with N 2+δ 0 = on, we have sup sup M s n γ : γ γ Ln N /n N 1 { ρ τ U i Π i γ γ R i n ρ τ U i R i n + Π i γ γ 2τ 2IU i < 0 E U i X i ρ τ U i Π i γ γ R i n ρ τ U i R i n} = o p 1. A.1 Proof. The lemma can be proved by technical arguments similar to those in the proof of Lemma 3.2 in He and hi Let Z i = n 1/2 Π i and θ = n 1/2 N 1/2 γ γ. Note that A.1 is equivalent to sup sup M s n θ B Ln N 1 { ρ τ U i N 1/2 Zi θ Rn i ρ τ U i Rn i + N 1/2 Zi θ 2τ 2IU i < 0 E U i Z iρ τ U i N 1/2 Zi θ Rn i ρ τ U i Rn} i = o p 1. A.2 By the assumption B3, we can choose δ 0 > 0 such that N 2+δ 0 = on. With such δ 0, let d n = N δ 0/10 L 2 n max M s n max 1 i n Z i N 1/2 L n. ince dn MN 2+4δ 0/5 /n 1/2 = o1 for some constant M > 0, we may assume that d n W for some constant W > 0. Then, it suffices to show that for any ɛ > 0, where P sup sup M s n θ B 1 N 1 h θ > ɛ, d n W 0 as n, h θ = = h i θ { U i N 1/2 L nz i θ Rn i U i Rn i + N 1/2 L nz i θ 2τ 2IU i < 0 E U i Z i U i N 1/2 L nz i θ Rn i U i Rn }. i A.3 Take the minimal number of balls with radius q 0 = ɛn /24W n, say Γ 1,..., Γ Kn, that covers B 1. Actually, the balls and the radius depend on but we omit such dependence in notation for 24

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Existence and Uniqueness

Existence and Uniqueness Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces Journal of Machine Learning Research 17 (2016) 1-26 Submitted 6/14; Revised 5/15; Published 4/16 A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces Xiang Zhang Yichao

More information

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Statistica Sinica 23 (2013), 929-962 doi:http://dx.doi.org/10.5705/ss.2011.074 VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Lee Dicker, Baosheng Huang and Xihong Lin Rutgers University,

More information

Forward Regression for Ultra-High Dimensional Variable Screening

Forward Regression for Ultra-High Dimensional Variable Screening Forward Regression for Ultra-High Dimensional Variable Screening Hansheng Wang Guanghua School of Management, Peking University This version: April 9, 2009 Abstract Motivated by the seminal theory of Sure

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

arxiv: v2 [stat.me] 4 Jun 2016

arxiv: v2 [stat.me] 4 Jun 2016 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates 1 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates Ben Sherwood arxiv:1510.00094v2

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Median Cross-Validation

Median Cross-Validation Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

high-dimensional inference robust to the lack of model sparsity

high-dimensional inference robust to the lack of model sparsity high-dimensional inference robust to the lack of model sparsity Jelena Bradic (joint with a PhD student Yinchu Zhu) www.jelenabradic.net Assistant Professor Department of Mathematics University of California,

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Identification and Estimation for Generalized Varying Coefficient Partially Linear Models

Identification and Estimation for Generalized Varying Coefficient Partially Linear Models Identification and Estimation for Generalized Varying Coefficient Partially Linear Models Mingqiu Wang, Xiuli Wang and Muhammad Amin Abstract The generalized varying coefficient partially linear model

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

M-estimation in high-dimensional linear model

M-estimation in high-dimensional linear model Wang and Zhu Journal of Inequalities and Applications 208 208:225 https://doi.org/0.86/s3660-08-89-3 R E S E A R C H Open Access M-estimation in high-dimensional linear model Kai Wang and Yanling Zhu *

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

Multicategory Vertex Discriminant Analysis for High-Dimensional Data

Multicategory Vertex Discriminant Analysis for High-Dimensional Data Multicategory Vertex Discriminant Analysis for High-Dimensional Data Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park October 8, 00 Joint work with Prof. Kenneth

More information

arxiv: v1 [stat.me] 23 Dec 2017 Abstract

arxiv: v1 [stat.me] 23 Dec 2017 Abstract Distribution Regression Xin Chen Xuejun Ma Wang Zhou Department of Statistics and Applied Probability, National University of Singapore stacx@nus.edu.sg stamax@nus.edu.sg stazw@nus.edu.sg arxiv:1712.08781v1

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Frontier estimation based on extreme risk measures

Frontier estimation based on extreme risk measures Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

The Australian National University and The University of Sydney. Supplementary Material

The Australian National University and The University of Sydney. Supplementary Material Statistica Sinica: Supplement HIERARCHICAL SELECTION OF FIXED AND RANDOM EFFECTS IN GENERALIZED LINEAR MIXED MODELS The Australian National University and The University of Sydney Supplementary Material

More information

P-Values for High-Dimensional Regression

P-Values for High-Dimensional Regression P-Values for High-Dimensional Regression Nicolai einshausen Lukas eier Peter Bühlmann November 13, 2008 Abstract Assigning significance in high-dimensional regression is challenging. ost computationally

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Discussion of High-dimensional autocovariance matrices and optimal linear prediction,

Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Electronic Journal of Statistics Vol. 9 (2015) 1 10 ISSN: 1935-7524 DOI: 10.1214/15-EJS1007 Discussion of High-dimensional autocovariance matrices and optimal linear prediction, Xiaohui Chen University

More information