Large Sample Theory For OLS Variable Selection Estimators

Size: px
Start display at page:

Download "Large Sample Theory For OLS Variable Selection Estimators"

Transcription

1 Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for ordinary least squares variable selection estimators such as forward selection and backward elimination. This theory is useful for comparing these estimators with the elastic net, lasso, and ridge regression when the sample size is large compared to the number of predictors. KEY WORDS: Elastic Net; Forward Selection; Lasso; Mixture Distributions; Relaxed Lasso; Ridge Regression. 1 INTRODUCTION In this section we review the large sample theory for some shrinkage estimators and review the variable selection model. The following section will give large sample theory for OLS variable selection estimators. We assume the number of predictors, p, is fixed. Suppose that the response variable Y i and at least one predictor variable x i,j are quantitative with x i,1 1. Let x T i = (x i,1,..., x i,p ) = (1 u T i ) and β = (β 1,..., β p ) T where β 1 corresponds to the intercept. Then the multiple linear regression model is Y i = β 1 + x i,2 β x i,p β p + e i = x T i β + e i (1) for i = 1,..., n. This model is also called the full model. Here n is the sample size, and assume that the random variables e i are independent and identically distributed (iid) with variance V (e i ) = σ 2. In matrix notation, these n equations become Y = Xβ + e (2) where Y is an n 1 vector of response variables, X is an n p matrix of predictors, β is a p 1 vector of unknown coefficients, and e is an n 1 vector of unknown errors. The David J. Olive is Professor, Department of Mathematics, Southern Illinois University, Carbondale, IL 62901, USA. Lasanthi C. R. Pelawa Watagoda is Visiting Assistant Professor, Appalachian State University, Boone, NC , USA. 1

2 ith fitted value Ŷi = x T ˆβ i and the ith residual r i = Y i Ŷi where ˆβ is an estimator of β. Ordinary least squares (OLS) is often used for inference if n/p is large. It is often convenient to use the centered response Z = Y Y where Y = Y 1, and the n (p 1) matrix of standardized nontrivial predictors W = (W ij ). For j = 1,..., p 1, let W ij denote the (j + 1)th variable standardized so that n i=1 W ij = 0 and n i=1 W ij 2 = n. Note that the sample correlation matrix of the nontrivial predictors u i is Ru = W T W /n. Then regression through the origin is used for the model Z = W η + e (3) where the vector of fitted values Ŷ = Y + Ẑ. There are many methods for estimating β, including backward elimination and forward selection with OLS, elastic net due to Zou and Hastie (2005), lasso due to Tibshirani (1996), and ridge regression: see Hoerl and Kennard (1970). We also used the variant of relaxed lasso that applies OLS to a constant and the predictors that had nonzero lasso coefficients, which is the LARS-OLS hybrid estimator of Efron, Hastie, Johnstone, and Tibshirani (2004), also called the relaxed lasso (φ = 0) estimator by Meinshausen (2007). Some large sample theory for these estimators will be summarized below. These methods produce M models and use a criterion to select the final model (e.g., C p or 10-fold cross validation (CV)). The number of models M depends on the method. Lasso and ridge regression have a parameter λ, and if λ = 0, then the OLS full model is used. These two methods also use a maximum value λ M of λ and a grid of M λ values 0 λ 1 < λ 2 < < λ M 1 < λ M. For lasso, λ M is the smallest value of λ such that ˆη λm = 0. Hence ˆη λi 0 for i < M. See James, Witten, Hastie, and Tibshirani (2013, ch. 6). Consider choosing ˆη to minimize the criterion Q(η) = 1 a (Z W η)t (Z Wη) + λ 1,n a p 1 η i j (4) where λ 1,n 0, a > 0, and j > 0 are known constants. Then j = 2 corresponds to ridge regression, j = 1 corresponds to lasso, and a = 1, 2, n, and 2n are common. The residual sum of squares RSS(η) = (Z Wη) T (Z W η), and λ 1,n = 0 corresponds to the OLS estimator ˆη OLS = (W T W ) 1 W T Z. For model (4), Knight and Fu (2000) proved that i) ˆη is a consistent estimator of η if λ 1,n = o(n) so λ 1,n /n 0 as n, ii) ˆη is a n consistent estimator of η if λ 1,n = O( n) (so λ 1,n / n is bounded), and iii) ˆη OLS, lasso, and ridge regression are asymptotically equivalent if λ 1,n / n 0 as n. Assume that the sample correlation matrix Ru = W T W n i=1 P V 1 (5) where V 1 = ρ u, the population correlation matrix of the nontrivial predictors u i, if the u i are a random sample from a population. Under (5), if λ 1,n /n 0 then W T W + λ 1,n I p 1 n P V 1, and n(w T W + λ 1,n I p 1 ) 1 P V. 2

3 Let H = W(W T W) 1 W T, and assume that max i=1,...,n h ii P 0 as n. Then the OLS estimator satisfies n(ˆηols η) D N p 1 (0, σ 2 V ). (6) The following identity from Gunst and Mason (1980, p. 342) is useful for ridge regression inference: ˆη R = (W T W + λ 1,n I p 1 ) 1 W T Z = (W T W + λ 1,n I p 1 ) 1 W T W (W T W) 1 W T Z = (W T W + λ 1,n I p 1 ) 1 W T W ˆη OLS = A nˆη OLS = [I p 1 λ 1,n (W T W + λ 1,n I p 1 ) 1 ]ˆη OLS = B nˆη OLS = ˆη OLS λ 1n n n(w T W + λ 1,n I p 1 ) 1ˆη OLS since A n B n = 0. The following identity from Efron and Hastie (2016, p. 308), for example, is useful for inference for the lasso estimator ˆη L : 1 n W T (Z W ˆη L ) + λ 1,n 2n s n = 0 or W T (Z W ˆη L ) + λ 1,n 2 s n = 0 where s in [ 1, 1] and s in = sign(ˆη i,l ) if ˆη i,l 0. Here sign(η i ) = 1 if η i > 1 and sign(η i ) = 1 if η i < 1. Note that s n = s n, ˆη L depends on ˆη L. Thus ˆη L = (W T W ) 1 W T Z λ 1,n 2n n(w T W ) 1 s n = ˆη OLS λ 1,n 2n n(w T W) 1 s n. Following Hastie, Tibshirani, and Wainwright (2015, p. 57), the elastic net estimator ˆη EN minimizes Q EN (η) = RSS(η) + λ 1 η λ 2 η 1 (7) where λ 1 = (1 α)λ 1,n and λ 2 = 2αλ 1,n with 0 α 1. Following Jia and Yu (2010), by standard Karush-Kuhn-Tucker (KKT) conditions for convex optimality for Equation (7), ˆη EN is optimal if 2W T W ˆη EN 2W T Z + 2λ 1ˆη EN + λ 2 s n = 0, or (W T W + λ 1 I p 1 )ˆη EN = W T Z λ 2 2 s n, or Hence ˆη EN = ˆη R n(w T W + λ 1 I p 1 ) 1 λ 2 2n s n. (8) ˆη EN = ˆη OLS λ 1 n n(w T W + λ 1 I p 1 ) 1 ˆη OLS λ 2 2n n(w T W + λ 1 I p 1 ) 1 s n = ˆη OLS n(w T W + λ 1 I p 1 ) 1 [ λ 1 n ˆη OLS + λ 2 2n s n]. 3

4 Note that if ˆλ 1,n / n P τ and ˆα P ψ, then ˆλ 1 / n P (1 ψ)τ and ˆλ 2 / n P 2ψτ. Under these conditions, n(ˆηen η) = n(ˆη OLS η) n(w T W + ˆλ 1 I p 1 ) 1 [ ˆλ 1 n ˆη OLS + ˆλ 2 2 n s n]. The following theorem shows the elastic net, lasso, and ridge regression are asymptotically equivalent to the OLS full model if ˆλ 1,n / n P 0, and will be useful for comparing these estimators and the OLS variable selection estimators. The theorem follows from results in Knight and Fu (2000) and Slawski, zu Castell, and Tutz (2010). Also see Zou and Zhang (2009). Let ˆη A be ˆη EN, ˆη L, or ˆη R. Note that c) follows from b) if ψ = 0, and d) follows from b) (using 2ˆλ 1,n / n P 2τ) if ψ = 1. Recall that we are assuming that p is fixed. Theorem 1. Assume that the conditions of the OLS theory (6) hold for the model Z = Wη + e. a) If ˆλ 1,n / n P 0, then n(ˆηa η) D N p 1 (0, σ 2 V ). b) If ˆλ 1,n / n P τ 0, ˆα P ψ [0, 1], and s n P s = sη, then n(ˆηen η) D N p 1 ( V [(1 ψ)τη + ψτs], σ 2 V ). c) If ˆλ 1,n / n P τ 0, then n(ˆηr η) D N p 1 ( τv η, σ 2 V ). d) If ˆλ 1,n / n P τ 0 and s n P s = sη, then n(ˆηl η) D N p 1 ( τ 2 V s, σ2 V Next we describe variable selection, and then develop theory in Section 2. Variable selection is the search for a subset of predictor variables that can be deleted with little loss of information if n/p is large. Following Olive and Hawkins (2005), a model for variable selection can be described by ). x T β = x T Sβ S + x T Eβ E = x T Sβ S (9) where x = (x T S, xt E )T, x S is an a S 1 vector, and x E is a (p a S ) 1 vector. Given that x S is in the model, β E = 0 and E denotes the subset of terms that can be eliminated given that the subset S is in the model. Let x I be the vector of a terms from a candidate subset indexed by I, and let x O be the vector of the remaining predictors (out of the candidate submodel). Suppose that S is a subset of I and that model (9) holds. Then x T β = x T Sβ S = x T Sβ S + x T I/Sβ (I/S) + x T O0 = x T I β I (10) 4

5 where x I/S denotes the predictors in I that are not in S. Since this is true regardless of the values of the predictors, β O = 0 if S I. Forward selection forms a sequence of submodels I 1,..., I M where I j uses j predictors including the constant. Let I 1 use x 1 = x 1 1: the model has a constant but no nontrivial predictors. To form I 2, consider all models I with two predictors including x 1. Compute Q 2 (I) = SSE(I) = RSS(I) = r T (I)r(I) = n i=1 r2 i (I) = n i=1 (Y i Ŷi(I)) 2. Let I 2 minimize Q 2 (I) for the p 1 models I that contain x 1 and one other predictor. Denote the predictors in I 2 by x 1, x 2. In general, to form I j consider all models I with j predictors including variables x 1,..., x j 1. Compute Q j(i) = r T (I)r(I) = n i=1 r2 i (I) = n i=1 (Y i Ŷ i (I)) 2. Let I j minimize Q j (I) for the p j+1 models I that contain x 1,..., x j 1 and one other predictor not already selected. Denote the predictors in I j by x 1,..., x j. Continue in this manner for j = 2,..., M = p. When there is a sequence of M submodels, the final submodel I d needs to be selected. Let the candidate model I contain a terms, including a constant. For example, let x I and ˆβ I be a 1 vectors. Then there are many criteria used to select the final submodel I d. For a given data set, the quantities p, n, and ˆσ 2 act as constants, and a criterion below may add a constant or be divided by a positive constant without changing the subset I min that minimizes the criterion. Let criteria C S (I) have the form C S (I) = SSE(I) + ak nˆσ 2. These criteria need a good estimator of σ 2 and n/p large. The criterion C p (I) = AIC S (I) uses K n = 2 while the BIC S (I) criterion uses K n = log(n). See Jones (1946) and Mallows (1973) for C p. Typically ˆσ 2 is the OLS full model MSE = n i=1 r 2 i n p when n/p is large. Then ˆσ 2 = MSE is a n consistent estimator of σ 2 under mild conditions by Su and Cook (2012). The following criteria also need n/p large. AIC is due to Akaike (1973) and BIC to Schwarz (1978). ( ) SSE(I) AIC(I) = nlog + 2a, and n ( ) SSE(I) BIC(I) = nlog + a log(n). n Let p be fixed and let I min be the submodel that minimizes the criterion using variable selection with OLS. Following Nishi (1984), the probability that model I min from C p or AIC underfits goes to zero as n. Hence P(S I min ) 1 as n. This result holds for all subsets regression and variable selection methods, such as forward selection and backward elimination, that produce a sequence of nested models including the full model. The above criteria can be applied to forward selection and relaxed lasso. The C p criterion can also be applied to lasso. See Efron and Hastie (2016, pp. 221, 231). 5

6 Section 2 gives large sample theory for OLS variable selection estimators such as forward selection. 2 Large sample theory for OLS variable selection estimators Large sample theory for the elastic net, lasso, and ridge regression is simple using the KKT conditions since the optimization problem is convex. The optimization problem for variable selection is not convex, so new tools are needed. One technique is to consider variable selection models where the probability that the model selects the true set S goes to one. See Leeb and Pötscher (2005). A problem is that n(ˆβ Imin β s ) is only defined if ˆβ Imin has the same dimension as β S. We will show that large sample theory becomes simple by using zero padding. If ˆβ I is a 1, form the p 1 vector ˆβ I,0 from ˆβ I by adding 0s corresponding to the omitted variables. For example, if p = 4 and ˆβ Imin = (ˆβ 1, ˆβ 3 ) T, then ˆβ Imin,0 = (ˆβ 1, 0, ˆβ 3, 0) T. Since fewer than 2 p regression models I contain the true model S, and each such model gives a n consistent estimator ˆβ I,0 of β, the probability that I min picks one of these models goes to one as n. Hence ˆβ Imin,0 is a n consistent estimator of β under model (9). Olive (2017a: p. 123, 2017b: p. 176) showed that ˆβ Imin,0 is a consistent estimator. This section will use mixture distributions to find the limiting distribution of n(ˆβ Imin,0 β). Mixture distributions are useful for model and variable selection since ˆβ Imin,0 is a mixture distribution of ˆβ Ij,0, and the lasso estimator ˆβ L is a mixture distribution of ˆβ L,λi for i = 1,..., M. A random vector u has a mixture distribution of random vectors u j with probabilities π j if u equals random vector u j with probability π j for j = 1,..., J. Let u and u j be p 1 random vectors. Then the cumulative distribution function (cdf) of u is Fu(t) = π j Fu j (t) (11) where the probabilities π j satisfy 0 π j 1 and J π j = 1, J 2, and Fu j (t) is the cdf of u j. Suppose E(h(u)) and the E(h(u j )) exist. Then E(h(u)) = π j E[h(u j )]. (12) Hence E(u) = π j E[u j ], (13) and Cov(u) = E(uu T ) E(u)E(u T ) = E(uu T ) E(u)[E(u)] T = 6

7 J π je[u j u T j ] E(u)[E(u)] T = π j Cov(u j ) + π j E(u j )[E(u j )] T E(u)[E(u)] T. (14) If E(u j ) = θ for j = 1,..., J, then E(u) = θ and Cov(u) = π j Cov(u j ). Note that E(u)[E(u)] T = k=1 π j π k E(u j )[E(u k )] T. (15) Now suppose that T n is equal to the estimator T jn with probability π jn for j = 1,..., J where j π jn = 1, π jn π j as n, and u jn = n(t jn θ) D u j with E(u j ) = 0 and Cov(u j ) = Σ j. Then T n has a mixture distribution of the T jn with probabilities π jn, and the cdf of T n is F Tn (z) = j π jnf Tjn (z) where F Tjn (z) is the cdf of T jn. Hence n(tn θ) has a mixture distribution of the n(t jn θ), and n(tn θ) D u (16) where the cdf of u is Fu(z) = j π jfu j (z) and Fu j (z) is the cdf of u j. Thus u is a mixture distribution of the u j with probabilities π j, E(u) = 0, and Cov(u) = Σu = j π jσ j. Applying the above results with large sample theory for OLS makes large sample theory for OLS variable selection simple. Assume the maximum leverage max i=1,...,n x T ii (XT I X i) 1 x ii 0 in probability as n for each I with S I. For the full OLS model, n(ˆβ β) D N p (0, σ 2 V ) where (X T X)/n P V 1. See, for example, Olive (2017a, p. 39) and Sen and Singer (1993, p. 280). For OLS variable selection with C p, let ˆβ Ij = (X T I j X Ij ) 1 X T I j Y = D j Y, T n = ˆβ Imin,0 and T jn = ˆβ Ij,0 = D j,0 Y where D j,0 adds rows of zeroes to D j corresponding to the x i not in I j. Let T n = T kn = ˆβ Ik,0 with probabilities π kn where π kn π k as n. Denote the π k with S I k by π j. The other π k = 0 by Nishi (1984). Then n(ˆβ Ij β Ij ) D N aj (0, σ 2 V j ) and u jn = n(ˆβ Ij,0 β) D u j N p (0, σ 2 V j,0 ) where n(x T I j X Ij ) 1 P V j and V j,0 adds columns and rows of zeroes corresponding to the x i not in I j. Hence Σ j = σ 2 V j,0 is singular unless I j corresponds to the full model. Then (16) holds: n(ˆβimin,0 β) D u (17) where the cdf of u is Fu(z) = j π jfu j (z). Thus u is a mixture distribution of the u j with probabilities π j, E(u) = 0, and Cov(u) = Σu = j π jσ 2 V j,0. The values of π j depend on the OLS variable selection method with C p, such as backward elimination, forward selection, all subsets, and if λ 1 = 0, the variant of relaxed lasso that computes 7

8 the OLS submodel for the subset corresponding to λ i for i = 1,..., M. Let A be a g p full rank matrix with 1 g p. Then n(aˆβimin,0 Aβ) D Au where Au has a mixture distribution of the Au j N g (0, σ 2 AV j,0 A T ) with probabilities π j. Two special cases are interesting. First, suppose π d = 1 so u u d N p (0,Σ d ). This special case occurs for C p if a S = p so S is the full model, and for methods like BIC that choose I S with probability going to one. The second special case occurs if for each π j > 0, Au j N g (0, AΣ j A T ) = N g (0, AΣA T ). Then n(aˆβ Imin,0 Aβ) D Au N g (0, AΣA T ). This special case occurs for ˆβ S if the nontrivial predictors are orthogonal or uncorrelated with zero mean so X T X/n diag(d 1,..., d p ) as n where each d i > 0. Then ˆβ S has the same multivariate normal limiting distribution for I min and for the OLS full model. 3 Conclusion Results in Claeskens and Hjort (2008, pp. 101, 102, 232) suggest that the probability that AIC underfits goes to zero for many models. Hence with AIC variable selection, n(ˆβimin,0 β) D u for many time series models, generalized linear models, and survival regression models. Efron and Hastie (2016, p. 4) note that inference is needed to compare and assess methods. OLS variable selection estimators are n consistent under mild conditions. The elastic net, lasso, and ridge regression are consistent estimators of β if λ 1,n = o(n) and n consistent if λ 1,n = O( n). These three estimators are asymptotically equivalent to the OLS full model if λ 1,n / n 0 as n. The OLS variable selection estimators have a limiting distribution that is a mixture distribution of the limiting distributions of the OLS full model and other models I j such that S I j. Hence the OLS variable selection estimators can give more precise estimators of β than the OLS full model if a S < p. Usually ˆλ 1,n is selected using a criterion such as k fold CV, AIC, BIC, or GCV. It is not clear whether ˆλ 1,n = o(n). For n/p large, often the lasso program chooses λ 1 > 0. Adding λ 1 = 0 if n 5p should improve the elastic net, lasso, and ridge regression estimators. Using a λ i near n/log(n) may also be useful. For the elastic net and lasso, λm /n does not go to zero as n since ˆη = 0 is not a consistent estimator. Hence λ M is likely proportional to n, and using λ i = iλ M /M for i = 1,..., M will not produce a consistent estimator. In addition to large sample theory, shrinkage estimators can be compared with asymptotically optimal prediction intervals, even if n/p is not large. See Pelawa Watagoda and Olive (2018a). If n/p is large, Olive (2018, 2017a: pp , 2017b: pp ) suggest a bootstrap confidence region that simulates well for OLS variable selection estimators and, in limited simulations, for lasso. Pelawa Watagoda and Olive (2018b) give some theory for this application. 8

9 Response plots of the fitted values Ŷ versus the response Y are useful for checking linearity of the multiple linear regression model and for detecting outliers. Residual plots should also be made. REFERENCES Akaike, H. (1973), Information Theory as an Extension of the Maximum Likelihood Principle, in Proceedings, 2nd International Symposium on Information Theory, eds. Petrov, B.N., and Csakim, F., Akademiai Kiado, Budapest, Claeskens, G., and Hjort, N.L. (2008), Model Selection and Model Averaging, Cambridge University Press, New York, NY. Efron, B., and Hastie, T. (2016), Computer Age Statistical Inference, Cambridge University Press, New York, NY. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004), Least Angle Regression, (with discussion), The Annals of Statistics, 32, Gunst, R.F., and Mason, R.L. (1980), Regression Analysis and Its Application, Marcel Dekker, New York, NY. Hastie, T., Tibshirani, R., and Wainwright, M. (2015), Statistical Learning with Sparsity: the Lasso and Generalizations, CRC Press Taylor & Francis, Boca Raton, FL. Hoerl, A.E., and Kennard, R. (1970), Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, 12, James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning With Applications in R, Springer, New York, NY. Jia, J., and Yu, B. (2010), On Model Selection Consistency of the Elastic Net When p >> n, Statistica Sinica, 20, Jones, H.L. (1946), Linear Regression Functions with Neglected Variables, Journal of the American Statistical Association, 41, Knight, K., and Fu, W.J. (2000), Asymptotics for Lasso-Type Estimators, The Annals of Statistics, 28, Leeb, H., and Pötscher, B.M. (2005), Model Selection and Inference: Facts and Fiction, Econometric Theory, 21, Mallows, C. (1973), Some Comments on C p, Technometrics, 15, Meinshausen, N. (2007), Relaxed Lasso, Computational Statistics & Data Analysis, 52, Nishi, R. (1984), Asymptotic Properties of Criteria for Selection of Variables in Multiple Regression, The Annals of Statistics, 12, Olive, D.J. (2017a), Linear Regression, Springer, New York, NY. Olive, D.J. (2017b), Robust Multivariate Analysis, Springer, New York, NY. Olive, D.J. (2018), Applications of Hyperellipsoidal Prediction Regions, Statistical Papers, to appear. Olive, D.J., and Hawkins, D.M. (2005), Variable Selection for 1D Regression Models, Technometrics, 47, Pelawa Watagoda, L.C.R., and Olive, D. J. (2018a), Comparing Shrinkage Estimators With Asymptotically Optimal Prediction Intervals. Unpublished manuscript at ( 9

10 Pelawa Watagoda, L. C. R., and Olive, D.J. (2018b), Bootstrapping Multiple Linear Regression After Variable Selection. Unpublished manuscript at ( siu.edu/olive/ppboottest.pdf). Schwarz, G. (1978), Estimating the Dimension of a Model, The Annals of Statistics, 6, Sen, P.K., and Singer, J.M. (1993), Large Sample Methods in Statistics: an Introduction with Applications, Chapman & Hall, New York. Slawski, M., zu Castell, W., and Tutz, G., (2010), Feature Selection Guided by Structural Information, The Annals of Applied Statistics, 4, Su, Z., and Cook, R.D. (2012), Inner Envelopes: Efficient Estimation in Multivariate Linear Regression, Biometrika, 99, Tibshirani, R. (1996), Regression Shrinkage and Selection Via the Lasso, Journal of the Royal Statistical Society, B, 58, Zou, H., and Hastie, T. (2005), Regularization and Variable Selection Via the Elastic Net, Journal of the Royal Statistical Society Series, B, 67, Zou, H., and Zhang, H.H. (2009), On the Adaptive Elastic-Net with a Diverging Number of Parameters, The Annals of Statistics, 37,

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University

More information

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013 INFERENCE AFTER VARIABLE SELECTION by Lasanthi C. R. Pelawa Watagoda M.S., Southern Illinois University Carbondale, 2013 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Doctor

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Bootstrapping Hypotheses Tests

Bootstrapping Hypotheses Tests Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Summer 2015 Bootstrapping Hypotheses Tests Chathurangi H. Pathiravasan Southern Illinois University Carbondale, chathurangi@siu.edu

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Bootstrapping Some GLMs and Survival Regression Models After Variable Selection

Bootstrapping Some GLMs and Survival Regression Models After Variable Selection Bootstrapping Some GLMs and Survival Regression Models After Variable Selection Rasanji C. Rathnayake and David J. Olive Southern Illinois University January 1, 2019 Abstract Consider a regression model

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR

Institute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

A robust hybrid of lasso and ridge regression

A robust hybrid of lasso and ridge regression A robust hybrid of lasso and ridge regression Art B. Owen Stanford University October 2006 Abstract Ridge regression and the lasso are regularized versions of least squares regression using L 2 and L 1

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis

Consistency of test based method for selection of variables in high dimensional two group discriminant analysis https://doi.org/10.1007/s42081-019-00032-4 ORIGINAL PAPER Consistency of test based method for selection of variables in high dimensional two group discriminant analysis Yasunori Fujikoshi 1 Tetsuro Sakurai

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

PENALIZING YOUR MODELS

PENALIZING YOUR MODELS PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

Spatial Lasso with Applications to GIS Model Selection

Spatial Lasso with Applications to GIS Model Selection Spatial Lasso with Applications to GIS Model Selection Hsin-Cheng Huang Institute of Statistical Science, Academia Sinica Nan-Jung Hsu National Tsing-Hua University David Theobald Colorado State University

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity International Journal of Statistics and Applications 21, (): 17-172 DOI: 1.523/j.statistics.21.2 Regularized Multiple Regression Methods to Deal with Severe Multicollinearity N. Herawati *, K. Nisa, E.

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014 Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis

Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Consistency of Test-based Criterion for Selection of Variables in High-dimensional Two Group-Discriminant Analysis Yasunori Fujikoshi and Tetsuro Sakurai Department of Mathematics, Graduate School of Science,

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,

More information

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.

ABSTRACT. POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell. ABSTRACT POST, JUSTIN BLAISE. Methods to Improve Prediction Accuracy under Structural Constraints. (Under the direction of Howard Bondell.) Statisticians are often faced with the difficult task of model

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Bias-corrected Estimators of Scalar Skew Normal

Bias-corrected Estimators of Scalar Skew Normal Bias-corrected Estimators of Scalar Skew Normal Guoyi Zhang and Rong Liu Abstract One problem of skew normal model is the difficulty in estimating the shape parameter, for which the maximum likelihood

More information

Prediction Intervals for Regression Models

Prediction Intervals for Regression Models Southern Illinois University Carbondale OpenSIUC Articles and Preprints Department of Mathematics 3-2007 Prediction Intervals for Regression Models David J. Olive Southern Illinois University Carbondale,

More information

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building Cong Liu, Tao Shi and Yoonkyung Lee Department of Statistics, The Ohio State University Abstract Variable selection

More information

Lecture 7: Modeling Krak(en)

Lecture 7: Modeling Krak(en) Lecture 7: Modeling Krak(en) Variable selection Last In both time cases, we saw we two are left selection with selection problems problem -- How -- do How we pick do we either pick either subset the of

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

The Risk of James Stein and Lasso Shrinkage

The Risk of James Stein and Lasso Shrinkage Econometric Reviews ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://tandfonline.com/loi/lecr20 The Risk of James Stein and Lasso Shrinkage Bruce E. Hansen To cite this article: Bruce

More information

1 The classical linear regression model

1 The classical linear regression model THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Multicollinearity and A Ridge Parameter Estimation Approach

Multicollinearity and A Ridge Parameter Estimation Approach Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

A Simple Plot for Model Assessment

A Simple Plot for Model Assessment A Simple Plot for Model Assessment David J. Olive Southern Illinois University September 16, 2005 Abstract Regression is the study of the conditional distribution y x of the response y given the predictors

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information