Inference After Variable Selection

Size: px
Start display at page:

Download "Inference After Variable Selection"

Transcription

1 Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda June 12, 2017

2 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection 3.1 Criteria for final model selection 3.2 R functions for variable selection 3.3 New OLS Sub Model Theorem 4 Prediction Intervals 4.1 Shorth PI 4.2 Two new prediction intervals 5 Bootstrapping Hypothesis Tests 5.1 Bootstrap a Confidence Region 5.2 Prediction Region Method 6 Examples and Simulations 2/53 Lasanthi Pelawa Watagoda SIU Carbondale

3 The Multiple Linear Regression model Suppose that the response variable Y i and at least one predictor variable x i,j are quantitative with x i,1 1. Let x T i = (x i,1,..., x i,p ) = (1 u T i ) and β = (β 1,..., β p ) T where β 1 corresponds to the intercept. The multiple linear regression (MLR) model Y i = β 1 + x i,2 β x i,p β p + e i = x T i β + e i (1) for i = 1,..., n. This model is also called the full model. Here n is the sample size and the random variable e i is the ith error. 3/53 Lasanthi Pelawa Watagoda SIU Carbondale

4 The Multiple Linear Regression model Matrix Notation In matrix notation, these n equations become Y = Xβ + e, (2) Y - n 1 vector of dependent variables X - n p matrix of predictors β - p 1 vector of unknown coefficients e - n 1 vector of unknown errors ˆβ - estimator of β Ŷ i = x T i ˆβ - ith fitted value r i = Y i Ŷi - ith residual Ordinary least squares (OLS) is often used for inference if n/p is large. 4/53 Lasanthi Pelawa Watagoda SIU Carbondale

5 The Multiple Linear Regression model More Notation The centered response Z = Y Y where Y = Y1. W = (W ij ) = n (p 1) matrix of standardized nontrivial predictors For j = 1,..., p 1. let W ij denote the (j + 1)th variable standardized so that n i=1 W ij = 0 and n i=1 W2 ij = n. Hence W ij = x i,j+1 x j+1 σ j+1 where σ 2 j+1 = 1 n n (x i,j+1 x j+1 ) 2. i=1 The sample correlation matrix of the nontrivial predictors u i is Ru = WT W. n 5/53 Lasanthi Pelawa Watagoda SIU Carbondale

6 The Multiple Linear Regression model More Notation Then regression through the origin is used for the model where the vector of fitted values Ŷ = Y + Ẑ. Z = Wη + e (3) 6/53 Lasanthi Pelawa Watagoda SIU Carbondale

7 The Multiple Linear Regression model Alternative Methods For Estimating β Method Forward Selection with OLS Principal Component Regression Partial Least Squares Ridge Regression Lasso Relaxed Lasso Criterion Minimum C p and EBIC 10 fold cross validation (CV) 10 fold cross validation (CV) 10 fold cross validation (CV) 10 fold cross validation (CV) 10 fold cross validation (CV) 7/53 Lasanthi Pelawa Watagoda SIU Carbondale

8 The Multiple Linear Regression model Alternative Methods For Estimating β All of these methods produce M models. The number of models M depends on the method. One of the models is the full model (1) that uses all p 1 nontrivial predictors. The full model is (approximately) fit with (ordinary) least squares (OLS). Some of the methods use ˆβ = 0 and fit the model Y i = β 1 + e i. 8/53 Lasanthi Pelawa Watagoda SIU Carbondale

9 The Multiple Linear Regression model Alternative Methods For Estimating β Lasso and ridge regression have a parameter λ (Tuning Parameter). When λ = 0, the full OLS model is used. These methods also use a maximum value λ M of λ and a grid of M λ values λ 1 = 0 < λ 2 < < λ M 1 < λ M. For Lasso, λ M is the smallest value of λ such that ˆβ λm = 0. Hence ˆβ λi 0 for i < M. For forward selection, PCR, and PLS, M p. 9/53 Lasanthi Pelawa Watagoda SIU Carbondale

10 Minimizing Criterion for Ridge and Lasso Consider choosing ˆη to minimize the criterion Q(η) = 1 a (Z Wη)T (Z Wη) + λ 1,n a p 1 η i j (4) λ 1,n 0, a > 0, and j > 0 are known constants. j = 2 corresponds to ridge regression, j = 1 corresponds to lasso, and a = 1, 2, n, and 2n are common. The residual sum of squares RSS(η) = (Z Wη) T (Z Wη), and λ 1,n = 0 corresponds to the OLS estimator ˆη OLS = (W T W) 1 W T Z. i=1 10/53 Lasanthi Pelawa Watagoda SIU Carbondale

11 Limiting Distribution For Ridge Regression Assume p is fixed. Knight and Fu (2000) prove i) that ˆη is a consistent estimator of η if λ 1,n = o(n) ii) ˆη OLS and ˆη are asymptotically equivalent if λ 1,n too slowly as n, iii) ˆη is a n consistent estimator of η if λ 1,n = O( n) λ 1,n / n τ 0, then n(ˆηrr η) D N p 1 ( τvη, σ 2 V) where Ru = WT W n P V 1 (5) as n If τ = 0, then OLS and ridge regression have the same limiting distribution. 11/53 Lasanthi Pelawa Watagoda SIU Carbondale

12 Limiting Distribution For Ridge Regression Note that V 1 = ρ u if the u i are a random sample from a population with a nonsingular population correlation matrix ρ u. Under (5), if λ 1,n /n 0 then W T W + λ 1,n I p 1 n P V 1, and n(w T W + λ 1,n I p 1 ) 1 P V. 12/53 Lasanthi Pelawa Watagoda SIU Carbondale

13 Limiting Distribution For Ridge Regression The following identity from Gunst and Mason (1980, p. 342) is useful for ridge regression inference: ˆη RR = (W T W+λ 1,n I p 1 ) 1 W T Z = (W T W+λ 1,n I p 1 ) 1 W T W ˆη OLS = A n = [I p 1 λ 1,n (W T W + λ 1,n I p 1 ) 1 ]ˆη OLS = B nˆη OLS since A n B n = 0. If λ 1,n / n τ 0, then n(ˆηrr η) = n(ˆη RR ˆη OLS + ˆη OLS η) = n(ˆηols η) n λ 1,n n n(wt W + λ 1,n I p 1 ) 1ˆη OLS D N p 1 (0, σ 2 V) τvη N p 1 ( τvη, σ 2 V). 13/53 Lasanthi Pelawa Watagoda SIU Carbondale

14 Limiting Distribution For Lasso Theorem Let ˆη L be the lasso estimator. Then ( ) τ n(ˆηl η) N p 1 2 Vs, σ2 V. where Ru = WT W n P V 1 as n. If τ = 0, then OLS and lasso regression have the same limiting distribution. 14/53 Lasanthi Pelawa Watagoda SIU Carbondale

15 Limiting Distribution For Lasso New Proof Proof. The following identity from Efron and Hastie (2016, p. 308), for example, is useful for inference for the lasso estimator ˆη L : W T (Z W ˆη L ) + λ 1,n 2 s n = 0 where s in [ 1, 1] and s in = sign(ˆη i,l ) if ˆη i,l 0. Here sign(η i ) = 1 if η i > 1 and sign(η i ) = 1 if η i < 1. Note that s n = s n, ˆη L depends on ˆη L. Thus ˆη L = (W T W) 1 W T Z n(w T W) 1 λ 1,n 2n s n. 15/53 Lasanthi Pelawa Watagoda SIU Carbondale

16 Limiting Distribution For Lasso New Proof Proof. If λ 1,n / n τ 0 and s n P s = s η, then n(ˆηl η) = n(ˆη L ˆη OLS + ˆη OLS η) = n(ˆηols η) n λ 1,n 2n n(wt W) 1 s n D Np 1 (0, σ 2 V) τ 2 Vs ( ) τ N p 1 2 Vs, σ2 V. 15/53 Lasanthi Pelawa Watagoda SIU Carbondale

17 Variable Selection Definition Following Olive and Hawkins (2005), a model for variable selection can be described by x T β = x T S β S + x T E β E = x T S β S (6) where x = (x T S, xt E )T, x S is a k S 1 vector and x E is a (p k S 1) 1 vector. Given that x S is in the model, β E = 0 and E denotes the subset of terms that can be eliminated given that the subset S is in the model. 16/53 Lasanthi Pelawa Watagoda SIU Carbondale

18 M Submodels M Submodels using Forward Selection Forward selection forms a sequence of submodels I 1,..., I M where I j uses j predictors including the constant. Let I 1 use x 1 = x 1 1: the model has a constant but no nontrivial predictors. To form I 2, consider all models I with two predictors including x 1. Compute Q 2(I) = SSE(I) = RSS(I) = r T (I)r(I) = n i=1 r2 i (I) = n i=1 (Y i Ŷi(I)) 2. Let I 2 minimize Q 2 (I) for the p 1 models I that contain x 1 and one other predictor. 17/53 Lasanthi Pelawa Watagoda SIU Carbondale

19 M Submodels M Submodels using Forward Selection Denote the predictors in I 2 by x 1, x 2. In general, to form I j consider all models I with j predictors including variables x 1,..., x j 1. Compute Q j (I) = r T (I)r(I) = n i=1 r2 i (I) = n i=1 (Y i Ŷi(I)) 2. Let I j minimize Q j (I) for the p j + 1 models I that contain x 1,..., x j 1 and one other predictor not already selected. Denote the predictors in I j by x 1,..., x j. Continue in this manner for j = 2,..., M. Often M = min( n/j, p) for some integer J such as J = 5, 10, or /53 Lasanthi Pelawa Watagoda SIU Carbondale

20 Criteria for final model selection When there is a sequence of M submodels, the final submodel I d needs to be selected. Let x I and ˆβ I be a 1. Hence the candidate model contains a terms, including a constant. Suppose the e i are independent and identically distributed (iid) with variance V(e i ) = σ 2. Then there are many criteria used to select the final submodel I d. A simple method is to take the model that uses d = M = min( n/j, p) variables. If p is fixed, the method will use the full OLS model once n/j p. 19/53 Lasanthi Pelawa Watagoda SIU Carbondale

21 Criteria for final model selection Criterion when there is a good estimator for σ 2 Let criteria C S (I) have the form C S (I) = SSE(I) + ak nˆσ 2. These criteria need a good estimator of σ 2. The criterion C p (I) = AIC S (I) uses K n = 2 while the BIC S (I) criterion uses K n = log(n). Typically ˆσ 2 is the full OLS model MSE = n i=1 r 2 i n p when n/p is large. Then ˆσ 2 = MSE is a n consistent estimator of σ 2 under mild conditions by Su and Cook (2012). 20/53 Lasanthi Pelawa Watagoda SIU Carbondale

22 Criteria for final model selection EBIC The EBIC criterion given in Luo and Chen (2012) may work when n/p is not large. Let 0 γ 1 and I = a min(n, p) if ˆβ I is a 1. We may use a min(n/5, p). Then ( SSE(I) EBIC(I) = n log n = BIC(I) + 2γ log ) + a log(n) + 2γ log [( )] p. a [( )] p a This criterion can give good results if p = p n = O(n k ) and γ > 1 1/(2k). Hence we will use γ = 1. The above criteria can be applied to forward selection and relaxed lasso. The C p criterion can also be applied to lasso. 21/53 Lasanthi Pelawa Watagoda SIU Carbondale

23 R functions for variable selection Method Criterion R function Library Forward Selection Minimum C p regsubsets leaps PCR 10 fold CV pcr pls PLS 10 fold CV plsr pls RR 10 fold CV cv.glmnet glmnet Lasso (Relaxed) 10 fold CV cv.glmnet glmnet Table: Methods used for Variable Selection 22/53 Lasanthi Pelawa Watagoda SIU Carbondale

24 New OLS Sub Model Theorem Theorem Suppose the usual linear model Y = Xβ + e, with EY = Xβ and E(e) = 0. Then Cov(Y) = Cov(e) = σ 2 I. If we break down X and β as follows; X = [ X I X 0 ], β = [ βi β 0 Xβ = X I β I + X 0 β 0 where β I = [ X T I X I] 1 X T I Y = AY, then E( ˆβ I ) = β I + [ X T I X I] 1 X T I X 0 β 0 and Cov( ˆβ I ) = σ 2 ( X T I X I) 1. ], 23/53 Lasanthi Pelawa Watagoda SIU Carbondale

25 New OLS Sub Model Theorem: Proof Proof. Assume this is an arbitorary submodel. When the submodel contains the set of predictors S then ˆβ I works well and estimates β I, but if I does not contain S then E( ˆβ ( [X ] ) T 1 I ) = E I X I X T I Y = E (AY) = AE (Y) = AXβ = A (X I β I + X 0 β 0 ) = [ X T I X I] 1 X T I (X I β I + X 0 β 0 ) = [ X T I X I] 1 X T I X I β I + [ X T I X I] 1 X T I X 0 β 0 = β I + [ X T I X I] 1 X T I X 0 β 0. When β 0 = 0 then E( ˆβ I ) is equal to ˆβ I. The above results shows why OLS does not work well if the submodel does not contains enough predictors. 24/53 Lasanthi Pelawa Watagoda SIU Carbondale

26 New OLS Sub Model Theorem: Proof Proof. Now consider the Cov( ˆβ I ): Cov( ˆβ I ) = Cov(AY) = ACov(Y)A T = Aσ 2 IA T = σ 2 AA T ( [X = σ 2 ] ) ( T 1 I X I X T [X ] ) T 1 T I I X I X T I = σ 2 [ X T I X ] ( 1 I X T [X ] ) T 1 T I X I I X I = σ 2 ( [X T I X I ] T ) 1 = σ 2 ( X T I X I) 1. The above results shows why OLS does not work well if the submodel does not contains enough predictors. 24/53 Lasanthi Pelawa Watagoda SIU Carbondale

27 Prediction Intervals Definition Consider predicting a future test response variable Y f given a p 1 vector of predictors x f and training data (x 1, Y 1 ),..., (x n, Y n ). A large sample 100(1 δ)% prediction interval (PI) has the form [ˆL n, Ûn] where P(ˆL n Y f Ûn) 1 δ as the sample size n. 25/53 Lasanthi Pelawa Watagoda SIU Carbondale

28 Prediction Intervals Shorth PI The shorth(c) estimator is useful for making prediction intervals. Let Z (1),..., Z (n) be the order statistics of Z 1,..., Z n. Then let the shortest closed interval containing at least c of the Z i be shorth(c) = [Z (s), Z (s+c 1) ]. (7) Let k n = n(1 δ) (8) Frey (2013) showed that for large nδ and iid data, the shorth(k n ) PI has maximum undercoverage 1.12 δ/n, and used the shorth(c) estimator as the large sample 100(1 δ)% PI where c = min(n, n[1 δ δ/n ] ). (9) 26/53 Lasanthi Pelawa Watagoda SIU Carbondale

29 Two new prediction intervals The First New Prediction Interval Let q n = min(1 δ , 1 δ + d/n) for δ > 0.1 and q n = min(1 δ/2, 1 δ + 10δd/n), otherwise. (10) If 1 δ < and q n < 1 δ , set q n = 1 δ. Let c = nq n, (11) and let ( b n = ) n + 2d n n d if d 8n/9, and ( b n = ), n otherwise. Compute the shorth(c) of the residuals = [r (s), r (s+c 1) ] = [ ξ δ1, ξ 1 δ2 ]. Then the first new 100 (1 δ)% large sample PI for Y f is: (12) [ ˆm(x f ) + b n ξδ1, ˆm(x f ) + b n ξ1 δ2 ]. (13) 27/53 Lasanthi Pelawa Watagoda SIU Carbondale

30 Two new prediction intervals The Second New Prediction Interval (Validation PI) The second new PI randomly divides the data into two half sets H and V H has n H = n/2 of the cases and V has the remaining n V = n n H cases i 1,..., i nv. The estimator ˆm H (x) is computed using the training data set H. Then the validation residuals v j = Y ij ˆm H (x ij ) are computed for j = 1,..., n V cases in the validation set V. Find the Frey PI [v (s), v (s+c 1) ] of the validation residuals. Then second new 100(1 δ)% large sample PI for Y f is: [ ˆm H (x f ) + v (s), ˆm H (x f ) + v (s+c 1) ]. (14) 28/53 Lasanthi Pelawa Watagoda SIU Carbondale

31 Bootstrapping Hypothesis Tests Bootstrap a Confidence Region To bootstrap a confidence region, Mahalanobis distances and prediction regions will be useful. Consider predicting a future test value z f, given past training data z 1,..., z n where the z i are r 1 random vectors. A large sample 100(1 δ)% prediction region is a set A n such that P(z f A n ) 1 δ as n. Let the r 1 column vector T be a multivariate location estimator, and let the r r symmetric positive definite matrix C be a dispersion estimator. Then the ith squared sample Mahalanobis distance is D 2 i = D 2 i (T, C) = D2 z i (T, C) = (z i T) T C 1 (z i T) (15) for each observation z i. 29/53 Lasanthi Pelawa Watagoda SIU Carbondale

32 Bootstrapping Hypothesis Tests Bootstrap a Confidence Region Notice that the Euclidean distance of z i from the estimate of center T is D i (T, I r ) where I r is the r r identity matrix. The classical Mahalanobis distance D i uses (T, C) = (z, S), the sample mean and sample covariance matrix where z = 1 n n i=1 z i and S = 1 n 1 n (z i z)(z i z) T. (16) i=1 30/53 Lasanthi Pelawa Watagoda SIU Carbondale

33 Bootstrapping Hypothesis Tests large sample 100(1 δ)% nonparametric prediction region Let (T, C) = (z, S), and let D (Un) be the 100q n th sample quantile of the D i. Then the Olive (2013) large sample 100(1 δ)% nonparametric prediction region for a future value z f given iid data z 1,...,, z n is {z : D 2 z(z, S) D 2 (U n) }, (17) while the classical large sample 100(1 δ)% prediction region is {z : D 2 z(z, S) χ 2 r,1 δ }. (18) 31/53 Lasanthi Pelawa Watagoda SIU Carbondale

34 Bootstrapping Hypothesis Tests Prediction Region Method The Olive (2017abce) prediction region method obtains a confidence region for µ by applying the nonparametric prediction region to the bootstrap sample T 1,..., T B. Let T and S T be the sample mean and sample covariance matrix of the bootstrap sample. The prediction region method large sample 100(1 δ)% confidence region for µ is {w : D 2 w(t, S T) D 2 (U B ) } (19) where D 2 (U B ) is computed from D2 i = (T i T ) T [S T] 1 (T i T ) for i = 1,..., B. Note that the corresponding test for H 0 : µ = µ 0 rejects H 0 if (T µ 0 ) T [S T] 1 (T µ 0 ) > D 2 (U B ). 32/53 Lasanthi Pelawa Watagoda SIU Carbondale

35 Bootstrapping Hypothesis Tests Prediction Region Method for Bootstrapping FS Following Olive (2017bce), we describe the prediction region method for bootstrapping forward selection where 0s need to be added for omitted variables. Suppose n > 20p. If ˆβ I is (k 1) 1, form ˆβ I,0 from ˆβ I by adding 0s corresponding to the omitted variables. Then ˆβ I,0 is a nonlinear estimator of β, and the residual bootstrap method can be applied. 33/53 Lasanthi Pelawa Watagoda SIU Carbondale

36 Bootstrapping Hypothesis Tests Prediction Region Method for Bootstrapping FS Let ˆβ = ˆβ Imin,0 be formed from the forward selection model I min that minimizes the C p criterion. Instead of computing the least squares estimator from regressing Y i on X, perform variable selection on Y i and X, fit the model that minimizes the criterion, and add 0s corresponding to the omitted variables, resulting in estimators ˆβ 1,..., ˆβ B. Then test µ = Aβ = c using the prediction region method for r > 1 and the shorth if r = 1. Bootstrapping lasso and ridge regression is similar, but lasso already has the 0s and ridge regression usually does not produce 0 slope estimates. 34/53 Lasanthi Pelawa Watagoda SIU Carbondale

37 Bootstrapping Hypothesis Tests Residual Bootstrap For MLR model with n > 20p, the residual bootstrap makes sense: The residuals from the full model are sampled with replacement resulting in a bootstrap sample r 1,..., r n and Y i = Ŷi + r i for i = 1,..., n are collected into a vector Y j which is regressed on X to get ˆβ j for j = 1,..., n. 35/53 Lasanthi Pelawa Watagoda SIU Carbondale

38 Examples and Simulations Example for PI The Hebbler (1847) data was collected from n = 26 districts in Prussia in We will study the relationship between Y = the number of women married to civilians in the district with the predictors x 1 = constant x 2 = pop = the population of the district in 1843 x 3 = mmen = the number of married civilian men in the district x 4 = mmilmen = number of married men in the military in the district x 5 = milwmn = the number of women married to husbands in the military in the district 36/53 Lasanthi Pelawa Watagoda SIU Carbondale

39 Examples and Simulations Example for PI Sometimes the person conducting the survey would not count a spouse if the spouse was not at home. Hence Y is highly correlated but not equal to x 3. Similarly, x 4 and x 5 are highly correlated but not equal. We expect that Y = x 3 + e is a good model, but n/p = 5.2 is small. 37/53 Lasanthi Pelawa Watagoda SIU Carbondale

40 Examples and Simulations Example for PI Forward selection selected the model with the minimum C p while the other methods used 10-fold CV PLS and PCR used the OLS full model with PI length Forward selection used a constant and mmen with PI length Ridge regression had PI length Lasso and relaxed lasso used a constant, mmen, and pop with lasso PI length relaxed lasso PI length /53 Lasanthi Pelawa Watagoda SIU Carbondale

41 y y Examples and Simulations Example: Marry Data Response Plots y y a) Forward Selection b) Ridge Regression yhat yhat c) Lasso d) Relaxed Lasso yhat yhat 39/53 Lasanthi Pelawa Watagoda SIU Carbondale

42 Examples and Simulations Example for PI Figure 1 shows the response plots for forward selection, ridge regression, lasso, and relaxed lasso. The plots for PLS=PCR=OLS full model were similar to those of forward selection and relaxed lasso. The plots suggest that the MLR model is appropriate since the plotted points scatter about the identity line. The 90% pointwise prediction bands are also shown, and consist of two lines parallel to the identity line. These bands are very narrow in Figure 1 a) and d). 40/53 Lasanthi Pelawa Watagoda SIU Carbondale

43 Examples and Simulations Simulations for New PI For the model Y = Xβ + e, methods such as forward selection, PCR, PLS, ridge regression, relaxed lasso, and lasso each generate M fitted models I 1,..., I M, where M depends on the method, and we considered several methods for selecting the final submodel I d. Only method i) needs n/p large. i) Let I d = I min be the model that minimizes C p for forward selection, relaxed lasso, or lasso. ii) Let I d = I min be the model that minimizes EBIC for forward selection or relaxed lasso. For forward selection, we used M = min( n/5, p). iii) Choose I d using k fold cross validation (CV). We used 10-fold CV. 41/53 Lasanthi Pelawa Watagoda SIU Carbondale

44 Examples and Simulations Simulations for New PI The simulation used p = 20, 40, n, and 2n. Let k be the number of nonzero coefficients, including the constant. k = 1, 19, and p 1. The zero mean errors e i were iid of five types: i) N(0,1) errors, ii) t 3 errors, iii) EXP(1) - 1 errors, iv) uniform( 1, 1) errors, and v) 0.9 N(0,1) N(0,100) errors. The simulation used 5000 runs, so an observed coverage in (0.94, 0.96) gives no reason to doubt that the PI has the nominal coverage of Correlations were set up to be cor(x i, x j ) = (2ψ + (m 2)ψ 2 )/(1 + (m 1)ψ 2 ) for i j where x i and x j are nontrivial predictors. The simulation used ψ = 0, 0.9 and 1/ p. 42/53 Lasanthi Pelawa Watagoda SIU Carbondale

45 Examples and Simulations Simulation Results for New PI Table: Large Sample 95% PI Coverages and Lengths, e i N(0, 1) n? p n p ψ k FS lasso RL RR PLS PCR Cov Len Cov Len Cov Len n < p Cov Len n 10p Cov Len Cov Len Cov Len n = p Cov Len n 10p Cov Len n 10p Cov Len Cov Len n < p Cov Len /53 Lasanthi Pelawa Watagoda SIU Carbondale

46 Examples and Simulations Simulation Results for New PI Table: Large Sample 95% PI Coverages and Lengths, e i N(0, 1) n? p n p ψ k FS lasso RL RR PLS PCR Cov Len Cov Len Cov Len Cov Len Cov Len n = p Cov Len Cov Len n = p Cov Len Cov Len Cov Len n = p Cov Len /53 Lasanthi Pelawa Watagoda SIU Carbondale

47 Examples and Simulations Simulation Results for New PI Tables show some simulation results for the first new PI where forward selection used C p for n 10p and EBIC for n < 10p. The other methods minimized 10-fold CV. For methods other than ridge regression, the maximum number of variables uses was approximately min( n/5, p). For n 10p, coverages tended to be near or higher than the nominal value of The lengths of the asymptotically optimal 95% PIs are i) 3.92 = 2(1.96), ii) 6.365, iii) 2.996, iv) 1.90 = 2(0.95), and v) /53 Lasanthi Pelawa Watagoda SIU Carbondale

48 Examples and Simulations Simulation Results for New PI The average PI length was often near to 1.4 to 1.8 times the asymptotically optimal length for n = 10p and close to the optimal length for n = 100p. An exception was that lasso and ridge regression often had far too long lengths for k 19 and ψ 0.5. For n p, good performance needed stronger regularity conditions. PLS tended to have severe undercoverage with small average length. PCR length was often too long for ψ = 0. 46/53 Lasanthi Pelawa Watagoda SIU Carbondale

49 Examples and Simulations Simulation Results for New PI When there was k = 1 active predictor, forward selection, lasso, and relaxed lasso often performed well. For k = 19, forward selection performed well as did lasso and relaxed lasso for ψ = 0. For dense models with k = p 1 and n/p not large, there was often undercoverage. Let d 1 be the number of active predictors in the selected model. N(0, 1) errors, ψ = 0, and d < k, an asymptotic population 95% PI has length 3.92 k d + 1. EBIC occasionally had undercoverage, especially for k = 19 or p 1, which was usually more severe for ψ = 0.9 or 1/ p. 47/53 Lasanthi Pelawa Watagoda SIU Carbondale

50 Examples and Simulations Simulations for Bootstraping Hypothesis testing A small simulation was done, using the same type of data as for the prediction interval simulation, using B = max(1000, n, 20p) and 5000 runs. The regression model used β = (1, 1, 0, 0) T with n = 100 and p = 4. When ψ = 0, the design matrix X consisted of iid N(0,1) random variables, and the full model least squares confidence intervals for β i should have length near 2t 96,0.975 σ/ n 2(1.96)σ/10 = 0.392σ when the iid zero mean errors have variance σ 2. 48/53 Lasanthi Pelawa Watagoda SIU Carbondale

51 Examples and Simulations Simulations for Bootstraping Hypothesis testing The simulation computed the Frey shorth(c) interval for each β i and used the prediction region method to test H 0 : β 3 = β 4 = 0. The nominal coverage was 0.95 with δ = Observed coverage between 0.94 and 0.96 would suggest coverage is close to the nominal value. 49/53 Lasanthi Pelawa Watagoda SIU Carbondale

52 Examples and Simulations Simulation Results for Bootstraping The regression models used the residual bootstrap on the full model least squares estimator and on the forward selection estimator ˆβ Imin,0. Results are shown for when the iid errors e i N(0, 1). Table: Bootstrapping OLS Regression and Forward Selection model ψ cov/len β 1 β 2 β 3 β 4 test reg 0 cov len vs 0 cov len reg 0.5 cov len vs 0.5 cov len reg 0.9 cov len vs 0.9 cov len /53 Lasanthi Pelawa Watagoda SIU Carbondale

53 Examples and Simulations Simulation Results for Bootstraping The cutoff will often be near χ 2 r,0.95 if the statistic T is asymptotically normal. χ 2 2,0.95 = is close to 2.45 for the full model regression bootstrap test. The coverages were near 0.95 for the regression bootstrap on the full model. Suppose ψ = 0. Then ˆβ S has the same limiting distribution for I min and the full model. Note that the average lengths and coverages were similar for the full model and forward selection I min for β 1 and β 2 and β S = (β 1, β 2 ) T. 51/53 Lasanthi Pelawa Watagoda SIU Carbondale

54 Examples and Simulations Simulation Results for Bootstraping Table: Bootstrap LASSO, ψ = 0 n eps type β 1 β 2 β 3 β 4 test cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen /53 Lasanthi Pelawa Watagoda SIU Carbondale

55 THANK YOU 53/53 Lasanthi Pelawa Watagoda SIU Carbondale

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013

INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013 INFERENCE AFTER VARIABLE SELECTION by Lasanthi C. R. Pelawa Watagoda M.S., Southern Illinois University Carbondale, 2013 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Doctor

More information

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University

More information

Large Sample Theory For OLS Variable Selection Estimators

Large Sample Theory For OLS Variable Selection Estimators Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for

More information

Bootstrapping Hypotheses Tests

Bootstrapping Hypotheses Tests Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Summer 2015 Bootstrapping Hypotheses Tests Chathurangi H. Pathiravasan Southern Illinois University Carbondale, chathurangi@siu.edu

More information

Prediction Intervals for Regression Models

Prediction Intervals for Regression Models Southern Illinois University Carbondale OpenSIUC Articles and Preprints Department of Mathematics 3-2007 Prediction Intervals for Regression Models David J. Olive Southern Illinois University Carbondale,

More information

Highest Density Region Prediction

Highest Density Region Prediction Highest Density Region Prediction David J. Olive Southern Illinois University November 3, 2015 Abstract Practical large sample prediction regions for an m 1 future response vector y f, given x f and past

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

i=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n

i=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n Math 583 Exam 3 is on Wednesday, Dec 3 You are allowed 10 sheets of notes and a calculator CHECK FORMULAS! 85 If the MLR (multiple linear regression model contains a constant, then SSTO = SSE + SSR where

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

A Squared Correlation Coefficient of the Correlation Matrix

A Squared Correlation Coefficient of the Correlation Matrix A Squared Correlation Coefficient of the Correlation Matrix Rong Fan Southern Illinois University August 25, 2016 Abstract Multivariate linear correlation analysis is important in statistical analysis

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Prediction and Statistical Learning

Prediction and Statistical Learning David J. Olive Prediction and Statistical Learning August 31, 2018 Springer Preface Many statistics departments offer a one semester graduate course in statistical learning theory using texts such as

More information

Kriging and Non Parametric Regrssion

Kriging and Non Parametric Regrssion Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Spring 5-15-2018 Kriging and Non Parametric Regrssion Chamila Ranaweera ranawera@siu.edu Follow this and additional works

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Introduction. Chapter Overview

Introduction. Chapter Overview Chapter 1 Introduction This chapter provides a preview of the book, and some techniques useful for visualizing data in the background of the data. Some large sample theory is presented in Section 1.4,

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

1 The classical linear regression model

1 The classical linear regression model THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used

More information

SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING

SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Focused fine-tuning of ridge regression

Focused fine-tuning of ridge regression Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models

A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Lecture 11. Correlation and Regression

Lecture 11. Correlation and Regression Lecture 11 Correlation and Regression Overview of the Correlation and Regression Analysis The Correlation Analysis In statistics, dependence refers to any statistical relationship between two random variables

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Modeling Real Estate Data using Quantile Regression

Modeling Real Estate Data using Quantile Regression Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices

More information

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University

Math 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions

More information

Bootstrapping Some GLMs and Survival Regression Models After Variable Selection

Bootstrapping Some GLMs and Survival Regression Models After Variable Selection Bootstrapping Some GLMs and Survival Regression Models After Variable Selection Rasanji C. Rathnayake and David J. Olive Southern Illinois University January 1, 2019 Abstract Consider a regression model

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Bootstrapping Analogs of the One Way MANOVA Test

Bootstrapping Analogs of the One Way MANOVA Test Bootstrapping Analogs of the One Way MANOVA Test Hasthika S Rupasinghe Arachchige Don and David J Olive Southern Illinois University July 17, 2017 Abstract The classical one way MANOVA model is used to

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

2.2 Classical Regression in the Time Series Context

2.2 Classical Regression in the Time Series Context 48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information