Inference After Variable Selection
|
|
- Jayson Warner
- 5 years ago
- Views:
Transcription
1 Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda June 12, 2017
2 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection 3.1 Criteria for final model selection 3.2 R functions for variable selection 3.3 New OLS Sub Model Theorem 4 Prediction Intervals 4.1 Shorth PI 4.2 Two new prediction intervals 5 Bootstrapping Hypothesis Tests 5.1 Bootstrap a Confidence Region 5.2 Prediction Region Method 6 Examples and Simulations 2/53 Lasanthi Pelawa Watagoda SIU Carbondale
3 The Multiple Linear Regression model Suppose that the response variable Y i and at least one predictor variable x i,j are quantitative with x i,1 1. Let x T i = (x i,1,..., x i,p ) = (1 u T i ) and β = (β 1,..., β p ) T where β 1 corresponds to the intercept. The multiple linear regression (MLR) model Y i = β 1 + x i,2 β x i,p β p + e i = x T i β + e i (1) for i = 1,..., n. This model is also called the full model. Here n is the sample size and the random variable e i is the ith error. 3/53 Lasanthi Pelawa Watagoda SIU Carbondale
4 The Multiple Linear Regression model Matrix Notation In matrix notation, these n equations become Y = Xβ + e, (2) Y - n 1 vector of dependent variables X - n p matrix of predictors β - p 1 vector of unknown coefficients e - n 1 vector of unknown errors ˆβ - estimator of β Ŷ i = x T i ˆβ - ith fitted value r i = Y i Ŷi - ith residual Ordinary least squares (OLS) is often used for inference if n/p is large. 4/53 Lasanthi Pelawa Watagoda SIU Carbondale
5 The Multiple Linear Regression model More Notation The centered response Z = Y Y where Y = Y1. W = (W ij ) = n (p 1) matrix of standardized nontrivial predictors For j = 1,..., p 1. let W ij denote the (j + 1)th variable standardized so that n i=1 W ij = 0 and n i=1 W2 ij = n. Hence W ij = x i,j+1 x j+1 σ j+1 where σ 2 j+1 = 1 n n (x i,j+1 x j+1 ) 2. i=1 The sample correlation matrix of the nontrivial predictors u i is Ru = WT W. n 5/53 Lasanthi Pelawa Watagoda SIU Carbondale
6 The Multiple Linear Regression model More Notation Then regression through the origin is used for the model where the vector of fitted values Ŷ = Y + Ẑ. Z = Wη + e (3) 6/53 Lasanthi Pelawa Watagoda SIU Carbondale
7 The Multiple Linear Regression model Alternative Methods For Estimating β Method Forward Selection with OLS Principal Component Regression Partial Least Squares Ridge Regression Lasso Relaxed Lasso Criterion Minimum C p and EBIC 10 fold cross validation (CV) 10 fold cross validation (CV) 10 fold cross validation (CV) 10 fold cross validation (CV) 10 fold cross validation (CV) 7/53 Lasanthi Pelawa Watagoda SIU Carbondale
8 The Multiple Linear Regression model Alternative Methods For Estimating β All of these methods produce M models. The number of models M depends on the method. One of the models is the full model (1) that uses all p 1 nontrivial predictors. The full model is (approximately) fit with (ordinary) least squares (OLS). Some of the methods use ˆβ = 0 and fit the model Y i = β 1 + e i. 8/53 Lasanthi Pelawa Watagoda SIU Carbondale
9 The Multiple Linear Regression model Alternative Methods For Estimating β Lasso and ridge regression have a parameter λ (Tuning Parameter). When λ = 0, the full OLS model is used. These methods also use a maximum value λ M of λ and a grid of M λ values λ 1 = 0 < λ 2 < < λ M 1 < λ M. For Lasso, λ M is the smallest value of λ such that ˆβ λm = 0. Hence ˆβ λi 0 for i < M. For forward selection, PCR, and PLS, M p. 9/53 Lasanthi Pelawa Watagoda SIU Carbondale
10 Minimizing Criterion for Ridge and Lasso Consider choosing ˆη to minimize the criterion Q(η) = 1 a (Z Wη)T (Z Wη) + λ 1,n a p 1 η i j (4) λ 1,n 0, a > 0, and j > 0 are known constants. j = 2 corresponds to ridge regression, j = 1 corresponds to lasso, and a = 1, 2, n, and 2n are common. The residual sum of squares RSS(η) = (Z Wη) T (Z Wη), and λ 1,n = 0 corresponds to the OLS estimator ˆη OLS = (W T W) 1 W T Z. i=1 10/53 Lasanthi Pelawa Watagoda SIU Carbondale
11 Limiting Distribution For Ridge Regression Assume p is fixed. Knight and Fu (2000) prove i) that ˆη is a consistent estimator of η if λ 1,n = o(n) ii) ˆη OLS and ˆη are asymptotically equivalent if λ 1,n too slowly as n, iii) ˆη is a n consistent estimator of η if λ 1,n = O( n) λ 1,n / n τ 0, then n(ˆηrr η) D N p 1 ( τvη, σ 2 V) where Ru = WT W n P V 1 (5) as n If τ = 0, then OLS and ridge regression have the same limiting distribution. 11/53 Lasanthi Pelawa Watagoda SIU Carbondale
12 Limiting Distribution For Ridge Regression Note that V 1 = ρ u if the u i are a random sample from a population with a nonsingular population correlation matrix ρ u. Under (5), if λ 1,n /n 0 then W T W + λ 1,n I p 1 n P V 1, and n(w T W + λ 1,n I p 1 ) 1 P V. 12/53 Lasanthi Pelawa Watagoda SIU Carbondale
13 Limiting Distribution For Ridge Regression The following identity from Gunst and Mason (1980, p. 342) is useful for ridge regression inference: ˆη RR = (W T W+λ 1,n I p 1 ) 1 W T Z = (W T W+λ 1,n I p 1 ) 1 W T W ˆη OLS = A n = [I p 1 λ 1,n (W T W + λ 1,n I p 1 ) 1 ]ˆη OLS = B nˆη OLS since A n B n = 0. If λ 1,n / n τ 0, then n(ˆηrr η) = n(ˆη RR ˆη OLS + ˆη OLS η) = n(ˆηols η) n λ 1,n n n(wt W + λ 1,n I p 1 ) 1ˆη OLS D N p 1 (0, σ 2 V) τvη N p 1 ( τvη, σ 2 V). 13/53 Lasanthi Pelawa Watagoda SIU Carbondale
14 Limiting Distribution For Lasso Theorem Let ˆη L be the lasso estimator. Then ( ) τ n(ˆηl η) N p 1 2 Vs, σ2 V. where Ru = WT W n P V 1 as n. If τ = 0, then OLS and lasso regression have the same limiting distribution. 14/53 Lasanthi Pelawa Watagoda SIU Carbondale
15 Limiting Distribution For Lasso New Proof Proof. The following identity from Efron and Hastie (2016, p. 308), for example, is useful for inference for the lasso estimator ˆη L : W T (Z W ˆη L ) + λ 1,n 2 s n = 0 where s in [ 1, 1] and s in = sign(ˆη i,l ) if ˆη i,l 0. Here sign(η i ) = 1 if η i > 1 and sign(η i ) = 1 if η i < 1. Note that s n = s n, ˆη L depends on ˆη L. Thus ˆη L = (W T W) 1 W T Z n(w T W) 1 λ 1,n 2n s n. 15/53 Lasanthi Pelawa Watagoda SIU Carbondale
16 Limiting Distribution For Lasso New Proof Proof. If λ 1,n / n τ 0 and s n P s = s η, then n(ˆηl η) = n(ˆη L ˆη OLS + ˆη OLS η) = n(ˆηols η) n λ 1,n 2n n(wt W) 1 s n D Np 1 (0, σ 2 V) τ 2 Vs ( ) τ N p 1 2 Vs, σ2 V. 15/53 Lasanthi Pelawa Watagoda SIU Carbondale
17 Variable Selection Definition Following Olive and Hawkins (2005), a model for variable selection can be described by x T β = x T S β S + x T E β E = x T S β S (6) where x = (x T S, xt E )T, x S is a k S 1 vector and x E is a (p k S 1) 1 vector. Given that x S is in the model, β E = 0 and E denotes the subset of terms that can be eliminated given that the subset S is in the model. 16/53 Lasanthi Pelawa Watagoda SIU Carbondale
18 M Submodels M Submodels using Forward Selection Forward selection forms a sequence of submodels I 1,..., I M where I j uses j predictors including the constant. Let I 1 use x 1 = x 1 1: the model has a constant but no nontrivial predictors. To form I 2, consider all models I with two predictors including x 1. Compute Q 2(I) = SSE(I) = RSS(I) = r T (I)r(I) = n i=1 r2 i (I) = n i=1 (Y i Ŷi(I)) 2. Let I 2 minimize Q 2 (I) for the p 1 models I that contain x 1 and one other predictor. 17/53 Lasanthi Pelawa Watagoda SIU Carbondale
19 M Submodels M Submodels using Forward Selection Denote the predictors in I 2 by x 1, x 2. In general, to form I j consider all models I with j predictors including variables x 1,..., x j 1. Compute Q j (I) = r T (I)r(I) = n i=1 r2 i (I) = n i=1 (Y i Ŷi(I)) 2. Let I j minimize Q j (I) for the p j + 1 models I that contain x 1,..., x j 1 and one other predictor not already selected. Denote the predictors in I j by x 1,..., x j. Continue in this manner for j = 2,..., M. Often M = min( n/j, p) for some integer J such as J = 5, 10, or /53 Lasanthi Pelawa Watagoda SIU Carbondale
20 Criteria for final model selection When there is a sequence of M submodels, the final submodel I d needs to be selected. Let x I and ˆβ I be a 1. Hence the candidate model contains a terms, including a constant. Suppose the e i are independent and identically distributed (iid) with variance V(e i ) = σ 2. Then there are many criteria used to select the final submodel I d. A simple method is to take the model that uses d = M = min( n/j, p) variables. If p is fixed, the method will use the full OLS model once n/j p. 19/53 Lasanthi Pelawa Watagoda SIU Carbondale
21 Criteria for final model selection Criterion when there is a good estimator for σ 2 Let criteria C S (I) have the form C S (I) = SSE(I) + ak nˆσ 2. These criteria need a good estimator of σ 2. The criterion C p (I) = AIC S (I) uses K n = 2 while the BIC S (I) criterion uses K n = log(n). Typically ˆσ 2 is the full OLS model MSE = n i=1 r 2 i n p when n/p is large. Then ˆσ 2 = MSE is a n consistent estimator of σ 2 under mild conditions by Su and Cook (2012). 20/53 Lasanthi Pelawa Watagoda SIU Carbondale
22 Criteria for final model selection EBIC The EBIC criterion given in Luo and Chen (2012) may work when n/p is not large. Let 0 γ 1 and I = a min(n, p) if ˆβ I is a 1. We may use a min(n/5, p). Then ( SSE(I) EBIC(I) = n log n = BIC(I) + 2γ log ) + a log(n) + 2γ log [( )] p. a [( )] p a This criterion can give good results if p = p n = O(n k ) and γ > 1 1/(2k). Hence we will use γ = 1. The above criteria can be applied to forward selection and relaxed lasso. The C p criterion can also be applied to lasso. 21/53 Lasanthi Pelawa Watagoda SIU Carbondale
23 R functions for variable selection Method Criterion R function Library Forward Selection Minimum C p regsubsets leaps PCR 10 fold CV pcr pls PLS 10 fold CV plsr pls RR 10 fold CV cv.glmnet glmnet Lasso (Relaxed) 10 fold CV cv.glmnet glmnet Table: Methods used for Variable Selection 22/53 Lasanthi Pelawa Watagoda SIU Carbondale
24 New OLS Sub Model Theorem Theorem Suppose the usual linear model Y = Xβ + e, with EY = Xβ and E(e) = 0. Then Cov(Y) = Cov(e) = σ 2 I. If we break down X and β as follows; X = [ X I X 0 ], β = [ βi β 0 Xβ = X I β I + X 0 β 0 where β I = [ X T I X I] 1 X T I Y = AY, then E( ˆβ I ) = β I + [ X T I X I] 1 X T I X 0 β 0 and Cov( ˆβ I ) = σ 2 ( X T I X I) 1. ], 23/53 Lasanthi Pelawa Watagoda SIU Carbondale
25 New OLS Sub Model Theorem: Proof Proof. Assume this is an arbitorary submodel. When the submodel contains the set of predictors S then ˆβ I works well and estimates β I, but if I does not contain S then E( ˆβ ( [X ] ) T 1 I ) = E I X I X T I Y = E (AY) = AE (Y) = AXβ = A (X I β I + X 0 β 0 ) = [ X T I X I] 1 X T I (X I β I + X 0 β 0 ) = [ X T I X I] 1 X T I X I β I + [ X T I X I] 1 X T I X 0 β 0 = β I + [ X T I X I] 1 X T I X 0 β 0. When β 0 = 0 then E( ˆβ I ) is equal to ˆβ I. The above results shows why OLS does not work well if the submodel does not contains enough predictors. 24/53 Lasanthi Pelawa Watagoda SIU Carbondale
26 New OLS Sub Model Theorem: Proof Proof. Now consider the Cov( ˆβ I ): Cov( ˆβ I ) = Cov(AY) = ACov(Y)A T = Aσ 2 IA T = σ 2 AA T ( [X = σ 2 ] ) ( T 1 I X I X T [X ] ) T 1 T I I X I X T I = σ 2 [ X T I X ] ( 1 I X T [X ] ) T 1 T I X I I X I = σ 2 ( [X T I X I ] T ) 1 = σ 2 ( X T I X I) 1. The above results shows why OLS does not work well if the submodel does not contains enough predictors. 24/53 Lasanthi Pelawa Watagoda SIU Carbondale
27 Prediction Intervals Definition Consider predicting a future test response variable Y f given a p 1 vector of predictors x f and training data (x 1, Y 1 ),..., (x n, Y n ). A large sample 100(1 δ)% prediction interval (PI) has the form [ˆL n, Ûn] where P(ˆL n Y f Ûn) 1 δ as the sample size n. 25/53 Lasanthi Pelawa Watagoda SIU Carbondale
28 Prediction Intervals Shorth PI The shorth(c) estimator is useful for making prediction intervals. Let Z (1),..., Z (n) be the order statistics of Z 1,..., Z n. Then let the shortest closed interval containing at least c of the Z i be shorth(c) = [Z (s), Z (s+c 1) ]. (7) Let k n = n(1 δ) (8) Frey (2013) showed that for large nδ and iid data, the shorth(k n ) PI has maximum undercoverage 1.12 δ/n, and used the shorth(c) estimator as the large sample 100(1 δ)% PI where c = min(n, n[1 δ δ/n ] ). (9) 26/53 Lasanthi Pelawa Watagoda SIU Carbondale
29 Two new prediction intervals The First New Prediction Interval Let q n = min(1 δ , 1 δ + d/n) for δ > 0.1 and q n = min(1 δ/2, 1 δ + 10δd/n), otherwise. (10) If 1 δ < and q n < 1 δ , set q n = 1 δ. Let c = nq n, (11) and let ( b n = ) n + 2d n n d if d 8n/9, and ( b n = ), n otherwise. Compute the shorth(c) of the residuals = [r (s), r (s+c 1) ] = [ ξ δ1, ξ 1 δ2 ]. Then the first new 100 (1 δ)% large sample PI for Y f is: (12) [ ˆm(x f ) + b n ξδ1, ˆm(x f ) + b n ξ1 δ2 ]. (13) 27/53 Lasanthi Pelawa Watagoda SIU Carbondale
30 Two new prediction intervals The Second New Prediction Interval (Validation PI) The second new PI randomly divides the data into two half sets H and V H has n H = n/2 of the cases and V has the remaining n V = n n H cases i 1,..., i nv. The estimator ˆm H (x) is computed using the training data set H. Then the validation residuals v j = Y ij ˆm H (x ij ) are computed for j = 1,..., n V cases in the validation set V. Find the Frey PI [v (s), v (s+c 1) ] of the validation residuals. Then second new 100(1 δ)% large sample PI for Y f is: [ ˆm H (x f ) + v (s), ˆm H (x f ) + v (s+c 1) ]. (14) 28/53 Lasanthi Pelawa Watagoda SIU Carbondale
31 Bootstrapping Hypothesis Tests Bootstrap a Confidence Region To bootstrap a confidence region, Mahalanobis distances and prediction regions will be useful. Consider predicting a future test value z f, given past training data z 1,..., z n where the z i are r 1 random vectors. A large sample 100(1 δ)% prediction region is a set A n such that P(z f A n ) 1 δ as n. Let the r 1 column vector T be a multivariate location estimator, and let the r r symmetric positive definite matrix C be a dispersion estimator. Then the ith squared sample Mahalanobis distance is D 2 i = D 2 i (T, C) = D2 z i (T, C) = (z i T) T C 1 (z i T) (15) for each observation z i. 29/53 Lasanthi Pelawa Watagoda SIU Carbondale
32 Bootstrapping Hypothesis Tests Bootstrap a Confidence Region Notice that the Euclidean distance of z i from the estimate of center T is D i (T, I r ) where I r is the r r identity matrix. The classical Mahalanobis distance D i uses (T, C) = (z, S), the sample mean and sample covariance matrix where z = 1 n n i=1 z i and S = 1 n 1 n (z i z)(z i z) T. (16) i=1 30/53 Lasanthi Pelawa Watagoda SIU Carbondale
33 Bootstrapping Hypothesis Tests large sample 100(1 δ)% nonparametric prediction region Let (T, C) = (z, S), and let D (Un) be the 100q n th sample quantile of the D i. Then the Olive (2013) large sample 100(1 δ)% nonparametric prediction region for a future value z f given iid data z 1,...,, z n is {z : D 2 z(z, S) D 2 (U n) }, (17) while the classical large sample 100(1 δ)% prediction region is {z : D 2 z(z, S) χ 2 r,1 δ }. (18) 31/53 Lasanthi Pelawa Watagoda SIU Carbondale
34 Bootstrapping Hypothesis Tests Prediction Region Method The Olive (2017abce) prediction region method obtains a confidence region for µ by applying the nonparametric prediction region to the bootstrap sample T 1,..., T B. Let T and S T be the sample mean and sample covariance matrix of the bootstrap sample. The prediction region method large sample 100(1 δ)% confidence region for µ is {w : D 2 w(t, S T) D 2 (U B ) } (19) where D 2 (U B ) is computed from D2 i = (T i T ) T [S T] 1 (T i T ) for i = 1,..., B. Note that the corresponding test for H 0 : µ = µ 0 rejects H 0 if (T µ 0 ) T [S T] 1 (T µ 0 ) > D 2 (U B ). 32/53 Lasanthi Pelawa Watagoda SIU Carbondale
35 Bootstrapping Hypothesis Tests Prediction Region Method for Bootstrapping FS Following Olive (2017bce), we describe the prediction region method for bootstrapping forward selection where 0s need to be added for omitted variables. Suppose n > 20p. If ˆβ I is (k 1) 1, form ˆβ I,0 from ˆβ I by adding 0s corresponding to the omitted variables. Then ˆβ I,0 is a nonlinear estimator of β, and the residual bootstrap method can be applied. 33/53 Lasanthi Pelawa Watagoda SIU Carbondale
36 Bootstrapping Hypothesis Tests Prediction Region Method for Bootstrapping FS Let ˆβ = ˆβ Imin,0 be formed from the forward selection model I min that minimizes the C p criterion. Instead of computing the least squares estimator from regressing Y i on X, perform variable selection on Y i and X, fit the model that minimizes the criterion, and add 0s corresponding to the omitted variables, resulting in estimators ˆβ 1,..., ˆβ B. Then test µ = Aβ = c using the prediction region method for r > 1 and the shorth if r = 1. Bootstrapping lasso and ridge regression is similar, but lasso already has the 0s and ridge regression usually does not produce 0 slope estimates. 34/53 Lasanthi Pelawa Watagoda SIU Carbondale
37 Bootstrapping Hypothesis Tests Residual Bootstrap For MLR model with n > 20p, the residual bootstrap makes sense: The residuals from the full model are sampled with replacement resulting in a bootstrap sample r 1,..., r n and Y i = Ŷi + r i for i = 1,..., n are collected into a vector Y j which is regressed on X to get ˆβ j for j = 1,..., n. 35/53 Lasanthi Pelawa Watagoda SIU Carbondale
38 Examples and Simulations Example for PI The Hebbler (1847) data was collected from n = 26 districts in Prussia in We will study the relationship between Y = the number of women married to civilians in the district with the predictors x 1 = constant x 2 = pop = the population of the district in 1843 x 3 = mmen = the number of married civilian men in the district x 4 = mmilmen = number of married men in the military in the district x 5 = milwmn = the number of women married to husbands in the military in the district 36/53 Lasanthi Pelawa Watagoda SIU Carbondale
39 Examples and Simulations Example for PI Sometimes the person conducting the survey would not count a spouse if the spouse was not at home. Hence Y is highly correlated but not equal to x 3. Similarly, x 4 and x 5 are highly correlated but not equal. We expect that Y = x 3 + e is a good model, but n/p = 5.2 is small. 37/53 Lasanthi Pelawa Watagoda SIU Carbondale
40 Examples and Simulations Example for PI Forward selection selected the model with the minimum C p while the other methods used 10-fold CV PLS and PCR used the OLS full model with PI length Forward selection used a constant and mmen with PI length Ridge regression had PI length Lasso and relaxed lasso used a constant, mmen, and pop with lasso PI length relaxed lasso PI length /53 Lasanthi Pelawa Watagoda SIU Carbondale
41 y y Examples and Simulations Example: Marry Data Response Plots y y a) Forward Selection b) Ridge Regression yhat yhat c) Lasso d) Relaxed Lasso yhat yhat 39/53 Lasanthi Pelawa Watagoda SIU Carbondale
42 Examples and Simulations Example for PI Figure 1 shows the response plots for forward selection, ridge regression, lasso, and relaxed lasso. The plots for PLS=PCR=OLS full model were similar to those of forward selection and relaxed lasso. The plots suggest that the MLR model is appropriate since the plotted points scatter about the identity line. The 90% pointwise prediction bands are also shown, and consist of two lines parallel to the identity line. These bands are very narrow in Figure 1 a) and d). 40/53 Lasanthi Pelawa Watagoda SIU Carbondale
43 Examples and Simulations Simulations for New PI For the model Y = Xβ + e, methods such as forward selection, PCR, PLS, ridge regression, relaxed lasso, and lasso each generate M fitted models I 1,..., I M, where M depends on the method, and we considered several methods for selecting the final submodel I d. Only method i) needs n/p large. i) Let I d = I min be the model that minimizes C p for forward selection, relaxed lasso, or lasso. ii) Let I d = I min be the model that minimizes EBIC for forward selection or relaxed lasso. For forward selection, we used M = min( n/5, p). iii) Choose I d using k fold cross validation (CV). We used 10-fold CV. 41/53 Lasanthi Pelawa Watagoda SIU Carbondale
44 Examples and Simulations Simulations for New PI The simulation used p = 20, 40, n, and 2n. Let k be the number of nonzero coefficients, including the constant. k = 1, 19, and p 1. The zero mean errors e i were iid of five types: i) N(0,1) errors, ii) t 3 errors, iii) EXP(1) - 1 errors, iv) uniform( 1, 1) errors, and v) 0.9 N(0,1) N(0,100) errors. The simulation used 5000 runs, so an observed coverage in (0.94, 0.96) gives no reason to doubt that the PI has the nominal coverage of Correlations were set up to be cor(x i, x j ) = (2ψ + (m 2)ψ 2 )/(1 + (m 1)ψ 2 ) for i j where x i and x j are nontrivial predictors. The simulation used ψ = 0, 0.9 and 1/ p. 42/53 Lasanthi Pelawa Watagoda SIU Carbondale
45 Examples and Simulations Simulation Results for New PI Table: Large Sample 95% PI Coverages and Lengths, e i N(0, 1) n? p n p ψ k FS lasso RL RR PLS PCR Cov Len Cov Len Cov Len n < p Cov Len n 10p Cov Len Cov Len Cov Len n = p Cov Len n 10p Cov Len n 10p Cov Len Cov Len n < p Cov Len /53 Lasanthi Pelawa Watagoda SIU Carbondale
46 Examples and Simulations Simulation Results for New PI Table: Large Sample 95% PI Coverages and Lengths, e i N(0, 1) n? p n p ψ k FS lasso RL RR PLS PCR Cov Len Cov Len Cov Len Cov Len Cov Len n = p Cov Len Cov Len n = p Cov Len Cov Len Cov Len n = p Cov Len /53 Lasanthi Pelawa Watagoda SIU Carbondale
47 Examples and Simulations Simulation Results for New PI Tables show some simulation results for the first new PI where forward selection used C p for n 10p and EBIC for n < 10p. The other methods minimized 10-fold CV. For methods other than ridge regression, the maximum number of variables uses was approximately min( n/5, p). For n 10p, coverages tended to be near or higher than the nominal value of The lengths of the asymptotically optimal 95% PIs are i) 3.92 = 2(1.96), ii) 6.365, iii) 2.996, iv) 1.90 = 2(0.95), and v) /53 Lasanthi Pelawa Watagoda SIU Carbondale
48 Examples and Simulations Simulation Results for New PI The average PI length was often near to 1.4 to 1.8 times the asymptotically optimal length for n = 10p and close to the optimal length for n = 100p. An exception was that lasso and ridge regression often had far too long lengths for k 19 and ψ 0.5. For n p, good performance needed stronger regularity conditions. PLS tended to have severe undercoverage with small average length. PCR length was often too long for ψ = 0. 46/53 Lasanthi Pelawa Watagoda SIU Carbondale
49 Examples and Simulations Simulation Results for New PI When there was k = 1 active predictor, forward selection, lasso, and relaxed lasso often performed well. For k = 19, forward selection performed well as did lasso and relaxed lasso for ψ = 0. For dense models with k = p 1 and n/p not large, there was often undercoverage. Let d 1 be the number of active predictors in the selected model. N(0, 1) errors, ψ = 0, and d < k, an asymptotic population 95% PI has length 3.92 k d + 1. EBIC occasionally had undercoverage, especially for k = 19 or p 1, which was usually more severe for ψ = 0.9 or 1/ p. 47/53 Lasanthi Pelawa Watagoda SIU Carbondale
50 Examples and Simulations Simulations for Bootstraping Hypothesis testing A small simulation was done, using the same type of data as for the prediction interval simulation, using B = max(1000, n, 20p) and 5000 runs. The regression model used β = (1, 1, 0, 0) T with n = 100 and p = 4. When ψ = 0, the design matrix X consisted of iid N(0,1) random variables, and the full model least squares confidence intervals for β i should have length near 2t 96,0.975 σ/ n 2(1.96)σ/10 = 0.392σ when the iid zero mean errors have variance σ 2. 48/53 Lasanthi Pelawa Watagoda SIU Carbondale
51 Examples and Simulations Simulations for Bootstraping Hypothesis testing The simulation computed the Frey shorth(c) interval for each β i and used the prediction region method to test H 0 : β 3 = β 4 = 0. The nominal coverage was 0.95 with δ = Observed coverage between 0.94 and 0.96 would suggest coverage is close to the nominal value. 49/53 Lasanthi Pelawa Watagoda SIU Carbondale
52 Examples and Simulations Simulation Results for Bootstraping The regression models used the residual bootstrap on the full model least squares estimator and on the forward selection estimator ˆβ Imin,0. Results are shown for when the iid errors e i N(0, 1). Table: Bootstrapping OLS Regression and Forward Selection model ψ cov/len β 1 β 2 β 3 β 4 test reg 0 cov len vs 0 cov len reg 0.5 cov len vs 0.5 cov len reg 0.9 cov len vs 0.9 cov len /53 Lasanthi Pelawa Watagoda SIU Carbondale
53 Examples and Simulations Simulation Results for Bootstraping The cutoff will often be near χ 2 r,0.95 if the statistic T is asymptotically normal. χ 2 2,0.95 = is close to 2.45 for the full model regression bootstrap test. The coverages were near 0.95 for the regression bootstrap on the full model. Suppose ψ = 0. Then ˆβ S has the same limiting distribution for I min and the full model. Note that the average lengths and coverages were similar for the full model and forward selection I min for β 1 and β 2 and β S = (β 1, β 2 ) T. 51/53 Lasanthi Pelawa Watagoda SIU Carbondale
54 Examples and Simulations Simulation Results for Bootstraping Table: Bootstrap LASSO, ψ = 0 n eps type β 1 β 2 β 3 β 4 test cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen cicov avelen /53 Lasanthi Pelawa Watagoda SIU Carbondale
55 THANK YOU 53/53 Lasanthi Pelawa Watagoda SIU Carbondale
INFERENCE AFTER VARIABLE SELECTION. Lasanthi C. R. Pelawa Watagoda. M.S., Southern Illinois University Carbondale, 2013
INFERENCE AFTER VARIABLE SELECTION by Lasanthi C. R. Pelawa Watagoda M.S., Southern Illinois University Carbondale, 2013 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Doctor
More informationPrediction Intervals For Lasso and Relaxed Lasso Using D Variables
Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School 2017 Prediction Intervals For Lasso and Relaxed Lasso Using D Variables Craig J. Bartelsmeyer Southern Illinois University
More informationLarge Sample Theory For OLS Variable Selection Estimators
Large Sample Theory For OLS Variable Selection Estimators Lasanthi C. R. Pelawa Watagoda and David J. Olive Southern Illinois University June 18, 2018 Abstract This paper gives large sample theory for
More informationBootstrapping Hypotheses Tests
Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Summer 2015 Bootstrapping Hypotheses Tests Chathurangi H. Pathiravasan Southern Illinois University Carbondale, chathurangi@siu.edu
More informationPrediction Intervals for Regression Models
Southern Illinois University Carbondale OpenSIUC Articles and Preprints Department of Mathematics 3-2007 Prediction Intervals for Regression Models David J. Olive Southern Illinois University Carbondale,
More informationHighest Density Region Prediction
Highest Density Region Prediction David J. Olive Southern Illinois University November 3, 2015 Abstract Practical large sample prediction regions for an m 1 future response vector y f, given x f and past
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationMIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationi=1 (Ŷ i Y ) 2 and error (or residual) i=1 (Y i Ŷ i ) 2 = n
Math 583 Exam 3 is on Wednesday, Dec 3 You are allowed 10 sheets of notes and a calculator CHECK FORMULAS! 85 If the MLR (multiple linear regression model contains a constant, then SSTO = SSE + SSR where
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationA Squared Correlation Coefficient of the Correlation Matrix
A Squared Correlation Coefficient of the Correlation Matrix Rong Fan Southern Illinois University August 25, 2016 Abstract Multivariate linear correlation analysis is important in statistical analysis
More informationFinal Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58
Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationLinear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77
Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More information3 Multiple Linear Regression
3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationPrediction and Statistical Learning
David J. Olive Prediction and Statistical Learning August 31, 2018 Springer Preface Many statistics departments offer a one semester graduate course in statistical learning theory using texts such as
More informationKriging and Non Parametric Regrssion
Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Spring 5-15-2018 Kriging and Non Parametric Regrssion Chamila Ranaweera ranawera@siu.edu Follow this and additional works
More informationSTAT 540: Data Analysis and Regression
STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State
More informationPrediction Intervals in the Presence of Outliers
Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationApplied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013
Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationOn High-Dimensional Cross-Validation
On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationIntroduction. Chapter Overview
Chapter 1 Introduction This chapter provides a preview of the book, and some techniques useful for visualizing data in the background of the data. Some large sample theory is presented in Section 1.4,
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationStatistical Inference of Covariate-Adjusted Randomized Experiments
1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More information1 The classical linear regression model
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 4: Multivariate Linear Regression Linear regression analysis is one of the most widely used
More informationSUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING
Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationFocused fine-tuning of ridge regression
Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, 2016 1 / 22 Penalized regression The least-squares
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationA Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models
A Consistent Model Selection Criterion for L 2 -Boosting in High-Dimensional Sparse Linear Models Tze Leung Lai, Stanford University Ching-Kang Ing, Academia Sinica, Taipei Zehao Chen, Lehman Brothers
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationLecture 11. Correlation and Regression
Lecture 11 Correlation and Regression Overview of the Correlation and Regression Analysis The Correlation Analysis In statistics, dependence refers to any statistical relationship between two random variables
More informationChapter 2 Multiple Regression I (Part 1)
Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationMath 5305 Notes. Diagnostics and Remedial Measures. Jesse Crawford. Department of Mathematics Tarleton State University
Math 5305 Notes Diagnostics and Remedial Measures Jesse Crawford Department of Mathematics Tarleton State University (Tarleton State University) Diagnostics and Remedial Measures 1 / 44 Model Assumptions
More informationBootstrapping Some GLMs and Survival Regression Models After Variable Selection
Bootstrapping Some GLMs and Survival Regression Models After Variable Selection Rasanji C. Rathnayake and David J. Olive Southern Illinois University January 1, 2019 Abstract Consider a regression model
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationBootstrapping Analogs of the One Way MANOVA Test
Bootstrapping Analogs of the One Way MANOVA Test Hasthika S Rupasinghe Arachchige Don and David J Olive Southern Illinois University July 17, 2017 Abstract The classical one way MANOVA model is used to
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationProbability and Statistics Notes
Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical
More informationOutline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model
Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More information2.2 Classical Regression in the Time Series Context
48 2 Time Series Regression and Exploratory Data Analysis context, and therefore we include some material on transformations and other techniques useful in exploratory data analysis. 2.2 Classical Regression
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationBivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More information