Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Similar documents
Bootstrap Confidence Intervals

The Nonparametric Bootstrap

Better Bootstrap Confidence Intervals

Finite Population Correction Methods

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

The automatic construction of bootstrap confidence intervals

4 Resampling Methods: The Bootstrap

Improved Ridge Estimator in Linear Regression with Multicollinearity, Heteroscedastic Errors and Outliers

On the Accuracy of Bootstrap Confidence Intervals for Efficiency Levels in Stochastic Frontier Models with Panel Data

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Bootstrap Testing in Econometrics

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Resampling and the Bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

UNIVERSITÄT POTSDAM Institut für Mathematik

ITERATED BOOTSTRAP PREDICTION INTERVALS

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

11. Bootstrap Methods

Improved Liu Estimators for the Poisson Regression Model

Bootstrap inference for the finite population total under complex sampling designs

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Constructing Prediction Intervals for Random Forests

Bootstrap, Jackknife and other resampling methods

A Non-parametric bootstrap for multilevel models

Nonparametric Methods II

Finding a Better Confidence Interval for a Single Regression Changepoint Using Different Bootstrap Confidence Interval Procedures

Resampling and the Bootstrap

STAT440/840: Statistical Computing

A Practitioner s Guide to Cluster-Robust Inference

A better way to bootstrap pairs

Heteroskedasticity-Robust Inference in Finite Samples

One-Sample Numerical Data

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Multicollinearity and A Ridge Parameter Estimation Approach

interval forecasting

Econometrics Summary Algebraic and Statistical Preliminaries

Exact Inference for the Two-Parameter Exponential Distribution Under Type-II Hybrid Censoring

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Higher-Order Confidence Intervals for Stochastic Programming using Bootstrapping

Analysis of Type-II Progressively Hybrid Censored Data

Bootstrapping, Randomization, 2B-PLS

The exact bootstrap method shown on the example of the mean and variance estimation

Jackknife Empirical Likelihood for the Variance in the Linear Regression Model

Advanced Statistics II: Non Parametric Tests

STAT 830 Non-parametric Inference Basics

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

Indian Agricultural Statistics Research Institute, New Delhi, India b Indian Institute of Pulses Research, Kanpur, India

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Analytical Bootstrap Methods for Censored Data

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Math 494: Mathematical Statistics

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Chapter 1 Statistical Inference

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

QED. Queen s Economics Department Working Paper No The Size Distortion of Bootstrap Tests

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview

Contents. Acknowledgments. xix

Statistical Inference

Bootstrap Confidence Intervals for the Correlation Coefficient

Outline of GLMs. Definitions

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

A note on multiple imputation for general purpose estimation

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

Regression, Ridge Regression, Lasso

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

Double Bootstrap Confidence Intervals in the Two Stage DEA approach. Essex Business School University of Essex

Small area prediction based on unit level models when the covariate mean is measured with error

IEOR E4703: Monte-Carlo Simulation

Slack and Net Technical Efficiency Measurement: A Bootstrap Approach

Contents 1. Contents

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Inference for P(Y<X) in Exponentiated Gumbel Distribution

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Bootstrapping Dependent Data in Ecology

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

THE EFFECTS OF NONNORMAL DISTRIBUTIONS ON CONFIDENCE INTERVALS AROUND THE STANDARDIZED MEAN DIFFERENCE: BOOTSTRAP AND PARAMETRIC CONFIDENCE INTERVALS

Multiple Testing of One-Sided Hypotheses: Combining Bonferroni and the Bootstrap

Integrated likelihoods in survival models for highlystratified

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Bahadur representations for bootstrap quantiles 1

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship

Inference via Kernel Smoothing of Bootstrap P Values

The Bootstrap Small Sample Properties. F.W. Scholz Boeing Computer Services Research and Technology

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Bootstrap & Confidence/Prediction intervals

A Resampling Method on Pivotal Estimating Functions

Bootstrap, Jackknife and other resampling methods

University of California San Diego and Stanford University and

STATISTICAL INFERENCE IN ACCELERATED LIFE TESTING WITH GEOMETRIC PROCESS MODEL. A Thesis. Presented to the. Faculty of. San Diego State University

Technical Report No. 1/12, February 2012 JACKKNIFING THE RIDGE REGRESSION ESTIMATOR: A REVISIT Mansi Khurana, Yogendra P. Chaubey and Shalini Chandra

Transcription:

Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE) in particular) can be of great use for estimating the unknown regression coefficients in the presence of multicollinearity. But apart from its ability to create good parameter estimates with smaller MSE than the OLSE, it must also provide fine solutions when dealing with more intricate inference problems like obtaining confidence intervals. The problem with the ORE is that its sampling distribution is unknown, hence we make use of the resampling methods to obtain the asymptotic confidence intervals for the regression coefficients based on the ORE. Recently, Firinguetti and Bobadilla [40] developed asymptotic confidence intervals for the regression coefficients based on ORE and Edgeworth expansion. Crivelli et. al. [17] proposed the use of a technique that combines the bootstrap and the Edgeworth expansion to obtain an approximation to the distribution of some ridge regression estimators and carried out some simulation experiments. Alheety and Ramanathan [4] proposed a data dependent confidence interval for the ridge parameter k. Confidence intervals have become very popular in the applied statistician s col- 73

lection of data-analytic tools. They combine point estimation and hypothesis testing into a single inferential statement. Recent advances in statistical methods and fast computing allow the construction of highly accurate approximate confidence intervals. Confidence intervals for a given population parameter θ are sample based range [ˆθ 1, ˆθ 2 ] given out for the unknown θ. The range possesses the property that θ would lie within its bounds with a high specified probability. Of course this probability is with respect to all possible samples, each sample giving rise to a confidence interval which depends on the chance mechanism involved in drawing the samples. Two approaches for confidence intervals includes the construction of exact intervals for special cases like the ratio of normal means or a single binomial parameter and others are the most commonly used, approximate confidence intervals which are also known as standard intervals (or normal theory intervals) having the following form ˆθ ± z (α)ˆσ, (4.1.1) where ˆθ is a point estimate of the parameter θ, ˆσ is an estimate of standard deviation of ˆθ, and z (α) is the 100α th percentile of a normal variate, (for example z (0.95) = 1.645 etc.). The main drawback of standard intervals is that they are based on an asymptotic approximation that may not be accurate in practice. There has been considerable progress on developing better confidence intervals techniques for improving the standard interval, involving bias corrections and parameter transformations etc.. Numerous methods have been proposed to improve upon standard intervals. These methods produce approximate confidence intervals that have better coverage accuracy than the standard one. Some references under this area includes McCullagh [67], Cox and Reid [16], Efron [34], Hall [49], DiCiccio and Tibshirani [24] etc.. The confidence intervals based on the resampling methods like bootstrap and jackknife can be seen as automatic algorithms for carrying out these improvements. We have already discussed that bootstrap and jackknife are two powerful methods for variance estimation which is why we can use them to produce confidence intervals. We discuss the different methods for construction of the bootstrap and jackknife confidence intervals in Section (4.2) and (4.3) respectively, Section (4.4) contains the model and the estimators, Section (4.5) gives a simulation study for the comparison of coverage probabilities and average confidence widths based on ORE and OLSE and 74

Section (4.6) consists of some concluding remarks. 4.2 Bootstrap Confidence Intervals Suppose that we draw a sample S = (x 1,x 2,...,x n ) of size n from a population of size N and we are interested in some statistic ˆθ as an estimate of the corresponding population parameter θ. The classical inference, where we derive the asymptotic distribution of sampling distribution of ˆθ when its exact distribution is intractable or has some drawbacks. The corresponding sampling distribution of statistic ˆθ may be inaccurate if the assumptions about the population are wrong or the sample size is relatively small. On the other hand, as discussed earlier also, nonparametric bootstrap enables us to estimate the sampling distribution of a statistic empirically without making any assumptions about the parent population and without deriving sampling distribution explicitly. In this method, we draw a sample of size n from the elements of sample S and the sampling is done with replacement. This is the first bootstrap resample S 1 = (x 11,x 12,...,x 1n). Each element of the resample is selected from the original sample with probability 1/n. We repeat this step of resampling a large number of times say B so that the r th bootstrap resample is given by S r = (x r1,x r2,...,x rn). Next, we compute the statistic ˆθ for each of the bootstrap resample i.e. ˆθ r. Now, the distribution of ˆθ r around the original statistic ˆθ is similar to the sampling distribution of ˆθ around population parameter θ. The average of the statistics computed based on bootstrap resamples estimates the expectation of the bootstrapped statistic. It is given as ˆθ = B r=1 ˆθ r B. Then ˆB = ˆθ ˆθ gives an estimate of bias of ˆθ which is ˆθ θ. The estimate of the bootstrap variance of ˆθ is ˆV (ˆθ ) = B r=1 (ˆθ r ˆθ ) 2, B 1 which estimates the sampling variance of ˆθ. With the help of this variance estimate, we can easily produce bootstrap confidence intervals. There are several 75

methods for constructing bootstrap confidence intervals, each of them are briefly discussed below. 4.2.1 Normal Theory Method The first method for constructing bootstrap confidence interval is based on the assumption that the sampling distribution of ˆθ is normal. This uses the bootstrap estimate of sampling variance to construct a 100(1 α)% confidence interval of the following form where ˆ SE(ˆθ ) = (2ˆθ ˆθ ) ± z (1 α/2) ˆ SE(ˆθ ), ˆV (ˆθ ) is the bootstrap estimate of the standard error of ˆθ and z (1 α/2) is the (1 α/2) quantile of the standard normal distribution. 4.2.2 Bootstrap Percentile and Centered Bootstrap Percentile Method Another method for the construction of bootstrap confidence intervals is the bootstrap percentile method which is the most popular among all primarily due to its simplicity and natural appeal. In this method, we use the empirical quantiles of ˆθ to form the confidence interval for θ. The bootstrap percentile confidence interval has the following form ˆθ (lower) < θ < ˆθ (upper), where ˆθ (1), ˆθ (2),..., ˆθ (B) are the ordered bootstrap replicates of ˆθ with lower confidence limit=[(b + 1)α/2] and upper confidence limit=[(b + 1)(1 α/2)]. The square brackets indicates rounding to the nearest integer. For example, there are 1000 bootstrap replications (i.e. B = 1000) for ˆθ, denoted by (ˆθ 1, ˆθ 2,..., ˆθ 1000). After sorting them from bottom to top, let us denote these bootstrap values as (ˆθ (1), ˆθ (2),..., ˆθ (1000) ). Then the bootstrap percentile confidence interval at 95% level would be [ˆθ (25), ˆθ (975) ]. Here, it is implicitly assumed that the distribution 76

of ˆθ is symmetric around θ as the method approximates the sampling distribution of (ˆθ θ) by the bootstrap distribution of (ˆθ ˆθ ) which is contrary to the bootstrap theory. The coverage error is substantial if the distribution of ˆθ is not nearly symmetric. Now, suppose the sampling distribution of (ˆθ θ) is approximated by the bootstrap distribution of (ˆθ ˆθ) which is what bootstrap theory explains. Then the statement that (ˆθ θ) lies within the range(b 0.025 ˆθ,B 0.975 ˆθ) would carry a probability of 0.95 where B s denotes the 100s th percentile of bootstrap distribution of ˆθ. In case of 1000 bootstrap replications B 0.025 = ˆθ (25) and B 0.975 = ˆθ (975), we see that the above statement reduces to the statement that θ lies in the following confidence interval (2ˆθ B 0.975 ) < θ < (2ˆθ B 0.025 ). The above mentioned range is known as centered bootstrap percentile confidence interval. This method can be applied to any statistic and works well in cases where the bootstrap distribution is symmetrical and centered around the observed statistic. 4.2.3 Bootstrap Studentized-t Method Another method for constructing the bootstrap confidence intervals is the studentized bootstrap, also called the bootstrap-t method. The studentized-t bootstrap confidence interval takes the same form as the normal confidence interval except that instead of using the quantiles from a normal distribution, a bootstrapped t-distribution is constructed from which the quantiles are computed (see Davison and Hinkley [21] and Efron and Tibshirani [35]). Bootstrapping a statistical function of the form t = (ˆθ θ)/(se), where SE is the sample estimate of the standard error of ˆθ brings extra accuracy. This additional accuracy is due to the so called one-term Edgeworth correction by the bootstrap (see Hall [50]). The basic example of it is the standard t statistics t = ( x µ)/(s/ n), which is a special case with θ = µ (the population mean), ˆθ = x (the sample mean) and s being the sample standard deviation. The bootstrap counterpart 77

of this is t = (ˆθ ˆθ)/(SE), where SE is the standard error based on bootstrap distribution. Denote the 100s th bootstrap percentile of t by b s and consider the statement that (b 0.025 < t < b 0.975 ) and after substituting t = (ˆθ θ)/se, we get confidence limits for θ as ˆθ (SE)b 0.975 < θ < ˆθ (SE)b 0.025. This range is known as bootstrap-t based confidence interval for θ at 95% confidence level. Like the normal approximation, this confidence interval leads to interval estimates that are symmetric about the original point estimator, which may not be appropriate. It is best suited for bootstrapping a location statistic. The use of the studentized bootstrap is controversial to some extent. In some cases, the endpoints of the intervals seem to be too wide and the method seem to be sensitive to outliers. Also, this method is computationally very intensive if (SE) is calculated using a double bootstrap but if (SE) is easily available, the method performs reliably in many examples in both its standard and variance stabilized forms (see Davison and Hinkley [21]). Percentile bootstrap endpoints are simple to calculate and can work well, especially if the sampling distribution is symmetrical. It may not have the correct coverage when the sampling distribution of the statistic is skewed. Coverage of the percentile bootstrap can be improved by adjusting the endpoints for bias and nonconstant variance. This method is known as BCa method which we discuss in the next subsection. 4.2.4 Bias corrected and Accelerated (BCa) Method If we have a distribution which has skew or bias, we need to do some adjustments. One method which is proved to be reliable in such cases is BCa method, this method will tend to be closer to the true confidence interval than the percentile method. This is an automatic algorithm for producing highly accurate 78

confidence limits from a bootstrap distribution. For a detailed review on the same, see Efron [34], Hall [49], DiCiccio [23], Efron and Tibshirani [35]. The BCa procedure is a method of setting approximate confidence intervals for θ from the percentiles of the bootstrap histogram. The parameter of interest being θ only, ˆθ is an estimate of θ based on the observed data and ˆθ is a bootstrap replication of ˆθ obtained by resampling. Let Ĝ(c) be the cumulative distribution function of B bootstrap replications of ˆθ, Ĝ(c) = #{ˆθ < c}/b. (4.2.1) The upper endpoint, ˆθBCa [α] of one-sided BCa interval at α level i.e. θ (, ˆθ BCa [α]) is defined in terms of Ĝ and two parameters z 0, the bias correction and a, the acceleration. That is how BCa stands for bias corrected and accelerated. The BCa endpoint is given by ( ) z 0 + z ˆθ (α) BCa [α] = Ĝ 1 Φ z 0 +. (4.2.2) 1 a(z 0 + z (α) ) Here Φ is the standard normal c.d.f., with z (α) = Φ 1 α. The central 90% BCa confidence interval is given by (ˆθ BCa [0.05], ˆθ BCa [.95]). In (4.2.2), if a and z 0 are zero, then ˆθ BCa [α] = Ĝ 1 (α), the 100α th percentile of the bootstrap replications. Also, if Ĝ is normal, then ˆθ BCa [α] = ˆθ + z (α)ˆσ, the standard interval endpoint. In general, (4.2.2) makes three different corrections to the standard intervals, improving their coverage accuracy. Suppose that there exists a monotone increasing transformation φ = m(θ) such that ˆφ = m(ˆθ) is normally distributed for every choice of θ, but with a bias and a non constant variance, The BCa algorithm estimates z 0 by ˆφ N(φ z 0 σ φ,σ 2 φ), σ φ = 1 + aφ. (4.2.3) { ẑ 0 = Φ 1 #{ˆθ (b) < ˆθ} }, B Φ 1 of the proportion of the bootstrap replications less than ˆθ. The acceleration 79

a measures how quickly the standard error is changing on the normalized scale. The acceleration a is estimated using the empirical influence function of the statistic ˆθ = t( ˆF), t((1 ǫ) U i = lim ˆF + ǫδ i ) ǫ 0,i = 1, 2,...,n. (4.2.4) ǫ Here δ i is a point mass on x i, so (1 ǫ) ˆF + ǫδ i is a version of ˆF putting extra weight on x i and less weight on the other points. The estimate of a is â = 1 6 n i=1 U3 i ( n (4.2.5) i=1 U2 i )3/2. There is a simpler way of calculating U i and â. Instead of (4.2.4), we can use the following jackknife influence function (see Hinkley [51]) in (4.2.5) U i = (n 1)(ˆθ ˆθ (i) ), (4.2.6) where ˆθ (i) is the estimate of θ based on the reduced data set x (i) = (x 1,x 2,...,x i 1, x i+1,...,x n ). In the next section, we discuss about the confidence intervals based on jackknife method. 4.3 Jackknife Confidence Intervals Jackknife technique is generally used to reduce the bias of parameter estimates and to estimate the variance. Let n be the total sample size and the procedure is to estimate a parameter θ for n times, each time deleting one sample data point. The resulting estimator denoted by ˆθ i where the i th data point has been excluded before calculating the estimate. The so-called pseudo values are computed as θ i = nˆθ (n 1)ˆθ i, here ˆθ is the value of the parameter estimated from the entire data set. These pseudo values act as independent and identical normal random variables. The average of the above gives the jackknife estimator of θ as 80

θ = 1 n n θ i = nˆθ i=1 (n 1) n n ˆθ i. i=1 Also, the variance of these pseudo values is the estimate of the variance of ˆθ which is given by V = 1 n(n 1) n ( θ i θ)( θ i θ). (4.3.1) i=1 Apart from reducing bias and estimating variance, the jackknife technique also offers a very simple method to construct confidence interval for θ (see Miller [71]). Now, using the consistent estimator for variance given in (4.3.1), we get the confidence interval for θ (see Hinkley [51]) as ) where t (1 α2 ;n 1 is the upper α 2 ( θ t 1 α ) 2 ;n 1 vii, (4.3.2) 100% point of the Students s t- distribution with n 1 degrees of freedom, which is suited even for small samples. The next section contains the model and the estimators and confidence intervals in terms of the unknown regression coefficients. 4.4 The Model and the Estimators Let us consider the following multiple linear regression model y = Xβ + u (4.4.1) where y is an (n 1) vector of observations on the variable to be explained, X is an (n p) matrix of n observations on p explanatory variables assumed to be of full column rank, β is a (p 1) vector of regression coefficients associated with them and u is an (n 1) vector of disturbances, the elements of which are assumed to be independently and identically normally distributed with E(u) = 0; V ar(u) = σ 2 I. 81

The OLSE for β in model (4.4.1) is given by ˆβ OLSE = (X X) 1 X y. (4.4.2) Now, suppose if the data has the problem of multicollinearity, we use the ORE given by Hoerl and Kennard [52] as ˆβ ORE = (X X + ki) 1 X y = (I A 1 ki)ˆβ, (4.4.3) where k > 0 and A = X X + ki. The jackknife ridge estimator JRE of ORE is calculated using the formula ˆβ JRE = [I (A 1 ki) 2 ]ˆβ (4.4.4) Now for bootstrapping, firstly, for the model defined in (4.4.1), we fit the least squares regression equation for full sample and calculate the standardized residuals u i and then draw an n sized bootstrap sample with replacement (u (b) 1,u (b) 2,...,u (b) n ) from the residuals u i s giving 1/n probability to each u i. After this, we obtain the bootstrap y values using the resampled residuals keeping the design matrix fixed as shown below y (b) = X ˆβ OLSE + u (b). We then regress these bootstrapped y values on the fixed X to obtain the bootstrap estimates of the regression coefficients. So, the ORE from the first bootstrap sample is ˆβ ORE(b1) = (X X + ki) 1 X y (b1). Repeat the above steps B times where B is the number of bootstrap resamples. Based on these, the bootstrap estimator for β is given by B ˆβ ORE = ˆβ ORE(br)/B, (4.4.5) r=1 where r = 1,...,B. The estimated bias is given by Bias est = ˆβ ORE ˆβ OLSE. 82

The estimated variance of ridge estimator through bootstrap is given as V ar est = B (ˆβ ORE(br) ˆβ ORE ) 2 /(B 1). (4.4.6) r=1 Now, based on these estimators, we construct the confidence intervals for the regression coefficient β in following subsection. 4.4.1 Confidence Intervals for Regression Coefficients using ORE The different confidence intervals discussed in last section gives the following forms for the confidence intervals in terms of the regression coefficient β. 1. Normal theory method A 95% confidence interval for β based on ORE is (2ˆβ ORE ˆβ ORE ) z (1 α/2) V arest < β < (2ˆβ ORE ˆβ ORE ) + z (1 α/2) V arest, where α = 0.05, V ar est is the bootstrap estimate of the variance of ˆβ ORE as defined in (4.4.6) and z (1 α/2) is the (1 α/2) quantile of the standard normal distribution. 2. Percentile Method A 95% confidence interval for 1000 bootstrap resamples would be ˆβ ORE(25) < β < ˆβ ORE(975), where ˆβ ORE(r) is the rth observation in the ordered bootstrap replicates of ˆβ ORE. 3. Studentized t Method A 95% studentized t-interval would be ˆβ ORE (SE)b 0.975 < β < ˆβ ORE (SE)b 0.025, 83

where b s is the 100s th bootstrap percentile of t and t is as defined in Section (4.2.3). 4. BCa Method A 95% confidence interval for β based on ORE is ˆβ BCa [0.025] < β < ˆβ BCa [0.975], where ˆθ BCa [α] is as defined in (4.2.2). Another method known as the ABC (approximate bootstrap confidence intervals) method has been proposed by Efron[34], that gives analytic adjustment to BCa method for smoothly defined parameters in exponential families. They are touted in the literature as improvements for common parametric and nonparametric BCa procedures, and may be preferred in order to avoid the BCa s Monte Carlo calculations. [see avoidhese intervals are the approximations to the BCa intervals of Efron[34], using analytic methods to avoid the BCa s Monte Carlo calculations [see DiCiccio and Efron [25];Efron and Tibshirani [35]; DiCiccio and Efron [26]]. We have adopted this method in the linear model setup, however we did not see significant improvement in its performance over the BCa method. Hence, this method is not pursued in the numerical investigations carried out later. 5. Jackknife Method A 95% jackknife confidence interval for β based on ORE is ( ˆβ JRE t 1 α ) ( 2 ;n p vii < β < ˆβ JRE + t 1 α ) 2 ;n p vii, (4.4.7) ) where α = 0.05, t (1 α2 ;n p is the upper α 100% point of the Students s 2 t-distribution with (n p) degrees of freedom and v ii is as given in (4.3.1) with (n 1) in the denominator being replaced by (n p). In order to compare these methods of constructing asymptotic confidence inter- 84

vals based on ORE and OLSE, we calculate coverage probabilities which is defined as the proportion that the confidence interval includes the true parameter, under repeated sampling from the same underlying population and confidence width which is the difference between the upper and lower confidence endpoints. In the next section, we carry out a simulation study to obtain the confidence widths and coverage probabilities based on the confidence intervals developed using ORE and OLSE. 4.5 A Simulation Study After getting into some theoretical aspects of each method to construct confidence intervals, we now apply the methods on simulated data and compare the performance ORE and OLSE based on their coverage probabilities and confidence widths. The model is y = Xβ + u (4.5.1) where u N(0, 1). Here β is taken as the normalized eigen vector corresponding to the largest eigen value of X X. The explanatory variables are generated from the following equation x ij = (1 ρ 2 ) 1 2 wij + ρw ip, i = 1, 2...,n; j = 1, 2,...,p. where w ij are independent standard normal pseudo-random numbers and ρ 2 is the correlation between x ij and x ij for j,j < p and j j. When j or j = p, the correlation will be ρ. We have taken ρ = 0.9 and 0.99 to investigate the effects of different degrees of collinearity with sample sizes n = 25, 50 and 100. The feasible value of k is obtained by the optimal formula k = pσ2 β β Hoerl et al. [53], so that ˆk = pˆσ2 ˆβ ˆβ, as given by where ˆσ 2 = (y X ˆβ) (y X ˆβ). n p 85

For calculating different bootstrap confidence intervals like Normal, Percentile, Studentized and BCa, we have used the function called boot.ci in R. The confidence limits through jackknife are calculated using (4.4.7) and variance of jackknifed estimator is calculated using bootstrap method for variance estimation. We have calculated the coverage probabilities and average confidence width using 1999 bootstrap resamples and the experiment is repeated 500 times. The coverage probability, say CP is calculated using the following formula CP = #(ˆβ L < β < ˆβ U ), N and the average confidence width, say CW is calculated by CW = N i=1 (ˆβ U ˆβ L ), N where N is the simulation size, ˆβ L and ˆβ U are lower and upper confidence interval endpoints respectively. The results for coverage probability and average confidence width at 95% and 99% confidence levels with different values of n and ρ are summarized in Table (4.1)-Table (4.4). Note that in all the tables, column namely OLSE gives the coverage probability and confidence width based on the confidence intervals through ˆβ OLSE using t-distribution. From Tables (4.1) and (4.2), we find that the coverage probabilities of all the intervals improve with the increasing value of n and become close to each other which is due to the consistency of the estimators. It is evident from Tables (4.3) and (4.4) that the confidence intervals based on ORE have shorter widths in comparison to the width of the interval based on OLSE. It is interesting to note that the coverage probabilities and confidence width through OLSE and through jackknife are very close to each other. Also, from Tables (4.3) and (4.4), we find that with the increasing collinearity between the dependent variables, the difference between the confidence width of interval based on OLSE and intervals based on ORE is increasing. Also, with the increasing value of sample size, the confidence width of all the intervals is decreasing. 86

According to Tables (4.1) and (4.2), all the bootstrap methods are generally conservative in terms coverage probabilities, however jackknife method seems to give coverage probabilities closer to the target. In terms of confidence width, resampling methods have smaller confidence width than the OLSE; jackknife method having larger confidence width than bootstrap methods. In noting that bootstrap methods are conservative with smaller confidence width, they seem to have an advantage over the jackknife method. 4.6 Concluding Remarks In the present chapter, we illustrated the use of different confidence intervals based on bootstrap and jackknife resampling methods. We obtained the coverage probabilities and confidence width based on ORE using different bootstrap and jackknife method and compared it with that of OLSE which we computed using t-intervals. The shorter confidence widths obtained through ORE shows its superiority over OLSE in the case of multicollinearity. The next chapter deals with jackknifing and bootstrapping another biased estimator given by Liu [64] to overcome the drawback of ORE. 87

Table 4.1: Coverage Probabilities through different methods at 95% confidence level n ρ OLSE N ormal P ercentile Studentized BCa Jackknif e 25 0.9 0.952 0.966 0.974 0.978 0.970 0.940 0.964 0.966 0.960 0.976 0.964 0.922 0.962 0.972 0.978 0.986 0.976 0.944 0.99 0.952 0.970 0.980 0.986 0.972 0.940 0.964 0.964 0.964 0.984 0.966 0.920 0.964 0.986 0.986 0.994 0.990 0.936 50 0.9 0.950 0.974 0.972 0.980 0.970 0.948 0.952 0.968 0.976 0.984 0.976 0.936 0.948 0.988 0.986 0.990 0.990 0.964 0.99 0.950 0.984 0.984 0.986 0.984 0.948 0.952 0.986 0.984 0.990 0.986 0.934 0.948 0.994 0.994 0.996 0.994 0.968 100 0.9 0.958 0.972 0.974 0.978 0.974 0.948 0.962 0.958 0.954 0.956 0.952 0.930 0.940 0.966 0.968 0.968 0.964 0.932 0.99 0.958 0.984 0.984 0.984 0.984 0.946 0.962 0.974 0.976 0.982 0.974 0.928 0.944 0.978 0.978 0.980 0.978 0.926 88

Table 4.2: Coverage Probabilities through different methods at 99% confidence level n ρ OLSE N ormal P ercentile Studentized BCa Jackknif e 25 0.9 0.992 0.992 0.990 0.994 0.990 0.984 0.990 0.990 0.988 0.994 0.992 0.982 0.996 1.000 0.998 1.000 0.996 0.994 0.99 0.992 0.992 0.990 0.994 0.992 0.984 0.990 0.990 0.992 0.996 0.992 0.978 0.998 0.996 1.000 1.000 0.998 0.992 50 0.9 0.998 0.994 0.994 0.994 0.994 0.988 0.990 0.994 0.996 0.998 0.996 0.990 0.992 0.994 0.996 0.996 0.994 0.994 0.99 0.998 0.994 0.994 0.994 0.994 0.988 0.990 1.000 0.998 1.000 0.998 0.990 0.988 0.998 1.000 1.000 0.998 0.994 100 0.9 0.988 0.992 0.992 0.992 0.992 0.988 0.990 0.988 0.986 0.990 0.988 0.982 0.994 0.990 0.988 0.994 0.986 0.976 0.99 0.988 0.992 0.992 0.994 0.992 0.988 0.990 0.996 0.994 0.998 0.996 0.980 0.992 0.996 0.996 0.996 0.992 0.978 89

Table 4.3: Average confidence width through different methods at 95% confidence level using OLSE and ORE n ρ OLSE N ormal P ercentile Studentized BCa Jackknif e 25 0.9 2.101518 1.1330 1.1346 1.2801 1.1361 1.6494 1.998510 1.1712 1.1732 1.3216 1.1738 1.6516 2.144994 1.0454 1.0475 1.1809 1.0483 1.5851 0.99 6.493567 2.3355 2.3407 2.6376 2.3437 3.6672 6.175280 2.3954 2.4000 2.7049 2.4008 3.6976 7.994216 2.2116 2.2146 2.5011 2.2178 3.7170 50 0.9 1.407747 0.9846 0.9864 1.0449 0.9868 1.2858 1.258123 0.9225 0.9241 0.9793 0.9245 1.1733 1.188554 0.8173 0.8185 0.8671 0.8191 1.0739 0.99 4.349856 1.7202 1.7245 1.8256 1.7251 2.6731 3.887527 1.7090 1.7124 1.8153 1.7128 2.5918 4.654095 1.5836 1.5854 1.6813 1.5869 2.5555 100 0.9 0.9346568 0.7506 0.7518 0.7732 0.7518 0.8923 0.8637652 0.7042 0.7053 0.7244 0.7057 0.8282 0.9818960 0.7410 0.7423 0.7639 0.7430 0.9207 0.99 2.888035 1.3929 1.3957 1.4355 1.3971 1.9978 2.668984 1.3652 1.3669 1.4057 1.3687 1.9255 3.754617 1.3487 1.3496 1.3918 1.3509 2.1479 90

Table 4.4: Average confidence width through different methods at 99% confidence level using ORE n ρ OLSE N ormal P ercentile Studentized BCa Jackknif e 25 0.9 2.856330 1.4890 1.4908 1.7480 1.4960 2.2419 2.716325 1.5392 1.5341 1.8025 1.5378 2.2449 2.915421 1.3739 1.3774 1.6143 1.3819 2.1544 0.99 8.825893 3.0694 3.0764 3.6005 3.0870 4.9844 8.393285 3.1480 3.1353 3.6944 3.1443 5.0257 10.865537 2.9065 2.9123 3.4147 2.9281 5.0521 50 0.9 1.878560 1.2940 1.3010 1.4038 1.3030 1.7159 1.678895 1.2124 1.2190 1.3119 1.2195 1.5658 1.586059 1.0741 1.0794 1.1616 1.0804 1.4331 0.99 5.804641 2.2607 2.2739 2.4573 2.2757 3.5671 5.187688 2.2460 2.2573 2.4294 2.2602 3.4586 6.210630 2.0813 2.0925 2.2534 2.0943 3.4102 100 0.9 1.237342 0.987 0.991 1.029 0.991 1.181 1.143492 0.925 0.931 0.962 0.933 1.096 1.299879 0.974 0.979 1.017 0.979 1.219 0.99 3.823313 1.831 1.839 1.908 1.841 2.645 3.533324 1.794 1.803 1.867 1.810 2.549 4.970534 1.772 1.782 1.853 1.782 2.844 91